Anna Geller
Aug 15, 2023

--

well spotted! in a normal setting, you would typically want to match by ID as here: "MERGE INTO fruits as USING raw_fruits ON fruits.id = raw_fruits.id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *".

however, because this is a synthetic dataset, the IDs are sometimes the same for different fruit :D so I had to use a fruit name to make the merge logic working as intended

python is only used to operationalize it all -- easier to work with Glue from awswrangler, but you're right this is not strictly needed 👍

--

--

Anna Geller
Anna Geller

Written by Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

No responses yet