Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

When does Iceberg beat Parquet+projection on AWS Glue, and when doesn't ?

DEV Community·Alessandra Bilardi·22 days ago
#phQrjglR
#iceberg#aws#glue#spark#catalog#article
Reading 0:00
15s threshold

Why this project I built this repo because I didn't have one of this kind yet and, having worked on data ingestion with Glue for a while, I wanted to gather in one place three things: how to structure code so it stays testable, which Firehose and Glue features to use and on what criteria, and a few Docker and Terraform gems I'd always promised myself to slot in somewhere. Plus, I had never set up Glue streaming from scratch, and for a personal project I needed a test bed to compare Iceberg and Parquet + partition projection on the same data flow and under the same Athena queries, to figure out when one solution wins over the other and why. This project mixes a lot of the experience I've gathered over the years with a couple of curiosities I hadn't had a chance to test. So there are no real challenges here: I already took those hits long ago. What I'm sharing is deliberate choices, driven by knowing these services inside out.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More