Menu

Streaming ETL engine for TypeScript/Node for large files
📰
0

Streaming ETL engine for TypeScript/Node for large files

Reddit r/typescript·u/pujansrt·about 1 month ago
#LyEjmEa1
Reading 0:00
15s threshold

Hi everyone,

I've been working on a project called Data-Genie. I built this because I was tired of OOM (Out of Memory) errors in my ETL jobs whenever a file got larger than a few hundred MBs.

It's a streaming-first engine that keeps a constant memory footprint (~15MB) regardless of whether you're processing 100KB or 100GB.

Key Features:

  • File Sources and Destinations: Local, AWS S3, SQL DBs
  • Multi-format: CSV, JSON, NDJSON, Excel, Parquet, and SQL.
  • Validation, filtering, and mapping, aggregation, and custom functions

I'm really looking for some honest feedback on the API design and the architecture. Is this something that would be useful in your workflows?

Repo: https://github.com/pujansrt/data-genie (feel free to add GitHub stars)

What do you think?

Read More