Contents [Where to draw the atomic line: defining transactional boundaries and idempotency] [How to build durable checkpoints and idempotent task boundaries] [Testing, CI/CD, and deployment strategies for reliable DAGs] [Why compensation beats two-phase commit for batch jobs (and how to implement it)] [How to classify failures and implement intelligent retry strategies] [Practical Application: checklist and example DAG (atomic, retryable, compensating)] Where to draw the atomic line: defining transactional boundaries and idempotency You must pick the unit of atomicity before you write a single @task . For a multi-step batch job an atomic boundary is the smallest unit of work you will guarantee to be "all-or-nothing" from the business perspective — not necessarily a database transaction. Make those boundaries explicit: a step that reserves inventory, a step that charges a customer, a step that writes a reporting snapshot. Each needs its own success criteria and idempotency contract.…