For a durable Lambda function, standard errors at any step are checkpointed and handled during the next replay by retry-strategy configurations. However, there can be scenarios where a durable step starts execution but fails to complete, while also leaving no ERROR checkpoint behind. A common example is a Lambda execution timeout. In such cases, step semantics become important because they define how the workflow should behave when the system cannot determine whether the step completed successfully or not. There are two types of semantics available: AtLeastOncePerRetry (Default) For a step, the SDK initiates a START checkpoint (by calling the internal checkpoint API) and proceeds to code execution without waiting for the checkpoint response. Now, if Lambda is interrupted mid-execution (timeout, OOM, sandbox crash), the step has no completion checkpoint. On the next replay, the SDK cannot determine that the step completed successfully and runs it again.…