DE Professional Practice Q9

A. Write in append mode without a checkpoint

Append without checkpoint loses progress tracking and can reinsert replayed micro-batches as duplicates.

B. Collect the micro-batch to the driver before writing

Collecting to the driver changes execution location, not duplicate-handling or restart safety.

C. Delete the target and replay the full source after every restart

Full delete-and-replay is expensive and unnecessary when idempotent incremental processing is available.

D. Use a stable checkpoint and make each micro-batch idempotent, for example with `MERGE` on a business key

Under Structured Streaming’s exactly-once processing model, the checkpoint directory stores the committed offsets and batch progress; if the job restarts with the same checkpoint, Spark resumes from the last successful micro-batch rather than starting over. In `foreachBatch`, however, the sink is user-managed and can be re-invoked for the same `batchId` after a failure, so Delta writes must be made idempotent; using `MERGE` keyed on a business key prevents duplicate inserts by updating the same target row when the same batch is replayed.

Question 9

Explanation

Why each option is right or wrong