Question 18
Domain 3: Data Transformation, Cleansing, and QualityA streaming aggregation over event time receives records that arrive several hours late. The business accepts late updates for up to one day but wants state cleaned up after that. Which concept is most relevant?
Correct answer: C
Explanation
Watermarking based on event time sets the progress point for event-time processing and lets the system decide when late data is no longer expected. In a streaming aggregation, it supports accepting records "for up to one day" and then cleaning up state after that lateness bound passes.
Why each option is right or wrong
A. Random repartitioning with no event-time logic
Random repartitioning changes data distribution, not late-data handling or event-time state expiration.
B. Deleting the checkpoint after every batch
Deleting checkpoints removes recovery progress and does not define acceptable lateness windows.
C. Watermarking based on event time
Event-time processing uses a watermark to mark how far the system believes event time has advanced, and state for a window is typically retained until the watermark passes the window end plus the allowed lateness. Here, the lateness bound is exactly 1 day, so records arriving within 24 hours can still update the aggregation, while anything beyond that can be treated as too late and the associated state safely evicted.
D. Sorting the dashboard alphabetically
Alphabetical dashboard sorting is a presentation concern, unrelated to streaming event-time aggregation.