Question 11
Domain 2Your production demand forecasting pipeline preprocesses raw data using Dataflow before model training and prediction. This involves applying Z-score normalization to data in BigQuery and then writing it back. With new training data added weekly, your goal is to enhance efficiency by reducing both computation time and manual effort. What steps should you take to achieve this?
Correct answer: B
Explanation
BigQuery preprocessing is explicitly in scope: the guide lists “Data preprocessing (e.g., Dataflow, TFX, BigQuery)” and emphasizes creating “repeatable, reusable code.” Translating Z-score normalization into SQL lets BigQuery do the transformation in-place, reducing Dataflow computation and weekly manual pipeline updates.
Why each option is right or wrong
A. Normalize the data using Google Kubernetes Engine.
B. Translate the normalization algorithm into SQL for use with BigQuery.
Section 2.1 of the exam guide explicitly includes BigQuery as a valid preprocessing environment, alongside Dataflow and TFX, so moving the Z-score step into BigQuery is within scope. Because the guide also stresses creating “repeatable, reusable code,” expressing the normalization as SQL removes the weekly Dataflow rewrite/redeploy cycle and lets the transformation run in-place on the updated BigQuery table, cutting both compute overhead and manual maintenance.
C. Use the normalizer_fn argument in TensorFlow's Feature Column API.
D. Normalize the data with Apache Spark using the Dataproc connector for BigQuery.