Question 20
Content Domain 3: ModelingA data scientist wants to estimate how well a machine learning model will generalize to unseen data while making efficient use of a limited dataset. Which evaluation approach best fits this goal?
Correct answer: B
Explanation
Cross-validation estimates model performance by repeatedly training and evaluating on different splits of the available data, making it useful when data is limited and generalization must be assessed. — Perform cross-validation.
Why each option is right or wrong
A. Train the model once on the full dataset and report that training performance
Training performance does not evaluate how the model performs on held-out data.
B. Partition the data into multiple splits and evaluate the model across those splits
Cross-validation evaluates a model by using multiple data splits to assess performance, which matches the goal of estimating generalization with limited data.
C. Increase the number of model features before measuring performance on the same data
Adding features changes the model specification, not the evaluation method.
D. Use a single random split one time and treat that result as fully representative
Cross-validation uses multiple splits rather than relying on one split alone.