Question 38
Content Domain 3: ModelingA data scientist is tuning model hyperparameters and wants to estimate how well each candidate model will generalize to unseen data while making efficient use of a limited dataset. Which technique should the data scientist use?
Correct answer: B
Explanation
Cross-validation estimates out-of-sample performance by repeatedly splitting available data into training and validation subsets. It is commonly used during hyperparameter optimization to compare candidate models using limited data. — Perform cross-validation.
Why each option is right or wrong
A. Train one model on the full dataset and select hyperparameters from that single fit
Cross-validation uses repeated data splits for evaluation, not a single fit on all available data.
B. Use cross-validation to evaluate candidate hyperparameter settings across data splits
The task is to perform hyperparameter optimization and estimate generalization with limited data. The provided material explicitly identifies cross-validation as the technique to perform for this purpose under Task 3.4.
C. Increase the number of model features until training accuracy no longer improves
Cross-validation is an evaluation approach based on data splitting, not feature expansion based on training accuracy.
D. Measure performance only on the training data after each hyperparameter change
Cross-validation assesses performance on validation splits rather than only on the training data.