Question 17
Domain 2: ML Model DevelopmentAn ML engineer is developing a fraud detection model by using the Amazon SageMaker XGBoost algorithm. The model classifies transactions as either fraudulent or legitimate. During testing, the model excels at identifying fraud in the training dataset. However, the model is inefficient at identifying fraud in new and unseen transactions. What should the ML engineer do to improve the fraud detection for new transactions?
Correct answer: D
Explanation
Reducing "max_depth" limits how complex each tree can become, which helps prevent the model from memorizing the training data. This lowers overfitting, improving generalization so the XGBoost model can identify fraud more effectively in "new and unseen transactions."
Why each option is right or wrong
A. Increase the learning rate.
Higher learning rate changes update speed, but does not directly reduce tree complexity causing overfitting.
B. Remove some irrelevant features from the training dataset.
Feature removal can help sometimes, but irrelevant features are not the clearest issue indicated here.
C. Increase the value of the max_depth hyperparameter.
Greater max_depth makes trees more complex and typically worsens overfitting on unseen data.
D. Decrease the value of the max_depth hyperparameter.
Amazon SageMaker’s XGBoost uses the `max_depth` hyperparameter to cap the depth of each decision tree; deeper trees create more splits and can fit noise in the training set. In this scenario, the model performs well on training data but poorly on unseen transactions, which is the classic overfitting pattern, so lowering `max_depth` is the appropriate control to reduce model complexity and improve generalization to new fraud cases.