Question 40
Content Domain 3: ModelingA retail company intends to use machine learning to categorize new products. A labeled dataset of current products was provided to the Data Science team. The dataset includes 1,200 products. The labeled dataset has 15 features for each product such as title dimensions, weight, and price. Each product is labeled as belonging to one of six categories such as books, games, electronics, and movies. Which model should be used for categorizing new products using the provided dataset for training?
Correct answer: A
Explanation
This is a supervised multiclass classification problem because the dataset is labeled and each product belongs to one of six categories. XGBoost supports multiclass classification with the objective "multi:softmax," which returns the predicted class label for each new product.
Why each option is right or wrong
A. AnXGBoost model where the objective parameter is set to multi:softmax.
The dataset is labeled and the target has 6 discrete classes, so the training task is multiclass supervised classification rather than regression or clustering. Under XGBoost, the multiclass objective is specified with `objective = 'multi:softmax'`, which is the correct setting when you want the model to output a single class label for each product; it is used with `num_class = 6` to match the six categories.
B. A deep Convolutional Neural Network (CNN) with a softmax activation function for the last layer.
CNNs are mainly suited to image-like or spatial data, not small tabular feature sets.
C. A regression forest where the number of trees is set equal to the number of product categories.
Regression forests predict continuous values, while product category is a discrete class label.
D. A DeepAR forecasting model based on a Recurrent Neural Network (RNN).
DeepAR is for time-series forecasting, not assigning categories from labeled product attributes.