Question 5
Domain 1: Data Preparation for Machine Learning (ML)A company wants to predict the success of advertising campaigns by considering the color scheme of each advertisement. An ML engineer is preparing data for a neural network model. The dataset includes color information as categorical data. Which technique for feature engineering should the ML engineer use for the model?
Correct answer: D
Explanation
One-hot encoding is used for categorical features because it converts each category into its own binary indicator, which neural networks can process as numeric input. For color categories, this transforms the color scheme into a binary matrix, avoiding any false ordinal relationship between colors.
Why each option is right or wrong
A. Apply label encoding to the color categories. Automatically assign each color a unique integer.
Label encoding imposes artificial numeric order on colors, which can mislead the neural network.
B. Implement padding to ensure that all color feature vectors have the same length.
Padding is for making variable-length sequences uniform, not for encoding categorical color values.
C. Perform dimensionality reduction on the color categories.
Dimensionality reduction compresses existing numeric features; it does not properly encode raw categorical labels first.
D. One-hot encode the color categories to transform the color scheme feature into a binary matrix.
Categorical color values must be converted into numeric inputs before a neural network can process them, and one-hot encoding is the standard method under this circumstance because each distinct category becomes its own 0/1 indicator column. That avoids imposing any artificial ordering or distance between colors, which would happen with label encoding; the resulting binary matrix is therefore the appropriate feature-engineering choice for this dataset.