Question 8
UnclassifiedWhich technique is most appropriate to handle a categorical feature with thousands of unique values (high cardinality)?
Correct answer: B
Explanation
High-cardinality categorical features are difficult to one-hot encode because they create "thousands of unique values" and a very sparse matrix. Target encoding with smoothing compresses categories into informative numeric values, while hashing or embeddings reduce dimensionality and memory use, making them practical for many unique levels.
Why each option is right or wrong
A. One-hot encoding
B. Target encoding with smoothing or hashing/embedding
One-hot encoding would create one binary column per level, so with thousands of categories it explodes the feature space and produces an extremely sparse matrix. A more suitable approach is target encoding with smoothing, which replaces each category with a regularized estimate of the target mean, or hashing/embeddings, which compress the levels into a fixed, low-dimensional representation and avoid the memory and sparsity burden.
C. Min-max normalization
D. Removing the feature