ML Associate Practice Q38

A. It creates a very wide sparse representation with limited modeling value

The exam objective explicitly flags one-hot encoding as appropriate only when the categorical domain is small enough to be represented efficiently; a ZIP-code field with tens of thousands of distinct categories explodes into tens of thousands of indicator columns. In practice that produces a mostly zero-valued sparse matrix, increasing memory and compute cost without adding much predictive signal, because adjacent ZIP codes are not inherently ordinal or meaningfully separable as individual dummy variables.

B. It prevents Spark from reading Delta tables

One-hot encoding affects feature representation, not Delta table reading.

C. It converts the column into a continuous feature automatically

One-hot encoding creates categorical indicator columns, not continuous features.

D. It makes missing values impossible to detect

Missing-value detection is separate from categorical encoding.

Question 38

Explanation

Why each option is right or wrong