Question 11
Domain 2: Explore data and run experimentsYou are analyzing a numerical dataset that contains missing values in several columns. You need to clean the missing values using an appropriate operation without changing the dimensionality of the feature set, and you want to preserve the full dataset as much as possible. A proposed solution is to remove the entire column that contains the missing data point. Does this solution meet the goal?
Correct answer: A
Explanation
No. Removing an entire column changes the feature set dimensionality, which conflicts with the requirement to "clean the missing values" without changing the dataset structure. It also discards potentially useful data, so it does not "preserve the full dataset as much as possible."
Why each option is right or wrong
A. No; removing an entire column changes the feature set dimensionality and discards potentially useful data instead of preserving the full dataset.
No statute, code section, or regulation applies here because this is a data-preprocessing question rather than a legal one. Removing a whole column reduces the number of features from the original dimensionality, so it fails the stated requirement to clean missing values without altering the feature set; in practice, that also throws away all non-missing observations in that column instead of retaining them. A more appropriate operation would target only the missing entries or use an imputation method, preserving the rest of the dataset.
B. Yes; dropping the column is always the best way to handle missing values because it keeps the remaining data unchanged.