Question 2
Content Domain 2: Exploratory Data AnalysisA company uses camera images of the tops of items displayed on store shelves to determine which items were removed and which ones still remain. After several hours of data labeling, the company has a total of 1,000 hand-labeled images covering 10 distinct items. The training results were poor. Which machine learning approach fulfills the company's long-term needs?
Correct answer: D
Explanation
With only "1,000 hand-labeled images covering 10 distinct items," the model likely lacks enough varied examples to learn robust shelf-item recognition. Augmenting each item with "image variants like inversions and translations" increases training diversity, which helps the model generalize better and supports an iterative build-and-improve approach for long-term performance.
Why each option is right or wrong
A. Convert the images to grayscale and retrain the model.
Grayscale removes color information and does not solve the lack of diverse training examples.
B. Reduce the number of distinct items from 10 to 2, build the model, and iterate.
Reducing classes changes the business objective instead of improving recognition of all required items.
C. Attach different colored labels to each item, take the images again, and build the model.
Colored labels alter the real-world input and create an artificial shortcut, not robust item recognition.
D. Augment training data for each item using image variants like inversions and translations, build the model, and iterate.
The dataset is too small for a 10-class vision problem: 1,000 labeled images means only about 100 examples per item on average, which is typically insufficient for a robust classifier to learn viewpoint and placement variation from shelf-top images. Using data augmentation under the same labels—such as translations and inversions—expands the effective training set without new labeling, and an iterative train-evaluate-retrain cycle is the standard way to improve performance when the initial model underfits due to limited data.