Question 3
Content Domain 1: Data EngineeringA machine learning engineer is inventorying inputs for a new model and wants to begin with data that originates directly from the organization’s users rather than from derived or third-party collections. Which data source best fits that requirement?
Correct answer: C
Explanation
Primary sources are data sources that originate directly from the original source of information, such as user data, rather than from secondary or derived collections. — Identify data sources, for example, content and location, primary sources such as user data.
Why each option is right or wrong
A. A dataset organized by storage location within the repository
Location describes where data is found, not whether it originates directly from users.
B. A collection defined by its content category for downstream use
Content describes what the data contains, not whether it is first-hand user-originated data.
C. A set of records captured directly from user interactions
The source material identifies primary sources as including user data. Because these records come directly from user interactions, they match the requirement for data that originates from the original source rather than from derived or third-party collections.
D. A dataset assembled from previously processed internal reports
Primary sources are original data, not data compiled from processed reports.