Question 6
Domain 1: Data Preparation for Machine Learning (ML)An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets. Which solution will meet these requirements?
Correct answer: B
Explanation
AWS Glue is designed for ETL and data ingestion from sources like Amazon S3, so it fits the raw-data pipeline requirement. Amazon SageMaker Studio Classic supports building and managing ML workflows, including model deployment pipelines, which matches the deployment requirement.
Why each option is right or wrong
A. Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.
Amazon Data Firehose is mainly for streaming data delivery, not general ETL ingestion from S3 datasets.
B. Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.
AWS Glue is the managed ETL service for building data ingestion pipelines from Amazon S3, with crawlers and jobs that extract, transform, and load data without provisioning servers. For the ML side, Amazon SageMaker Studio Classic provides the integrated environment for creating and orchestrating SageMaker pipelines and deployment workflows, whereas services like Amazon EMR or AWS Data Pipeline are not the primary fit for end-to-end ML model deployment on AWS.
C. Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.
Amazon Redshift ML focuses on SQL-based model creation in Redshift, not building ingestion pipelines.
D. Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.
Amazon Athena is for querying data in S3, not creating ingestion pipelines; notebooks alone are not deployment pipelines.