Study Guide

AWS Certified Machine Learning Engineer - Associate Study Guide

Use the official AWS domain outline to connect Data preparation, ML model development, deployment and orchestration, monitoring, maintenance, and ML security to scenario-based questions and explanations.

Download App Free Practice Exam Key Terms Glossary

How the Exam Is Structured

AWS Certified Machine Learning Engineer - Associate (MLA-C01) validates Data preparation, ML model development, deployment and orchestration, monitoring, maintenance, and ML security. The ExamPal practice bank includes 168 premium questions and 40 free questions mapped across the official blueprint.

Domain	Weight	Focus
Domain 1: Data Preparation for Machine Learning (ML)	28%	Task 1.1: Ingest and store data; Data formats and ingestion mechanisms (CSV, JSON, Parquet, ORC, Avro, RecordIO)
Domain 2: ML Model Development	26%	Task 2.1: Choose a modeling approach; ML problem framing: classification, regression, clustering, anomaly detection, recommendation, forecasting
Domain 3: Deployment and Orchestration of ML Workflows	22%	Task 3.1: Select deployment infrastructure based on existing architecture and requirements; SageMaker endpoint types: real-time, serverless, asynchronous, batch transform
Domain 4: ML Solution Monitoring, Maintenance, and Security	24%	Task 4.1: Monitor model performance and data quality; SageMaker Model Monitor: data quality drift, model quality drift, bias drift, feature attribution drift

28% of exam

Domain 1: Data Preparation for Machine Learning (ML)

Covers the end-to-end preparation of data for ML workloads, including ingestion, storage, transformation, feature engineering, quality checks, bias handling, splitting, and labeling. This domain emphasizes selecting the right AWS services and data-processing patterns to produce reliable training and evaluation datasets.

Task 1.1: Ingest and store data

Data formats and ingestion mechanisms (CSV, JSON, Parquet, ORC, Avro, RecordIO)

AWS storage options for ML workloads: Amazon S3 (Standard, Intelligent-Tiering, Glacier classes), Amazon EBS, Amazon EFS, Amazon FSx for Lustre (for high-throughput training reads)

Task 1.2: Transform data and perform feature engineering

Data cleaning, normalization, encoding (one-hot, target, ordinal), binning, imputation

Feature engineering: aggregations, time-window features, embeddings, derived features

Task 1.3: Ensure data integrity and prepare data for modeling

26% of exam

Domain 2: ML Model Development

Covers selecting modeling approaches, training and refining models, and evaluating performance across common ML and NLP tasks. This domain emphasizes SageMaker training, tuning, transfer learning, and the use of appropriate metrics and analysis tools.

Task 2.1: Choose a modeling approach

ML problem framing: classification, regression, clustering, anomaly detection, recommendation, forecasting

Algorithm selection: linear regression, logistic regression, XGBoost, k-means, RCF, DeepAR, BERT, neural networks

Task 2.2: Train and refine models

SageMaker training jobs (spot training, distributed training, pipe mode vs file mode)

Hyperparameter tuning: SageMaker Automatic Model Tuning (Bayesian, random, grid, Hyperband)

Task 2.3: Analyze model performance

22% of exam

Domain 3: Deployment and Orchestration of ML Workflows

Covers deployment choices, infrastructure scripting, workflow orchestration, and CI/CD for ML solutions. This domain includes SageMaker endpoint patterns, IaC tools, pipeline orchestration, and release strategies for safe model deployment.

Task 3.1: Select deployment infrastructure based on existing architecture and requirements

SageMaker endpoint types: real-time, serverless, asynchronous, batch transform

Multi-model endpoints, multi-container endpoints, inference pipelines

Task 3.2: Create and script infrastructure based on existing architecture and requirements

Infrastructure as Code for ML: AWS CloudFormation, AWS CDK, SageMaker Projects

SageMaker Pipelines for ML workflows (preprocessing → training → evaluation → deployment)

Task 3.3: Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines

24% of exam

Domain 4: ML Solution Monitoring, Maintenance, and Security

Covers monitoring model and infrastructure health, optimizing cost and resource usage, and securing ML systems on AWS. This domain includes drift detection, endpoint observability, IAM, encryption, network isolation, documentation, secrets management, and compliance logging.

Task 4.1: Monitor model performance and data quality

SageMaker Model Monitor: data quality drift, model quality drift, bias drift, feature attribution drift

Concept drift vs data drift detection patterns

Task 4.2: Monitor and optimize infrastructure and costs

Cost optimization: spot training, savings plans, right-sizing instance types

AWS Cost Explorer, AWS Budgets for ML cost tracking

Task 4.3: Secure AWS resources

Key Terms to Know

These terms are loaded from the shared terminology pack and appear across the question explanations.

AWS AI Service Cards: AWS documentation artifacts that provide information about AI services and their intended use.
AWS Budgets: An AWS service used to set and monitor cost budgets and track spending.
AWS CDK: The AWS Cloud Development Kit, an Infrastructure as Code tool.
AWS CloudFormation: An AWS Infrastructure as Code service.
AWS Cost Explorer: An AWS service used to analyze and track cloud costs.
AWS Data Migration Service (DMS): An AWS data ingestion service listed for ML workloads.
AWS Glue: An AWS data ingestion and ETL service used for ML data preparation and transformation.
AWS Glue Data Quality: An AWS service for assessing data quality.
AWS Glue DataBrew: A visual, no-code data preparation tool that uses recipes for data prep.
AWS Glue ETL: An AWS Glue-based extract-transform-load capability used for data transformation.
AWS IoT Greengrass: An AWS service used for edge deployment.
AWS Lambda: An AWS service used for lightweight data transforms.
Amazon AppFlow: An AWS data ingestion service listed for ML workloads.
Amazon Bedrock foundation models: Foundation models available through Amazon Bedrock, listed as an option alongside built-in, pre-trained, and custom models.
Amazon CloudWatch: An AWS monitoring service used here to track machine learning endpoint metrics such as latency, error rate, invocations, and model latency.
Amazon EBS: An AWS storage option listed for ML workloads.
Amazon EFS: An AWS storage option listed for ML workloads.
Amazon EMR: An AWS service listed for distributed data transformation.

Official Materials and Guidance

This page is built from AWS MLA-C01 official exam guide, the shared syllabus, topic tree, terminology pack, free pack, and premium pack.

-AWS Mla c01 Exam Guide

Download App Official source Start Free Practice Exam