MLA-C01 Exam Prep

Study Guide

AWS Certified Machine Learning Engineer - Associate Study Guide

Use the official AWS domain outline to connect Data preparation, ML model development, deployment and orchestration, monitoring, maintenance, and ML security to scenario-based questions and explanations.

How the Exam Is Structured

AWS Certified Machine Learning Engineer - Associate (MLA-C01) validates Data preparation, ML model development, deployment and orchestration, monitoring, maintenance, and ML security. The ExamPal practice bank includes 168 premium questions and 40 free questions mapped across the official blueprint.

DomainWeightFocus
Domain 1: Data Preparation for Machine Learning (ML) 28% Task 1.1: Ingest and store data; Data formats and ingestion mechanisms (CSV, JSON, Parquet, ORC, Avro, RecordIO)
Domain 2: ML Model Development 26% Task 2.1: Choose a modeling approach; ML problem framing: classification, regression, clustering, anomaly detection, recommendation, forecasting
Domain 3: Deployment and Orchestration of ML Workflows 22% Task 3.1: Select deployment infrastructure based on existing architecture and requirements; SageMaker endpoint types: real-time, serverless, asynchronous, batch transform
Domain 4: ML Solution Monitoring, Maintenance, and Security 24% Task 4.1: Monitor model performance and data quality; SageMaker Model Monitor: data quality drift, model quality drift, bias drift, feature attribution drift

28% of exam

Domain 1: Data Preparation for Machine Learning (ML)

Covers the end-to-end preparation of data for ML workloads, including ingestion, storage, transformation, feature engineering, quality checks, bias handling, splitting, and labeling. This domain emphasizes selecting the right AWS services and data-processing patterns to produce reliable training and evaluation datasets.

Task 1.1: Ingest and store data
Data formats and ingestion mechanisms (CSV, JSON, Parquet, ORC, Avro, RecordIO)
AWS storage options for ML workloads: Amazon S3 (Standard, Intelligent-Tiering, Glacier classes), Amazon EBS, Amazon EFS, Amazon FSx for Lustre (for high-throughput training reads)
Task 1.2: Transform data and perform feature engineering
Data cleaning, normalization, encoding (one-hot, target, ordinal), binning, imputation
Feature engineering: aggregations, time-window features, embeddings, derived features
Task 1.3: Ensure data integrity and prepare data for modeling

26% of exam

Domain 2: ML Model Development

Covers selecting modeling approaches, training and refining models, and evaluating performance across common ML and NLP tasks. This domain emphasizes SageMaker training, tuning, transfer learning, and the use of appropriate metrics and analysis tools.

Task 2.1: Choose a modeling approach
ML problem framing: classification, regression, clustering, anomaly detection, recommendation, forecasting
Algorithm selection: linear regression, logistic regression, XGBoost, k-means, RCF, DeepAR, BERT, neural networks
Task 2.2: Train and refine models
SageMaker training jobs (spot training, distributed training, pipe mode vs file mode)
Hyperparameter tuning: SageMaker Automatic Model Tuning (Bayesian, random, grid, Hyperband)
Task 2.3: Analyze model performance

22% of exam

Domain 3: Deployment and Orchestration of ML Workflows

Covers deployment choices, infrastructure scripting, workflow orchestration, and CI/CD for ML solutions. This domain includes SageMaker endpoint patterns, IaC tools, pipeline orchestration, and release strategies for safe model deployment.

Task 3.1: Select deployment infrastructure based on existing architecture and requirements
SageMaker endpoint types: real-time, serverless, asynchronous, batch transform
Multi-model endpoints, multi-container endpoints, inference pipelines
Task 3.2: Create and script infrastructure based on existing architecture and requirements
Infrastructure as Code for ML: AWS CloudFormation, AWS CDK, SageMaker Projects
SageMaker Pipelines for ML workflows (preprocessing → training → evaluation → deployment)
Task 3.3: Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines

24% of exam

Domain 4: ML Solution Monitoring, Maintenance, and Security

Covers monitoring model and infrastructure health, optimizing cost and resource usage, and securing ML systems on AWS. This domain includes drift detection, endpoint observability, IAM, encryption, network isolation, documentation, secrets management, and compliance logging.

Task 4.1: Monitor model performance and data quality
SageMaker Model Monitor: data quality drift, model quality drift, bias drift, feature attribution drift
Concept drift vs data drift detection patterns
Task 4.2: Monitor and optimize infrastructure and costs
Cost optimization: spot training, savings plans, right-sizing instance types
AWS Cost Explorer, AWS Budgets for ML cost tracking
Task 4.3: Secure AWS resources

Key Terms to Know

These terms are loaded from the shared terminology pack and appear across the question explanations.

AWS AI Service Cards
AWS documentation artifacts that provide information about AI services and their intended use.
AWS Budgets
An AWS service used to set and monitor cost budgets and track spending.
AWS CDK
The AWS Cloud Development Kit, an Infrastructure as Code tool.
AWS CloudFormation
An AWS Infrastructure as Code service.
AWS Cost Explorer
An AWS service used to analyze and track cloud costs.
AWS Data Migration Service (DMS)
An AWS data ingestion service listed for ML workloads.
AWS Glue
An AWS data ingestion and ETL service used for ML data preparation and transformation.
AWS Glue Data Quality
An AWS service for assessing data quality.
AWS Glue DataBrew
A visual, no-code data preparation tool that uses recipes for data prep.
AWS Glue ETL
An AWS Glue-based extract-transform-load capability used for data transformation.
AWS IoT Greengrass
An AWS service used for edge deployment.
AWS Lambda
An AWS service used for lightweight data transforms.
Amazon AppFlow
An AWS data ingestion service listed for ML workloads.
Amazon Bedrock foundation models
Foundation models available through Amazon Bedrock, listed as an option alongside built-in, pre-trained, and custom models.
Amazon CloudWatch
An AWS monitoring service used here to track machine learning endpoint metrics such as latency, error rate, invocations, and model latency.
Amazon EBS
An AWS storage option listed for ML workloads.
Amazon EFS
An AWS storage option listed for ML workloads.
Amazon EMR
An AWS service listed for distributed data transformation.

Official Materials and Guidance

This page is built from AWS MLA-C01 official exam guide, the shared syllabus, topic tree, terminology pack, free pack, and premium pack.

  • -AWS Mla c01 Exam Guide