Study Guide
Designing and Implementing a Data Science Solution on Azure Study Guide
Use the saved domain outline to connect design and prepare a machine learning solution, explore data and run experiments, train and evaluate models, deploy and operationalize machine learning solutions to scenario-based questions and explanations.
How the Exam Is Structured
Designing and Implementing a Data Science Solution on Azure (DP-100) validates design and prepare a machine learning solution, explore data and run experiments, train and evaluate models, deploy and operationalize machine learning solutions. The ExamPal practice bank includes 175 premium questions and 40 free questions mapped across the official blueprint.
| Domain | Weight | Focus |
|---|---|---|
| Domain 1: Design and prepare a machine learning solution | 20% | Task 1.1: Design an Azure Machine Learning workspace solution; Select workspace architecture |
| Domain 2: Explore data and run experiments | 25% | Task 2.1: Ingest and profile data; Load data into tools |
| Domain 3: Train and evaluate models | 20% | Task 3.1: Select evaluation metrics for model type; Use classification metrics |
| Domain 4: Deploy and operationalize machine learning solutions | 20% | Task 4.1: Prepare models for deployment; Register models and dependencies |
| Domain 5: Monitor, retrain, and manage ML lifecycle | 15% | Task 5.1: Monitor deployed models and endpoints; Track service performance |
20% of exam
Domain 1: Design and prepare a machine learning solution
Covers the foundational Azure Machine Learning workspace, security, compute, environment, and data setup needed to build ML solutions. This domain emphasizes selecting the right workspace architecture and resources, managing access and governance, and preparing reusable development assets for experiments and pipelines.
25% of exam
Domain 2: Explore data and run experiments
Covers data ingestion, preparation, splitting, training, tuning, and experiment tracking. This domain focuses on the practical workflow of preparing data, running models, and comparing results in Azure Machine Learning.
20% of exam
Domain 3: Train and evaluate models
Covers selecting evaluation metrics, diagnosing model fit issues, interpreting model behavior, improving performance, and assessing responsible AI considerations. This domain focuses on evaluating model quality and trustworthiness before deployment.
20% of exam
Domain 4: Deploy and operationalize machine learning solutions
Covers preparing models for deployment, serving real-time and batch inference, managing inference environments, and integrating deployed models with applications. This domain emphasizes operational readiness, endpoint configuration, and deployment lifecycle management.
15% of exam
Domain 5: Monitor, retrain, and manage ML lifecycle
Covers monitoring deployed services, detecting drift and degradation, automating retraining, managing versioned assets, and supporting collaboration practices. This domain focuses on sustaining ML solutions in production with governance, reproducibility, and MLOps discipline.
Key Terms to Know
These terms are loaded from the shared terminology pack and appear across the question explanations.
- Azure ML Python SDK v2
- The version 2 Python software development kit used to interact programmatically with Azure Machine Learning resources.
- Azure ML component
- A reusable, versionable unit of work in Azure ML pipelines that encapsulates code, environment, inputs, and outputs.
- Azure ML workspace
- The central Azure Machine Learning resource that stores assets, runs, compute targets, and configuration for ML projects.
- Azure Machine Learning Designer
- A visual interface in Azure ML used to build, configure, and run machine learning pipelines without extensive coding.
- Conda configuration file
- A YAML file that specifies Conda packages and dependencies required for an ML environment.
- Environment
- An Azure ML SDK v2 class used to define the software environment, including dependencies, for training or inference.
- Import Data
- A Designer component used to bring external data, such as a CSV file from a website, into a pipeline.
- MLOps
- A set of practices for automating, managing, deploying, monitoring, and retraining machine learning systems.
- MLTable
- An Azure Machine Learning data asset format used to define tabular or file-based datasets for ML workflows.
- MLflow
- An open-source platform for tracking experiments, logging metrics and artifacts, and managing ML lifecycle tasks.
- ParallelRunStep
- An Azure ML pipeline step used for scalable parallel batch inference over large datasets.
- YAML
- A human-readable configuration format commonly used to define Azure ML components and pipeline settings.
- artifact
- A file or folder produced or used by an ML run, such as images, models, or output datasets.
- autoscaling
- The ability of a compute resource to automatically increase or decrease the number of nodes based on workload demand.
- binary classification
- A supervised learning task in which a model predicts one of two possible classes.
- compute cluster
- An Azure ML compute target made up of multiple nodes that can run training or batch workloads.
- conda_file
- A parameter used when creating an Azure ML environment from a Conda YAML specification.
- config.json
- A workspace configuration file that stores connection details needed for SDK code to connect to Azure Machine Learning.
Official Materials and Guidance
This page is built from Microsoft official materials and ExamPal shared release pack, the shared syllabus, topic tree, terminology pack, free pack, and premium pack.
- -Guidance: Microsoft Learn study guide, practice assessment, sandbox
- -Domain outline: Design/prepare ML solution 20-25%; Explore data/run experiments 20-25%; Train/deploy models 25-30%; Optimize language models for AI apps 25-30%.