Study Guide

Designing and Implementing a Data Science Solution on Azure Study Guide

Use the saved domain outline to connect design and prepare a machine learning solution, explore data and run experiments, train and evaluate models, deploy and operationalize machine learning solutions to scenario-based questions and explanations.

Download App Free Practice Exam Key Terms Glossary

How the Exam Is Structured

Designing and Implementing a Data Science Solution on Azure (DP-100) validates design and prepare a machine learning solution, explore data and run experiments, train and evaluate models, deploy and operationalize machine learning solutions. The ExamPal practice bank includes 175 premium questions and 40 free questions mapped across the official blueprint.

Domain	Weight	Focus
Domain 1: Design and prepare a machine learning solution	20%	Task 1.1: Design an Azure Machine Learning workspace solution; Select workspace architecture
Domain 2: Explore data and run experiments	25%	Task 2.1: Ingest and profile data; Load data into tools
Domain 3: Train and evaluate models	20%	Task 3.1: Select evaluation metrics for model type; Use classification metrics
Domain 4: Deploy and operationalize machine learning solutions	20%	Task 4.1: Prepare models for deployment; Register models and dependencies
Domain 5: Monitor, retrain, and manage ML lifecycle	15%	Task 5.1: Monitor deployed models and endpoints; Track service performance

20% of exam

Domain 1: Design and prepare a machine learning solution

Covers the foundational Azure Machine Learning workspace, security, compute, environment, and data setup needed to build ML solutions. This domain emphasizes selecting the right workspace architecture and resources, managing access and governance, and preparing reusable development assets for experiments and pipelines.

Task 1.1: Design an Azure Machine Learning workspace solution

Select workspace architecture

Plan supporting Azure resources

Choose implementation interface

Task 1.2: Configure security, access, and governance

Configure role-based access control

Manage secrets and keys securely

25% of exam

Domain 2: Explore data and run experiments

Covers data ingestion, preparation, splitting, training, tuning, and experiment tracking. This domain focuses on the practical workflow of preparing data, running models, and comparing results in Azure Machine Learning.

Task 2.1: Ingest and profile data

Load data into tools

Examine schema and statistics

Identify data quality issues

Task 2.2: Prepare and transform data for modeling

Clean missing or invalid values

Encode categorical variables

20% of exam

Domain 3: Train and evaluate models

Covers selecting evaluation metrics, diagnosing model fit issues, interpreting model behavior, improving performance, and assessing responsible AI considerations. This domain focuses on evaluating model quality and trustworthiness before deployment.

Task 3.1: Select evaluation metrics for model type

Use classification metrics

Use regression metrics

Use clustering metrics

Align metrics with business goals

Task 3.2: Diagnose underfitting and overfitting

Compare training and validation results

20% of exam

Domain 4: Deploy and operationalize machine learning solutions

Covers preparing models for deployment, serving real-time and batch inference, managing inference environments, and integrating deployed models with applications. This domain emphasizes operational readiness, endpoint configuration, and deployment lifecycle management.

Task 4.1: Prepare models for deployment

Create scoring scripts

Package inference assets

Task 4.2: Deploy real-time inference endpoints

Deploy to online or Kubernetes targets

Select deployment settings

15% of exam

Domain 5: Monitor, retrain, and manage ML lifecycle

Covers monitoring deployed services, detecting drift and degradation, automating retraining, managing versioned assets, and supporting collaboration practices. This domain focuses on sustaining ML solutions in production with governance, reproducibility, and MLOps discipline.

Task 5.1: Monitor deployed models and endpoints

Track service performance

Collect logs and diagnostics

Emit custom metrics

Task 5.2: Detect data drift and model degradation

Monitor incoming data drift

Compare production and baseline data

Key Terms to Know

These terms are loaded from the shared terminology pack and appear across the question explanations.

Azure ML Python SDK v2: The version 2 Python software development kit used to interact programmatically with Azure Machine Learning resources.
Azure ML component: A reusable, versionable unit of work in Azure ML pipelines that encapsulates code, environment, inputs, and outputs.
Azure ML workspace: The central Azure Machine Learning resource that stores assets, runs, compute targets, and configuration for ML projects.
Azure Machine Learning Designer: A visual interface in Azure ML used to build, configure, and run machine learning pipelines without extensive coding.
Conda configuration file: A YAML file that specifies Conda packages and dependencies required for an ML environment.
Environment: An Azure ML SDK v2 class used to define the software environment, including dependencies, for training or inference.
Import Data: A Designer component used to bring external data, such as a CSV file from a website, into a pipeline.
MLOps: A set of practices for automating, managing, deploying, monitoring, and retraining machine learning systems.
MLTable: An Azure Machine Learning data asset format used to define tabular or file-based datasets for ML workflows.
MLflow: An open-source platform for tracking experiments, logging metrics and artifacts, and managing ML lifecycle tasks.
ParallelRunStep: An Azure ML pipeline step used for scalable parallel batch inference over large datasets.
YAML: A human-readable configuration format commonly used to define Azure ML components and pipeline settings.
artifact: A file or folder produced or used by an ML run, such as images, models, or output datasets.
autoscaling: The ability of a compute resource to automatically increase or decrease the number of nodes based on workload demand.
binary classification: A supervised learning task in which a model predicts one of two possible classes.
compute cluster: An Azure ML compute target made up of multiple nodes that can run training or batch workloads.
conda_file: A parameter used when creating an Azure ML environment from a Conda YAML specification.
config.json: A workspace configuration file that stores connection details needed for SDK code to connect to Azure Machine Learning.

Official Materials and Guidance

This page is built from Microsoft official materials and ExamPal shared release pack, the shared syllabus, topic tree, terminology pack, free pack, and premium pack.

-Guidance: Microsoft Learn study guide, practice assessment, sandbox
-Domain outline: Design/prepare ML solution 20-25%; Explore data/run experiments 20-25%; Train/deploy models 25-30%; Optimize language models for AI apps 25-30%.

Download App Official source Start Free Practice Exam