ML Associate Exam Prep

ML Associate Exam Glossary - 51 Terms

Search the terminology pack for Databricks Certified Machine Learning Associate. Use these definitions with the study guide and practice questions.

A

alias
A label used to promote a challenger model to a champion model.
AutoML
A Databricks capability that helps facilitate model and feature selection and is described as improving the model development process.

B

batch inference
Inference performed in batches rather than one record at a time.
bias-variance tradeoff
The tradeoff between model bias and variance that affects model performance.

C

challenger model
A candidate model evaluated against the champion model in a champion-challenger setup.
champion model
The preferred production model in a champion-challenger setup.
class imbalance
A training-data condition where one class is represented much less frequently than another, potentially biasing the model toward the majority class.
cost-sensitive learning
A training approach that assigns higher misclassification cost to a minority class to mitigate class imbalance.
cross-validation
A model evaluation and fitting technique used to compare with train-validation split and to train multiple models during tuning.

D

Databricks
A platform used in the exam context to perform machine learning tasks and work with tools such as AutoML, Unity Catalog, MLflow, and Delta Live Tables.
Databricks Certified Machine Learning Associate
A Databricks certification exam that assesses the ability to use Databricks for basic machine learning tasks, including data exploration, feature engineering, model training, tuning, evaluation, and deployment.
dbutils
A Databricks utility referenced for obtaining data summaries.
Delta Live Tables
A Databricks data pipeline feature used in the exam for data management and for streaming inference.

F

F1
A classification metric used in the exam.
feature engineering
The process of creating, transforming, and preparing input features for machine learning models.
feature store
A Databricks feature management capability used to create feature store tables, write data to them, and train or score models using their features.
feature store table
A table in Unity Catalog used to store and govern features for machine learning models; it can be created, written to, and used for training or scoring models.
fmin
A Hyperopt operation used to tune a model's hyperparameters.

G

GridSearchCV
A scikit-learn tool for exhaustive hyperparameter search with cross-validation.

H

Hyperopt
A hyperparameter tuning library referenced in the exam, including its fmin operation.

I

IQR
Interquartile range, used in the exam as one method for removing outliers from a Spark DataFrame.

L

Log Loss
A classification metric used in the exam.
log scale transformation
A transformation applied when a logarithmic scale is appropriate for the data or scenario.

M

MAE
Mean absolute error, a regression metric used in the exam.
ML runtimes
Databricks machine learning runtime environments whose advantages are tested in the exam.
MLflow
A machine learning lifecycle tool referenced in the exam for logging metrics, artifacts, and models, inspecting runs in the UI, and registering models through its client API.
MLflow Client API
The programmatic interface used to identify the best run, log metrics, artifacts, and models, and register models in the Unity Catalog registry.
MLflow Run
A single execution record in MLflow where metrics, artifacts, and models can be logged manually.
MLflow UI
The user interface in MLflow where information about runs and related model-development details can be viewed.
MLOps
A machine learning operations strategy whose best practices are identified in the exam.
model endpoint
A deployed endpoint used to serve a model for inference.

O

offline feature table
A feature table designed for offline use cases; the exam contrasts it with online feature tables.
one-hot encoding
A categorical encoding method used for categorical features; the exam asks when it is appropriate or not appropriate for certain model types or datasets.
online feature table
A feature table designed for online use cases; the exam contrasts it with offline feature tables.

P

pandas
A Python data library used in the exam to perform batch inference.

R

R-squared
A regression metric used in the exam.
realtime inference
Inference performed with low latency through a deployed model queried at runtime.
RMSE
Root mean squared error, a regression metric used in the exam.
ROC/AUC
A classification metric combining receiver operating characteristic and area under the curve, used in the exam.

S

scikit-learn
A Python machine learning library that candidates are expected to know at a working level.
SimpleImputer
A scikit-learn tool referenced for imputing missing values.
Spark DataFrame
A Spark data structure used in the exam for computing summary statistics, removing outliers, and other data-processing tasks.
Spark UDF
A Spark user-defined function used in the exam to apply logic within a streaming pipeline.
SparkML
A machine learning library for Spark that candidates are expected to know at a working level.
streaming inference
Inference performed on a stream of incoming data, referenced in the exam with Delta Live Tables.
Structured Streaming
A Spark streaming framework referenced as an alternative to Delta Live Tables for streaming workloads.
SVM
Support Vector Machine, a model type referenced in the exam question about grid search and cross-validation.

T

train-validation split
A data-splitting approach used for model fitting and compared against cross-validation in the exam.

U

Unity Catalog
A Databricks governance and data management layer used in the exam for feature store tables and model registry management at the account level.
Unity Catalog registry
The model registry in Unity Catalog used to register models; the exam contrasts it with the workspace registry.

W

workspace registry
The Databricks model registry at the workspace level, contrasted in the exam with the Unity Catalog registry.

About These Definitions

These definitions are loaded from the shared release pack. Use them with the study guide and practice questions to connect vocabulary to exam scenarios.