Question 4

Domain 1: Data Preparation for Machine Learning (ML)

Case study - An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3. The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data. Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data. Which solution will meet this requirement with the LEAST operational effort?

A. Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly. B. Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset. C. Use AWS Glue DataBrew built-in features to oversample the minority class. D. Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.

Previous Next

Question 4

Explanation

Why each option is right or wrong