Question 26
Domain 3: Deployment and Orchestration of ML WorkflowsA company has an existing Amazon SageMaker model (v1) on a production endpoint. The company develops a new model version (v2) and needs to test v2 in production before substituting v2 for v1. The company needs to implement a solution to minimize the risk of v2 generating incorrect output in production. The solution must prevent any disruption of production traffic during the change to v2. Which solution will meet these requirements?
Correct answer: D
Explanation
Amazon SageMaker shadow variants let you "test new model versions in production" by sending a copy of traffic to v2 while keeping v1 serving live requests, so there is "no disruption of production traffic." Sampling 100% of inference requests and storing outputs in Amazon S3 provides full comparison data to evaluate v2 before promoting it to production.
Why each option is right or wrong
A. Create a second production variant for v2. Assign 1% of the traffic to v2 and 99% of the traffic to v1. Collect all the output of v2 in an Amazon S3 bucket. If v2 performs as expected, switch all the traffic to v2.
Sending 1% live traffic to v2 exposes real users to unvalidated predictions.
B. Create a second production variant for v2. Assign 10% of the traffic to v2 and 90% of the traffic to v1. Collect all the output of v2 in an Amazon S3 bucket. If v2 performs as expected, switch all the traffic to v2.
Sending 10% live traffic to v2 increases customer risk instead of isolating testing.
C. Deploy v2 to a new endpoint. Turn on data capturing for the production endpoint. Write a script to pass 100% of input data to v2. If v2 performs as expected, deactivate the v1 endpoint and direct the traffic to v2.
A separate endpoint with scripted replay adds operational complexity and is not true no-impact shadow testing.
D. Deploy v2 into a shadow variant that samples 100% of the inference requests. Collect all the output in an Amazon S3 bucket. If v2 performs as expected, promote v2 to production.
Amazon SageMaker shadow variants are designed for exactly this use case: they mirror live inference traffic to a candidate model while the existing production variant continues serving all responses, so the endpoint behavior for users is unchanged. In SageMaker, you can route 100% of the requests as shadow traffic to v2 and log the outputs to Amazon S3 for offline comparison; only after validating the results would you shift production traffic, avoiding any interruption during the test phase.