Question 11
Domain 1: Fundamentals of AI and MLA company is developing an editorial assistant application that uses generative AI. During the pilot phase, usage is low and application performance is not a concern. The company cannot predict application usage after the application is fully deployed and wants to minimize application costs. Which solution will meet these requirements?
Correct answer: C
Explanation
Amazon Bedrock On-Demand Throughput fits unpredictable demand because it lets you pay only for model usage instead of reserving capacity. Since the company “cannot predict application usage” and wants to “minimize application costs,” this option avoids upfront commitment while meeting low pilot-phase needs.
Why each option is right or wrong
A. Use GPU-powered Amazon EC2 instances.
EC2 GPUs require you to provision and manage compute, risking idle cost at low usage.
B. Use Amazon Bedrock with Provisioned Throughput.
Provisioned Throughput is for predictable, sustained demand when reserving capacity makes sense.
C. Use Amazon Bedrock with On-Demand Throughput.
Amazon Bedrock On-Demand Throughput is the appropriate choice when traffic is uncertain because it has no provisioned-capacity commitment or minimum spend; you are billed only for the model invocations and tokens actually used. By contrast, Provisioned Throughput requires you to reserve model capacity for a fixed term, which adds cost and is unnecessary here because the pilot has low usage and performance is not a concern.
D. Use Amazon SageMaker JumpStart.
SageMaker JumpStart helps discover and deploy models, but not as the lowest-cost fit for unpredictable inference usage.