Question 31
Domain 3: Implement Generative AI SolutionsYour company needs an Azure OpenAI deployment that guarantees maximum of 2-second response times for a high-volume customer-facing application processing 500 requests per minute. Which deployment type should you use?
Correct answer: C
Explanation
Provisioned Throughput Unit (PTU) is the right choice because it provides reserved capacity for Azure OpenAI, which is designed for predictable performance under heavy load. For a customer-facing app needing a maximum 2-second response time at 500 requests per minute, reserved throughput helps ensure consistent latency and avoids shared-capacity contention.
Why each option is right or wrong
A. Standard (S1) — pay-per-call
B. Global Standard — automatic routing
C. Provisioned Throughput Unit (PTU) — reserved capacity
Azure OpenAI’s Provisioned Throughput Unit (PTU) deployment is the reserved-capacity model intended for predictable, low-latency production traffic, whereas standard deployments are subject to shared-capacity variability and throttling. For a workload at 500 requests per minute with a hard 2-second latency target, the exam expects the choice that explicitly reserves model throughput rather than relying on best-effort capacity.
D. Serverless API — consumption-based