Question 31
Domain 4: NVIDIA Platform Implementation and Production OperationsWhat rate limiting implementation works best?
Correct answer: A
Explanation
Per-API token bucket rate limiters are well suited because they allow bursts while enforcing an average request rate, which is the standard behavior of token bucket algorithms. Adding request queuing and priority queues ensures important requests are served first, and adaptive rate adjustment based on API responses lets the system respond to throttling signals and avoid repeated limit violations.
Why each option is right or wrong
A. Implement per-API token bucket rate limiters with request queuing, priority queues for important requests, and adaptive rate adjustment based on API responses.
Per-API token bucket control is the appropriate mechanism here because RFC 2697/2698-style token bucket policing permits short bursts while enforcing a sustained average rate, which matches API throttling behavior better than a fixed-window cap. Adding a queue with priority handling is justified where some calls are latency-sensitive, and adaptive backoff based on 429 responses or Retry-After headers is the standard response to server-side rate limiting signals, preventing repeated violations and wasted retries.
B. Add fixed delays between all API calls (1 second per call).
C. Catch rate limit errors and retry after delay.
D. Use a single global rate limiter for all APIs.