NCA-GENL Practice Q7

A. Reduces model size

B. Optimizes throughput by combining requests

Triton Inference Server’s dynamic batching feature queues incoming inference requests and forms them into batches at runtime, which increases server utilization and raises requests-per-second under load. In the Triton documentation, this is the stated purpose of dynamic batching: to improve throughput by combining multiple requests into a single batch before execution, rather than processing each request separately.

C. Improves model accuracy

D. Reduces latency for all requests

Question 7

Explanation

Why each option is right or wrong