NCA-GENL Practice Q27

A. Model accuracy only

B. Balancing performance, latency, and computational costs

In production deployments, the governing constraint is not a single metric but the joint optimization of model quality, response time, and infrastructure spend; the practical target is to keep inference within acceptable latency while avoiding unnecessary GPU/CPU utilization. Because no statute or regulation applies here, the correct choice is the one that reflects the standard engineering tradeoff among accuracy, latency, and compute cost rather than maximizing any one dimension in isolation.

C. Using the largest available model

D. Maximum throughput regardless of cost

Question 27

Explanation

Why each option is right or wrong