Question 2
Domain 6: Evaluation and MonitoringA RAG app is accurate but slow and expensive because it retrieves too many chunks and uses a very large prompt for every question. What should the engineer tune first?
Correct answer: A
Explanation
The exam guide emphasizes tuning RAG efficiency by choosing the right retrieval and prompt settings: it lists “Apply a chunking strategy,” “Use tools and metrics to evaluate retrieval performance,” and “Use Databricks features to control LLM costs.” Reducing retrieved chunks and prompt size lowers latency and cost while preserving quality through measurement.
Why each option is right or wrong
A. Optimize retrieval count, chunk relevance, prompt size, and model choice while measuring quality
The exam guide’s RAG objectives point first to retrieval and context control: it explicitly tests “use tools and metrics to evaluate retrieval performance,” “select chunking strategy based on model & retrieval evaluation,” and “use Databricks features to control LLM costs.” In a case where the app is already accurate but slow and expensive, the first lever is to reduce unnecessary retrieved context and prompt tokens, then validate the impact with retrieval metrics rather than changing the whole system blindly.
B. Add more unrelated chunks to improve coverage
Extra unrelated chunks increase context size and usually worsen latency, cost, and retrieval noise.
C. Remove all evaluation so the app feels faster
Evaluation is needed to measure retrieval performance and quality; removing it hides regressions.
D. Force every user query through a manual review queue
Manual review is a governance workflow, not the first optimization for RAG retrieval cost or speed.