Question 37
Domain 3: Implement Generative AI SolutionsA company evaluates their RAG-based chatbot. Internal testers rate answers as relevant and grounded, but real users report the answers don't feel natural to read. Which evaluation metric is lowest?
Correct answer: C
Explanation
Coherence/fluency is the metric for whether an answer “feel[s] natural to read,” so it would be lowest when users say the chatbot’s responses are awkward even if they are relevant and grounded. Relevance and groundedness measure content and support, not readability or naturalness.
Why each option is right or wrong
A. Groundedness
B. Relevance
C. Coherence / Fluency
The user complaint targets readability and naturalness, which is what coherence/fluency measures in RAG evaluation: whether the response is well-formed, easy to read, and sounds natural. Since internal testers already found the answers relevant and grounded, those content/support metrics are not the weak point here; the lowest score would be on the fluency dimension, not relevance or factual grounding.
D. Similarity