Generative AI Leader Practice Q11

A. Pick the prompt with the longest answers

Longer answers are not necessarily more accurate, helpful, or safer for support use.

B. Choose the prompt that uses the most technical words

Technical wording measures style, not whether the answer actually solves the customer problem.

C. Use representative test cases, quality criteria, human review, and error tracking

The strongest evaluation design is the one that combines a realistic test set with explicit scoring criteria and a human-in-the-loop review, because prompt quality is not a single numeric metric and must be judged against the actual support scenarios the model will face. In practice, this means using representative cases, defined quality dimensions, and tracking failure modes over time so the comparison is based on observed performance rather than subjective impressions; there is no governing statute here, so the controlling standard is methodological rigor rather than a legal rule.

D. Avoid evaluation because prompts cannot be tested

Prompts can be evaluated with test sets, rubrics, and reviewers like other AI outputs.

Question 11

Explanation

Why each option is right or wrong