Question 38
Domain 2: Core Machine Learning, AI, and Transformer FoundationsWhich metric is most appropriate for evaluating machine translation quality?
Correct answer: C
Explanation
BLEU is the standard metric for machine translation because it measures overlap between a candidate translation and one or more reference translations using n-gram precision, often with a brevity penalty. It is widely used to evaluate how closely machine-generated text matches human translations.
Why each option is right or wrong
A. ROUGE
B. F1-score
C. BLEU
BLEU is the established automatic evaluation metric for machine translation, introduced by Papineni et al. in 2002, and it scores candidate output against one or more human reference translations using modified n-gram precision. It also applies a brevity penalty when the candidate is shorter than the reference, which is important in translation tasks where overly short outputs can otherwise score artificially well.
D. Perplexity