Question 29
Domain 2: Core Machine Learning, AI, and Transformer FoundationsWhich evaluation metric is most commonly used for text summarization tasks?
Correct answer: B
Explanation
ROUGE is the standard metric for text summarization because it measures overlap between a generated summary and reference summaries, especially recall of shared n-grams and sequences. In summarization research, this makes it the most commonly used evaluation metric for comparing how much important content the system summary captures.
Why each option is right or wrong
A. METEOR
B. ROUGE
ROUGE is the standard automatic metric used in summarization benchmarks because it scores overlap between a system summary and one or more human reference summaries, typically via ROUGE-1, ROUGE-2, and ROUGE-L. In practice, summarization papers report these scores to measure how much reference content is recovered, with ROUGE-1/2 capturing unigram and bigram overlap and ROUGE-L capturing longest common subsequence alignment.
C. Perplexity
D. F1-Score