Question 12
Domain 2 — Data, Machine Learning, and Model DevelopmentA text processing pipeline converts documents for sentiment analysis. The team debates between stemming and lemmatization for word normalization. Which two characteristics differentiate lemmatization from stemming? (Select two!)
Correct answer: AC
Explanation
Lemmatization differs from stemming because it uses "dictionary-based morphological analysis" to reduce words to their base form, rather than just chopping off prefixes or suffixes. It also produces valid words by considering part of speech and context, while stemming is a simpler rule-based truncation method.
Why each option is right or wrong
A. Lemmatization uses dictionary-based morphological analysis
Lemmatization is the linguistically informed normalization method: it consults a lexicon and morphological rules to map an inflected token to its lemma, often using part-of-speech context, whereas stemming is typically a crude affix-stripping heuristic. In this question’s pipeline, that means the lemmatizer can return a valid base word form rather than merely truncating endings, which is the distinguishing feature being tested.
B. Lemmatization executes faster than stemming algorithms
C. Lemmatization always produces valid dictionary words
D. Lemmatization removes all punctuation and special characters
E. Lemmatization applies only to romance languages