MLS-C01 Practice Q22

A. Amazon Comprehend syntax analysis and entity detection.

Comprehend syntax and entity detection extract linguistic structure, not optimized sentiment-classification feature vectors.

B. Amazon SageMaker BlazingText cbow mode.

BlazingText cbow learns word embeddings from context, not direct sparse weighting for rare-term classification.

C. Natural Language Toolkit (NLTK) stemming and stop word removal.

Stemming and stop-word removal can help preprocessing, but alone do not solve sparse term weighting.

D. Scikit-leam term frequency-inverse document frequency (TF-IDF) vectorizer.

A rich vocabulary with low per-word frequency is exactly the sparse-text scenario addressed by TF-IDF, which assigns each term a weight of \(tf \times \log(N/df)\) so rare but discriminative words contribute more than ubiquitous ones. In scikit-learn, `TfidfVectorizer` is the standard tool for this preprocessing step, and it is commonly used to improve classification performance on sentiment tasks by reducing the impact of very common tokens and producing more informative feature vectors.

Question 22

Explanation

Why each option is right or wrong