Question 26
Domain 2: Core Machine Learning, AI, and Transformer FoundationsIn the attention mechanism, what mathematical operation is performed on the Query (Q) and Key (K) matrices to compute attention scores?
Correct answer: B
Explanation
Attention scores are computed by taking the dot product of the Query and Key matrices, which is matrix multiplication, then scaling the result and applying softmax to turn scores into probabilities. This follows the standard attention formula: "QK^T" is used to measure similarity, then the scores are normalized with softmax.
Why each option is right or wrong
A. Element-wise multiplication of query and key vectors followed by summation across the feature dimension to produce position-wise similarity scores
B. Matrix multiplication (dot product) followed by scaling and softmax
The attention score calculation uses the scaled dot-product attention formula, where the Query matrix is multiplied by the transpose of the Key matrix, i.e. QK^T, to produce raw similarity scores. Those scores are then divided by \u221as_k (the key dimension) before softmax normalization, as defined in Vaswani et al., 2017, "Attention Is All You Need."
C. A convolution operation applied across query and key matrices to capture local positional patterns and short-range dependencies between tokens
D. Concatenation of the query and key vectors for each position pair followed by a learned linear transformation to compute compatibility scores