NCA-GENL Practice Q26

A. Element-wise multiplication of query and key vectors followed by summation across the feature dimension to produce position-wise similarity scores

B. Matrix multiplication (dot product) followed by scaling and softmax

The attention score calculation uses the scaled dot-product attention formula, where the Query matrix is multiplied by the transpose of the Key matrix, i.e. QK^T, to produce raw similarity scores. Those scores are then divided by \u221as_k (the key dimension) before softmax normalization, as defined in Vaswani et al., 2017, "Attention Is All You Need."

C. A convolution operation applied across query and key matrices to capture local positional patterns and short-range dependencies between tokens

D. Concatenation of the query and key vectors for each position pair followed by a learned linear transformation to compute compatibility scores

Question 26

Explanation

Why each option is right or wrong