Question 30
Domain 2: Core Machine Learning, AI, and Transformer FoundationsWhat is Multi-Query Attention (MQA) and how does it differ from standard multi-head attention? (Select TWO)
Correct answer: AE
Explanation
Multi-Query Attention shares one set of key and value projections across all heads, so it uses "a single set of key and value projection matrices shared across all attention heads." It still keeps separate query projections per head, which preserves head-specific querying while reducing memory and computation compared with standard multi-head attention.
Why each option is right or wrong
A. MQA uses a single set of key and value projection matrices shared across all attention heads, while maintaining separate query projections per head
In standard multi-head attention, each head has its own Q, K, and V projections, so the parameter count and KV-cache scale with the number of heads. Multi-Query Attention changes only the K/V side: all heads reuse one shared key matrix and one shared value matrix, while queries remain head-specific, which is why it reduces memory bandwidth and inference cost without eliminating per-head query diversity.
B. MQA processes multiple user queries simultaneously in a single forward pass by parallelizing the attention computation across different input sequences in the batch
C. MQA eliminates the query projection matrices entirely and computes attention scores using only the key and value representations, reducing the total parameter count per layer
D. MQA increases the number of attention heads beyond the standard configuration to improve the model's representational capacity and pattern recognition ability
E. MQA significantly reduces the KV cache memory footprint during inference, enabling higher throughput and longer context lengths on the same hardware