NCA-GENL Practice Q30

A. MQA uses a single set of key and value projection matrices shared across all attention heads, while maintaining separate query projections per head

In standard multi-head attention, each head has its own Q, K, and V projections, so the parameter count and KV-cache scale with the number of heads. Multi-Query Attention changes only the K/V side: all heads reuse one shared key matrix and one shared value matrix, while queries remain head-specific, which is why it reduces memory bandwidth and inference cost without eliminating per-head query diversity.

B. MQA processes multiple user queries simultaneously in a single forward pass by parallelizing the attention computation across different input sequences in the batch

C. MQA eliminates the query projection matrices entirely and computes attention scores using only the key and value representations, reducing the total parameter count per layer

D. MQA increases the number of attention heads beyond the standard configuration to improve the model's representational capacity and pattern recognition ability

E. MQA significantly reduces the KV cache memory footprint during inference, enabling higher throughput and longer context lengths on the same hardware

Question 30

Explanation

Why each option is right or wrong