NCA-GENL Practice Q14

A. To reduce computational complexity of attention

B. To provide sequence order information to the model

Transformers do not inherently encode token order because self-attention treats the input set in parallel, so an added positional signal is required to distinguish different permutations of the same words. In the standard Transformer architecture, positional encodings are added to the input embeddings at the start of the model to inject sequence position information, enabling the network to learn relationships that depend on order rather than just token identity.

C. To compress input token representations

D. To enable variable-length sequence processing

Question 14

Explanation

Why each option is right or wrong