Question 6
Domain 2: Core Machine Learning, AI, and Transformer FoundationsWhat is the main purpose of layer normalization in transformers?
Correct answer: B
Explanation
Layer normalization normalizes activations within each layer, which keeps values in a stable range during forward and backward passes. This reduces internal covariate shift, helping training remain stable and allowing transformers to train deeper networks more effectively.
Why each option is right or wrong
A. To reduce overfitting
B. To stabilize training and enable deeper networks
Layer normalization is applied to the hidden states in each transformer block, typically across the feature dimension for each token, so the activations stay numerically well-behaved during both forward and backward propagation. In transformer architectures, this stabilization is what prevents gradients and activations from becoming erratic, which is especially important when stacking many layers; the practical effect is more reliable optimization and the ability to train deeper models without divergence.
C. To compress model size
D. To speed up inference