Exit 36 / 40

Question 36

Domain 2: Core Machine Learning, AI, and Transformer Foundations

In multi-head attention, why are multiple attention heads used?