Exit 37 / 40

Question 37

Domain 2: Core Machine Learning, AI, and Transformer Foundations

In multi-head attention, why does the transformer use multiple attention heads instead of a single attention mechanism?