Question 22
Domain 3In a distributed machine learning system, which technique helps in reducing the communication overhead between nodes?
Correct answer: D
Explanation
Distributed training uses techniques like “Reduction Server on Vertex AI” and Horovod to combine updates across workers. Gradient aggregation reduces communication overhead by sending summarized gradients instead of many separate node-to-node messages, which lowers network traffic during training.
Why each option is right or wrong
A. Data sharding
B. Data replication
C. Model compression
D. Gradient aggregation.
Gradient aggregation is the distributed-training mechanism that combines per-worker updates into a single summarized gradient before synchronization, which directly cuts down the number and size of inter-node messages. In the context of distributed GPU/TPU training, this is the communication-saving pattern used by tools such as Horovod and Vertex AI Reduction Server, where workers exchange aggregated gradients rather than sending many separate parameter updates across the network.