Giordano B. Ferreira

Giordano B. Ferreira

Giordano Ferreira

How GPUs Communicate: Fundamentals of Distributed Training with PyTorch

The rapid growth in the use of Artificial Intelligence (especially for language modeling tasks) in recent years has been rightly credited to the Transformer architecture. The seminal 2017 paper introduced the concept of self-attention, allowing each token in a sequence to attend to others contextually. This made it possible to