A Discussion on Decentralised Training

Summary

In this conversation on decentralised training, Max Ryabinin (Distinguished Research Scientist at Together AI), Bowen Peng (Co-Founder of Nous Research), Harry Grieve (Co-Founder of Gensyn), Johannes Hagemann (Co-Founder of Prime Intellect), and Mohamed Baioumy (Co-Founder of Exo Labs) join Christine Yip (Head of Ecosystem at Gensyn) in a X Space.

The conversation centered around the current capabilities and potential advancements in decentralised deep learning, exploring various methodologies, techniques and the promising trajectory of future research in this field.

Further Reading

Relevant papers and resources that were mentioned in this conversation:

  • Ryabinin, M., et al. SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient (2023)
  • Douillard, A., et al. DiLoCo: Distributed Low-Communication Training of Language Models (2023)
  • Jaghouar, S., et al. OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training (2024)
  • Peng, B., et al. A Preliminary Report on DisTrO (2024)
  • Ryabinin, M., et al. Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts (2020)
  • Yang, B. et al. PipeMare: Asynchronous Pipeline Parallel DNN Training (2019)
  • Niu, F., et al. Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent (2011)
  • McMahan H., et al. Communication-Efficient Learning of Deep Networks from Decentralized Data (2017)
  • Li, M., et al. Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models (2022)
  • Zhang, Y., et al. Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark (2024)
  • Jiang, A., et al. Mixtral of Experts (2024)