这篇文章来介绍一下MegatronLM中,PipelineParallel的实现,主要是偏源码 主要相关的论文是这一篇:Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM 还有经典的一些前置的paper: GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism PipeDream: Generalized Pipeline Parallelism f…