整理几个核心的设计点 两种format: * E4M3,适合权重和activation * E5M2,适合梯度 The recommended use of FP8 encodings is E4M3 for weight and activation tensors, and E5M2 for gradient tensors. This is consistent with findings in [20, 16], where inference and forward pass of training use…