ZeRO & Fastest BERT: Increasing the scale and speed of deep learning training in DeepSpeed