# Megatron-LM BERT 345M
Megatron is a large, powerful transformer. For this particular Megatron model we trained a bidirectional transformer in the style of BERT. This model contains 345 million parameters made up of 24 layers, 16 attention heads, and a hidden size of 1024. 

This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. We offer versions of this model pretrained both with a cased and uncased vocabulary.

Find more information at our repo: https://github.com/NVIDIA/Megatron-LM

Megatron-BERT 345M

Megatron-LM BERT 345M