Abstract
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
Key Contributions
- Bidirectional Pre-training: Uses masked language modeling
- State-of-the-Art: Achieved SOTA on 11 NLP tasks
- Fine-tuning Approach: Simple adaptation for downstream tasks
Results
BERT obtained new state-of-the-art results on eleven natural language processing tasks.