,

Abstract

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

Key Contributions

  1. Bidirectional Pre-training: Uses masked language modeling
  2. State-of-the-Art: Achieved SOTA on 11 NLP tasks
  3. Fine-tuning Approach: Simple adaptation for downstream tasks

Results

BERT obtained new state-of-the-art results on eleven natural language processing tasks.