BERT (language model)

Bidirectional Encoder Representations from Transformers (BERT)
Original author(s)	Google AI
Initial release	October 31, 2018
Repository	github.com/google-research/bert
Type	Large language model; Transformer (deep learning architecture); Foundation model;
License	Apache 2.0
Website	arxiv.org/abs/1810.04805

Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google.^[1]^[2] It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. It is notable for its dramatic improvement over previous state-of-the-art models, and as an early example of a large language model. As of 2020^[update], BERT is a ubiquitous baseline in natural language processing (NLP) experiments.^[3]

BERT is trained by masked token prediction and next sentence prediction. As a result of this training process, BERT learns contextual, latent representations of tokens in their context, similar to ELMo and GPT-2.^[4] It found applications for many natural language processing tasks, such as coreference resolution and polysemy resolution.^[5] It is an evolutionary step over ELMo, and spawned the study of "BERTology", which attempts to interpret what is learned by BERT.^[3]

BERT was originally implemented in the English language at two model sizes, BERT_BASE (110 million parameters) and BERT_LARGE (340 million parameters). Both were trained on the Toronto BookCorpus^[6] (800M words) and English Wikipedia (2,500M words). The weights were released on GitHub.^[7] On March 11, 2020, 24 smaller models were released, the smallest being BERT_TINY with just 4 million parameters.^[7]

^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (October 11, 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
^ "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. November 2, 2018. Retrieved November 27, 2019.
^ ^a ^b Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We Know About How BERT Works". Transactions of the Association for Computational Linguistics. 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403.
^ Ethayarajh, Kawin (September 1, 2019), How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings, arXiv:1909.00512
^ Anderson, Dawn (November 5, 2019). "A deep dive into BERT: How BERT launched a rocket into natural language understanding". Search Engine Land. Retrieved August 6, 2024.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". pp. 19–27. arXiv:1506.06724 [cs.CV].
^ ^a ^b "BERT". GitHub. Retrieved March 28, 2023.

[:0-1] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (October 11, 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].

[2] "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. November 2, 2018. Retrieved November 27, 2019.

[:4-3] Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We Know About How BERT Works". Transactions of the Association for Computational Linguistics. 8: 842–866. arXiv:2002.12327. doi:10.1162/tacl_a_00349. S2CID 211532403.

[:5-4] Ethayarajh, Kawin (September 1, 2019), How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings, arXiv:1909.00512

[5] Anderson, Dawn (November 5, 2019). "A deep dive into BERT: How BERT launched a rocket into natural language understanding". Search Engine Land. Retrieved August 6, 2024.

[6] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books". pp. 19–27. arXiv:1506.06724 [cs.CV].

[:3-7] "BERT". GitHub. Retrieved March 28, 2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]