BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
In this deep dive of BERT, we explore the powerful NLP model's history, break down the approach and architecture behind the model, and take a look at some relevant experiments. We then close with a code demo showing how to use BERT, DistilBERT, RoBERTa, and ALBERT in a Gradient Notebook.