Natural Language Processing (NLP)
Motivation and Intuition
Query-Key-Value Framework
Attention Weights and Alignment
Additive Attention
Multiplicative Attention
Scaled Dot-Product Attention
Multi-Head Attention
Intra-sequence Dependencies
Positional Information
Computational Complexity
Encoder-Decoder Structure
Positional Encodings
Layer Normalization
Residual Connections
Feed-Forward Networks
Teacher Forcing
Masked Language Modeling
Autoregressive Generation
Previous
9. Recurrent Neural Networks
Go to top
Next
11. Pre-trained Language Models