Useful Links
Computer Science
Artificial Intelligence
Deep Learning
Deep Learning and Neural Networks
1. Foundations of Machine Learning and Neural Networks
2. Training Shallow Neural Networks
3. Deepening the Network
4. Practical Considerations for Training
5. Convolutional Neural Networks (CNNs)
6. Recurrent Neural Networks (RNNs)
7. The Transformer Architecture
8. Generative Models
9. Deep Reinforcement Learning
10. Advanced Topics and Specialized Architectures
11. Deployment and Production
The Transformer Architecture
Limitations of RNNs
Sequential Processing Bottleneck
Difficulty with Long-Range Dependencies
Parallelization Challenges
Computational Inefficiency
The Attention Mechanism
Motivation for Attention
Attention as Soft Lookup
Query, Key, and Value Vectors
Mathematical Representation
Linear Transformations
Attention Score Computation
Scaled Dot-Product Attention
Computation Steps
Scaling Factor
Softmax Normalization
Attention Weights Interpretation
Self-Attention
Mechanism and Benefits
Multi-Token Interactions
Position-Aware Processing
Computational Complexity
Multi-Head Attention
Parallel Attention Heads
Different Representation Subspaces
Concatenation and Projection
Head Dimensionality
The Transformer Architecture
Overall Architecture Design
Encoder-Decoder Structure
Positional Encoding
Need for Position Information
Sinusoidal Positional Encoding
Learned Positional Embeddings
Encoder Block
Multi-Head Self-Attention
Layer Normalization
Feedforward Networks
Residual Connections
Decoder Block
Masked Self-Attention
Cross-Attention
Autoregressive Generation
Layer Normalization Placement
Feedforward Networks
Point-wise Operations
Activation Functions
Training Transformers
Teacher Forcing
Masked Language Modeling
Autoregressive Training
Transformer Variants
Encoder-Only Models
Decoder-Only Models
Encoder-Decoder Models
Applications and Impact
Machine Translation
Text Summarization
Question Answering
Large Language Models
Pre-training Objectives
Fine-Tuning Strategies
Scaling Laws
Emergent Abilities
Previous
6. Recurrent Neural Networks (RNNs)
Go to top
Next
8. Generative Models