Useful Links
Computer Science
Artificial Intelligence
Deep Learning
Transformer deep learning architecture
1. Foundational Concepts and Predecessors
2. The Original Transformer Architecture
3. Transformer Encoder
4. Transformer Decoder
5. Output Generation and Decoding
6. Training Methodology
7. Mathematical Foundations
8. Architectural Analysis
9. Interpretability and Analysis
10. Transformer Variants and Evolution
11. Advanced Attention Mechanisms
12. Applications and Adaptations
13. Implementation Considerations
Advanced Attention Mechanisms
Relative Position Encoding
Relative Position Representations
Shaw et al. Approach
T5 Relative Position Bias
RoPE (Rotary Position Embedding)
Sparse Attention Patterns
Local Attention Windows
Strided Attention
Random Attention
Structured Sparsity
Multi-Scale Attention
Hierarchical Attention
Multi-Resolution Processing
Pyramid Attention
Cross-Modal Attention
Vision-Language Models
Audio-Text Alignment
Multimodal Fusion
Previous
10. Transformer Variants and Evolution
Go to top
Next
12. Applications and Adaptations