UsefulLinks
Computer Science
Artificial Intelligence
Deep Learning
Transformer deep learning architecture
1. Foundational Concepts and Predecessors
2. The Original Transformer Architecture
3. Transformer Encoder
4. Transformer Decoder
5. Output Generation and Decoding
6. Training Methodology
7. Mathematical Foundations
8. Architectural Analysis
9. Interpretability and Analysis
10. Transformer Variants and Evolution
11. Advanced Attention Mechanisms
12. Applications and Adaptations
13. Implementation Considerations
11.
Advanced Attention Mechanisms
11.1.
Relative Position Encoding
11.1.1.
Relative Position Representations
11.1.2.
Shaw et al. Approach
11.1.3.
T5 Relative Position Bias
11.1.4.
RoPE (Rotary Position Embedding)
11.2.
Sparse Attention Patterns
11.2.1.
Local Attention Windows
11.2.2.
Strided Attention
11.2.3.
Random Attention
11.2.4.
Structured Sparsity
11.3.
Multi-Scale Attention
11.3.1.
Hierarchical Attention
11.3.2.
Multi-Resolution Processing
11.3.3.
Pyramid Attention
11.4.
Cross-Modal Attention
11.4.1.
Vision-Language Models
11.4.2.
Audio-Text Alignment
11.4.3.
Multimodal Fusion
Previous
10. Transformer Variants and Evolution
Go to top
Next
12. Applications and Adaptations