Transformer deep learning architecture
Relative Position Representations
Shaw et al. Approach
T5 Relative Position Bias
RoPE (Rotary Position Embedding)
Local Attention Windows
Strided Attention
Random Attention
Structured Sparsity
Hierarchical Attention
Multi-Resolution Processing
Pyramid Attention
Vision-Language Models
Audio-Text Alignment
Multimodal Fusion
Previous
10. Transformer Variants and Evolution
Go to top
Next
12. Applications and Adaptations