Transformer deep learning architecture
Attention Weight Matrices
Head-specific Patterns
Layer-wise Analysis
Attention Rollout
Embedding Space Structure
Layer-wise Representations
Probing Tasks
Geometric Properties
Syntactic Patterns
Semantic Patterns
Positional Patterns
Multi-Head Specialization
Attention Entropy
Attention Distance
Head Importance Scoring
Layer Importance Analysis
Previous
8. Architectural Analysis
Go to top
Next
10. Transformer Variants and Evolution