UsefulLinks
Computer Science
Artificial Intelligence
Deep Learning
Distributed Deep Learning Training
1. Introduction to Distributed Deep Learning
2. Data Parallelism
3. Model Parallelism
4. Hybrid Parallelism Strategies
5. Communication in Distributed Training
6. Communication Optimization
7. System and Hardware Considerations
8. Frameworks and Libraries
9. Performance Optimization and Tuning
10. Practical Implementation
11. Advanced Topics and Future Directions
3.
Model Parallelism
3.1.
Fundamental Concepts
3.1.1.
Model Partitioning Strategies
3.1.1.1.
Layer-wise Partitioning
3.1.1.2.
Tensor-wise Partitioning
3.1.1.3.
Operator-level Partitioning
3.1.2.
Memory Distribution Benefits
3.1.3.
Communication Requirements
3.2.
Pipeline Parallelism
3.2.1.
Layer Pipelining Concepts
3.2.1.1.
Sequential Layer Assignment
3.2.1.2.
Forward Pass Scheduling
3.2.1.3.
Backward Pass Scheduling
3.2.2.
Micro-batching
3.2.2.1.
Batch Splitting Strategies
3.2.2.2.
Pipeline Efficiency
3.2.2.3.
Memory Trade-offs
3.2.3.
Pipeline Bubble Problem
3.2.3.1.
Idle Time Analysis
3.2.3.2.
Throughput Impact
3.2.3.3.
Bubble Minimization Techniques
3.2.4.
Advanced Pipeline Techniques
3.2.4.1.
Interleaved Scheduling
3.2.4.2.
Virtual Pipeline Stages
3.2.4.3.
Gradient Accumulation in Pipelines
3.3.
Tensor Parallelism
3.3.1.
Tensor Partitioning Methods
3.3.1.1.
Row-wise Parallelism
3.3.1.2.
Column-wise Parallelism
3.3.1.3.
Block-wise Parallelism
3.3.2.
Matrix Operation Splitting
3.3.2.1.
Matrix Multiplication Partitioning
3.3.2.2.
Element-wise Operation Distribution
3.3.3.
Transformer Model Applications
3.3.3.1.
Self-Attention Parallelization
3.3.3.2.
Feed-Forward Network Splitting
3.3.3.3.
Embedding Table Sharding
3.3.3.4.
Communication Patterns
Previous
2. Data Parallelism
Go to top
Next
4. Hybrid Parallelism Strategies