Useful Links
Computer Science
Programming
GPU Programming
1. Introduction to Parallel Computing and GPU Architecture
2. GPU Programming Models and APIs
3. Fundamentals of CUDA Programming
4. Intermediate CUDA Programming
5. Performance Optimization and Profiling
6. Advanced CUDA Programming
7. OpenCL Programming
8. Alternative GPU Programming Frameworks
9. Parallel Algorithms and Patterns
10. Applications and Case Studies
11. Performance Analysis and Optimization
12. Debugging and Testing
Parallel Algorithms and Patterns
Fundamental Parallel Patterns
Map Pattern
Element-wise Operations
Embarrassingly Parallel Problems
Implementation Strategies
Reduce Pattern
Parallel Reduction Algorithms
Tree-based Reduction
Warp-level Primitives
Scan Pattern
Prefix Sum Algorithms
Inclusive vs. Exclusive Scan
Applications and Use Cases
Scatter and Gather Patterns
Irregular Memory Access
Data Reorganization
Performance Considerations
Advanced Algorithmic Patterns
Stencil Computations
Finite Difference Methods
Boundary Conditions
Optimization Techniques
Graph Algorithms
Breadth-First Search
Shortest Path Algorithms
Graph Traversal Patterns
Sorting Algorithms
Parallel Sorting Networks
Radix Sort
Merge Sort
Matrix Operations
Matrix Multiplication
Decomposition Algorithms
Sparse Matrix Operations
Optimization Strategies
Load Balancing
Static vs. Dynamic Balancing
Work Stealing
Irregular Workloads
Communication Minimization
Data Locality
Communication-Avoiding Algorithms
Overlapping Communication and Computation
Memory Access Optimization
Tiling Strategies
Cache Blocking
Memory Coalescing
Previous
8. Alternative GPU Programming Frameworks
Go to top
Next
10. Applications and Case Studies