UsefulLinks
Computer Science
Programming
GPU Programming
1. Introduction to Parallel Computing and GPU Architecture
2. GPU Programming Models and APIs
3. Fundamentals of CUDA Programming
4. Intermediate CUDA Programming
5. Performance Optimization and Profiling
6. Advanced CUDA Programming
7. OpenCL Programming
8. Alternative GPU Programming Frameworks
9. Parallel Algorithms and Patterns
10. Applications and Case Studies
11. Performance Analysis and Optimization
12. Debugging and Testing
6.
Advanced CUDA Programming
6.1.
Dynamic Parallelism
6.1.1.
Nested Kernel Launches
6.1.1.1.
Parent-Child Relationships
6.1.1.2.
Synchronization Semantics
6.1.1.3.
Memory Visibility
6.1.2.
Use Cases and Applications
6.1.2.1.
Adaptive Algorithms
6.1.2.2.
Tree Traversal
6.1.2.3.
Recursive Problems
6.1.3.
Performance Considerations
6.1.3.1.
Launch Overhead
6.1.3.2.
Memory Management
6.1.3.3.
Debugging Challenges
6.2.
Multi-GPU Programming
6.2.1.
Multi-GPU Architectures
6.2.1.1.
Peer-to-Peer Access
6.2.1.2.
NVLink Technology
6.2.1.3.
Memory Topology
6.2.2.
Programming Patterns
6.2.2.1.
Data Parallelism
6.2.2.2.
Model Parallelism
6.2.2.3.
Pipeline Parallelism
6.2.3.
Communication Strategies
6.2.3.1.
Direct Memory Access
6.2.3.2.
Unified Memory
6.2.3.3.
NCCL Library
6.3.
CUDA Libraries Ecosystem
6.3.1.
Mathematical Libraries
6.3.1.1.
cuBLAS
6.3.1.2.
cuSPARSE
6.3.1.3.
cuSOLVER
6.3.1.4.
cuFFT
6.3.2.
Machine Learning Libraries
6.3.2.1.
cuDNN
6.3.2.2.
TensorRT
6.3.2.3.
cuML
6.3.3.
Utility Libraries
6.3.3.1.
Thrust
6.3.3.2.
CUB
6.3.3.3.
cuRAND
6.3.3.4.
NPP
6.4.
Interoperability
6.4.1.
Graphics API Integration
6.4.1.1.
OpenGL Interop
6.4.1.2.
DirectX Interop
6.4.1.3.
Vulkan Interop
6.4.2.
CPU Library Integration
6.4.2.1.
MPI Integration
6.4.2.2.
OpenMP Integration
6.4.2.3.
Threading Libraries
6.5.
Specialized Hardware Features
6.5.1.
Tensor Cores
6.5.1.1.
Architecture Overview
6.5.1.2.
Mixed-Precision Computing
6.5.1.3.
Programming Models
6.5.1.4.
Performance Optimization
6.5.2.
RT Cores
6.5.2.1.
Ray Tracing Acceleration
6.5.2.2.
OptiX Integration
6.5.2.3.
Hybrid Rendering
Previous
5. Performance Optimization and Profiling
Go to top
Next
7. OpenCL Programming