Useful Links
Computer Science
Programming
GPU Programming
1. Introduction to Parallel Computing and GPU Architecture
2. GPU Programming Models and APIs
3. Fundamentals of CUDA Programming
4. Intermediate CUDA Programming
5. Performance Optimization and Profiling
6. Advanced CUDA Programming
7. OpenCL Programming
8. Alternative GPU Programming Frameworks
9. Parallel Algorithms and Patterns
10. Applications and Case Studies
11. Performance Analysis and Optimization
12. Debugging and Testing
Advanced CUDA Programming
Dynamic Parallelism
Nested Kernel Launches
Parent-Child Relationships
Synchronization Semantics
Memory Visibility
Use Cases and Applications
Adaptive Algorithms
Tree Traversal
Recursive Problems
Performance Considerations
Launch Overhead
Memory Management
Debugging Challenges
Multi-GPU Programming
Multi-GPU Architectures
Peer-to-Peer Access
NVLink Technology
Memory Topology
Programming Patterns
Data Parallelism
Model Parallelism
Pipeline Parallelism
Communication Strategies
Direct Memory Access
Unified Memory
NCCL Library
CUDA Libraries Ecosystem
Mathematical Libraries
cuBLAS
cuSPARSE
cuSOLVER
cuFFT
Machine Learning Libraries
cuDNN
TensorRT
cuML
Utility Libraries
Thrust
CUB
cuRAND
NPP
Interoperability
Graphics API Integration
OpenGL Interop
DirectX Interop
Vulkan Interop
CPU Library Integration
MPI Integration
OpenMP Integration
Threading Libraries
Specialized Hardware Features
Tensor Cores
Architecture Overview
Mixed-Precision Computing
Programming Models
Performance Optimization
RT Cores
Ray Tracing Acceleration
OptiX Integration
Hybrid Rendering
Previous
5. Performance Optimization and Profiling
Go to top
Next
7. OpenCL Programming