Useful Links
Computer Science
Programming
GPU Programming
1. Introduction to Parallel Computing and GPU Architecture
2. GPU Programming Models and APIs
3. Fundamentals of CUDA Programming
4. Intermediate CUDA Programming
5. Performance Optimization and Profiling
6. Advanced CUDA Programming
7. OpenCL Programming
8. Alternative GPU Programming Frameworks
9. Parallel Algorithms and Patterns
10. Applications and Case Studies
11. Performance Analysis and Optimization
12. Debugging and Testing
Fundamentals of CUDA Programming
CUDA Ecosystem and Setup
CUDA Toolkit Components
NVCC Compiler
Runtime Libraries
Development Tools
Development Environment Setup
Driver Installation
Toolkit Installation
IDE Integration
Verification and Testing
CUDA Programming Model
Host and Device Concepts
Host Code (CPU)
Device Code (GPU)
Heterogeneous Computing Model
Kernels and Functions
Kernel Declaration
Device Functions
Host Functions
Function Qualifiers
Thread Hierarchy
Grids
Blocks
Threads
Thread Indexing
Built-in Variables
threadIdx
blockIdx
blockDim
gridDim
warpSize
First CUDA Programs
Hello World on GPU
Kernel Definition
Kernel Launch
Compilation Process
Vector Addition Example
Memory Allocation
Data Transfer
Kernel Implementation
Result Verification
Error Checking Fundamentals
CUDA Error Codes
Error Handling Macros
Debugging Basics
Memory Management
Memory Spaces Overview
Host Memory
Device Memory
Memory Hierarchy
Basic Memory Operations
cudaMalloc
cudaFree
cudaMemcpy
Memory Transfer Directions
Unified Memory
cudaMallocManaged
Automatic Data Migration
Page Faulting Mechanism
Performance Considerations
Previous
2. GPU Programming Models and APIs
Go to top
Next
4. Intermediate CUDA Programming