UsefulLinks
Computer Science
Programming
GPU Programming
1. Introduction to Parallel Computing and GPU Architecture
2. GPU Programming Models and APIs
3. Fundamentals of CUDA Programming
4. Intermediate CUDA Programming
5. Performance Optimization and Profiling
6. Advanced CUDA Programming
7. OpenCL Programming
8. Alternative GPU Programming Frameworks
9. Parallel Algorithms and Patterns
10. Applications and Case Studies
11. Performance Analysis and Optimization
12. Debugging and Testing
3.
Fundamentals of CUDA Programming
3.1.
CUDA Ecosystem and Setup
3.1.1.
CUDA Toolkit Components
3.1.1.1.
NVCC Compiler
3.1.1.2.
Runtime Libraries
3.1.1.3.
Development Tools
3.1.2.
Development Environment Setup
3.1.2.1.
Driver Installation
3.1.2.2.
Toolkit Installation
3.1.2.3.
IDE Integration
3.1.2.4.
Verification and Testing
3.2.
CUDA Programming Model
3.2.1.
Host and Device Concepts
3.2.1.1.
Host Code (CPU)
3.2.1.2.
Device Code (GPU)
3.2.1.3.
Heterogeneous Computing Model
3.2.2.
Kernels and Functions
3.2.2.1.
Kernel Declaration
3.2.2.2.
Device Functions
3.2.2.3.
Host Functions
3.2.2.4.
Function Qualifiers
3.2.3.
Thread Hierarchy
3.2.3.1.
Grids
3.2.3.2.
Blocks
3.2.3.3.
Threads
3.2.3.4.
Thread Indexing
3.2.4.
Built-in Variables
3.2.4.1.
threadIdx
3.2.4.2.
blockIdx
3.2.4.3.
blockDim
3.2.4.4.
gridDim
3.2.4.5.
warpSize
3.3.
First CUDA Programs
3.3.1.
Hello World on GPU
3.3.1.1.
Kernel Definition
3.3.1.2.
Kernel Launch
3.3.1.3.
Compilation Process
3.3.2.
Vector Addition Example
3.3.2.1.
Memory Allocation
3.3.2.2.
Data Transfer
3.3.2.3.
Kernel Implementation
3.3.2.4.
Result Verification
3.3.3.
Error Checking Fundamentals
3.3.3.1.
CUDA Error Codes
3.3.3.2.
Error Handling Macros
3.3.3.3.
Debugging Basics
3.4.
Memory Management
3.4.1.
Memory Spaces Overview
3.4.1.1.
Host Memory
3.4.1.2.
Device Memory
3.4.1.3.
Memory Hierarchy
3.4.2.
Basic Memory Operations
3.4.2.1.
cudaMalloc
3.4.2.2.
cudaFree
3.4.2.3.
cudaMemcpy
3.4.2.4.
Memory Transfer Directions
3.4.3.
Unified Memory
3.4.3.1.
cudaMallocManaged
3.4.3.2.
Automatic Data Migration
3.4.3.3.
Page Faulting Mechanism
3.4.3.4.
Performance Considerations
Previous
2. GPU Programming Models and APIs
Go to top
Next
4. Intermediate CUDA Programming