This document provides an overview of CUDA C/C++ basics for processing data in parallel on GPUs. It discusses:
- The CUDA architecture which exposes GPU parallelism for general-purpose computing while retaining performance.
- The CUDA programming model which uses a grid of thread blocks, with each block containing a group of threads that execute concurrently.
- Key CUDA C/C++ concepts like declaring device functions, launching kernels, managing host and device memory, and using thread and block indexes to parallelize work across threads and blocks.
- A simple example of vector addition to demonstrate parallel execution using threads and blocks, with indexing to map threads/blocks to problem elements.