All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

A deep technical dive into CUDA kernel execution: from source code to GPU warps

By

mezark

3h ago· 38 min readenInsight

Summary

This article provides an in-depth technical walkthrough of what happens when a CUDA kernel (specifically a vector-add kernel) is executed. It traces the journey from the CUDA source code (nvcc compilation) all the way down to the hardware level where warps execute on GPU cores. The piece covers GPU architecture concepts like thread blocks, warps, memory hierarchy, and the CUDA execution model in meticulous detail.

Source

Hacker NewsA deep technical dive into CUDA kernel execution: from source code to GPU warpsfergusfinn.com

Key quotes

· 3 pulled
Here's a simple CUDA program. It adds two vectors.
__global__ void vadd(const float* a, const float* b, float* c, int n) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < n) c[i] = a[i] + b[i]; }
Tracing one vector-add kernel from nvcc all the way down to the warps that execute it.
Snippet from the RSS feed
Tracing one vector-add kernel from nvcc all the way down to the warps that execute it.

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.