GPUPrefixSums: A portable collection of GPU prefix sum algorithms for CUDA, D3D12, Unity, and WGPU
By
coffeeaddict1
Master baker tier. Every paragraph earns its place on the tray.
Summary
GPUPrefixSums is an open-source GitHub project that implements a comprehensive collection of prefix sum algorithms for GPUs, supporting CUDA, D3D12, Unity, and WGPU. It aims to make state-of-the-art GPU prefix sum techniques portable across different compute shader environments. The project introduces a novel technique called "Decoupled Fallback" for Chained Scan with Decoupled Lookback, designed to enable devices without forward thread progress guarantees to perform scans without crashing. The D3D12 implementation includes an extensive survey of GPU prefix sums from warp to device level, all utilizing wave/warp/subgroup level parallelism.
Key quotes
· 5 pulledGPUPrefixSums aims to bring state-of-the-art GPU prefix sum techniques from CUDA and make them available in portable compute shaders.
It contributes 'Decoupled Fallback,' a novel fallback technique for Chained Scan with Decoupled Lookback that should allow devices without forward thread progress guarantees to perform the scan without crashing.
The D3D12 implementation includes an extensive survey of GPU prefix sums, ranging from the warp to the device level.
All included algorithms utilize wave/warp/subgroup (referred to as 'wave' hereon) level parallelism.
Theoretically portable to all wave/warp/subgroup sizes.
You might also wanna read
Netflix engineer's open-source tool cuts AI token usage by up to 90%
Netflix senior engineer Tejas Chopra created software called "Project Headroom" that prunes redundant tokens from AI agent instructions befo
Copyparty: A Lightweight File Server That Runs as a Single Python Script
Copyparty is a lightweight, full-featured file server that runs as a single Python script, making it extremely easy to set up without needin
Researcher's "ADHD" tool for Claude Code claims 2x improvement; experts call for more evidence
Solo researcher Udit Akhouri released a third-party Agent SDK tool called "ADHD" for Claude Code on Reddit, claiming it makes the coding age
bit.ly·1d agoRotary GPU: Enabling Large Mixture-of-Experts Models on Consumer Laptop GPUs with Limited Memory
This paper presents Rotary GPU, an exploratory approach to running large Mixture-of-Experts (MoE) language models on consumer-grade hardware
ReactOS open-source Windows NT clone reaches ARM64 boot milestone on Raspberry Pi 5
ReactOS, the open-source project aiming to recreate Windows NT, has achieved a new milestone by booting on ARM64 architecture. The experimen
Zig Devlog: Build System Rework Separates Maker and Configurer Processes
This devlog entry from the Zig programming language project announces a major rework of the build system, separating the maker process from
