All Topics

Technology

Art

GPU Programming Project: Implementing Parallelizable RNNs with CUDA

omegablues

8mo ago· 8 min readen

85/100

Golden Brown

Bagelometer↗

If you only eat one bagel today, this is the bagel.

Score85Typehow-toSentimentneutral

Summary

A student's final project for CS179: GPU Programming implementing the paper "Were RNNs All We Needed?" by Feng et al. The project focuses on creating simplified versions of GRUs and LSTMs (minGRU and minLSTM) that can be parallelized using the parallel scan algorithm, transforming their training and inference from O(T) sequential to O(log T) parallel processes for GPU acceleration. The implementation is done in CUDA to verify the paper's claims about making recurrent neural networks more efficient on GPUs.

Key quotes

· 3 pulled

The paper's core claim is that by making minor simplifications to LSTMs and GRUs, their recurrence can be expressed in a form amenable to the parallel scan algorithm

This changes their training and inference from an O(T) sequential process into an O(log T) parallel one, which helps with GPU acceleration

My goal was to verify this claim by building both the simplified models (minGRU and minLSTM) and a custom CUDA implementation

Snippet from the RSS feed

An implementation of parallelizable GRUs and LSTMs for CS179 in CUDA.

You might also wanna read

Rotary GPU: Enabling Large Mixture-of-Experts Models on Consumer Laptop GPUs with Limited Memory

This paper presents Rotary GPU, an exploratory approach to running large Mixture-of-Experts (MoE) language models on consumer-grade hardware

arxiv.org·1d ago

Unsloth and NVIDIA Partner to Accelerate LLM Fine-Tuning by 20%

Unsloth has partnered with NVIDIA to optimize fine-tuning of large language models, achieving 20% faster training speeds. The collaboration

unsloth.ai·24d ago

Systalyze's Utilyze Tool Reveals True GPU Compute Utilization in AI Workloads

Systalyze introduces Utilyze, a GPU compute utilization monitoring tool that reveals actual compute throughput versus traditional metrics li

systalyze.com·1mo ago

Eyot: A Programming Language That Makes GPU Programming as Simple as Background Threads

Eyot is a new programming language designed to make GPU programming as simple as spawning background threads. It transparently compiles code

cowleyforniastudios.com·2mo ago

BarraCUDA: Open-Source CUDA Compiler Supports AMD, NVIDIA, and Tenstorrent GPUs

BarraCUDA is an open-source CUDA C++ compiler written from scratch in C99 that can compile .cu files to multiple GPU architectures including

github.com·3mo ago

VectorWare Enables Rust Async/Await Programming on GPUs

VectorWare announces a breakthrough in GPU programming by enabling Rust's async/await and Future trait on GPUs. This represents a significan

vectorware.com·3mo ago