All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Reverse-Engineering the RK3588 NPU to Run Vision Transformers 15x Faster

By

rcarmo

5mo ago· 5 min readenInsight

Summary

The article details the process of reverse-engineering the Rockchip RK3588 NPU to overcome hardware limitations that prevent running modern Vision Transformers like SmolVLM. The author discovered that while the chip promises 6 TOPS of NPU performance, the standard Computer Vision SDK fails with Vision Transformers due to memory constraints from large Attention matrices. Through reverse-engineering, they identified hardware limits, defeated compiler optimizations, and built a custom sharding runtime that achieved 15x faster performance for SmolVLM.

Key quotes

· 3 pulled
The standard Computer Vision SDK (rknn-toolkit2) is optimized for older, predictable CNNs (like ResNet). When I fed it the SigLIP Vision Transformer used by SmolVLM, the driver choked.
Even though the model is 'smol,' the massive Attention matrices it generates triggered cry
Reverse-engineering the Rockchip RK3588 NPU to run SmolVLM 15x faster by discovering hardware limits, defeating compiler optimizations, and building a custom sharding runtime
Snippet from the RSS feed
Reverse-engineering the Rockchip RK3588 NPU to run SmolVLM 15x faster by discovering hardware limits, defeating compiler optimizations, and building a custom sharding runtime

You might also wanna read