Cactus: An Open-Source Low-Latency AI Engine for Mobile Devices and Wearables
By
HenryNdubuaku
The kind of bagel that ruins lesser bagels for you.
Summary
Cactus is an open-source, low-latency AI engine designed specifically for mobile devices and wearables. It features OpenAI-compatible APIs across major languages, supporting chat, vision, speech-to-text, RAG, tool calling, and cloud handoff. The engine is built on a zero-copy computation graph (PyTorch for mobile) with custom models optimized for RAM and quantization, and includes ARM SIMD kernels optimized for Apple, Snapdragon, Exynos, and other mobile processors. It offers custom attention mechanisms, KV-cache quantization, and chunked prefill for efficient on-device AI inference.
Key quotes
· 4 pulledA low-latency AI engine for mobile devices & wearables.
Zero-copy computation graph (PyTorch for mobile) — Custom models, optimised for RAM & quantisation
ARM SIMD kernels (Apple, Snapdragon, Exynos, etc) — Custom attention, KV-cache quant, chunked prefill
OpenAI-compatible APIs for all major languages — Chat, vision, STT, RAG, tool call, cloud handoff
You might also wanna read
Netflix engineer's open-source tool cuts AI token usage by up to 90%
Netflix senior engineer Tejas Chopra created software called "Project Headroom" that prunes redundant tokens from AI agent instructions befo
Copyparty: A Lightweight File Server That Runs as a Single Python Script
Copyparty is a lightweight, full-featured file server that runs as a single Python script, making it extremely easy to set up without needin
Researcher's "ADHD" tool for Claude Code claims 2x improvement; experts call for more evidence
Solo researcher Udit Akhouri released a third-party Agent SDK tool called "ADHD" for Claude Code on Reddit, claiming it makes the coding age
bit.ly·1d agoReactOS open-source Windows NT clone reaches ARM64 boot milestone on Raspberry Pi 5
ReactOS, the open-source project aiming to recreate Windows NT, has achieved a new milestone by booting on ARM64 architecture. The experimen
Zig Devlog: Build System Rework Separates Maker and Configurer Processes
This devlog entry from the Zig programming language project announces a major rework of the build system, separating the maker process from
Reflections on DwarfStar 4's rapid rise in local AI inference
The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve
