KVBoost: A Drop-In Python Library for KV Cache Reuse in LLM Inference
KVBoost is a drop-in Python library for LLM inference that enables chunk-level KV cache reuse, eliminating redundant computation. It allows developers to warm a shared prefix once and reuse the cache across subsequent generation calls, achieving 80%+ KV reuse ratio without requiring any code rewrites.
pythongiant.github.io10d ago