whichllm: An open-source CLI tool that ranks local LLMs by real benchmarks and hardware compatibility
By
andyyyy64
Baker's choice. Dense with flavour, light on filler.
Summary
whichllm is an open-source CLI tool that auto-detects your GPU/CPU/RAM specs and ranks the best local LLMs from HuggingFace that actually run on your hardware. Unlike simple "what fits" tools, it ranks models by real, recency-aware benchmark performance rather than just parameter count or VRAM size. The tool emphasizes that a smaller, newer model can outperform a larger older one (e.g., ranking a 27B model above a 32B one due to better benchmarks). It provides one-command instant results with live HuggingFace data.
Key quotes
· 4 pulledThe 32B model fits your card fine — whichllm still ranks the 27B #1, because it scores higher on real benchmarks and is a newer generation.
A size-only 'what fits?' tool would hand you the bigger one. That gap is the whole point of whichllm.
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count.
Auto-detects your GPU/CPU/RAM and ranks the top models from HuggingFace that fit your system.
You might also wanna read
LLMTest: Automated LLM Model Selection and Fallback Tool for Developers
LLMTest is a tool created by maker Tom to help developers and "vibe coders" automatically select the best LLM models for AI-powered features
QuickCompare: Compare 50+ LLMs on Your Own Data by Quality, Cost, and Speed
QuickCompare by Trismik is a tool that allows users to upload their own data and compare 50+ LLMs side-by-side based on quality, cost, and s
LLM Stats: Platform for Comparing AI Language Models by Benchmarks, Cost, and Capabilities
LLM Stats is a platform that allows users to compare various AI language models (LLMs) across multiple dimensions including performance benc
Guide to Calculating GPU Memory for Self-Hosted LLM Inference
The article provides a guide on calculating GPU memory requirements and managing concurrent requests for self-hosted large language model (L
