Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-hosted)
I recently built a small open-source tool to benchmark different LLM API endpoints — including OpenAI, Claude, and self-hosted models (like llama.cpp).It runs a configurable number of test requests and reports two key metrics:
• First-token latency (ms): How long it takes for the first token to appear
• Output speed (tokens/sec): Overall output fluencyDe