All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Forge: A Python framework for reliable self-hosted LLM tool-calling and multi-step agent workflows

By

zambelli

12d ago· 7 min readenCode

Summary

Forge is a Python framework designed to improve self-hosted LLM tool-calling and multi-step agentic workflows. It acts as a reliability layer for local models (like 8B parameter models), boosting their performance through guardrails (rescue parsing, retry nudges, step enforcement) and context management (VRAM-aware budgets, tiered compaction). The top self-hosted configuration achieves 86.5% across a 26-scenario evaluation suite. The framework offers three usage modes: WorkflowRunner for structured agent loops, a CLI for interactive sessions, and a server mode for API-based deployments.

Key quotes

· 3 pulled
Forge lifts an 8B local model to the top of its class on multi-step agentic workflows through guardrails (rescue parsing, retry nudges, step enforcement) and context management (VRAM-aware budgets, tiered compaction).
The current top self-hosted config (Ministral-3 8B Instruct Q8 on llama-server) scores 86.5% across forge's 26-scenario eval suite — and 76% on the hardest tier.
WorkflowRunner — Define tools, pick a backend, run structured agent loops. Forge manages the full lifecycle: system prompts, tool execution, error recovery, and context budgets.
Snippet from the RSS feed
A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows - antoinezambelli/forge

You might also wanna read