Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always…

Read the full article

HenryNdubuaku2mo agoenCode

You might also wanna read

Gemini Function Calling

A promising feature of the Gemini large language model released recently by Google DeepMind , is the support for function calls . It’s a way

glaforge.dev·2y ago

Amplitude Gating: A Non-Destructive FFN Intervention Method for Improving Tool-Structured LLM Outputs

Large language models increasingly operate as tool-using agents, where small format, argument, or function-call errors can invalidate otherw

arxiv.org·3d ago

Gemini, Google's Large Language Model, for Java Developers

As a follow-up to my talk on generative AI for Java developers , I’ve developed a new presentation that focuses more on the Gemini large mul

glaforge.dev·2y ago

How we built the most performant DeepSeek V3.2, MiniMax-M2.5 and Qwen 3.5 397B on DigitalOcean Serverless Inference

Today at Deploy, we are announcing the general availability of DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B on DigitalOcean Serverless Inf

DigitalOcean·2mo ago

Gemini codelab for Java developers using LangChain4j

No need to be a Python developer to do Generative AI! If you’re a Java developer, you can take advantage of LangChain4j to implement some ad

glaforge.dev·2y ago

DigitalOcean Serverless Inference: A Deep Dive

The Problem: Inference Gets Hard at Scale If you’ve shipped an AI feature to production, you already know: the hard part isn’t making a mode

DigitalOcean·1mo ago

Comments

No comments yet. Be the first.