RunRL - Content Missing
By
ag8
Summary
The article appears to be about RunRL, but the content is completely empty or missing. Without any substantive content, it's impossible to determine the actual subject matter, context, or details about what RunRL refers to.
Key quotes
· 3 pulledNo content available for quote extraction
Article body appears to be empty
Unable to extract meaningful quotes from blank content
Hey HN, we’re Andrew and Derik at RunRL (https://runrl.com/). We've built a platform to improve models and agents with reinforcement learning. If you can define a metric, we'll make your model or agent better, without you having to think about managing GPU clusters.
Here's a demo video: https://youtu.be/EtiBjs4jfCg
I (Andrew) was doing a PhD in reinforcement learning on language models, and everyone kept...not using RL because it was too hard to get running. At some point I realized that someone's got to sit down and actually write a good platform for running RL experiments.
Once this happened, people started using it for antiviral design, formal verification, browser agents, and a bunch of other cool applications, so we decided to make a startup out of it.
How it works:
- Choose an open-weight base model (weights are necessary for RL updates; Qwen3-4B-Instruct-2507 is a good starting point)
- Upload a set of initial prompts ("Generate an antiviral targeting Sars-CoV-2 protease", "Prove this theorem", "What's the average summer high in Windhoek?")
- Define a reward function, using Python, an LLM-as-a-judge, or both
- For complex settings, you can define an entire multi-turn environment
- Watch the reward go up!
For most well-defined problems, a small open model + RunRL outperforms frontier models. (For instance, we've seen Qwen-3B do better than Claude 4.1 Opus on antiviral design.) This is because LLM intelligence is notoriously "spiky"; often models are decent-but-not-great at common-sense knowledge, are randomly good at a few domains, but make mistakes on lots of other tasks. RunRL creates spikes precisely on the tasks where you need them.
Pricing: $80/node-hour. Most models up to 14B parameters fit on one node (0.6-1.2 TB of VRAM). We do full fine-tuning, at the cost of parameter-efficiency (with RL, people seem to care a lot about the last few percent gains in e.g. agent reliability).
Next up: continuous learning; tool use. Tool use is currently in private beta, which you can join here: https://forms.gle/D2mSmeQDVCDraPQg8
We'd love to hear any thoughts, questions, or positive or negative reinforcement!
Comments URL: https://news.ycombinator.com/item?id=45277704
Points: 2
# Comments: 0
You might also wanna read
Google's Debug program seeks EPA approval to release 64 million modified mosquitoes in California and Florida
Google's Debug program plans to release up to 64 million genetically modified "good" mosquitoes in California and Florida over two years to
AI's Real Threat: The Normalization of Mediocrity Over Originality
Ray Nayler argues that AI's real danger isn't superhuman intelligence but the encouragement of mediocrity. He contends that AI systems optim
Data Center Activism as a Strategic Lever for AI Backlash
The article discusses data center activism as a strategic "bankshot" against the AI industry's growing energy consumption. While the author
Phishing Campaign Targets Signal Users by Stealing Backup Recovery Keys
A new wave of phishing attacks is targeting Signal users by impersonating the app's support team. Hackers send messages inside Signal claimi
cybersecuritynews.com·52m agoApple Plans to Launch Smart Glasses in Late 2027, Competing With Meta's Ray-Ban Wearables
The article discusses Apple's anticipated entry into the smart glasses market, reportedly launching in late 2027, directly competing with Me
European Commission explores new semiconductor factory as part of Chips Act 2.0 strategy
The European Commission, along with two R&D hubs, is exploring the establishment of a cutting-edge semiconductor factory in Europe as part o
