Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
1y ago
Source
OpenAIReinforcement Fine-Tuning for Conversational Reasoning with the OpenAI APIopenai.comCookbook for reinforcement fine-tuning conversational reasoning using HealthBench evaluations.
You might also wanna read
Introducing Sakana AI’s Recursive Self-Improvement (RSI) Lab
sakana.ai·1mo ago
Reinforcement Learning to Train Large Language Models to Explain Human Decisions
arxiv.org·1y ago
Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration
Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-
DialogLab: A Research Prototype for Authoring and Testing Human-AI Group Conversations
DialogLab is a research prototype tool that provides a unified interface for designing and testing human-AI group conversations. It allows d
RICP: A Teacher-Student Framework for Retrieved In-Context Principles from Mistakes in LLMs
This paper introduces Retrieved In-Context Principles (RICP), a novel teacher-student framework for improving Large Language Models (LLMs) t
Thinking Fast, Slow, and Artificial: How AI Is Reshaping Human Reasoning
papers.ssrn.com·3mo ago

Comments
Sign in to join the conversation.
No comments yet. Be the first.