Program of Thoughts: Separating Computation from Reasoning in Language Models for Numerical Tasks
By
mkagenius
A respectable bake. You'd come back tomorrow for another.
Summary
The article introduces "Program of Thoughts" (PoT), a new approach that disentangles computation from reasoning in language models for numerical reasoning tasks. Unlike Chain-of-Thoughts prompting (CoT) which handles both reasoning and computation within the language model, PoT uses language models (primarily Codex) to generate programs that express the reasoning process, while delegating computation to an external computer. The method was evaluated on five math word problem datasets and three financial-QA datasets, showing an average 12% performance improvement over CoT in both few-shot and zero-shot settings. When combined with self-consistency decoding, PoT achieves state-of-the-art performance on math datasets and near-state-of-the-art on financial datasets.
Key quotes
· 4 pulledTo disentangle computation from reasoning, we propose 'Program of Thoughts' (PoT), which uses language models (mainly Codex) to express the reasoning process as a program.
The computation is relegated to an external computer, which executes the generated programs to derive the answer.
Under both few-shot and zero-shot settings, PoT can show an average performance gain over CoT by around 12% across all the evaluated datasets.
By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets.
You might also wanna read
Researchers Develop Method to Predict Real-Time Progress in Reasoning Language Models
This research paper investigates whether real-time progress prediction is feasible for reasoning language models that use long latent chains
Study Reveals How RL and SFT Differently Teach Transformers Chain-of-Thought Reasoning on Sparse Boolean Functions
This research paper analyzes how transformers learn Chain-of-Thought (CoT) reasoning capabilities through Reinforcement Learning (RL) with p
