CoT-PoT Ensembling: Efficient LLM Reasoning with Self-Consistency from Just Two Samples
By
[Submitted on 19 Apr 2026 (v1), last revised 4 Jun 2026 (this version, v2)]
Right out the toaster. Reliable, with some real depth.
Summary
This paper introduces a hybrid ensembling approach called CoT-PoT that combines Chain-of-Thought (CoT) and Program-of-Thought (PoT) reasoning for self-consistency (SC) in large language models. The method leverages the complementary strengths of both reasoning modes to improve accuracy while drastically reducing computational costs. The authors demonstrate that CoT-PoT ensembling reduces the number of samples required for SC by a factor of 9.3x, and that 78.6% of tasks can be addressed with only two samples—a feat not possible with prior SC methods.
Key quotes
· 3 pulledWe introduce a hybrid ensembling approach that leverages the complementary strengths of two distinct modes of reasoning: Chain-of-Thought (CoT) and Program-of-Thought (PoT).
We show that CoT-PoT ensembling not only improves overall accuracy, but also drastically reduces the number of samples required for SC by a factor of 9.3x.
The majority of tasks (78.6%) can be addressed with only two samples, which has not been possible with any prior SC methods.
You might also wanna read
Program of Thoughts: Separating Computation from Reasoning in Language Models for Numerical Tasks
The article introduces "Program of Thoughts" (PoT), a new approach that disentangles computation from reasoning in language models for numer
Theoretical Perspective on Continuous Chain of Thoughts in Reasoning
Large Language Models (LLMs) have shown impressive performance in reasoning tasks using chain-of-thoughts (CoTs) techniques. This article ex
Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration
Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-
Comprehensive Survey of Reasoning Failures in Large Language Models
This article presents a comprehensive survey of reasoning failures in Large Language Models (LLMs), introducing a novel categorization frame
Tiny Recursion Model Achieves Strong AGI Benchmark Results with Only 7M Parameters
The paper introduces Tiny Recursion Model (TRM), a recursive reasoning model that achieves impressive scores of 45% on ARC-AGI-1 and 8% on A
AI Evolution in 2025: From Stochastic Parrots to Chain of Thought Reasoning
The article reflects on the evolution of AI understanding by the end of 2025, noting that the 'stochastic parrots' criticism of LLMs has lar
