Python 3.15's Tail-Calling Interpreter Shows 15% Performance Gain on Windows x86-64
By
lumpa
Hand-rolled, kettle-boiled, baked to perfection. Worth every minute at the bakery.
Summary
The article discusses performance improvements in Python 3.15's interpreter, specifically highlighting that the tail-calling interpreter shows significant speed gains over the computed goto interpreter on certain platforms. The author partially retracts a previous apology about performance results, noting that on macOS AArch64 (XCode Clang) the tail-calling interpreter is 5% faster, and on Windows x86-64 (MSVC) it's approximately 15% faster based on pyperformance benchmarks.
Key quotes
· 3 pulledI can proudly say today that I am partially retracting that apology, but only for two platforms—macOS AArch64 (XCode Clang) and Windows x86-64 (MSVC).
In our own experiments, the tail calling interpreter for CPython was found to beat the computed goto interpreter by 5% on pyperformance on AArch64 macOS using XCode Clang, and roughly 15% on pyperformance on Windows x86-64 (MSVC).
Some time ago I posted an apology piece for Python’s tail calling results. I apologized for communicating performance results without noticing a compiler bug had occured.
You might also wanna read
Java Performance Optimization: Fixing 8 Common Anti-Patterns to Reduce Processing Time by 80%
The article presents a case study of Java performance optimization where fixing common anti-patterns dramatically improved application perfo
Performance Optimization: Replacing Virtual Dispatch with Static Polymorphism in C++
The article discusses performance issues with virtual dispatch in object-oriented programming and advocates for using static polymorphism as
Performance Optimization: Achieving 20x Speedup by Removing Code in Rust Data Versioning Tool
A developer shares a performance optimization story where removing code led to a 20x speedup in their data versioning tool. The team at Oxen
suriya.cc·3mo agoIntroducing tprof: A Targeted Profiler for Python Performance Optimization
The article introduces tprof, a targeting profiler for Python that addresses the inefficiency of traditional profilers when optimizing speci
Building memchunk: A High-Performance Text Chunking Library for RAG Pipelines Using SIMD and memchr
The article details the development of memchunk, a high-performance text chunking library for RAG (Retrieval-Augmented Generation) pipelines
GitHub Repository: Fix for VLC Video Source Audio Stuttering and CPU Throttling on Low-End Devices
A GitHub repository containing code that fixes VLC Video Source audio stuttering and CPU throttling issues on low-end or older devices durin
