Optimizing the asin() Function: A Technical Follow-up on Performance Improvements
By
def-pri-pub
Crackling crust, pillowy middle. The kind of bagel that earns a second cup of coffee.
Summary
The author revisits their previous work on optimizing the asin() (arcsine) function in C/C++ after receiving feedback from online communities. They analyze their original implementation and discover a performance optimization opportunity by examining the assembly code generated by the compiler. The article focuses on low-level programming optimization techniques, specifically for mathematical function approximations, with technical details about coefficient usage, assembly analysis, and performance improvements.
Key quotes
· 4 pulledI couldn't help wonder, 'Could I have made it even more performant?'
Look at the implementation of the Cg asin() approximation
constexpr double a0 = 1.5707288; constexpr double a1 = -0.2121144; constexpr double a2 = 0.0742610;
After posting that last article, it was fun to read the comments on Reddit and Hacker News as they rolled in.
You might also wanna read

Research-Driven Coding Agents Improve llama.cpp Performance with Literature Search Phase
The article discusses how coding agents that incorporate a research phase—reading academic papers and studying competing projects—before wri
Company Uses AI to Rewrite JSONata in Go, Achieving 1,000x Speedup and $500K Annual Savings
A company used AI to rewrite JSONata, a JSON transformation language, as a pure-Go library called gnata in just seven hours with $400 in AI
Learning from Redundant Optimization: The asin() Function Case Study
The article discusses the author's realization that a faster implementation of the asin() (arcsine) function was already available in standa
Optimizing C Software Performance with Dynamic CPU Feature Detection
The article discusses techniques for optimizing C software performance through dynamic feature detection, focusing on x86-64 processors. It
Performance Optimization: How a 185-Microsecond Type Hint Boosted Throughput 13× in Clojure Roughtime Implementation
The article describes a performance optimization in a Clojure implementation of the Roughtime protocol, where a seemingly trivial change to
SectorC: A 512-Byte C Compiler Written in x86-16 Assembly
SectorC is an extremely compact C compiler written in x86-16 assembly that fits within a 512-byte boot sector of an x86 machine. It supports
