Understanding CPU Pipelining and Its Evolution into Branch Prediction
By
flipacholas
Master baker tier. Every paragraph earns its place on the tray.
Summary
This article explores CPU pipelining concepts as part of a branch prediction series, explaining how modern processors optimize instruction execution through pipeline stages. The author discusses the evolution from simple pipelining to complex branch prediction mechanisms, including branch delay slots and how they evolved into modern prediction techniques. The content delves into the technical details of CPU architecture, pipeline hazards, and performance optimization strategies used in processors like MIPS and modern x86 architectures.
Key quotes
· 5 pulledI want to share what I've learned about CPU pipelining.
I was motivated to dive into the details after reading Rodrigo Copetti's Playstation MIPS write-up where he talked about branch delay slots and how they evolved into branch prediction.
I quickly found many subtle and fascinating details on CPU pipelining that I had previously overlooked.
This is part of my branch prediction series.
Visualizing CPU Pipelining | Why Branch Prediction Needs Real Data | Hacking LLDB to Evaluate Branch Predictions (coming soon)
You might also wanna read
Understanding x86-64 CPU Register Architecture: From 16 General-Purpose to Hundreds of Physical Registers
This technical article explores the complex register architecture of x86-64 CPUs, explaining that while the ISA defines 16 general-purpose r
Reverse-engineering the Intel 8087: A look at microcode and register exchange
A detailed technical deep-dive into the Intel 8087 floating-point co-processor's microcode, specifically examining the register exchange ope
Zero-Copy GPU Inference from WebAssembly on Apple Silicon: Direct Memory Sharing Between Wasm and GPU
The article describes a technical breakthrough on Apple Silicon where WebAssembly modules can share linear memory directly with the GPU, ena
abacusnoir.com·1mo agoTailslayer: C++ Library for Reducing RAM Tail Latency from DRAM Refresh Stalls
Tailslayer is a C++ library designed to reduce tail latency in RAM reads caused by DRAM refresh stalls. It works by replicating data across
Understanding CPU Branch Prediction and Its Impact on Benchmarking
The article discusses how modern processors use branch prediction to execute multiple instructions per cycle, explaining that CPUs have rema
lemire.me·2mo agonCPU: AI-Native Computing Platform with Neural Network-Based Architecture
nCPU is an AI-native computing platform where every layer from arithmetic to operating system to compiler is either a trained neural network
