Instruction Pipelines & CPU Execution Stages

Instruction Pipelines & CPU Execution Stages

The Basic Idea: CPUs Don’t Do One Thing at a Time

If a CPU fully finished one instruction before starting the next, performance would be terrible.

Instead, CPUs work like an assembly line.

While one instruction is being executed, the next one is being decoded, and another one is already being fetched.

That overlap is called instruction pipelining.

What Is an Instruction Pipeline

An instruction pipeline splits instruction execution into stages.

Each stage does one part of the job.

Different instructions can be in different stages at the same time.

So instead of this:

fetch -> decode -> execute -> writeback
(wait)
fetch -> decode -> execute -> writeback

The CPU does this:

Instr 1: fetch | decode | execute | writeback
Instr 2:        fetch | decode | execute | writeback
Instr 3:                 fetch | decode | execute | writeback

Same stages, but overlapped.

Typical CPU Execution Stages (Simplified)

Exact stages vary by architecture, but conceptually they look like this.

1. Fetch (IF)

  • CPU reads the instruction from memory
  • Uses the instruction pointer (program counter)
  • Instruction usually comes from L1 instruction cache, not RAM

If instruction isn’t in cache, pipeline stalls can happen.

2. Decode (ID)

  • CPU figures out what the instruction means
  • Determines:
    • Which registers are needed
    • What operation to perform
    • Whether it’s a branch, load, store, etc.

Modern CPUs may decode multiple instructions per cycle.

3. Execute (EX)

  • Actual work happens here
  • ALU operations, comparisons, address calculations
  • For memory instructions, this stage computes the memory address

Some instructions take multiple cycles here.

4. Memory Access (MEM)

  • Load or store data from memory
  • Ideally hits L1 cache
  • Cache miss here is expensive and can stall the pipeline

Not all instructions use this stage.

5. Write Back (WB)

  • Result is written back to registers
  • Instruction officially completes

After this, the instruction retires.

Why Pipelining Makes CPUs Fast

Once the pipeline is full, the CPU can complete one instruction per cycle (or more).

This is why:

  • Clock speed alone doesn’t define performance
  • Instructions per cycle (IPC) matters
  • Modern CPUs feel insanely fast

But this only works when the pipeline flows smoothly.

Pipeline Hazards (Where Things Go Wrong)

Real programs are messy. Pipelines don’t always flow perfectly.

1. Data Hazards

Instruction depends on the result of a previous instruction.

Example:

ADD R1, R2, R3
MUL R4, R1, R5

The second instruction needs R1 before the first finishes.

CPU may:

  • Stall
  • Forward data internally (bypassing)
  • Reorder instructions

2. Control Hazards (Branches)

Branches are pipeline killers.

Example:

if (x > 0) {
   do_something();
}

CPU doesn’t know which path to fetch next until the branch is resolved.

Solution:

  • Branch prediction
  • Speculative execution

If prediction is wrong → pipeline flush → performance hit.

3. Structural Hazards

Hardware resources aren’t available.

Example:

  • Too many instructions needing the same execution unit

Modern CPUs reduce this with duplicated units.

Superscalar and Out-of-Order Execution

Modern CPUs go far beyond simple pipelines.

They can:

  • Execute multiple instructions per cycle
  • Reorder instructions internally
  • Execute instructions speculatively
  • Retire results in correct program order

Your code stays sequential.
Execution does not.

This is why:

  • Instruction order in source code ≠ execution order
  • CPUs feel “smart” but unpredictable
  • Performance tuning is hard

Why This Matters to Software Engineers

You don’t see pipelines directly, but you feel them.

They explain:

  • Why tight loops are fast
  • Why unpredictable branches are slow
  • Why branchless code sometimes wins
  • Why CPUs hate dependency chains
  • Why micro-optimizations sometimes work

At scale, pipeline behavior matters as much as algorithms.

Learn More About relevant topics:

Common Misconception

“Each instruction runs one after another”

False.

Instructions overlap, reorder, speculate, stall, and flush.

Sequential code is an illusion maintained by the CPU.

Final Thought

Instruction pipelines are why CPUs are fast.
Pipeline hazards are why performance is tricky.

Once you understand pipelines, performance stops being mysterious.
You stop guessing and start reasoning.

That’s real engineering.

References and Sources

Computer Systems: A Programmer’s Perspective (CS:APP)

Intel 64 and IA-32 Architectures Optimization Reference Manual

Modern Microprocessors: A 90-Minute Guide – Jason R. Smith

    Leave a Comment

    Your email address will not be published. Required fields are marked *

    Scroll to Top