Datapath and Control

The Basic Processing Unit is the brain of every computer. It fetches instructions from memory, decodes them, and executes the operations they specify. Whether you're running a simple addition or a complex algorithm, every operation flows through this fundamental unit.

Fundamental Concepts

Before diving into the datapath, let's establish the foundational concepts.

The Instruction Cycle

Every instruction goes through a predictable cycle:

Phase	Description
FETCH	Processor fetches instruction from memory using the Program Counter (PC)
DECODE	Control unit examines the opcode and determines the operation
EXECUTE	Processor performs the actual operation (arithmetic, logic, transfer)
WRITE BACK	Results are written to registers or memory

Key Processor Registers

The processor uses several special-purpose registers:

Register	Full Name	Purpose	Visibility
PC	Program Counter	Holds address of the next instruction	Programmer visible
IR	Instruction Register	Holds the currently executing instruction	Programmer visible
MAR	Memory Address Register	Holds address for memory access	Internal
MDR	Memory Data Register	Buffers data to/from memory	Internal
Y	Temporary Register	Holds one ALU input operand	Internal (transparent)
Z	Temporary Register	Holds ALU output result	Internal (transparent)
TEMP	Temporary Register	General temporary storage	Internal (transparent)

💡 Note: Registers marked as "transparent" are invisible to the programmer — you don't need to worry about them when writing assembly code, but they're crucial for the processor's internal operation.

Register Transfers

The basic operation inside a processor is the register transfer — moving data from one register to another, often through the ALU. We express these transfers using a notation called Register Transfer Language (RTL).

Examples:

R1 ← [R2] means "copy the contents of R2 into R1"
R3 ← [R1] + [R2] means "add contents of R1 and R2, store result in R3"
PC ← [PC] + 4 means "increment PC by 4"

Single-Bus Datapath Organization

The simplest processor organization uses a single internal bus to connect all components. While slower than multi-bus designs, it's easier to understand and requires less hardware.

Single-Bus Architecture Diagram

┌────────────────────────────────────────────────────────────────────────────┐
│                           SINGLE-BUS DATAPATH                              │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│                        ┌─────────────────────┐                             │
│                        │   Control Signals   │                             │
│                        │   from Control Unit │                             │
│                        └──────────┬──────────┘                             │
│                                   │                                        │
│    ┌──────────────────────────────┼─────────────────────────────────┐      │
│    │                              ▼                                 │      │
│    │  ◄═══════════════════ INTERNAL BUS ═══════════════════════════►│      │
│    │         ▲          ▲         ▲         ▲         ▲             │      │
│    │         │          │         │         │         │             │      │
│    │    ┌────┴────┐ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐         │      │
│    │    │   PC    │ │  IR   │ │  MAR  │ │  MDR  │ │R0-Rn-1│         │      │
│    │    │         │ │       │ │       │ │       │ │       │         │      │
│    │    └────┬────┘ └───────┘ └───┬───┘ └───┬───┘ └───┬───┘         │      │
│    │         │                    │         │         │             │      │
│    │         │                    ▼         ▼         │             │      │
│    │         │              ┌───────────────────┐     │             │      │
│    │         │              │   Memory Bus      │     │             │      │
│    │         │              │   Interface       │     │             │      │
│    │         │              └─────────┬─────────┘     │             │      │
│    │         │                        │               │             │      │
│    │         │                        ▼               │             │      │
│    │         │              ┌───────────────────┐     │             │      │
│    │         │              │   Main Memory     │     │             │      │
│    │         │              └───────────────────┘     │             │      │
│    │         │                                        │             │      │
│    │         │          ┌─────────────────────────────┘             │      │
│    │         │          │                                           │      │
│    │         │          ▼                                           │      │
│    │         │     ┌─────────┐                                      │      │
│    │         │     │    Y    │◄── Temporary Register                │      │
│    │         │     └────┬────┘    (Holds one ALU input)             │      │
│    │         │          │                                           │      │
│    │         │          ▼                                           │      │
│    │         │   ┌─────────────┐      ┌──────────┐                  │      │
│    │         │   │             │      │ Constant │                  │      │
│    │         └──►│     MUX     │◄─────│    4     │                  │      │
│    │             │   (Select)  │      └──────────┘                  │      │
│    │             └──────┬──────┘                                    │      │
│    │                    │                                           │      │
│    │                    ▼                                           │      │
│    │             ┌─────────────┐                                    │      │
│    │             │             │                                    │      │
│    │             │     ALU     │◄── Add, Sub, AND, OR, etc.         │      │
│    │             │             │                                    │      │
│    │             │   Input A   │◄── From MUX (Y, Constant 4, etc.)  │      │
│    │             │             │                                    │      │
│    │             │   Input B   │◄── ALWAYS from BUS                 │      │
│    │             │             │                                    │      │
│    │             └──────┬──────┘                                    │      │
│    │                    │                                           │      │
│    │                    ▼                                           │      │
│    │             ┌─────────────┐                                    │      │
│    │             │      Z      │◄── ALU Output Register             │      │
│    │             └──────┬──────┘                                    │      │
│    │                    │                                           │      │
│    │                    ▼                                           │      │
│    │  ◄═══════════════════════════════════════════════════════════► │      │
│    │                       INTERNAL BUS                             │      │
│    └────────────────────────────────────────────────────────────────┘      │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Component Description

Internal Bus: A single shared pathway that connects all registers. Only ONE register can place data on the bus at any given time. The bus feeds directly into ALU Input B.

ALU (Arithmetic Logic Unit): Performs all arithmetic and logical operations. It has TWO inputs:

Input A: Comes from the multiplexer (can be Y register, constant 4, or other sources)
Input B: ALWAYS comes directly from the bus

This is crucial! Since the bus can only carry one value at a time, and the ALU needs two operands:

The first operand is saved in register Y
The second operand comes through the bus to Input B
Y feeds into the MUX, which routes to Input A
The ALU can then perform the operation: [Input A] op [Input B]

Multiplexer (MUX): Selects one of multiple inputs to feed into ALU Input A:

Select4: Choose constant 4 (for incrementing PC: 4 + PC from bus)
SelectY: Choose contents of register Y (for operations like R4 + R5)

Register Y: A temporary register that holds one operand. Since the bus can only carry one value at a time, we need Y to "remember" the first operand while we fetch the second through the bus.

Register Z: Holds the output of the ALU until it can be transferred to its destination via the bus.

Control Signals

The control unit generates signals that orchestrate data movement:

Signal	Meaning
`PCout`	Place PC contents on the bus
`PCin`	Load bus contents into PC
`MARin`	Load bus contents into MAR
`MDRout`	Place MDR contents on the bus
`MDRin`	Load bus contents into MDR
`IRin`	Load bus contents into IR
`Yin`	Load bus contents into Y
`Zin`	Load ALU result into Z
`Zout`	Place Z contents on the bus
`R1out, R2out, ...`	Place register contents on the bus
`R1in, R2in, ...`	Load bus contents into register
`Read`	Initiate memory read operation
`Write`	Initiate memory write operation
`Select4`	Select constant 4 for MUX
`SelectY`	Select Y register for MUX
`Add`	ALU performs addition
`Sub`	ALU performs subtraction
`WMFC`	Wait for Memory Function Complete
`End`	Signals end of instruction execution

How PC + 4 Works in Single-Bus

This is a common point of confusion. Here's exactly what happens:

Step 1 Signals: PCout, MARin, Read, Select4, Add, Zin

PCout → PC value (e.g., 100) goes onto the bus
MARin → MAR captures the PC value from the bus (for instruction fetch)
Simultaneously:
- Select4 → MUX routes constant 4 to ALU Input A
- The bus carries PC to ALU Input B (Input B is always from bus!)
- Add → ALU computes: 4 + PC = 4 + 100 = 104
- Zin → Result (104) is stored in register Z

So we use the same bus value (PC) for two purposes in one clock cycle: sending to MAR for fetch, and feeding into the ALU for incrementing.

Single-Bus: Complete Instruction Examples

Let's trace through complete instructions to see how all components work together.

1. ADD R4, R5, R6 (Single-Bus)

Instruction: Add the contents of R4 and R5, store result in R6

RTL: R6 ← [R4] + [R5]

Step	Control Signals	Explanation	RTL
1	`PCout, MARin, Read, Select4, Add, Zin`	Fetch begins: PC goes to MAR to fetch instruction. Simultaneously, constant 4 (Input A) and PC from bus (Input B) are added in ALU, result goes to Z.	MAR ← [PC], Z ← [PC]+4
2	`Zout, PCin, Yin, WMFC`	Update PC and save it: Z (which contains PC+4) goes onto bus. PC is updated to point to next instruction. Crucially, we ALSO save this PC+4 value in Y just in case the instruction we're fetching is a branch—we'll need this incremented PC value. Wait for memory.	PC ← [Z], Y ← [Z]
3	`MDRout, IRin`	Load instruction: Memory has delivered the instruction to MDR. Move it to IR for decoding.	IR ← [MDR]
4	`R4out, Yin`	Save first operand: R4 goes onto bus and is saved in Y. We need Y to hold R4 because the bus can only carry one value at a time, and we'll need R4 when we add it to R5.	Y ← [R4]
5	`R5out, SelectY, Add, Zin`	Perform addition: R5 goes onto bus (to Input B). Y containing R4 is selected by MUX (to Input A). ALU adds them: R4 + R5. Result goes to Z.	Z ← [Y] + [R5]
6	`Zout, R6in, End`	Store result: Z goes onto bus, R6 captures it. Instruction complete.	R6 ← [Z]

Key Points:

Step 2 stores PC+4 in Y even though this is an ADD instruction. This is standard practice—if it were a branch, we'd need that value.
Step 4-5 shows why Y is essential: we must save R4 in Y before bringing R5 to the bus, because the ALU needs both simultaneously.

2. Branch OFFSET (Unconditional) (Single-Bus)

Instruction: Branch to address PC + offset

RTL: PC ← [PC] + offset

Step	Control Signals	Explanation	RTL
1	`PCout, MARin, Read, Select4, Add, Zin`	Fetch begins: Same as all instructions—PC to MAR for fetch, calculate PC+4 in ALU and store in Z.	MAR ← [PC], Z ← [PC]+4
2	`Zout, PCin, Yin, WMFC`	Update and save PC: PC is updated to PC+4 (next sequential instruction). We save PC+4 in Y because we'll need it to calculate the branch target. Wait for memory.	PC ← [Z], Y ← [Z]
3	`MDRout, IRin`	Load branch instruction: Instruction delivered from memory to IR. Now we know it's a branch.	IR ← [MDR]
4	`Offset-field-of-IRout, SelectY, Add, Zin`	Calculate branch target: The offset field from the instruction goes onto the bus (to Input B). Y containing PC+4 is selected (to Input A). ALU calculates: (PC+4) + offset. This is why we saved PC+4 in step 2! Result goes to Z.	Z ← [Y] + [offset]
5	`Zout, PCin, End`	Jump to target: The calculated target address goes from Z onto bus and into PC. Next instruction will be fetched from this new address.	PC ← [Z]

Key Points:

The branch target is calculated as (PC+4) + offset, which is why saving PC+4 in Y during step 2 is crucial.
This is a PC-relative branch—the offset is relative to the already-incremented PC.

3. Load R1, 20(R2) (Single-Bus)

Instruction: Load from memory address [R2]+20 into R1

RTL: R1 ← [[R2] + 20]

Step	Control Signals	Explanation	RTL
1	`PCout, MARin, Read, Select4, Add, Zin`	Fetch begins: Standard fetch—PC to MAR, calculate PC+4.	MAR ← [PC], Z ← [PC]+4
2	`Zout, PCin, Yin, WMFC`	Update PC: Update PC to point to next instruction, save in Y (in case we need it). Wait for memory to deliver instruction.	PC ← [Z], Y ← [Z]
3	`MDRout, IRin`	Load instruction: Instruction from memory moves to IR. Now we know it's a Load with offset addressing.	IR ← [MDR]
4	`R2out, Yin`	Save base address: R2 (base address) goes onto bus and is saved in Y. We need to save R2 because we'll add the offset to it in the next step.	Y ← [R2]
5	`Offset-field-of-IRout, SelectY, Add, Zin`	Calculate effective address: The offset (20) from the instruction goes onto bus (to Input B). Y containing R2 is selected (to Input A). ALU calculates: R2 + 20. This is the memory address we want to read from. Result goes to Z.	Z ← [Y] + 20
6	`Zout, MARin, Read`	Initiate memory read: The effective address from Z goes to MAR. Start reading from memory at address [R2]+20.	MAR ← [Z]
7	`WMFC`	Wait for memory: Memory read takes time—wait for the data to arrive in MDR.	-
8	`MDRout, R1in, End`	Load data into R1: Data from memory is now in MDR. Move it to R1. Instruction complete.	R1 ← [MDR]

Key Points:

Steps 4-5 calculate the effective address using Y register to hold the base address.
Step 6-7 perform the actual memory read—this is the second memory access (first was instruction fetch).
Total: 8 steps for a load instruction.

4. Store R1, 20(R2) (Single-Bus)

Instruction: Store R1 to memory address [R2]+20

RTL: [[R2] + 20] ← [R1]

Step	Control Signals	Explanation	RTL
1	`PCout, MARin, Read, Select4, Add, Zin`	Fetch begins: Standard fetch sequence—PC to MAR for instruction fetch, calculate PC+4.	MAR ← [PC], Z ← [PC]+4
2	`Zout, PCin, Yin, WMFC`	Update PC: Update PC to next instruction, save in Y. Wait for memory to deliver instruction.	PC ← [Z], Y ← [Z]
3	`MDRout, IRin`	Load instruction: Instruction delivered to IR. Now we know it's a Store with offset.	IR ← [MDR]
4	`R2out, Yin`	Save base address: R2 (base address) goes to Y. We need it to calculate the effective address.	Y ← [R2]
5	`Offset-field-of-IRout, SelectY, Add, Zin`	Calculate effective address: Offset (20) on bus (to Input B), Y containing R2 to Input A. ALU calculates: R2 + 20. This is where we'll store the data. Result to Z.	Z ← [Y] + 20
6	`Zout, MARin`	Send address to MAR: Effective address from Z goes to MAR. This is the memory location where we'll write.	MAR ← [Z]
7	`R1out, MDRin, Write`	Initiate memory write: R1 (the data to store) goes onto bus and into MDR. Start the memory write operation.	MDR ← [R1]
8	`WMFC, End`	Wait for write to complete: Memory write takes time. Wait until it's done. Instruction complete.	-

Key Points:

Very similar to Load, but we're writing instead of reading.
Step 7 moves the data from R1 into MDR and starts the write.
The effective address calculation (steps 4-6) is identical to Load.

Three-Bus Datapath Organization

The single-bus design is simple but slow — we can only do one transfer per clock cycle. A three-bus organization allows multiple simultaneous transfers, dramatically improving performance.

Three-Bus Architecture Diagram

┌──────────────────────────────────────────────────────────────────────────────┐
│                         THREE-BUS DATAPATH                                   │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                                                                              │
│     ══════════════════════ BUS A (Source A) ═══════════════════════          │
│            │                                                │                │
│            │                                                │                │
│       ┌────▼────────────────────────────────────────────────▼─────┐          │
│       │                                                           │          │
│       │              REGISTER FILE (R0 - Rn-1)                    │          │
│       │                                                           │          │
│       │          Read Port A              Read Port B             │          │
│       │              │                         │                  │          │
│       └──────────────┼─────────────────────────┼──────────────────┘          │
│                      │                         │                             │
│                      │                         │                             │
│     ══════════════════▼═══════════ BUS A ═══════▼══════════════              │
│                      │                         │                             │
│                      │                         │                             │
│                      │     ══════════════════════▼══════════ BUS B           │
│                      │     │                    │                            │
│                      │     │                    │                            │
│                      ▼     ▼                    │                            │
│                 ┌─────────────────┐             │                            │
│                 │                 │             │                            │
│                 │      A L U      │             │                            │
│                 │                 │             │                            │
│                 │   Input A  ◄────┼─────────────┘                            │
│                 │            (from Bus A)                                    │
│                 │                 │                                          │
│                 │   Input B  ◄────┼──────────────┐                           │
│                 │            (from Bus B)        │                           │
│                 │                 │              │                           │
│                 └────────┬────────┘              │                           │
│                          │                       │                           │
│                          │  ALU Result           │                           │
│                          │                       │                           │
│     ═════════════════════▼═══════════════════════▼═══════ BUS C (Result) ══  │
│                          │                       │                           │
│                          │                       │                           │
│       ┌──────────────────┼───────────────────────┼──────────────────┐        │
│       │                  │                       │                  │        │
│       │        Write Port (from Bus C)           │                  │        │
│       │                  │                       │                  │        │
│       │              REGISTER FILE               │                  │        │
│       │                                          │                  │        │
│       └──────────────────────────────────────────┼──────────────────┘        │
│                                                  │                           │
│                                                  │                           │
│  ┌─────────────────────────────────────────────────────────────────────┐     │
│  │                     SPECIAL REGISTERS & MEMORY                      │     │
│  │                                                                     │     │
│  │    ┌────┐         ┌───────────┐                                     │     │
│  │    │ PC │◄────────│Incrementer│◄────── Bus C or Bus B               │     │
│  │    └──┬─┘         │   (+4)    │                                     │     │
│  │       │           └───────────┘                                     │     │
│  │       │ Address                                                     │     │
│  │       │                                                             │     │
│  │    ┌──▼─┐    ┌────┐                                                 │     │
│  │    │MAR │    │ IR │◄───────── Bus B (R=B signal)                    │     │
│  │    └──┬─┘    └────┘                                                 │     │
│  │       │                                                             │     │
│  │       │ Memory Address                                              │     │
│  │       │                                                             │     │
│  │       ▼                                                             │     │
│  │  ┌─────────────┐                                                    │     │
│  │  │   MEMORY    │                                                    │     │
│  │  │             │                                                    │     │
│  │  └──────┬──────┘                                                    │     │
│  │         │                                                           │     │
│  │         │ Memory Data                                               │     │
│  │         │                                                           │     │
│  │    ┌────▼────┐                                                      │     │
│  │    │   MDR   │───────────► Bus B (MDRoutB signal)                   │     │
│  │    └─────────┘                                                      │     │
│  │                                                                     │     │
│  └─────────────────────────────────────────────────────────────────────┘     │
│                                                                              │
│                                                                              │
│  DATA FLOW SUMMARY:                                                          │
│  • Bus A: Carries data from register file (Read Port A) to ALU Input A       │
│  • Bus B: Carries data from register file (Read Port B) to ALU Input B       │
│           Also used for special transfers (MDR→IR, PC→MAR, etc.)             │
│  • Bus C: Carries ALU results and other data to register file (Write Port)   │
│           Also feeds PC incrementer and can load special registers           │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Three-Bus Architecture: Detailed Explanation

The three-bus organization dramatically improves performance by allowing simultaneous data transfers on three separate buses:

Bus A (Source Bus A)

Purpose: Carries the first source operand from the register file to ALU Input A
Connected to: Register file Read Port A → ALU Input A
Example: In R6 ← R4 + R5, Bus A carries R4 to ALU Input A

Bus B (Source Bus B)

Purpose: Carries the second source operand from the register file to ALU Input B
Also used for: Special register transfers (MDR→IR, values to MAR, etc.)
Connected to:
- Register file Read Port B → ALU Input B
- MDR → Bus B (when MDRoutB is active)
- Bus B → IR, MAR (when R=B signal is active)
Example: In R6 ← R4 + R5, Bus B carries R5 to ALU Input B simultaneously with Bus A carrying R4

Bus C (Result/Destination Bus)

Purpose: Carries results back to the register file
Connected to:
- ALU Output → Bus C → Register file Write Port
- Bus C → PC (for branches and updates)
- Bus C → PC Incrementer (for calculating PC+4)
Example: In R6 ← R4 + R5, Bus C carries the sum directly to R6

Key Components

Register File with Multiple Ports:

Read Port A: Simultaneously reads one register onto Bus A
Read Port B: Simultaneously reads another register onto Bus B
Write Port: Writes data from Bus C into a destination register
All three operations can happen in the same clock cycle!

PC Incrementer:

Dedicated hardware that adds 4 to PC
Can operate in parallel with ALU operations
Connected to Bus C or receives PC value directly

Special Register Transfers:

R=B signal: "Register equals Bus B" — loads a register from Bus B
MDRoutB: Places MDR contents on Bus B
Used for transfers like: MDR→IR, PC→MAR, etc.

Why Three Buses Are Faster

Single-Bus Limitation:

Step 1: R4 → Bus → Y        (save first operand)
Step 2: R5 → Bus → ALU      (bring second operand, Y feeds to ALU)
Step 3: ALU result → Bus → R6
Total: 3 steps minimum for any ALU operation

Three-Bus Advantage:

Step 1: R4 → Bus A → ALU Input A  }
        R5 → Bus B → ALU Input B  } ALL SIMULTANEOUS!
        ALU result → Bus C → R6   }
Total: 1 step for same operation!

Control Signals in Three-Bus

Signal	Meaning
`R4outA`	Place R4 on Bus A
`R5outB`	Place R5 on Bus B
`R6in`	Load R6 from Bus C
`R=B`	Load destination register from Bus B (not Bus C)
`MDRoutB`	Place MDR contents on Bus B
`Add`, `Sub`, etc.	ALU operation (result automatically goes to Bus C)
`IncPC`	Increment PC using dedicated incrementer
`Read`, `Write`	Memory operations
`WMFC`	Wait for Memory Function Complete

Advantages of Three-Bus Design

Feature	Single-Bus	Three-Bus
Transfers per cycle	1	Up to 3
ALU operands	Sequential (need Y register)	Simultaneous (both at once)
Need for Y register	Yes (essential)	No (not needed for ALU ops)
Hardware complexity	Low	Higher
Speed	Slower	Much Faster
Register file ports	1 read, 1 write	2 read, 1 write
PC increment	Through ALU	Dedicated incrementer

Three-Bus: Complete Instruction Examples

Now let's see how the same instructions execute much faster on three-bus architecture.

1. ADD R4, R5, R6 (Three-Bus)

Instruction: Add the contents of R4 and R5, store result in R6

RTL: R6 ← [R4] + [R5]

Step	Control Signals	Explanation	RTL
1	`PCout, MARin, Read, IncPC`	Fetch and increment: PC value goes to MAR to fetch instruction. The dedicated PC incrementer calculates PC+4 simultaneously and updates PC. No ALU needed!	MAR ← [PC], PC ← [PC]+4
2	`WMFC`	Wait for memory: Wait for instruction to be delivered from memory to MDR.	-
3	`MDRoutB, R=B, IRin`	Load instruction: MDR places instruction on Bus B, and IR loads it. The R=B signal indicates we're loading from Bus B (not Bus C).	IR ← [MDR]
4	`R4outA, R5outB, Add, R6in`	Execute in ONE step! R4 goes on Bus A to ALU Input A. R5 goes on Bus B to ALU Input B. ALU adds them. Result goes on Bus C directly into R6. All simultaneous!	R6 ← [R4] + [R5]

Key Points:

4 steps total vs 6 steps in single-bus (33% faster!)
Step 4 is the magic: Both operands travel simultaneously to the ALU, and the result goes directly to destination
No Y register needed—we don't have to save and retrieve operands
PC incrementer works in parallel, so incrementing PC doesn't consume an ALU cycle

Comparison:

Single-bus ADD: 6 steps (need to save R4 in Y, then fetch R5, then add, then store)
Three-bus ADD: 4 steps (fetch, wait, decode, execute all in one)

2. Branch OFFSET (Unconditional) (Three-Bus)

Instruction: Branch to address PC + offset

RTL: PC ← [PC] + offset

Step	Control Signals	Explanation	RTL
1	`PCout, MARin, Read, IncPC`	Fetch and increment: PC to MAR for instruction fetch. PC incrementer calculates PC+4 and updates PC immediately.	MAR ← [PC], PC ← [PC]+4
2	`WMFC`	Wait for memory: Wait for branch instruction to be delivered to MDR.	-
3	`MDRoutB, R=B, IRin`	Load instruction: Instruction from MDR goes on Bus B to IR. Now we know it's a branch.	IR ← [MDR]
4	`PCoutA, Offset-field-of-IRout-B, Add, PCin`	Calculate and load target: Current PC (already incremented to PC+4) goes on Bus A. Offset from instruction goes on Bus B. ALU adds: (PC+4) + offset. Result goes on Bus C directly into PC. Done in one step!	PC ← [PC] + [offset]

Key Points:

4 steps total vs 5 steps in single-bus
Step 4 calculates branch target in one cycle—PC and offset go to ALU simultaneously
No need to save PC+4 in Y register like we did in single-bus
The incremented PC is still available when we need to add the offset

3. Load R1, 20(R2) (Three-Bus)

Instruction: Load from memory address [R2]+20 into R1

RTL: R1 ← [[R2] + 20]

Step	Control Signals	Explanation	RTL
1	`PCout, MARin, Read, IncPC`	Fetch and increment: PC to MAR for instruction fetch. PC increments to PC+4 using dedicated incrementer.	MAR ← [PC], PC ← [PC]+4
2	`WMFC`	Wait for instruction: Wait for memory to deliver instruction to MDR.	-
3	`MDRoutB, R=B, IRin`	Load instruction: Instruction from MDR on Bus B goes to IR. Now we know it's a Load with offset.	IR ← [MDR]
4	`R2outA, Offset-field-of-IRout-B, Add, MARin`	Calculate effective address: R2 (base address) goes on Bus A. Offset (20) goes on Bus B. ALU calculates: R2 + 20. Result goes on Bus C to MAR. All in one step!	MAR ← [R2] + 20
5	`Read, WMFC`	Read from memory: Start memory read from the effective address. Wait for data to arrive in MDR.	-
6	`MDRoutB, R=B, R1in`	Load data: Data from MDR goes on Bus B to R1. Instruction complete.	R1 ← [MDR]

Key Points:

6 steps total vs 8 steps in single-bus (25% faster!)
Step 4 calculates effective address in one cycle—base and offset added simultaneously
No need to save R2 in Y register first
The address calculation and transfer to MAR happen in the same cycle

4. Store R1, 20(R2) (Three-Bus)

Instruction: Store R1 to memory address [R2]+20

RTL: [[R2] + 20] ← [R1]

Step	Control Signals	Explanation	RTL
1	`PCout, MARin, Read, IncPC`	Fetch and increment: PC to MAR for instruction fetch. PC increments to PC+4.	MAR ← [PC], PC ← [PC]+4
2	`WMFC`	Wait for instruction: Wait for memory to deliver instruction to MDR.	-
3	`MDRoutB, R=B, IRin`	Load instruction: Instruction from MDR goes to IR via Bus B. Now we know it's a Store.	IR ← [MDR]
4	`R2outA, Offset-field-of-IRout-B, Add, MARin`	Calculate effective address: R2 on Bus A, offset (20) on Bus B. ALU calculates: R2 + 20. Result to MAR via Bus C. All simultaneous!	MAR ← [R2] + 20
5	`R1outB, R=B, MDRin, Write`	Write to memory: R1 goes on Bus B (using R=B signal) to MDR. Start memory write operation.	MDR ← [R1]
6	`WMFC`	Wait for write: Wait for memory write to complete. Instruction done.	-

Key Points:

6 steps total vs 8 steps in single-bus (25% faster!)
Step 4 calculates effective address in one cycle, just like Load
Step 5 moves R1 to MDR and starts write—note we use R=B because we're transferring on Bus B, not Bus C
Address calculation is identical to Load; only the data transfer direction differs

Execution Time Comparison: Single-Bus vs Three-Bus

Instruction	Single-Bus Steps	Three-Bus Steps	Speedup
ADD R4, R5, R6	6	4	33% faster
Branch OFFSET	5	4	20% faster
Load R1, 20(R2)	8	6	25% faster
Store R1, 20(R2)	8	6	25% faster

Why the difference?

Three-bus eliminates the need to save/retrieve operands in Y register
Multiple simultaneous transfers reduce sequential dependencies
Dedicated PC incrementer eliminates ALU usage for PC+4
More parallelism = fewer clock cycles per instruction

Hardwired Control Unit

The hardwired control unit generates control signals using dedicated logic circuits. It's fast but inflexible.

Block Diagram

┌────────────────────────────────────────────────────────────────────────────┐
│                        HARDWIRED CONTROL UNIT                              │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│    ┌─────────┐                                                             │
│    │  CLK    │──────────────────────────────────┐                          │
│    └─────────┘                                  │                          │
│                                                 ▼                          │
│    ┌─────────┐        ┌─────────────────────────────────────┐              │
│    │  Reset  │───────►│         CONTROL STEP                │              │
│    └─────────┘        │           COUNTER                   │              │
│                       │                                     │              │
│                       │  Counts: T1, T2, T3, ... Tn         │              │
│                       └──────────────┬──────────────────────┘              │
│                                      │                                     │
│                                      ▼                                     │
│                       ┌─────────────────────────────────────┐              │
│                       │         STEP DECODER                │              │
│                       │                                     │              │
│                       │    Outputs: T1  T2  T3 ... Tn       │              │
│                       └──────────────┬──────────────────────┘              │
│                                      │                                     │
│         ┌────────────────────────────┼────────────────────────────┐        │
│         │                            │                            │        │
│         │                            ▼                            │        │
│         │            ┌───────────────────────────────┐            │        │
│         │            │                               │            │        │
│    ┌────┴────┐       │          ENCODER              │       ┌────┴────┐   │
│    │   IR    │──────►│                               │◄──────│Condition│   │
│    │(Opcode) │       │    Combinational Logic        │       │ Codes   │   │
│    └─────────┘       │    (AND, OR, NOT gates)       │       │ (N,Z,C) │   │
│                      │                               │       └─────────┘   │
│    ┌─────────┐       │                               │       ┌─────────┐   │
│    │Instruction      │                               │◄──────│External │   │
│    │ Decoder │──────►│                               │       │ Inputs  │   │
│    └─────────┘       └───────────────┬───────────────┘       └─────────┘   │
│                                      │                                     │
│                                      ▼                                     │
│                      ════════════════════════════════════                  │
│                              CONTROL SIGNALS                               │
│                      ════════════════════════════════════                  │
│                           │    │    │    │    │    │                       │
│                           ▼    ▼    ▼    ▼    ▼    ▼                       │
│                         PCin PCout MARin Read Zin  ...                     │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

How It Works

Control Step Counter: Counts clock cycles (T1, T2, T3...) during instruction execution
Step Decoder: Converts counter value to one-hot encoding (only one Ti active)
Instruction Decoder: Decodes opcode to identify the instruction type
Encoder: Combinational logic that generates appropriate control signals

Generating the Zin Signal

The Zin signal (load ALU result into Z) is needed at different times for different instructions:

Step T1 for all instructions (during fetch, for PC+4)
Step T6 for ADD instruction (for storing sum)
Step T4 for Branch instruction (for branch target)

                    ┌─────────────────────────────┐
                    │                             │
        T1 ────────►│                             │
                    │                             │
        T6 ─────┬──►│         OR Gate             │────► Zin
                │   │                             │
        ADD ────┴──►│           ┌──┐              │
                    │     ──────│& │──────        │
        T4 ─────┬──►│           └──┘              │
                │   │                             │
        BR ─────┴──►│                             │
                    │                             │
                    └─────────────────────────────┘

Logic Expression: Zin = T1 + (T6 · ADD) + (T4 · BR) + ...

Generating the End Signal

The End signal terminates instruction execution. Different instructions end at different steps:

Logic Expression: End = (T7 · ADD) + (T5 · BR) + (T5 · N + T4 · N̄) · BRN + ...

For Branch<0 (BRN):

End at T5 if N=1 (branch taken)
End at T4 if N=0 (branch not taken)

Advantages and Disadvantages

✅ Advantages:

Very fast operation (minimal delay through gates)
No memory access needed for control signals
Efficient for simple instruction sets (RISC)
Natural fit for pipelining

❌ Disadvantages:

Complex design for large instruction sets
Difficult to modify once manufactured
Higher design and debugging time
Inflexible — adding instructions requires complete redesign
Circuit complexity grows rapidly with instruction set size

Microprogrammed Control Unit

The microprogrammed control unit uses a small program (microprogram) stored in a special memory to generate control signals. Each instruction is implemented by a sequence of microinstructions.

Key Terminology

Term	Definition
Control Word	A word whose bits represent individual control signals
Microinstruction	A single control word in the microprogram
Microroutine	Sequence of microinstructions for one machine instruction
Microprogram	Collection of all microroutines stored in control memory
Control Store (CS)	ROM/RAM that holds the microprogram
μPC	Micro Program Counter — points to current microinstruction

Block Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                      MICROPROGRAMMED CONTROL UNIT                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│    ┌─────────┐                                                              │
│    │   IR    │                                                              │
│    │(Opcode) │                                                              │
│    └────┬────┘                                                              │
│         │                                                                   │
│         ▼                                                                   │
│    ┌─────────────────────────────────┐                                      │
│    │    STARTING ADDRESS             │                                      │
│    │       GENERATOR                 │                                      │
│    │                                 │                                      │
│    │  Maps opcode to starting        │                                      │
│    │  address of microroutine        │                                      │
│    └────────────┬────────────────────┘                                      │
│                 │                                                           │
│                 │  Starting Address                                         │
│                 ▼                                                           │
│    ┌─────────────────────────────────┐         ┌───────────────────┐        │
│    │                                 │         │                   │        │
│    │           μPC                   │◄────────│    +1             │        │
│    │   (Micro Program Counter)       │         │  (Incrementer)    │        │
│    │                                 │         │                   │        │
│    └────────────┬────────────────────┘         └───────────────────┘        │
│                 │                                       ▲                   │
│                 │  Address                              │                   │
│                 ▼                                       │                   │
│    ┌─────────────────────────────────────────────────────────────────┐      │
│    │                                                                 │      │
│    │                    CONTROL STORE (CS)                           │      │
│    │                                                                 │      │
│    │   ┌─────────────────────────────────────────────────────────┐   │      │
│    │   │ Address │        Microinstruction                       │   │      │
│    │   ├─────────┼───────────────────────────────────────────────┤   │      │
│    │   │   000   │  PCout, MARin, Read, Select4, Add, Zin        │   │      │
│    │   │   001   │  Zout, PCin, Yin, WMFC                        │   │      │
│    │   │   002   │  MDRout, IRin                                 │   │      │
│    │   │   003   │  (Branch to microroutine based on opcode)     │   │      │
│    │   │   ...   │  ...                                          │   │      │
│    │   │   025   │  ADD microroutine starts here                 │   │      │
│    │   │   ...   │  ...                                          │   │      │
│    │   └─────────┴───────────────────────────────────────────────┘   │      │
│    │                                                                 │      │
│    └────────────────────────────┬────────────────────────────────────┘      │
│                                 │                                           │
│                                 │  Microinstruction                         │
│                                 ▼                                           │
│    ┌─────────────────────────────────────────────────────────────────┐      │
│    │                                                                 │      │
│    │                         DECODER                                 │      │
│    │            (if using encoded microinstructions)                 │      │
│    │                                                                 │      │
│    └────────────────────────────┬────────────────────────────────────┘      │
│                                 │                                           │
│                                 ▼                                           │
│                 ════════════════════════════════════                        │
│                         CONTROL SIGNALS                                     │
│                 ════════════════════════════════════                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Microinstruction Format

┌────────────────────────────────────────────────────────────────────────────┐
│                       MICROINSTRUCTION FORMAT                              │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  HORIZONTAL FORMAT (Wide):                                                 │
│  ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐             │
│  │PC│PC│MA│MD│MD│IR│Y │Z │Z │R1│R1│Se│Se│Ad│Su│Re│Wr│WM│En│  │             │
│  │ou│in│Ri│Ro│Ri│in│in│in│ou│ou│in│l4│lY│d │b │ad│it│FC│d │  │             │
│  │t │  │n │ut│n │  │  │  │t │t │  │  │  │  │  │  │e │  │  │  │             │
│  ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤             │
│  │ 1│ 0│ 1│ 0│ 0│ 0│ 0│ 1│ 0│ 0│ 0│ 1│ 0│ 1│ 0│ 1│ 0│ 0│ 0│  │             │
│  └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘             │
│                                                                            │
│  Each bit directly controls one signal - simple but WIDE                   │
│                                                                            │
│  ──────────────────────────────────────────────────────────────────────    │
│                                                                            │
│  VERTICAL FORMAT (Narrow):                                                 │
│  ┌────────────┬────────────┬────────────┬────────────────┐                 │
│  │   F1       │     F2     │     F3     │  Next Address  │                 │
│  │ (4 bits)   │  (3 bits)  │  (3 bits)  │   (8 bits)     │                 │
│  │            │            │            │                │                 │
│  │ Encoded    │  Encoded   │  ALU       │  Branch        │                 │
│  │ Source     │  Dest      │  Function  │  Target        │                 │
│  └────────────┴────────────┴────────────┴────────────────┘                 │
│                                                                            │
│  Requires decoder but uses less memory                                     │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Horizontal vs Vertical Microprogramming

Aspect	Horizontal	Vertical
Width	Wide (many bits)	Narrow (fewer bits)
Encoding	None (direct)	Encoded fields
Decoder	Not needed	Required
Speed	Faster	Slower
Control Store Size	Larger	Smaller
Parallelism	High	Limited
Flexibility	Less	More

Microprogram Sequencing

The μPC typically increments automatically, but sometimes we need to branch:

When μPC changes:

Normal: Increment by 1
End of instruction: Load starting address of next instruction's microroutine
Microbranch: Load branch target address
Conditional microbranch: Load target only if condition is true

Example Microroutine for `Add (R3), R1`

Address	Microinstruction	Comment
000	PCout, MARin, Read, Select4, Add, Zin	Common fetch - step 1
001	Zout, PCin, Yin, WMFC	Common fetch - step 2
002	MDRout, IRin, BRANCH	Common fetch - step 3, then branch
...	...	...
025	R3out, MARin, Read	ADD microroutine begins
026	R1out, Yin, WMFC	Save R1, wait for memory
027	MDRout, SelectY, Add, Zin	Perform addition
028	Zout, R1in, End	Store result, end

Advantages and Disadvantages

✅ Advantages:

Highly flexible and easy to modify
Can fix bugs by changing microcode
Simplifies design of complex instruction sets (CISC)
Can add new instructions without hardware changes
Easier debugging and testing
Supports complex addressing modes naturally

❌ Disadvantages:

Slower than hardwired (memory access overhead)
Requires control store memory
More complex to optimize for speed
Doesn't lend well to pipelining
Extra level of indirection adds delay

Hardwired vs Microprogrammed Comparison

Side-by-Side Comparison

Characteristic	Hardwired	Microprogrammed
Implementation	Fixed logic circuits	Stored microprogram
Speed	⚡ Faster	🐢 Slower
Flexibility	❌ Inflexible	✅ Highly flexible
Modification	Requires redesign	Change microcode
Complexity	Increases with instruction set	Manages complexity well
Cost	Lower for simple ISA	Lower for complex ISA
Design Time	Longer	Shorter
Bug Fixes	New chip required	Microcode update
Typical Use	RISC processors	CISC processors
Pipelining	✅ Natural fit	❌ Difficult

Visual Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONTROL UNIT COMPARISON                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   HARDWIRED                              MICROPROGRAMMED                    │
│                                                                             │
│   ┌───────────────────┐                  ┌───────────────────┐              │
│   │                   │                  │                   │              │
│   │  ┌─────┐  ┌─────┐ │                  │   ┌───────────┐   │              │
│   │  │ AND │  │ OR  │ │                  │   │  Control  │   │              │
│   │  └──┬──┘  └──┬──┘ │                  │   │   Store   │   │              │
│   │     │        │    │                  │   │   (ROM)   │   │              │
│   │  ┌──┴────────┴──┐ │                  │   └─────┬─────┘   │              │
│   │  │     NOT      │ │                  │         │         │              │
│   │  └──────┬───────┘ │                  │   ┌─────▼─────┐   │              │
│   │         │         │                  │   │   μPC     │   │              │
│   │  Fixed Logic      │                  │   └───────────┘   │              │
│   │  Circuits         │                  │                   │              │
│   │                   │                  │   Stored Program  │              │
│   └─────────┬─────────┘                  └─────────┬─────────┘              │
│             │                                      │                        │
│             ▼                                      ▼                        │
│   ┌─────────────────┐                    ┌─────────────────┐                │
│   │ Control Signals │                    │ Control Signals │                │
│   └─────────────────┘                    └─────────────────┘                │
│                                                                             │
│   Speed: ████████████░░                  Speed: ████████░░░░░░              │
│   Flexibility: ███░░░░░░░               Flexibility: ███████████░           │
│   Complexity handling: ███░░░░          Complexity handling: ██████████░    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

When to Use Which?

Summary and Key Takeaways

The Big Picture

Key Concepts to Remember

Datapath Components: The processor contains registers (PC, IR, MAR, MDR, general-purpose), ALU, and buses that connect them.
Control Signals: Every data movement and operation is controlled by signals like PCout, MARin, Read, Add, etc.
Single-Bus Architecture:
- Only one data transfer per clock cycle
- ALU Input A: From MUX (can select Y, constant 4, etc.)
- ALU Input B: ALWAYS from the bus
- Y register essential for saving first operand
- Z register holds ALU result
- PC+4 calculation: constant 4 (Input A) + PC from bus (Input B)
Three-Bus Advantage:
- Bus A and Bus B carry operands simultaneously to ALU
- Bus C carries results back to registers
- Dramatically reduces execution time (25-33% faster)
- No Y register needed for basic ALU operations
- Dedicated PC incrementer works in parallel
Hardwired Control: Uses combinational logic to generate control signals. Fast but inflexible — suited for RISC.
Microprogrammed Control: Uses a stored microprogram. Slower but flexible — suited for CISC.
WMFC (Wait for Memory Function Complete): Synchronizes processor with slower memory operations.
Y Register Purpose: In single-bus, Y holds the first operand while the second operand comes through the bus, allowing the ALU to have both inputs simultaneously.

Practice Problems

Try working through these to test your understanding:

Write the control sequence for Sub R1, R2 (subtract R2 from R1, store in R1) using single-bus organization.
How many clock cycles does Add (R3), R1 take on single-bus vs three-bus?
Design the logic circuit to generate the Read signal for fetch and memory-reference instructions.
Explain why microprogrammed control makes it easier to implement complex addressing modes.
If a processor has 32 control signals, how wide would a horizontal microinstruction be? How could vertical encoding reduce this?
In single-bus, why can't we do R6 ← R4 + R5 in one step? Trace through what would go wrong.
For three-bus architecture, explain how the register file supports reading two registers simultaneously.
Calculate the speedup if a program has 40% ALU instructions, 30% loads, 20% stores, and 10% branches when moving from single-bus to three-bus.

Command Palette

Fundamental Concepts

The Instruction Cycle

Key Processor Registers

Register Transfers

Single-Bus Datapath Organization

Single-Bus Architecture Diagram

Component Description

Control Signals

How PC + 4 Works in Single-Bus

Single-Bus: Complete Instruction Examples

1. ADD R4, R5, R6 (Single-Bus)

2. Branch OFFSET (Unconditional) (Single-Bus)

3. Load R1, 20(R2) (Single-Bus)

4. Store R1, 20(R2) (Single-Bus)

Three-Bus Datapath Organization

Three-Bus Architecture Diagram

Three-Bus Architecture: Detailed Explanation

Bus A (Source Bus A)

Bus B (Source Bus B)

Bus C (Result/Destination Bus)

Key Components

Why Three Buses Are Faster

Control Signals in Three-Bus

Advantages of Three-Bus Design

Three-Bus: Complete Instruction Examples

1. ADD R4, R5, R6 (Three-Bus)

2. Branch OFFSET (Unconditional) (Three-Bus)

3. Load R1, 20(R2) (Three-Bus)

4. Store R1, 20(R2) (Three-Bus)

Execution Time Comparison: Single-Bus vs Three-Bus

Hardwired Control Unit

Block Diagram

How It Works

Generating the Zin Signal

Generating the End Signal

Advantages and Disadvantages

Microprogrammed Control Unit

Key Terminology

Block Diagram

Microinstruction Format

Horizontal vs Vertical Microprogramming

Microprogram Sequencing

Example Microroutine for Add (R3), R1

Advantages and Disadvantages

Hardwired vs Microprogrammed Comparison

Side-by-Side Comparison

Visual Comparison

When to Use Which?

Summary and Key Takeaways

The Big Picture

Key Concepts to Remember

Practice Problems

Comments

More from this blog

Example Microroutine for `Add (R3), R1`