Datapath and Control
The Basic Processing Unit is the brain of every computer. It fetches instructions from memory, decodes them, and executes the operations they specify. Whether you're running a simple addition or a complex algorithm, every operation flows through this fundamental unit.
Fundamental Concepts
Before diving into the datapath, let's establish the foundational concepts.
The Instruction Cycle
Every instruction goes through a predictable cycle:
| Phase | Description |
| FETCH | Processor fetches instruction from memory using the Program Counter (PC) |
| DECODE | Control unit examines the opcode and determines the operation |
| EXECUTE | Processor performs the actual operation (arithmetic, logic, transfer) |
| WRITE BACK | Results are written to registers or memory |
Key Processor Registers
The processor uses several special-purpose registers:
| Register | Full Name | Purpose | Visibility |
| PC | Program Counter | Holds address of the next instruction | Programmer visible |
| IR | Instruction Register | Holds the currently executing instruction | Programmer visible |
| MAR | Memory Address Register | Holds address for memory access | Internal |
| MDR | Memory Data Register | Buffers data to/from memory | Internal |
| Y | Temporary Register | Holds one ALU input operand | Internal (transparent) |
| Z | Temporary Register | Holds ALU output result | Internal (transparent) |
| TEMP | Temporary Register | General temporary storage | Internal (transparent) |
π‘ Note: Registers marked as "transparent" are invisible to the programmer β you don't need to worry about them when writing assembly code, but they're crucial for the processor's internal operation.
Register Transfers
The basic operation inside a processor is the register transfer β moving data from one register to another, often through the ALU. We express these transfers using a notation called Register Transfer Language (RTL).
Examples:
R1 β [R2]means "copy the contents of R2 into R1"R3 β [R1] + [R2]means "add contents of R1 and R2, store result in R3"PC β [PC] + 4means "increment PC by 4"
Single-Bus Datapath Organization
The simplest processor organization uses a single internal bus to connect all components. While slower than multi-bus designs, it's easier to understand and requires less hardware.
Single-Bus Architecture Diagram
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SINGLE-BUS DATAPATH β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββ β
β β Control Signals β β
β β from Control Unit β β
β ββββββββββββ¬βββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ β
β β βΌ β β
β β ββββββββββββββββββββ INTERNAL BUS ββββββββββββββββββββββββββββΊβ β
β β β² β² β² β² β² β β
β β β β β β β β β
β β ββββββ΄βββββ βββββ΄ββββ βββββ΄ββββ βββββ΄ββββ βββββ΄ββββ β β
β β β PC β β IR β β MAR β β MDR β βR0-Rn-1β β β
β β β β β β β β β β β β β β
β β ββββββ¬βββββ βββββββββ βββββ¬ββββ βββββ¬ββββ βββββ¬ββββ β β
β β β β β β β β
β β β βΌ βΌ β β β
β β β βββββββββββββββββββββ β β β
β β β β Memory Bus β β β β
β β β β Interface β β β β
β β β βββββββββββ¬ββββββββββ β β β
β β β β β β β
β β β βΌ β β β
β β β βββββββββββββββββββββ β β β
β β β β Main Memory β β β β
β β β βββββββββββββββββββββ β β β
β β β β β β
β β β βββββββββββββββββββββββββββββββ β β
β β β β β β
β β β βΌ β β
β β β βββββββββββ β β
β β β β Y ββββ Temporary Register β β
β β β ββββββ¬βββββ (Holds one ALU input) β β
β β β β β β
β β β βΌ β β
β β β βββββββββββββββ ββββββββββββ β β
β β β β β β Constant β β β
β β ββββΊβ MUX ββββββββ 4 β β β
β β β (Select) β ββββββββββββ β β
β β ββββββββ¬βββββββ β β
β β β β β
β β βΌ β β
β β βββββββββββββββ β β
β β β β β β
β β β ALU ββββ Add, Sub, AND, OR, etc. β β
β β β β β β
β β β Input A ββββ From MUX (Y, Constant 4, etc.) β β
β β β β β β
β β β Input B ββββ ALWAYS from BUS β β
β β β β β β
β β ββββββββ¬βββββββ β β
β β β β β
β β βΌ β β
β β βββββββββββββββ β β
β β β Z ββββ ALU Output Register β β
β β ββββββββ¬βββββββ β β
β β β β β
β β βΌ β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊ β β
β β INTERNAL BUS β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Component Description
Internal Bus: A single shared pathway that connects all registers. Only ONE register can place data on the bus at any given time. The bus feeds directly into ALU Input B.
ALU (Arithmetic Logic Unit): Performs all arithmetic and logical operations. It has TWO inputs:
Input A: Comes from the multiplexer (can be Y register, constant 4, or other sources)
Input B: ALWAYS comes directly from the bus
This is crucial! Since the bus can only carry one value at a time, and the ALU needs two operands:
The first operand is saved in register Y
The second operand comes through the bus to Input B
Y feeds into the MUX, which routes to Input A
The ALU can then perform the operation: [Input A] op [Input B]
Multiplexer (MUX): Selects one of multiple inputs to feed into ALU Input A:
Select4: Choose constant 4 (for incrementing PC: 4 + PC from bus)SelectY: Choose contents of register Y (for operations like R4 + R5)
Register Y: A temporary register that holds one operand. Since the bus can only carry one value at a time, we need Y to "remember" the first operand while we fetch the second through the bus.
Register Z: Holds the output of the ALU until it can be transferred to its destination via the bus.
Control Signals
The control unit generates signals that orchestrate data movement:
| Signal | Meaning |
PCout | Place PC contents on the bus |
PCin | Load bus contents into PC |
MARin | Load bus contents into MAR |
MDRout | Place MDR contents on the bus |
MDRin | Load bus contents into MDR |
IRin | Load bus contents into IR |
Yin | Load bus contents into Y |
Zin | Load ALU result into Z |
Zout | Place Z contents on the bus |
R1out, R2out, ... | Place register contents on the bus |
R1in, R2in, ... | Load bus contents into register |
Read | Initiate memory read operation |
Write | Initiate memory write operation |
Select4 | Select constant 4 for MUX |
SelectY | Select Y register for MUX |
Add | ALU performs addition |
Sub | ALU performs subtraction |
WMFC | Wait for Memory Function Complete |
End | Signals end of instruction execution |
How PC + 4 Works in Single-Bus
This is a common point of confusion. Here's exactly what happens:
Step 1 Signals: PCout, MARin, Read, Select4, Add, Zin
PCoutβ PC value (e.g., 100) goes onto the busMARinβ MAR captures the PC value from the bus (for instruction fetch)Simultaneously:
Select4β MUX routes constant 4 to ALU Input AThe bus carries PC to ALU Input B (Input B is always from bus!)
Addβ ALU computes: 4 + PC = 4 + 100 = 104Zinβ Result (104) is stored in register Z
So we use the same bus value (PC) for two purposes in one clock cycle: sending to MAR for fetch, and feeding into the ALU for incrementing.
Single-Bus: Complete Instruction Examples
Let's trace through complete instructions to see how all components work together.
1. ADD R4, R5, R6 (Single-Bus)
Instruction: Add the contents of R4 and R5, store result in R6
RTL: R6 β [R4] + [R5]
| Step | Control Signals | Explanation | RTL |
| 1 | PCout, MARin, Read, Select4, Add, Zin | Fetch begins: PC goes to MAR to fetch instruction. Simultaneously, constant 4 (Input A) and PC from bus (Input B) are added in ALU, result goes to Z. | MAR β [PC], Z β [PC]+4 |
| 2 | Zout, PCin, Yin, WMFC | Update PC and save it: Z (which contains PC+4) goes onto bus. PC is updated to point to next instruction. Crucially, we ALSO save this PC+4 value in Y just in case the instruction we're fetching is a branchβwe'll need this incremented PC value. Wait for memory. | PC β [Z], Y β [Z] |
| 3 | MDRout, IRin | Load instruction: Memory has delivered the instruction to MDR. Move it to IR for decoding. | IR β [MDR] |
| 4 | R4out, Yin | Save first operand: R4 goes onto bus and is saved in Y. We need Y to hold R4 because the bus can only carry one value at a time, and we'll need R4 when we add it to R5. | Y β [R4] |
| 5 | R5out, SelectY, Add, Zin | Perform addition: R5 goes onto bus (to Input B). Y containing R4 is selected by MUX (to Input A). ALU adds them: R4 + R5. Result goes to Z. | Z β [Y] + [R5] |
| 6 | Zout, R6in, End | Store result: Z goes onto bus, R6 captures it. Instruction complete. | R6 β [Z] |
Key Points:
Step 2 stores PC+4 in Y even though this is an ADD instruction. This is standard practiceβif it were a branch, we'd need that value.
Step 4-5 shows why Y is essential: we must save R4 in Y before bringing R5 to the bus, because the ALU needs both simultaneously.
2. Branch OFFSET (Unconditional) (Single-Bus)
Instruction: Branch to address PC + offset
RTL: PC β [PC] + offset
| Step | Control Signals | Explanation | RTL |
| 1 | PCout, MARin, Read, Select4, Add, Zin | Fetch begins: Same as all instructionsβPC to MAR for fetch, calculate PC+4 in ALU and store in Z. | MAR β [PC], Z β [PC]+4 |
| 2 | Zout, PCin, Yin, WMFC | Update and save PC: PC is updated to PC+4 (next sequential instruction). We save PC+4 in Y because we'll need it to calculate the branch target. Wait for memory. | PC β [Z], Y β [Z] |
| 3 | MDRout, IRin | Load branch instruction: Instruction delivered from memory to IR. Now we know it's a branch. | IR β [MDR] |
| 4 | Offset-field-of-IRout, SelectY, Add, Zin | Calculate branch target: The offset field from the instruction goes onto the bus (to Input B). Y containing PC+4 is selected (to Input A). ALU calculates: (PC+4) + offset. This is why we saved PC+4 in step 2! Result goes to Z. | Z β [Y] + [offset] |
| 5 | Zout, PCin, End | Jump to target: The calculated target address goes from Z onto bus and into PC. Next instruction will be fetched from this new address. | PC β [Z] |
Key Points:
The branch target is calculated as (PC+4) + offset, which is why saving PC+4 in Y during step 2 is crucial.
This is a PC-relative branchβthe offset is relative to the already-incremented PC.
3. Load R1, 20(R2) (Single-Bus)
Instruction: Load from memory address [R2]+20 into R1
RTL: R1 β [[R2] + 20]
| Step | Control Signals | Explanation | RTL |
| 1 | PCout, MARin, Read, Select4, Add, Zin | Fetch begins: Standard fetchβPC to MAR, calculate PC+4. | MAR β [PC], Z β [PC]+4 |
| 2 | Zout, PCin, Yin, WMFC | Update PC: Update PC to point to next instruction, save in Y (in case we need it). Wait for memory to deliver instruction. | PC β [Z], Y β [Z] |
| 3 | MDRout, IRin | Load instruction: Instruction from memory moves to IR. Now we know it's a Load with offset addressing. | IR β [MDR] |
| 4 | R2out, Yin | Save base address: R2 (base address) goes onto bus and is saved in Y. We need to save R2 because we'll add the offset to it in the next step. | Y β [R2] |
| 5 | Offset-field-of-IRout, SelectY, Add, Zin | Calculate effective address: The offset (20) from the instruction goes onto bus (to Input B). Y containing R2 is selected (to Input A). ALU calculates: R2 + 20. This is the memory address we want to read from. Result goes to Z. | Z β [Y] + 20 |
| 6 | Zout, MARin, Read | Initiate memory read: The effective address from Z goes to MAR. Start reading from memory at address [R2]+20. | MAR β [Z] |
| 7 | WMFC | Wait for memory: Memory read takes timeβwait for the data to arrive in MDR. | - |
| 8 | MDRout, R1in, End | Load data into R1: Data from memory is now in MDR. Move it to R1. Instruction complete. | R1 β [MDR] |
Key Points:
Steps 4-5 calculate the effective address using Y register to hold the base address.
Step 6-7 perform the actual memory readβthis is the second memory access (first was instruction fetch).
Total: 8 steps for a load instruction.
4. Store R1, 20(R2) (Single-Bus)
Instruction: Store R1 to memory address [R2]+20
RTL: [[R2] + 20] β [R1]
| Step | Control Signals | Explanation | RTL |
| 1 | PCout, MARin, Read, Select4, Add, Zin | Fetch begins: Standard fetch sequenceβPC to MAR for instruction fetch, calculate PC+4. | MAR β [PC], Z β [PC]+4 |
| 2 | Zout, PCin, Yin, WMFC | Update PC: Update PC to next instruction, save in Y. Wait for memory to deliver instruction. | PC β [Z], Y β [Z] |
| 3 | MDRout, IRin | Load instruction: Instruction delivered to IR. Now we know it's a Store with offset. | IR β [MDR] |
| 4 | R2out, Yin | Save base address: R2 (base address) goes to Y. We need it to calculate the effective address. | Y β [R2] |
| 5 | Offset-field-of-IRout, SelectY, Add, Zin | Calculate effective address: Offset (20) on bus (to Input B), Y containing R2 to Input A. ALU calculates: R2 + 20. This is where we'll store the data. Result to Z. | Z β [Y] + 20 |
| 6 | Zout, MARin | Send address to MAR: Effective address from Z goes to MAR. This is the memory location where we'll write. | MAR β [Z] |
| 7 | R1out, MDRin, Write | Initiate memory write: R1 (the data to store) goes onto bus and into MDR. Start the memory write operation. | MDR β [R1] |
| 8 | WMFC, End | Wait for write to complete: Memory write takes time. Wait until it's done. Instruction complete. | - |
Key Points:
Very similar to Load, but we're writing instead of reading.
Step 7 moves the data from R1 into MDR and starts the write.
The effective address calculation (steps 4-6) is identical to Load.
Three-Bus Datapath Organization
The single-bus design is simple but slow β we can only do one transfer per clock cycle. A three-bus organization allows multiple simultaneous transfers, dramatically improving performance.
Three-Bus Architecture Diagram
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β THREE-BUS DATAPATH β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β β
β ββββββββββββββββββββββ BUS A (Source A) βββββββββββββββββββββββ β
β β β β
β β β β
β ββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββΌββββββ β
β β β β
β β REGISTER FILE (R0 - Rn-1) β β
β β β β
β β Read Port A Read Port B β β
β β β β β β
β ββββββββββββββββΌββββββββββββββββββββββββββΌβββββββββββββββββββ β
β β β β
β β β β
β βββββββββββββββββββΌβββββββββββ BUS A ββββββββΌββββββββββββββ β
β β β β
β β β β
β β βββββββββββββββββββββββΌββββββββββ BUS B β
β β β β β
β β β β β
β βΌ βΌ β β
β βββββββββββββββββββ β β
β β β β β
β β A L U β β β
β β β β β
β β Input A ββββββΌββββββββββββββ β
β β (from Bus A) β
β β β β
β β Input B ββββββΌβββββββββββββββ β
β β (from Bus B) β β
β β β β β
β ββββββββββ¬βββββββββ β β
β β β β
β β ALU Result β β
β β β β
β ββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββ BUS C (Result) ββ β
β β β β
β β β β
β ββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββββββββ β
β β β β β β
β β Write Port (from Bus C) β β β
β β β β β β
β β REGISTER FILE β β β
β β β β β
β ββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββ β
β β β
β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SPECIAL REGISTERS & MEMORY β β
β β β β
β β ββββββ βββββββββββββ β β
β β β PC βββββββββββIncrementerββββββββ Bus C or Bus B β β
β β ββββ¬ββ β (+4) β β β
β β β βββββββββββββ β β
β β β Address β β
β β β β β
β β ββββΌββ ββββββ β β
β β βMAR β β IR βββββββββββ Bus B (R=B signal) β β
β β ββββ¬ββ ββββββ β β
β β β β β
β β β Memory Address β β
β β β β β
β β βΌ β β
β β βββββββββββββββ β β
β β β MEMORY β β β
β β β β β β
β β ββββββββ¬βββββββ β β
β β β β β
β β β Memory Data β β
β β β β β
β β ββββββΌβββββ β β
β β β MDR βββββββββββββΊ Bus B (MDRoutB signal) β β
β β βββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β β
β DATA FLOW SUMMARY: β
β β’ Bus A: Carries data from register file (Read Port A) to ALU Input A β
β β’ Bus B: Carries data from register file (Read Port B) to ALU Input B β
β Also used for special transfers (MDRβIR, PCβMAR, etc.) β
β β’ Bus C: Carries ALU results and other data to register file (Write Port) β
β Also feeds PC incrementer and can load special registers β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Three-Bus Architecture: Detailed Explanation
The three-bus organization dramatically improves performance by allowing simultaneous data transfers on three separate buses:
Bus A (Source Bus A)
Purpose: Carries the first source operand from the register file to ALU Input A
Connected to: Register file Read Port A β ALU Input A
Example: In
R6 β R4 + R5, Bus A carries R4 to ALU Input A
Bus B (Source Bus B)
Purpose: Carries the second source operand from the register file to ALU Input B
Also used for: Special register transfers (MDRβIR, values to MAR, etc.)
Connected to:
Register file Read Port B β ALU Input B
MDR β Bus B (when
MDRoutBis active)Bus B β IR, MAR (when
R=Bsignal is active)
Example: In
R6 β R4 + R5, Bus B carries R5 to ALU Input B simultaneously with Bus A carrying R4
Bus C (Result/Destination Bus)
Purpose: Carries results back to the register file
Connected to:
ALU Output β Bus C β Register file Write Port
Bus C β PC (for branches and updates)
Bus C β PC Incrementer (for calculating PC+4)
Example: In
R6 β R4 + R5, Bus C carries the sum directly to R6
Key Components
Register File with Multiple Ports:
Read Port A: Simultaneously reads one register onto Bus A
Read Port B: Simultaneously reads another register onto Bus B
Write Port: Writes data from Bus C into a destination register
All three operations can happen in the same clock cycle!
PC Incrementer:
Dedicated hardware that adds 4 to PC
Can operate in parallel with ALU operations
Connected to Bus C or receives PC value directly
Special Register Transfers:
R=Bsignal: "Register equals Bus B" β loads a register from Bus BMDRoutB: Places MDR contents on Bus BUsed for transfers like: MDRβIR, PCβMAR, etc.
Why Three Buses Are Faster
Single-Bus Limitation:
Step 1: R4 β Bus β Y (save first operand)
Step 2: R5 β Bus β ALU (bring second operand, Y feeds to ALU)
Step 3: ALU result β Bus β R6
Total: 3 steps minimum for any ALU operation
Three-Bus Advantage:
Step 1: R4 β Bus A β ALU Input A }
R5 β Bus B β ALU Input B } ALL SIMULTANEOUS!
ALU result β Bus C β R6 }
Total: 1 step for same operation!
Control Signals in Three-Bus
| Signal | Meaning |
R4outA | Place R4 on Bus A |
R5outB | Place R5 on Bus B |
R6in | Load R6 from Bus C |
R=B | Load destination register from Bus B (not Bus C) |
MDRoutB | Place MDR contents on Bus B |
Add, Sub, etc. | ALU operation (result automatically goes to Bus C) |
IncPC | Increment PC using dedicated incrementer |
Read, Write | Memory operations |
WMFC | Wait for Memory Function Complete |
Advantages of Three-Bus Design
| Feature | Single-Bus | Three-Bus |
| Transfers per cycle | 1 | Up to 3 |
| ALU operands | Sequential (need Y register) | Simultaneous (both at once) |
| Need for Y register | Yes (essential) | No (not needed for ALU ops) |
| Hardware complexity | Low | Higher |
| Speed | Slower | Much Faster |
| Register file ports | 1 read, 1 write | 2 read, 1 write |
| PC increment | Through ALU | Dedicated incrementer |
Three-Bus: Complete Instruction Examples
Now let's see how the same instructions execute much faster on three-bus architecture.
1. ADD R4, R5, R6 (Three-Bus)
Instruction: Add the contents of R4 and R5, store result in R6
RTL: R6 β [R4] + [R5]
| Step | Control Signals | Explanation | RTL |
| 1 | PCout, MARin, Read, IncPC | Fetch and increment: PC value goes to MAR to fetch instruction. The dedicated PC incrementer calculates PC+4 simultaneously and updates PC. No ALU needed! | MAR β [PC], PC β [PC]+4 |
| 2 | WMFC | Wait for memory: Wait for instruction to be delivered from memory to MDR. | - |
| 3 | MDRoutB, R=B, IRin | Load instruction: MDR places instruction on Bus B, and IR loads it. The R=B signal indicates we're loading from Bus B (not Bus C). | IR β [MDR] |
| 4 | R4outA, R5outB, Add, R6in | Execute in ONE step! R4 goes on Bus A to ALU Input A. R5 goes on Bus B to ALU Input B. ALU adds them. Result goes on Bus C directly into R6. All simultaneous! | R6 β [R4] + [R5] |
Key Points:
4 steps total vs 6 steps in single-bus (33% faster!)
Step 4 is the magic: Both operands travel simultaneously to the ALU, and the result goes directly to destination
No Y register neededβwe don't have to save and retrieve operands
PC incrementer works in parallel, so incrementing PC doesn't consume an ALU cycle
Comparison:
Single-bus ADD: 6 steps (need to save R4 in Y, then fetch R5, then add, then store)
Three-bus ADD: 4 steps (fetch, wait, decode, execute all in one)
2. Branch OFFSET (Unconditional) (Three-Bus)
Instruction: Branch to address PC + offset
RTL: PC β [PC] + offset
| Step | Control Signals | Explanation | RTL |
| 1 | PCout, MARin, Read, IncPC | Fetch and increment: PC to MAR for instruction fetch. PC incrementer calculates PC+4 and updates PC immediately. | MAR β [PC], PC β [PC]+4 |
| 2 | WMFC | Wait for memory: Wait for branch instruction to be delivered to MDR. | - |
| 3 | MDRoutB, R=B, IRin | Load instruction: Instruction from MDR goes on Bus B to IR. Now we know it's a branch. | IR β [MDR] |
| 4 | PCoutA, Offset-field-of-IRout-B, Add, PCin | Calculate and load target: Current PC (already incremented to PC+4) goes on Bus A. Offset from instruction goes on Bus B. ALU adds: (PC+4) + offset. Result goes on Bus C directly into PC. Done in one step! | PC β [PC] + [offset] |
Key Points:
4 steps total vs 5 steps in single-bus
Step 4 calculates branch target in one cycleβPC and offset go to ALU simultaneously
No need to save PC+4 in Y register like we did in single-bus
The incremented PC is still available when we need to add the offset
3. Load R1, 20(R2) (Three-Bus)
Instruction: Load from memory address [R2]+20 into R1
RTL: R1 β [[R2] + 20]
| Step | Control Signals | Explanation | RTL |
| 1 | PCout, MARin, Read, IncPC | Fetch and increment: PC to MAR for instruction fetch. PC increments to PC+4 using dedicated incrementer. | MAR β [PC], PC β [PC]+4 |
| 2 | WMFC | Wait for instruction: Wait for memory to deliver instruction to MDR. | - |
| 3 | MDRoutB, R=B, IRin | Load instruction: Instruction from MDR on Bus B goes to IR. Now we know it's a Load with offset. | IR β [MDR] |
| 4 | R2outA, Offset-field-of-IRout-B, Add, MARin | Calculate effective address: R2 (base address) goes on Bus A. Offset (20) goes on Bus B. ALU calculates: R2 + 20. Result goes on Bus C to MAR. All in one step! | MAR β [R2] + 20 |
| 5 | Read, WMFC | Read from memory: Start memory read from the effective address. Wait for data to arrive in MDR. | - |
| 6 | MDRoutB, R=B, R1in | Load data: Data from MDR goes on Bus B to R1. Instruction complete. | R1 β [MDR] |
Key Points:
6 steps total vs 8 steps in single-bus (25% faster!)
Step 4 calculates effective address in one cycleβbase and offset added simultaneously
No need to save R2 in Y register first
The address calculation and transfer to MAR happen in the same cycle
4. Store R1, 20(R2) (Three-Bus)
Instruction: Store R1 to memory address [R2]+20
RTL: [[R2] + 20] β [R1]
| Step | Control Signals | Explanation | RTL |
| 1 | PCout, MARin, Read, IncPC | Fetch and increment: PC to MAR for instruction fetch. PC increments to PC+4. | MAR β [PC], PC β [PC]+4 |
| 2 | WMFC | Wait for instruction: Wait for memory to deliver instruction to MDR. | - |
| 3 | MDRoutB, R=B, IRin | Load instruction: Instruction from MDR goes to IR via Bus B. Now we know it's a Store. | IR β [MDR] |
| 4 | R2outA, Offset-field-of-IRout-B, Add, MARin | Calculate effective address: R2 on Bus A, offset (20) on Bus B. ALU calculates: R2 + 20. Result to MAR via Bus C. All simultaneous! | MAR β [R2] + 20 |
| 5 | R1outB, R=B, MDRin, Write | Write to memory: R1 goes on Bus B (using R=B signal) to MDR. Start memory write operation. | MDR β [R1] |
| 6 | WMFC | Wait for write: Wait for memory write to complete. Instruction done. | - |
Key Points:
6 steps total vs 8 steps in single-bus (25% faster!)
Step 4 calculates effective address in one cycle, just like Load
Step 5 moves R1 to MDR and starts writeβnote we use
R=Bbecause we're transferring on Bus B, not Bus CAddress calculation is identical to Load; only the data transfer direction differs
Execution Time Comparison: Single-Bus vs Three-Bus
| Instruction | Single-Bus Steps | Three-Bus Steps | Speedup |
| ADD R4, R5, R6 | 6 | 4 | 33% faster |
| Branch OFFSET | 5 | 4 | 20% faster |
| Load R1, 20(R2) | 8 | 6 | 25% faster |
| Store R1, 20(R2) | 8 | 6 | 25% faster |
Why the difference?
Three-bus eliminates the need to save/retrieve operands in Y register
Multiple simultaneous transfers reduce sequential dependencies
Dedicated PC incrementer eliminates ALU usage for PC+4
More parallelism = fewer clock cycles per instruction
Hardwired Control Unit
The hardwired control unit generates control signals using dedicated logic circuits. It's fast but inflexible.
Block Diagram
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HARDWIRED CONTROL UNIT β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ β
β β CLK ββββββββββββββββββββββββββββββββββββ β
β βββββββββββ β β
β βΌ β
β βββββββββββ βββββββββββββββββββββββββββββββββββββββ β
β β Reset βββββββββΊβ CONTROL STEP β β
β βββββββββββ β COUNTER β β
β β β β
β β Counts: T1, T2, T3, ... Tn β β
β ββββββββββββββββ¬βββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββ β
β β STEP DECODER β β
β β β β
β β Outputs: T1 T2 T3 ... Tn β β
β ββββββββββββββββ¬βββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ β
β β β β β
β β βΌ β β
β β βββββββββββββββββββββββββββββββββ β β
β β β β β β
β ββββββ΄βββββ β ENCODER β ββββββ΄βββββ β
β β IR ββββββββΊβ βββββββββConditionβ β
β β(Opcode) β β Combinational Logic β β Codes β β
β βββββββββββ β (AND, OR, NOT gates) β β (N,Z,C) β β
β β β βββββββββββ β
β βββββββββββ β β βββββββββββ β
β βInstruction β βββββββββExternal β β
β β Decoder ββββββββΊβ β β Inputs β β
β βββββββββββ βββββββββββββββββ¬ββββββββββββββββ βββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββ β
β CONTROL SIGNALS β
β ββββββββββββββββββββββββββββββββββββ β
β β β β β β β β
β βΌ βΌ βΌ βΌ βΌ βΌ β
β PCin PCout MARin Read Zin ... β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
How It Works
Control Step Counter: Counts clock cycles (T1, T2, T3...) during instruction execution
Step Decoder: Converts counter value to one-hot encoding (only one Ti active)
Instruction Decoder: Decodes opcode to identify the instruction type
Encoder: Combinational logic that generates appropriate control signals
Generating the Zin Signal
The Zin signal (load ALU result into Z) is needed at different times for different instructions:
Step T1 for all instructions (during fetch, for PC+4)
Step T6 for ADD instruction (for storing sum)
Step T4 for Branch instruction (for branch target)
βββββββββββββββββββββββββββββββ
β β
T1 βββββββββΊβ β
β β
T6 ββββββ¬βββΊβ OR Gate ββββββΊ Zin
β β β
ADD βββββ΄βββΊβ ββββ β
β βββββββ& βββββββ β
T4 ββββββ¬βββΊβ ββββ β
β β β
BR ββββββ΄βββΊβ β
β β
βββββββββββββββββββββββββββββββ
Logic Expression: Zin = T1 + (T6 Β· ADD) + (T4 Β· BR) + ...
Generating the End Signal
The End signal terminates instruction execution. Different instructions end at different steps:
Logic Expression: End = (T7 Β· ADD) + (T5 Β· BR) + (T5 Β· N + T4 Β· NΜ) Β· BRN + ...
For Branch<0 (BRN):
End at T5 if N=1 (branch taken)
End at T4 if N=0 (branch not taken)
Advantages and Disadvantages
β Advantages:
Very fast operation (minimal delay through gates)
No memory access needed for control signals
Efficient for simple instruction sets (RISC)
Natural fit for pipelining
β Disadvantages:
Complex design for large instruction sets
Difficult to modify once manufactured
Higher design and debugging time
Inflexible β adding instructions requires complete redesign
Circuit complexity grows rapidly with instruction set size
Microprogrammed Control Unit
The microprogrammed control unit uses a small program (microprogram) stored in a special memory to generate control signals. Each instruction is implemented by a sequence of microinstructions.
Key Terminology
| Term | Definition |
| Control Word | A word whose bits represent individual control signals |
| Microinstruction | A single control word in the microprogram |
| Microroutine | Sequence of microinstructions for one machine instruction |
| Microprogram | Collection of all microroutines stored in control memory |
| Control Store (CS) | ROM/RAM that holds the microprogram |
| ΞΌPC | Micro Program Counter β points to current microinstruction |
Block Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MICROPROGRAMMED CONTROL UNIT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ β
β β IR β β
β β(Opcode) β β
β ββββββ¬βββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββ β
β β STARTING ADDRESS β β
β β GENERATOR β β
β β β β
β β Maps opcode to starting β β
β β address of microroutine β β
β ββββββββββββββ¬βββββββββββββββββββββ β
β β β
β β Starting Address β
β βΌ β
β βββββββββββββββββββββββββββββββββββ βββββββββββββββββββββ β
β β β β β β
β β ΞΌPC βββββββββββ +1 β β
β β (Micro Program Counter) β β (Incrementer) β β
β β β β β β
β ββββββββββββββ¬βββββββββββββββββββββ βββββββββββββββββββββ β
β β β² β
β β Address β β
β βΌ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β CONTROL STORE (CS) β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Address β Microinstruction β β β
β β βββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ€ β β
β β β 000 β PCout, MARin, Read, Select4, Add, Zin β β β
β β β 001 β Zout, PCin, Yin, WMFC β β β
β β β 002 β MDRout, IRin β β β
β β β 003 β (Branch to microroutine based on opcode) β β β
β β β ... β ... β β β
β β β 025 β ADD microroutine starts here β β β
β β β ... β ... β β β
β β βββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β
β β β
β β Microinstruction β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β DECODER β β
β β (if using encoded microinstructions) β β
β β β β
β ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββ β
β CONTROL SIGNALS β
β ββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Microinstruction Format
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MICROINSTRUCTION FORMAT β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β HORIZONTAL FORMAT (Wide): β
β ββββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ β
β βPCβPCβMAβMDβMDβIRβY βZ βZ βR1βR1βSeβSeβAdβSuβReβWrβWMβEnβ β β
β βouβinβRiβRoβRiβinβinβinβouβouβinβl4βlYβd βb βadβitβFCβd β β β
β βt β βn βutβn β β β βt βt β β β β β β βe β β β β β
β ββββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββ€ β
β β 1β 0β 1β 0β 0β 0β 0β 1β 0β 0β 0β 1β 0β 1β 0β 1β 0β 0β 0β β β
β ββββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ β
β β
β Each bit directly controls one signal - simple but WIDE β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β VERTICAL FORMAT (Narrow): β
β ββββββββββββββ¬βββββββββββββ¬βββββββββββββ¬βββββββββββββββββ β
β β F1 β F2 β F3 β Next Address β β
β β (4 bits) β (3 bits) β (3 bits) β (8 bits) β β
β β β β β β β
β β Encoded β Encoded β ALU β Branch β β
β β Source β Dest β Function β Target β β
β ββββββββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββββββββ β
β β
β Requires decoder but uses less memory β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Horizontal vs Vertical Microprogramming
| Aspect | Horizontal | Vertical |
| Width | Wide (many bits) | Narrow (fewer bits) |
| Encoding | None (direct) | Encoded fields |
| Decoder | Not needed | Required |
| Speed | Faster | Slower |
| Control Store Size | Larger | Smaller |
| Parallelism | High | Limited |
| Flexibility | Less | More |
Microprogram Sequencing
The ΞΌPC typically increments automatically, but sometimes we need to branch:
When ΞΌPC changes:
Normal: Increment by 1
End of instruction: Load starting address of next instruction's microroutine
Microbranch: Load branch target address
Conditional microbranch: Load target only if condition is true
Example Microroutine for Add (R3), R1
| Address | Microinstruction | Comment |
| 000 | PCout, MARin, Read, Select4, Add, Zin | Common fetch - step 1 |
| 001 | Zout, PCin, Yin, WMFC | Common fetch - step 2 |
| 002 | MDRout, IRin, BRANCH | Common fetch - step 3, then branch |
| ... | ... | ... |
| 025 | R3out, MARin, Read | ADD microroutine begins |
| 026 | R1out, Yin, WMFC | Save R1, wait for memory |
| 027 | MDRout, SelectY, Add, Zin | Perform addition |
| 028 | Zout, R1in, End | Store result, end |
Advantages and Disadvantages
β Advantages:
Highly flexible and easy to modify
Can fix bugs by changing microcode
Simplifies design of complex instruction sets (CISC)
Can add new instructions without hardware changes
Easier debugging and testing
Supports complex addressing modes naturally
β Disadvantages:
Slower than hardwired (memory access overhead)
Requires control store memory
More complex to optimize for speed
Doesn't lend well to pipelining
Extra level of indirection adds delay
Hardwired vs Microprogrammed Comparison
Side-by-Side Comparison
| Characteristic | Hardwired | Microprogrammed |
| Implementation | Fixed logic circuits | Stored microprogram |
| Speed | β‘ Faster | π’ Slower |
| Flexibility | β Inflexible | β Highly flexible |
| Modification | Requires redesign | Change microcode |
| Complexity | Increases with instruction set | Manages complexity well |
| Cost | Lower for simple ISA | Lower for complex ISA |
| Design Time | Longer | Shorter |
| Bug Fixes | New chip required | Microcode update |
| Typical Use | RISC processors | CISC processors |
| Pipelining | β Natural fit | β Difficult |
Visual Comparison
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTROL UNIT COMPARISON β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β HARDWIRED MICROPROGRAMMED β
β β
β βββββββββββββββββββββ βββββββββββββββββββββ β
β β β β β β
β β βββββββ βββββββ β β βββββββββββββ β β
β β β AND β β OR β β β β Control β β β
β β ββββ¬βββ ββββ¬βββ β β β Store β β β
β β β β β β β (ROM) β β β
β β ββββ΄βββββββββ΄βββ β β βββββββ¬ββββββ β β
β β β NOT β β β β β β
β β ββββββββ¬ββββββββ β β βββββββΌββββββ β β
β β β β β β ΞΌPC β β β
β β Fixed Logic β β βββββββββββββ β β
β β Circuits β β β β
β β β β Stored Program β β
β βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββββββ βββββββββββββββββββ β
β β Control Signals β β Control Signals β β
β βββββββββββββββββββ βββββββββββββββββββ β
β β
β Speed: ββββββββββββββ Speed: ββββββββββββββ β
β Flexibility: ββββββββββ Flexibility: ββββββββββββ β
β Complexity handling: βββββββ Complexity handling: βββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
When to Use Which?
Summary and Key Takeaways
The Big Picture
Key Concepts to Remember
Datapath Components: The processor contains registers (PC, IR, MAR, MDR, general-purpose), ALU, and buses that connect them.
Control Signals: Every data movement and operation is controlled by signals like
PCout,MARin,Read,Add, etc.Single-Bus Architecture:
Only one data transfer per clock cycle
ALU Input A: From MUX (can select Y, constant 4, etc.)
ALU Input B: ALWAYS from the bus
Y register essential for saving first operand
Z register holds ALU result
PC+4 calculation: constant 4 (Input A) + PC from bus (Input B)
Three-Bus Advantage:
Bus A and Bus B carry operands simultaneously to ALU
Bus C carries results back to registers
Dramatically reduces execution time (25-33% faster)
No Y register needed for basic ALU operations
Dedicated PC incrementer works in parallel
Hardwired Control: Uses combinational logic to generate control signals. Fast but inflexible β suited for RISC.
Microprogrammed Control: Uses a stored microprogram. Slower but flexible β suited for CISC.
WMFC (Wait for Memory Function Complete): Synchronizes processor with slower memory operations.
Y Register Purpose: In single-bus, Y holds the first operand while the second operand comes through the bus, allowing the ALU to have both inputs simultaneously.
Practice Problems
Try working through these to test your understanding:
Write the control sequence for
Sub R1, R2(subtract R2 from R1, store in R1) using single-bus organization.How many clock cycles does
Add (R3), R1take on single-bus vs three-bus?Design the logic circuit to generate the
Readsignal for fetch and memory-reference instructions.Explain why microprogrammed control makes it easier to implement complex addressing modes.
If a processor has 32 control signals, how wide would a horizontal microinstruction be? How could vertical encoding reduce this?
In single-bus, why can't we do
R6 β R4 + R5in one step? Trace through what would go wrong.For three-bus architecture, explain how the register file supports reading two registers simultaneously.
Calculate the speedup if a program has 40% ALU instructions, 30% loads, 20% stores, and 10% branches when moving from single-bus to three-bus.