# Chapter 5: **The Processor:**

**Datapath and Control** 

## Overview

- Logic Design Conventions
- Building a Datapath and Control Unit
  - Different Implementations of MIPS instruction set
- A simple implementation of a processor
- Multicycle Implementation
- Exceptions

## Review Logic Design Conventions

- Almost ready to move into chapter 5 and start building a processor
- First, let's review Boolean Logic and build the ALU
   To support 1-add for MIPS instruction set and use 32 of them



Review: Boolean Logic (summary)

- Boolean operations
  - a AND b
    - True only when a is true and b is true
  - 🗅 a OR b
    - True when either a is true or b is true, or both are true
  - NOT a
    - True when a is false, and vice versa

Review: Boolean Logic (continued)

- Boolean expressions
  - Constructed by combining together Boolean operations
    - Example: (a AND b) OR ((NOT b) AND (NOT a))
- Truth tables capture the output/value of a Boolean expression
  - A column for each input plus the output
  - A row for each combination of input values

## Boolean Logic (continued)

#### • Example:

#### (a AND b) OR ((NOT b) and (NOT a))

INPUT

OUTPUT

| а | b | a | b | a.b | b.a | (a.b) + (b.a) | value |
|---|---|---|---|-----|-----|---------------|-------|
| 0 | 0 | 1 | 1 | 0   | 1   | 1             | 1     |
| 0 | 1 | 1 | 0 | 0   | 0   | 0             | 0     |
| 1 | 0 | 0 | 1 | 0   | 0   | 0             | 0     |
| 1 | 1 | 0 | 0 | 1   | 0   | 1             | 1     |

## Gates

#### Gates

- Hardware devices built from transistors to mimic Boolean logic
- AND gate
   a \_\_\_\_\_\_c
   b \_\_\_\_\_\_b
  - Two input lines, one output line
  - Outputs a 1 when both inputs are 1



- One input line, one output line
- NOT gate

Gates (continued)

- Outputs a 1 when either input is 1
- Two input lines, one output line





OR gate



Gates (continued)

- Abstraction in hardware design
  - Map hardware devices to Boolean logic
  - Design more complex devices in terms of logic, not electronics

Building Computer Circuits

• A circuit is a collection of logic gates:

- Transforms a set of binary inputs into a set of binary outputs
- Values of the outputs depend only on the current values of the inputs
- Combinational circuits have no cycles in them (no outputs feed back into their own inputs)

Circuit Diagram (Summary)



•Every output in a circuit diagram can be represented as a Boolean Expression

## A Circuit Construction Algorithm ALU

Sum-of-products algorithm is one way to design circuits:

□ Truth table  $\rightarrow$  Boolean expression  $\rightarrow$  gate layout

A Circuit Construction Algorithm (continued)

- Sum-of-products algorithm
  - Truth table captures every input/output possible for circuit
  - Repeat process for each output line
    - Build a Boolean expression using AND and NOT for each 1 of the output line
    - Combine together all the expressions with ORs
    - Build circuit from whole Boolean expression

## Construction of Addition Circuit 1-ADD Algorithm

 $\begin{array}{c} 011000 \leftarrow \text{Carry Bit} & c_i \\ 13 \rightarrow 001101 & a_i \\ + \\ \underline{14} \rightarrow 001110 & b_i \\ \mathbf{27} \rightarrow 011011 & s_i \end{array}$ 

An Addition Circuit add \$\$1, \$\$2, \$\$3

- Addition circuit
  - Adds two unsigned binary integers, setting output bits and an overflow
  - Built from 1-bit adders (1-ADD)
  - Starting with rightmost bits, each pair produces
    - A value for that order
    - A carry bit for next place to the left

## An Addition Circuit (continued)

- 1-ADD truth table
  - Input
    - One bit from each input integer: a<sub>i</sub>, b<sub>i</sub>
    - One carry bit (always zero for rightmost bit): c<sub>i</sub>
  - Output
    - One bit for output place value: s<sub>i</sub>
    - One "carry" bit: c<sub>i+1</sub>

## 1-ADD Circuit

Sub-expression construction (AND & NOT Gates)

|                | Inputs         |                | 01             | utputs           |       |
|----------------|----------------|----------------|----------------|------------------|-------|
| a <sub>i</sub> | b <sub>i</sub> | C <sub>i</sub> | S <sub>i</sub> | C <sub>i+1</sub> |       |
| 0              | 0              | 0              | 0              | 0                |       |
| 0              | 0              | 1              | 1              | 0                | a.b.c |
| 0              | 1              | 0              | 1              | 0                | a.b.c |
| 0              | 1              | 1              | 0              | 1                |       |
| 1              | 0              | 0              | 1              | 0                | a.b.c |
| 1              | 0              | 1              | 0              | 1                |       |
| 1              | 1              | 0              | 0              | 1                |       |
| 1              | 1              | 1              | 1              | 1                | a.b.c |

1-ADD Circuit Sub-expression construction (AND & NOT Gates) OUTPUT: S<sub>i</sub>

$$(\overline{a},\overline{b},c) + (\overline{a},b,\overline{c}) + (a,b,c) + (a,b,c)$$

Construct the circuit diagram for the sub-expression

## 1-ADD Circuit

Sub-expression construction (AND & NOT Gates)



1-ADD Circuit Sub-expression construction (AND & NOT Gates) OUTPUT: C<sub>i+1</sub>

(a.b.c) + (a.b.c) + (a.b.c) + (a.b.c)

Construct the circuit diagram for the sub-expression

An Addition Circuit (continued)

Building the full adder

- Put rightmost bits into 1-ADD, with zero for the input carry
- Send 1-ADD's output value to output, and put its carry value as input to 1-ADD for next bits to left
- Repeat process for all bits

## Control Circuits

- Do not perform computations
- Choose order of operations or select among data values
- Major types of controls circuits
  - Multiplexors
    - Select one of inputs to send to output
  - Decoders
    - Sends a 1 on one output line, based on what input line indicates

## Control Circuits (continued)

Multiplexor form

- □ 2<sup>N</sup> regular input lines
- N selector input lines
- 1 output line
- Multiplexor purpose
  - Given a code number for some input, selects that input to pass along to its output
  - Used to choose the right input value to send to a circuit (ALU, Registers..)

## Practice

Problem: Consider a logic function with three inputs: A, B, and C.

Output D is true if at least one input is true Output E is true if exactly two inputs are true Output F is true only if all three inputs are true

Show the truth table for these three functions.

Show the Boolean equations for these three functions.

Show an implementation consisting of AND, OR and NOT gates.

## The Processor: *Datapath and Control* Intro

**Recall CPU Performance equation:** 

CPU time in \_ Instruction Count x *CPI* x *Clock cycle time* program

Best Performance  $\rightarrow$  Optimal (minimum) CPU time

The ISA and Compiler determine the Instruction Count

#### **The Processor determines:**

- Clock cycles Per Instruction (CPI)
- Seconds per clock cycle (*Clock cycle time*)

Build a processor that exploits the MIPS ISA

The Processor: Datapath and Control

**Basic MIPS Implementation** 

Let's look at a subset of core MIPS ISA:

- The memory-reference instructions: load word (lw), store word (sw)
- The arithmetic-logical instructions: add, sub, and, or, slt
- The control flow instructions: branch equal (beq) and jump (j)

Create Datapath and design the Control for the three instruction classes

The Processor: *Datapath and Control* **Basic MIPS Implementation flow** 

 Let's examine the steps involved to implement the instruction classes

#### **1. Generic Steps**

[assume the memory unit stores instructions and supply the instruction given an address]

- use the program counter (PC) to supply instruction address
- get the instruction from memory
- read registers
- use the instruction to decide exactly what to do

The Processor: *Datapath and Control* **Basic MIPS Implementation flow** 

#### 2. Use ALU after reading the register(s)

- The memory-reference instructions: address calculation
- The arithmetic-logical instructions: execute operation
- Branch instruction: Comparison
- 3. Now Complete Implementation based on instruction class
  - The memory-reference instructions: access memory?
  - The arithmetic-logical instructions: access register?
  - Branch: *access PC?*

## Abstract High-level View

Basic (our subset) MIPS Implementation flow



What is wrong with this flow?

#### Logic Design Conventions MIPS Implementation

- Two types of logic elements:
  - elements that operate on data values (Combinational)
    - No memory
    - output state depends only on current input state
    - ALU: and-gate, or-gate and not-gate are combinational
  - elements that contain state (State Elements)
    - Store the state of the bit
    - Output depends on the stored bit
    - State-elements (*flip-flops, latches*)
    - Instruction, Data Memories, Registers

#### State Element



A State Element has a minimum of two inputs and one output

State Elements Clocking Methodology

 Defines when signals can be read and when they can be written



Clocking methodology avoids the scenario

## Clocks Synchronous Logic

When should an element that contains state be updated ?



## State Elements

Synchronous Digital Systems

#### Clocks

- □ Free running signal with a fixed cycle time
- Decide when an element that contains state should be updated



Clock Period: High and Low

## Sequential Logic Design

- Logic Circuit representation:
  - Block of State Elements  $\rightarrow$  Combinational Logic  $\rightarrow$  State Element
  - Signals written into state elements must be valid (Stable)



After Clock edge: SE-1 output changes  $\rightarrow$  Input to Combinational Logic

| (Stable signal) --→ SE-2

Require Long Clock Period Restriction to ensure output of combinational logic is stable (lower bound)

#### **Clocking Methodology**

**Edge-triggered clocking** 

#### Edge-triggered Clocking Methodology

- Either Rising edge or Falling Edge of clock is active
- All State Changes occur on the active clock edge
  - Sampling of signals is almost instantaneous
- Choice of an active clock edge depends on implementation



## Sequential Logic

How does a Sequential Logic circuit operate?

#### 1. When the clock issues a pulse $\rightarrow$

- Each SE examines outputs of combinational logic, changes affected 0/1 state to opposite state (takes a few nanoseconds) and transforms to a stable state
- 2. The change in state of some of the stored bits triggers changes in some combinational logic and subsequently changes in other SE (unstable state ).
  - Takes few nanoseconds for every output of the combinational logic block to reach a stable state
- 3. When all outputs of combinational logic blocks are stable, the clock issues another pulse
  - Repeat steps 1 and 2.

Require Long Clock Period Restriction to ensure output of combinational logic is stable

### Building a Datapath

Store and Access Instructions



Datapath Elements to store and Access Instructions

Instruction memory is SE (but no write function)  $\rightarrow$  Combinational logic PC  $\rightarrow$  SE

### Building a Datapath **R-Format ALU Operations**



Units needed to implement R-format ALU operations Register File contains  $2^5$  registers (read/write)  $\rightarrow$  SE



lw and sw operations



a. Data memory unit



#### Additional Units needed to implement lw and sw

Memory Unit is a SE:

Inputs: Address & Write Data Output: Read Result

Sign Extension Unit: Converts a 16-bit input to 32-bit output

### Building Datapath

Use Multiplexors to combine: the components



### A Simple Implementation Scheme ALU Control

### Let's look at a subset of core MIPS ISA:

- The memory-reference instructions: load word (lw), store word (sw)
- The arithmetic-logical instructions: add, sub, and, or, set on less than (slt)
- The control flow instructions: branch equal (beq) and jump (j)

A Simple Implementation Scheme
ALU Control

- ALU has 4 control Inputs (2<sup>4</sup> combinations)
  - Examine Subset (6 combinations)

| <b>ALU Control Input</b> | Function         |
|--------------------------|------------------|
| 0000                     | AND              |
| 0001                     | OR               |
| 0010                     | add              |
| 0110                     | subtract         |
| 0111                     | set on less than |
| 1100                     | NOR              |

A Simple Implementation Scheme ALU Control

Selecting the operations to perform (ALU, read/write, etc.)

Controlling the flow of data (multiplexor inputs)

Information comes from the 32 bits of the instruction

What should the ALU do with this instruction ?

| add \$8, | \$17, \$18 | 3     | Instruction Format: |       |        |  |
|----------|------------|-------|---------------------|-------|--------|--|
| 000000   | 10001      | 10010 | 01000               | 00000 | 100000 |  |
| ор       | rs         | rt    | rd                  | shamt | funct  |  |

Compute AND, OR, subtract, add or slt depending on 6-bit funct

**ALU Control** 

#### What should the ALU do with this instruction?

lw \$1, 100(\$2)

 35
 2
 1
 100

 op
 rs
 rt
 16 bit offset

Compute the memory address...... But how?

Computer Architecture CS 35101-002

### **ALU Control**

Need a mapping for hardware to compute the 4-bit ALU control input signals

#### **Instruction Type:**

| opcode     | ALUop (is set to) |
|------------|-------------------|
| lw, sw     | 00                |
| beq        | 01                |
| arithmetic | 10                |

Function Code for Arithmetic.....

## **ALU Control Bit Settings**

| Instruction<br>opcode | ALUop | Instruction<br>Operation | Funct<br>Field | Desired ALU<br>action | ALU<br>control<br>input |
|-----------------------|-------|--------------------------|----------------|-----------------------|-------------------------|
| lw                    | 00    | load word                | XXXXXX         | add                   | 0010                    |
| SW                    | 00    | store word               | XXXXXX         | add                   | 0010                    |
| Branch<br>equal       | 01    | branch equal             | XXXXXX         | subtract              | 0110                    |
| R-type                | 10    | add                      | 100000         | add                   | 0010                    |
| R-type                | 10    | subtract                 | 100010         | subtract              | 0110                    |
| R-type                | 10    | AND                      | 100100         | and                   | 0000                    |
| R-type                | 10    | OR                       | 100101         | or                    | 0001                    |
| R-type                | 10    | set on less than         | 101010         | set on less than      | 0111                    |

X ~ don't care bit

## **ALU Control Bit Settings**

- Control Bit Settings depends on:
  - ALUop bits
  - Function field codes for R-type instruction

| Instruction<br>opcode | ALUop | Funct Field | ALU control<br>input |
|-----------------------|-------|-------------|----------------------|
| lw                    | 00    | XXXXXX      | <mark>0</mark> 010   |
| SW                    | 00    | XXXXXX      | <mark>0</mark> 010   |
| Branch equal          | 01    | XXXXXX      | <mark>0</mark> 110   |
| R-type                | 10    | 100000      | <mark>0</mark> 010   |
| R-type                | 10    | 100010      | <mark>0</mark> 110   |
| R-type                | 10    | 100100      | 0000                 |
| R-type                | 10    | 100101      | 0001                 |
| R-type                | 10    | 101010      | 0111                 |

### ALU Control Logic Truth Table (the 3-ALU control bits)

| AL     | Uop    |    |    | Func |    | Operation |    |                    |
|--------|--------|----|----|------|----|-----------|----|--------------------|
| ALUop1 | ALUop0 | F5 | F4 | F3   | F2 | F1        | F0 | (Output)           |
| 0      | 0      | X  | X  | X    | X  | X         | X  | 0010               |
| Х      | 1      | X  | X  | X    | X  | X         | X  | 0110               |
| 1      | Х      | X  | X  | 0    | 0  | 0         | 0  | 0010               |
| 1      | Х      | X  | X  | 0    | 0  | 1         | 0  | 0 <mark>110</mark> |
| 1      | Х      | X  | X  | 0    | 1  | 0         | 0  | 0000               |
| 1      | Х      | X  | X  | 0    | 1  | 0         | 1  | 0001               |
| 1      | Х      | X  | Х  | 1    | 0  | 1         | 0  | 0111               |

ALU Control function has 3-distinct outputs: operation2 operation1 operation0

### **ALU Control Logic**

#### Mapping of ALU Control Function to Gates



What is the mapping for the ALU control functions?



| AL     | Uop    | Funct Field |    |    |    | ld |    | Operation<br>(Output) |   | ALU Control<br>Input | Functio     |
|--------|--------|-------------|----|----|----|----|----|-----------------------|---|----------------------|-------------|
| ALUop1 | ALUop0 | F5          | F4 | F3 | F2 | F1 | F0 |                       |   | input                |             |
| 0      | 0      | Х           | Х  | Х  | Х  | Х  | Х  | 0010                  |   | 0000                 | AND         |
| Х      | 1      | X           | Х  | Х  | Х  | X  | Х  | 0110                  |   | 0001                 | OR          |
| 1      | Х      | X           | Х  | 0  | 0  | 0  | 0  | 0010                  |   | 0001                 |             |
| 1      | X      | x           | X  | 0  | 0  | 1  | 0  | 0110                  |   | 0010                 | add         |
|        |        |             |    |    |    |    |    |                       |   | 0110                 | subtract    |
| 1      | X      | X           | X  | 0  | 1  | 0  | 0  | 0000                  |   | 0110                 | 30011001    |
| 1      | Х      | X           | X  | 0  | 1  | 0  | 1  | 0011                  |   | 0111                 | set on less |
| 1      | Х      | X           | X  | 1  | 0  | 1  | 0  | 0111                  | 1 |                      | than        |

### Designing the Main Control Unit Datapath

Let's examine MIPS Instruction Fields & Control Lines:

| Field           | ор     | rs     | rt     | rd     | shamt  | funct  |
|-----------------|--------|--------|--------|--------|--------|--------|
| Bit<br>Position | 31:26  | 25:21  | 20:16  | 15:11  | 10:16  | 5:0    |
| # of bits       | 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |

#### **R-Type Instruction**

#### **Observations:**

- •The op field in bits 31:26 always set to 000000
- •Source Registers: Identified in bits 25:21 & bits 20:16 resp.
- **Destination** Register: Identified in bits 15:11
- •The shamt field is never used (ignore)
- •The funct field is coded as per ALU design

### Designing the Main Control Unit Datapath

Load word, store word and branch-on-equal:

| Field           | ор     | rs     | rt     | address |
|-----------------|--------|--------|--------|---------|
| Bit<br>Position | 31:26  | 25:21  | 20:16  | 15:0    |
| # of bits       | 6 bits | 5 bits | 5 bits | 16 bits |

#### I-Type Instruction

**Observations:** 

- The op field same position as R-type format
- The base register (rs) and rt fields same positions as R-type format •The address field in bits 15 - 0

### Designing the Main Control Unit Observations

- The Op field is always in bits 31 26
  - Labeled as op5, Op4, Op3, Op2, Op1, Op0
- Every instruction reads register specified by rs field
- Every instruction, except load word, reads the register specified by rt field
  - The load word writes to the **rt** field
- The base register for load and store word is always specified by rs field
- The destination register is one of two places:
  - □ For *load word,* it is **rt** field (bits 20 -16)
  - □ For R-Type instruction it is in rd field (bits 15 -11)
  - Thus a multiplexor needs to be added to the datapath to select the correct field for the write register input

Designing the Main Control Unit Effects of control signals

#### Fig 5.17 shows the datapath with 7 control lines

| Signal Name | Effect if 0                                                | Effect if 1                                            |
|-------------|------------------------------------------------------------|--------------------------------------------------------|
| RegDst      | Destination number of write register is in <b>rt</b> field | Destination write register is in <b>rd</b> field       |
| RegWrite    | None                                                       | Store Write data input into<br>Destination register    |
| ALUSrc      | Send rt register to ALU                                    | Send sign-extended lower 16 bits of instruction to ALU |
| PCSrc       | Send PC+4 to PC                                            | Send branch target to PC                               |
| MemRead     | None                                                       | Read contents of addressed memory word                 |
| MemWrite    | None                                                       | Store Write data into addressed memory word            |
| MemtoReg    | Send ALU result to register file                           | Send word to register file                             |

#### Designing the Main Control Unit Control unit computes settings of control lines

Х

0

0

0

0



beq

X

0

Designing the Main Control Unit Effects of control signals

- Fig 5.17 shows two additional control line control lines
   ALUOp1 and ALUOp0 (ALUOp)
- The states of all control lines (except PCSrc) are determined by Op code field
- The state of PCSrc is determined by
  - AND gate with inputs: Zero output of ALU and Branch control line
  - The Branch control line is asserted if Op field equals 000100 (beq)

### Designing the Main Control Unit Control unit computes settings of control lines

| Instruction | RegDst | ALUSrc | Memto-<br>Reg | Reg<br>Write | Mem<br>Read | Mem<br>Write | Branch | ALUOp1 | ALUOp0 |
|-------------|--------|--------|---------------|--------------|-------------|--------------|--------|--------|--------|
| R-format    | 1      | 0      | 0             | 1            | 0           | 0            | 0      | 1      | 0      |
| lw          | 0      | 1      | 1             | 1            | 1           | 0            | 0      | 0      | 0      |
| SW          | X      | 1      | X             | 0            | 0           | 1            | 0      | 0      | 0      |
| beq         | Х      | 0      | Х             | 0            | 0           | 0            | 1      | 0      | 1      |

R-format Instructions: add, sub, slt Source Register fields ~ rs and rt Destination register field ~ rd

add \$t1, \$t2, \$s1

### The datapath in operation

| AL        | Uop        |        | Fu     | Operation<br>(Output) |        |        |        |                    |
|-----------|------------|--------|--------|-----------------------|--------|--------|--------|--------------------|
| ALUo<br>1 | ALUop<br>0 | F<br>5 | F<br>4 | F<br>3                | F<br>2 | F<br>1 | F<br>0 |                    |
| 0         | 0          | Х      | Х      | Х                     | Х      | Х      | Х      | 0 <mark>010</mark> |
| Х         | 1          | Х      | Х      | Х                     | Х      | Х      | Х      | 0110               |
| 1         | Х          | х      | х      | 0                     | 0      | 0      | 0      | 0 <mark>010</mark> |
| 1         | Х          | Х      | Х      | 0                     | 0      | 1      | 0      | 0110               |
| 1         | х          | Х      | Х      | 0                     | 1      | 0      | 0      | 0000               |
| 1         | Х          | Х      | Х      | 0                     | 1      | 0      | 1      | 0011               |
| 1         | х          | Х      | Х      | 1                     | 0      | 1      | 0      | 0 <mark>111</mark> |

| Signal<br>Name | Effect if 0                                                      | Effect if 1                                               |  |  |
|----------------|------------------------------------------------------------------|-----------------------------------------------------------|--|--|
| RegDst         | Destination number<br>of write register is in<br><b>rt</b> field | Destination write register is<br>in <b>rd</b> field       |  |  |
| RegWrite       | None                                                             | Store Write data input into<br>Destination register       |  |  |
| ALUSrc         | Send rt register to<br>ALU                                       | Send sign-extended lower<br>16 bits of instruction to ALU |  |  |
| PCSrc          | Send PC+4 to PC                                                  | Send branch target to PC                                  |  |  |
| MemRead        | None                                                             | Read contents of addressed memory word                    |  |  |
| MemWrite       | None                                                             | Store Write data into addressed memory word               |  |  |
| MemtoReg       | Send ALU result to<br>register file                              | Send word to register file                                |  |  |

| Instruction | RegDst | ALUSrc | Memto-<br>Reg | RegW<br>rite | MemR<br>ead | Mem<br>Write | Branch | ALUOp<br>1 | ALUOp<br>0 |
|-------------|--------|--------|---------------|--------------|-------------|--------------|--------|------------|------------|
| R-format    | 1      | 0      | 0             | 1            | 0           | 0            | 0      | 1          | 0          |
| lw          | 0      | 1      | 1             | 1            | 1           | 0            | 0      | 0          | 0          |
| sw          | Х      | 1      | Х             | 0            | 0           | 1            | 0      | 0          | 0          |
| beq         | Х      | 0      | Х             | 0            | 0           | 0            | 1      | 0          | 1          |

# The datapath in operation Truth Table

| Input/<br>Output | Signal<br>Name | R-format | Iw | SW | beq |
|------------------|----------------|----------|----|----|-----|
| Inputs           | Op5            | 0        | 1  | 1  | 0   |
|                  | Op4            | 0        | 0  | 0  | 0   |
|                  | Op3            | 0        | 0  | 1  | 0   |
|                  | Op2            | 0        | 0  | 0  | 1   |
|                  | Op1            | 0        | 1  | 1  | 0   |
|                  | Op0            | 0        | 1  | 1  | 0   |
| Outputs          | RegDst         | 1        | 0  | X  | Х   |
|                  | ALUSrc         | 0        | 1  | 1  | 0   |
|                  | MemtoReg       | 0        | 1  | X  | X   |
|                  | RegWrite       | 1        | 1  | 0  | 0   |
|                  | MemRead        | 0        | 1  | 0  | 0   |
|                  | MemWrite       | 0        | 0  | 1  | 0   |
|                  | Branch         | 0        | 0  | 0  | 1   |
|                  | AlUOp1         | 1        | 0  | 0  | 0   |
|                  | ALUOp0         | 0        | 0  | 0  | 1   |

Computer Architecture CS 35101-002

| Th               | The datapath in operation |              |        |        | Sig<br>Nar |                      | E            | ffect if                               | 0        |          |                                             | Ef                         | fect if        | 1        |                   |          |
|------------------|---------------------------|--------------|--------|--------|------------|----------------------|--------------|----------------------------------------|----------|----------|---------------------------------------------|----------------------------|----------------|----------|-------------------|----------|
| Input/<br>Output | Signal<br>Name            | R-<br>format | Iw     | sw     | beq        | RegDs                | st           | Destina<br>of write<br><b>rt</b> field |          |          |                                             | Destin<br>n <b>rd</b> fi   |                | write    | registe           | er is    |
| Inputs           | Op5                       | 0            | 1      | 1      | 0          | RegWrite             |              | None                                   |          |          |                                             |                            | Write<br>ation |          | nput iı<br>er     | nto      |
|                  | Op4<br>Op3                | 0            | 0      | 0      | 0          | ALUS                 | rc           | Send rt<br>ALU                         | registe  | er to    | 5                                           | Send                       | sign-e         | xtend    | ed lov<br>on to . |          |
|                  | Op2<br>Op1                | 0            | 0<br>1 | 0<br>1 | 1<br>0     | PCSrc                |              | Send PC+4 to PC                        |          |          | 5                                           | Send branch target to PC   |                |          | °C                |          |
| Outpu            | Op0<br>RegDst             | 0<br>1       | 1<br>0 | 1<br>X | 0<br>X     | MemRead              |              | None                                   |          |          | Read contents of addressed memory word      |                            |                | ssed     |                   |          |
| ts               | ALUSrc                    | 0            | 1      | 1      | 0          | MemWrite<br>MemtoReg |              | None                                   |          |          | Store Write data into addressed memory word |                            | d              |          |                   |          |
|                  | Memto<br>Reg<br>RegWri    | 0            | 1      | X<br>0 | X<br>0     |                      |              | Send ALU result to register file       |          |          |                                             | Send word to register file |                |          |                   |          |
|                  | te                        | 0            | 1      | 0      | 0          |                      | Instruc      | <u> </u>                               | AL       | Ме       | Re                                          | м                          | Ме             | Br       | AL                | AL       |
|                  | ead                       | _            |        |        |            |                      | tion         | Dst                                    | USr<br>c | mt<br>o- | g<br>W                                      | e<br>m                     | m<br>Wr        | an<br>ch | UO<br>p1          | UO<br>p0 |
|                  | MemW<br>rite              | 0            | 0      | 1      | 0          |                      |              |                                        |          | Re<br>g  | rit<br>e                                    | Re<br>ad                   | ite            |          |                   |          |
|                  | Branch<br>AIUOp           | 0            | 0      | 0      | 1<br>0     |                      | R-<br>format | 1<br>t                                 | 0        | 0        | 1                                           | 0                          | 0              | 0        | 1                 | 0        |
|                  | 1<br>ALUOp                | 0            | 0      | 0      | 1          |                      | lw           | 0                                      | 1        | 1<br>    | 1                                           | 1                          | 0              | 0        | 0                 | 0        |
|                  | 0                         |              |        |        |            |                      | sw<br>beq    | X<br>X                                 | 1        | X<br>X   | 0                                           | 0                          | 1<br>0         | 0        | 0                 | 0        |

### Single-cycle implementation Let's summarize

| Instruction<br>Class | Major Functional Units used by Instruction class/Single Clock cycle |                    |                  |                  |                    |  |  |  |
|----------------------|---------------------------------------------------------------------|--------------------|------------------|------------------|--------------------|--|--|--|
|                      | Instruction<br>Memory                                               | Register<br>Read   | ALU<br>Operation | Data<br>Memory   | Register<br>Write  |  |  |  |
| R-type               | Fetch<br>Instruction                                                | Access<br>Register | ALU              |                  | Register<br>access |  |  |  |
| Load word            | Fetch<br>Instruction                                                | Access<br>Register | ALU              | Memory<br>access | Register<br>access |  |  |  |
| Store word           | Fetch<br>Instruction                                                | Access<br>Register | ALU              | Memory<br>access |                    |  |  |  |
| Branch               | Fetch<br>Instruction                                                | Access<br>Register | ALU              |                  |                    |  |  |  |

#### Other Units: PC, Control, Multiplexors

### Single-cycle implementation Let's summarize

Assume Critical Machine Operation Times: Memory Unit ~ 200 ps, ALU and adders ~ 100ps, Register file (read/write) ~ 50 ps. Estimate clock cycle for the machine

| Instruction<br>Class | Ν                     | Period           |                  |                |                   |        |
|----------------------|-----------------------|------------------|------------------|----------------|-------------------|--------|
|                      | Instruction<br>Memory | Register<br>Read | ALU<br>Operation | Data<br>Memory | Register<br>Write |        |
| R-type               | 200                   | 50               | 100              | 0              | 50                | 400 ps |
| Load word            | 200                   | 50               | 100              | 200            | 50                | 600 ps |
| Store word           | 200                   | 50               | 100              | 200            |                   | 550 ps |
| Branch               | 200                   | 50               | 100              | 0              |                   | 350 ps |

Single Clock cycle determined by longest instruction period = **600 ps** 

A Multicycle Implementation Datapath

Basic Idea ("Divide and Conquer")

- Break MIPS Instructions into independent and "manageable" steps
  - Balance workload across steps
- Each step takes one clock cycle
- Re-use functional units in different steps
  - Pack as much work into each step
  - At most one ALU operation, one register file access or one memory access per step

### A Multicycle Datapath Implementation Approach

- A single memory replaces the instruction memory and data memory associated with single-cycle approach
- A single ALU performs the addition operations (thereby eliminating the two adders for single-cycle operation)
- A 32-bit Instruction Register (IR) is added to store instruction fetched from memory
- A 32-bit Memory Data Register (MDR) is added to hold data fetched from memory
- Two 32-bit registers, A and B are added to store values of source registers *rs* and *rt* respectively
- A 32-bit register, ALUOut, is added to temporarily hold the results of the ALU
- Use FSM to determine control signals (instead of instructions)

### A Multicycle Datapath Implementation Actions of the 1bit Control Signals

| Signal Name | Effect if 0                                                | Effect if 1                                      |
|-------------|------------------------------------------------------------|--------------------------------------------------|
| RegDst      | Destination number of write register is in <b>rt</b> field | Destination write register is in <b>rd</b> field |
| RegWrite    | None                                                       | Store Write data input into Destination register |
| ALUSrcA     | First ALU Operand = PC                                     | First ALU Operand = A register                   |
| MemRead     | None                                                       | Read contents of addressed memory word           |
| MemWrite    | None                                                       | Store Write data into addressed memory word      |
| MemtoReg    | Send ALUOut result to register file                        | Send MDR to register file                        |
| lorD        | Memory address is PC                                       | Memory address is ALUOut                         |
| IRWrite     | None                                                       | Store memory word into the IR                    |
| PCWrite     | None                                                       | Store a new value into the PC                    |
| PCWriteCond | None                                                       | Store a new value into PC if ALU result is 0     |

### A Multicycle Datapath Implementation Actions of the 2 bit Control Signals

| Signal Name | Value | Effect if 1                                                                       |
|-------------|-------|-----------------------------------------------------------------------------------|
| ALUOp       | 00    | ALU performs an ADD operation                                                     |
|             | 01    | ALU performs a subtract operation                                                 |
|             | 10    | ALU performs operation specified by <i>funct</i> field of the IR                  |
| ALUSrcB     | 00    | Second ALU Operand is the B register                                              |
|             | 01    | Second ALU Operand is the constant 4                                              |
|             | 10    | Second ALU Operand is sign-extended lower 16 bits of the IR                       |
|             | 11    | Second ALU Operand is sign-extended lower 16 bits of the IR shifted left 2 places |
| PCSource    | 00    | New PC value comes from ALU (PC+4)                                                |
|             | 01    | New PC value comes from branch target address in ALUOut                           |
|             | 10    | New PC value is jump target address                                               |

### A Multicycle Datapath Implementation Approach



Breaking the Instructions Example: add \$t0, \$t1, \$t2

- The add instruction changes value of a register (Reg).
  - Register specified by bits 15:11 of Instruction.
- Instruction specified by the PC.
- New value is the sum ("op") of two registers.
  - Registers specified by bits 25:21 and 20:16 of the instruction

Breaking the Instructions Example: add \$t0, \$t1, \$t2

Could break down to:

- □ IR <= Memory[PC]
- □ A <= Reg[IR[25:21]]
- □ B <= Reg[IR[20:16]]
- □ ALUOut <= A op B
- □ Reg[IR[15:11]] <= ALUOut

A Multicycle Datapath Implementation Five Execution steps

- 1. Instruction Fetch
- 2. Instruction Decode and Register Fetch
- 3. Execution, Memory Address Computation, or Branch Completion
- 4. Memory Access or R-type instruction completion
- 5. Memory Read Completion

### 1. Instruction Fetch

- Use PC to get instruction and put it in the Instruction Register.
- Increment the PC by 4 and put the result back in the PC.

```
IR <= Memory[PC];
PC <= PC + 4;</pre>
```

Why update the PC now?

| Signal Settings: |  | Sig<br>Na |      | Set                       | to 0                                 | Set to 1                                  |           |
|------------------|--|-----------|------|---------------------------|--------------------------------------|-------------------------------------------|-----------|
|                  |  | ALUS      | rcA  | First ALU Operand<br>= PC |                                      |                                           |           |
|                  |  | MemF      | Read |                           |                                      | Read contents of<br>addressed memory word |           |
|                  |  | lorD      |      | Memory a<br>PC            | ddress is                            |                                           |           |
|                  |  | IRWri     | te   |                           |                                      | Store memory word into the IR             |           |
|                  |  | PCWr      | ite  |                           |                                      | Store a new value into the PC             |           |
| ALUSro           |  | сВ        | 01   |                           | Second ALU Operand is the constant 4 |                                           |           |
|                  |  | р         |      | 00                        | New PC value comes from Al           |                                           | LU (PC+4) |

### 2. Instruction Decode and Register Fetch

- Read registers *rs* and *rt* in case we need them
- Compute the branch address in case the instruction is a branch

What is the signal setting for step 2?

3. Execution, Memory Address Computation, or Branch Completion

- ALU performs one of three functions, based on instruction type
  - Memory Reference:
     ALUOut <= A + sign-extend(IR[15:0]);
    </pre>
  - R-type:
     ALUOut <= A op B;
    </pre>
  - Branch: if (A==B) PC <= ALUOut;</pre>

Determine the signal settings for each instruction type

4. Memory Access or R-type instruction completion

### Loads and stores access memory

MDR <= Memory[ALUOut];
 or
Memory[ALUOut] <= B;</pre>

### R-type instructions finish

Reg[IR[15:11]] <= ALUOut;</pre>

Memory Read Completion

Reg[IR[20:16]] <= MDR;</pre>

Only the *load word* instruction executes this step

# A Multicycle Datapath Implementation Let's summarize

| Step Name                                                    | Action for R-<br>type Instructions                                                           | Action for<br>memory-<br>reference<br>instruction                   | Action for<br>branches      |  |  |  |
|--------------------------------------------------------------|----------------------------------------------------------------------------------------------|---------------------------------------------------------------------|-----------------------------|--|--|--|
| Instruction fetch                                            | IR <= Memory[PC]<br>PC <= PC + 4                                                             |                                                                     |                             |  |  |  |
| Instruction decode/register fetch                            | A <= Reg [IR[25:21]]<br>B <= Reg [IR[20:16]]<br>ALUOut <= PC + (sign-extend (IR[15:0]) << 2) |                                                                     |                             |  |  |  |
| Execution, address<br>computation, branch/jump<br>completion | ALUOut <= A op B                                                                             | ALUOut <= A + sign-<br>extend (IR[15:0])                            | If (A == B) PC <=<br>ALUOut |  |  |  |
| Memory access or R-type completion                           | Reg [IR[15:11]] <=<br>ALUOut                                                                 | Load: MDR <=<br>Memory[ALUOut] or<br>Store: Memory<br>[ALUOut] <= B |                             |  |  |  |
| Memory read completion                                       |                                                                                              | Load: Reg[IR[20:16]]<br><= MDR                                      |                             |  |  |  |

### Multicycle Datapath Example

 How many cycles will it take to execute the following MIPS code

lw \$s2, 0(\$s3)
lw \$s3, 4(\$s3)
beq \$s2, \$s3, Label
add \$s5, \$s2, \$s3
sw \$s5, 8(\$s3)

#assume not

### Finite State Machine (FSM)

Overview

- Subsystems characterized by:
  - □ Set of **well defined** states (States: 1, 2, 3,4, 5)
  - State function determined typically by
    - Current state
    - Input values
  - Output function determined typically by
    - Current state
    - Input values