Ch1: Computer Abstractions and Technology

先复习重点,考古题上的两个考点

考古题

1. Performance and CPU Frequency

Power Trends
Power = Capacitive load Voltage(电压)^2 Frequency(频率)

BUT
the power wll

  • we cannot reduce voltage further
  • we cannot remove more heat
  1. how can we improve performance?

NCTU_CO - 图1

  1. how can measure execution time?

image.png

  1. CPU Time

NCTU_CO - 图3

  • Performance improved by
    • Reduce number of clock cycles
    • increase clock rate
    • hardware designer must often trade off clock rate against cycle count
  1. Different numbers of cycles for different instructions

image.png

  • Multiplication takes more time than addition
  • Floating point operations take longer than integer ones
  • Access memory takes more time than access registers
  • important point: changing the cycle time often changes the number of cycles required for various instructions(more later)

2. Performance and CPI

  1. Instruction Count and CPI

NCTU_CO - 图5
NCTU_CO - 图6

  • Instruction Count for a program
    • determined by program, ISA and compiler
  • Average cycles per instruction
    • Determined by CPU hardware
    • if different instructions have different CPI
      • Average CPI affected by instruction mix

image.png

3. Performance Summary

NCTU_CO - 图8

  • Performance depends on

    • Algorithm: affects IC, possibly CPI
    • Programming language: affects IC, CPI
    • Compiler: affects IC, CPI
    • ISA(instruction set architecture): affects IC, CPI, T

      4. Pitfall: MIPS as a Performance Metric

      MIPS: Millions of Instructions Per Second
  • Does not account for

    • differences in ISAs between computers
    • differences in complexity between instructions

NCTU_CO - 图9

  • CPI varies between programs on a given CPU

Ch2: Instructions:Language of the Computer

复习重点:

考古题

1. Design Principle

  1. Simplicity favours regularity
  2. Small is faster
  3. Make the common case fast
  4. Good Design demands good compromises

2. Procedure call

3. Instruction Format and Addressing Mode

4. Synchronization and Linkage

Instruction set:

  • the repertoire of the instructions of a computer
  • Different computers have different instruction sets(But with many aspects in common)
  • Early computers had very simple Instruction sets(and simplified implementation)
  • Many modern computers also have simple instruction sets

MIPS Instruction Set

MIPS Instructions

Arithmetic Operations

  • Add and subtract three operands

    • Two sources and one destination
      1. add a,b,c # a gets b+c
  • All arithmetic operations have this form

  • Design Principle 1: Simplicity favours regularity
    • regularity makes implementation simpler
    • Simplicity enables higher performance at low cost

C Code

  1. f = (g + h) - (i + j);

Compiled MIPS Code

  1. add t0, g, h # temp t0 = g + h
  2. add t1, i, j # temp t1 = i + j
  3. sub f ,t0,t1 # f = t0 - t1

Register Operands

  • Arithmetic Operation use register Operands
  • MIPS has a 32 * 32-bit register file
    • Use for frequently accessed data
    • Numbered 0 to 31
    • 32-bit data called a “word”
  • Assembler names
    • $t0, $t1, …, $t9 for temporary values
    • $s0,$s1, …, $s7 for saved variables
  • Design Principle 2: Small is faster

C Code

  1. f = (g + h) - (i + j);

Compiled MIPS Code

  1. add $t0, $s1, $s2
  2. add $t1, $s3, $s4
  3. sub $s0, $t0, $t1

Memory Operands

  • Main memory used for composite data
    • Arrays, Structures, dynamic data
  • To apple arithmetic operations
    • load values from memory into registers(lw $t0, ,,,)
    • store values from register to memory(sw $t0,…)
  • Memory is byte addressed
    • each address identifies an 8-bit byte
  • Words are aligned in memory
    • Address must be multiple of 4
  • MIPS is Big Endian
    • Most-significant byte at least address of a word
    • Little Endian: Least-significant byte at least address

例如有数据0x01234567 (注意:左边是高字节,右边是低字节)
Big Endian:
存放的地址依次为:0x0001 0x0002 0x0003 0x0004
对应的数据依次为:0x01 0x23 0x45 0x67
示意图如下:
image.png
Little Endian:
存放的地址依次为:0x0001 0x0002 0x0003 0x0004
对应的数据依次为:0x67 0x45 0x23 0x01
示意图如下:
image.png
Registers vs. Memory

  • registers are faster to access than memory
  • Operating on memory data requires loads and stores
    • More instruction to be executed
  • Compiler must use registers for variables as much as possible

    • only spill to memory for less frequently used variables
    • register optimization is important

      Immediate Operands

  • Constant data specified in an instruction(addi $s3, $s3, 4)

  • No subtract immediate instruction(addi $s2, $s1, -1)
    • just use a negative constant
  • Design Principle 3: Make the common case fast
    • Small constants are common
    • Immediate operand avoids a load instruction

Constant Zero

  • MIPS register 0($zero) is the constant 0
    • cannot be overwritten
  • Useful for common operations
    • E.g., move between registers(add $t2, $s1, $zero)

Unsigned Binary Integers
image.png

2s-Complement Signed Integers
image.png

  • Bit 31 is sign bit
    • 1 for negative numbers
    • 0 for non-negative numbers
  • Signed Negation
    • Complement and add 1

NCTU_CO - 图14

Sign Extension

(在不同的 bit number 中做一个转换,比如一个负数可以用 4-bit 来表示,也可以用 8-bit 来表示,需要中间去做一个转换才行)

  • Representing a number using more bits(Preserve the numeric value)
  • In MIPS instruction set
    • addi: extend immediate value
    • lb, lh: extend loaded byte/halfword
    • beq, bne: extend the displacement
  • Replicate the sign bit to the left
    • unsigned values: extend with 0s
  • Examples: 8-bit to 16-bit

    • +2: 0000 0010 => 0000 0000 0000 0010
    • -2: 1111 1110 => 1111 1111 1111 1110

      Representing Instructions

  • instructions are encoded in binary(Called machine code)

  • MIPS Instructions
    • Encoded as 32-bit instruction words
    • Small number of formats encoding operation code(opcode), register numbers,…
    • Regularity
  • Register numbers
    • $t0 - $t7 are reg’s 8 - 15
    • $t8 - $t9 are reg’s 24 - 25
    • $s0 - $s7 are reg’s 16 - 23

      MIPS R-format Instructions

      | op | rs | rt | rd | shamt | funct | | —- | —- | —- | —- | —- | —- | | 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |

Instruction fields

  • op: operation code(opcode)
  • rs: first source register number
  • rt: second source register number
  • rd: destination register number
  • shamt: shift amount (00000 for now)
  • funct: function code(extends opcode)

For Example
image.png

Hexadecimal(十六进制)

  • base 16
    • compact representation of bit strings
    • 4 bits per hex digit

image.png

  • Example: eca8 6420
    • 1110 1100 1010 1000 0110 0100 0010 0000

MIPS I-format instructions

op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
  • Immediate arithmetic and load/store instructions
    • Constant: -2^15to +2^15 - 1
  • Design Principle 4: Good Design demands good compromises
    • different formats complicate decoding, but allow 32-bit instructions uniformly
    • keep formats as similar as possible

Logical Operations

Instructions for bitwise manipulation
image.png

  1. and/or $t0, $t1, $t2

NOT Operations
MIPS has NOR 3-operand instruction

  • a NOR b == NOT( a OR b )

    1. nor $t0, $t1, $zero

    Shift Operations

    | op | rs | rt | rd | shamt | funct | | —- | —- | —- | —- | —- | —- | | 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |

  • shamt: how many positions to shift

  • shift left logical
    • shift left and fill with 0 bits
    • sll by i bits multiples by 2^i
  • shift right logical
    • shift right and fill with 0 bits
    • srl by i bits divides by 2^i (unsigned only)

Conditional Operations

branch to a labeled instruction if a condition is true( otherwise, continue sequentially)

  • beq rs, rt, L1
    • if (rs == rt) branch to instruction labeled L1;
  • bne rs, rt, L1
    • if (rs != rt) branch to instruction labeled L1;
  • j L1
    • unconditional jump to instruction labeled L1;

image.png
image.png

Basic Blocks

  • A basic blocks is a sequence of instructions with
    • No embedded branches(except at end)
    • No branch targets(except at beginning)
  • A compiler identifies basic blocks for optimization
  • An advanced processor can accelerate execution of basic blocks

More Conditional Operations
image.png


Branch Instruction Design
image.png**

Signed vs. Unsigned
image.png

Procedure Calling

  • Steps required
    1. place parameters in registers
    2. transfer control to procedure
    3. acquire storage for procedure
    4. perform procedure’s operations
    5. place result in register for caller
    6. return to place of call

      Register Usage

      Procedure Call Instructions


ch3: Arithmetic for Computers

复习重点

考古题

1. Addition Overflow

2. Multiplier hardware

3. Arithmetic Division

4. Floating point operation

5. Carry lookahead adder

ALU

Basic Arithmetic Logic Unit
One-bit ALU that performs AND, OR, and addition
image.png image.png
Enhanced Arithmetic Logic Unit
ALU that have NAND/NOR operation
image.pngimage.png
32-bit ALU

image.png

One-bit ALUs with Set Less Than

  • Set less than instruction requires a subtraction and then sets all but the least significant bit to 0, with the lsb set to 1 if a < b
  • Less signal line
    • lsb(least significant bit) - signed bit
    • all but the lsb - 0

Arithmetic for Computers

  • Operations on integers
    • addition and subtraction
    • multiplication and division
    • dealing with overflow
  • Floating-point real numbers
    • Representation and operations

Operations on integers

addition and subtraction

Integer Addition
image.png

Integer Subtraction
image.png


Dealing with Overflow
image.png**

Overflow Detection for Signed & Unsigned Addition

  • Signed addition

    1. addu $t0, $t1, $t2
    2. xor $t3, $t1, $t2(异或)
    3. slt $t3, $t3, $zero
    4. bne $t3, $zero, No_overflow
    5. xor $t3, $t0, $t1
    6. slt $t3, $t3, $zero
    7. bne $t3, $zero, Overflow
  • Unsigned addition

    1. addu $t0, $t1, $t2
    2. nor $t3, $t1, $zero
    3. sltu $t3, $t3, $t2
    4. bne $t3, $zero, Overflow

    Multiplication


ch4:The Processor

Introduction

  • CPU performance factors
    • instruction count
      • Determined by ISA and compiler
    • CPI and Cycle time
      • Determined by CPU hardware
  • We will examine two MIPS implementations
    • a simplified version
    • a more realistic pipelined version
  • Simple subset, shows most aspects

    • Memory reference: lw, sw
    • Arithmetic/logical: add, sub, and, or slt
    • Control transfer: beq, j

      Instruction Execution

      PC -> instruction memory, fetch instruction
      Register numbers -> register file, read registers
      Depending on instruction class
  • Use ALU to calculate

    • Arithmetic result
    • Memory address for load/store
    • Branch target address
  • Access data memory for load/store
  • PC <- target address or PC + 4

image.png

Control
image.png

Logic Design Basics

  • information encoded in

Sequential Elements
Register: stores data in a circuit

  • Uses a clock signal to determine when to update the stored value
  • Edge-triggered: update when Clk changes from 0 to 1

image.png
Register with write control

  • only updates on clock edge when write control input is 1
  • used when stored value is required later

image.png

Clocking Methodology
combinational logic transforms data during clock cycles

  • between clock edges
  • input from state elements, output to state element
  • longest delay determines clock period

Building a Datapath

Performance Issues of Single Cycle Design

  • longest delay determines clock period
    • critical path: load instruction
    • Instruction memory -> register file -> ALU -> data memory -> register file
  • Not feasible to vary period for different instructions
  • Violates design principle
    • Making the common case fast
  • We will improve performance by pipelining

MIPS Pipeline

five stages, one step per stage

  1. IF: instruction fetch from memory
  2. ID: instruction decode & register read
  3. EX: Execute operation or calculate address
  4. MEM: Access memory operand
  5. WB: Write result back to register

Hazards
Situations that prevent starting the next instruction in the next cycle

  • Structure hazards(a required resource is busy)
  • Data hazard(Need to wait for previous instruction to complete its data read/write)
  • Control hazard(Deciding on control action depends on previous instruction)

    Structure hazards

    conflict for use of a resource
    In MIPS pipeline with a single memory

  • load