Computation structures

Pierre Wolper

Email: Pierre.Wolper@ulg.ac.be
URL: http://www.montefiore.ulg.ac.be/~pw/
     http://www.montefiore.ulg.ac.be/~pw/cours/struct.html

References

Course objectives


   (a) To study in detail how a simple programmable machine can be built.

   (b) To introduce processor performance enhancement techniques.

   (c) To overview the concepts used in order to make the task of programming a computer more manageable: interrupts and supervisor (privileged) mode, virtual memory, ...
2. **Operating systems and Concurrent programming**

(a) To introduce programming with processes and the organization of an operating system kernel.

(b) To develop the concepts pertaining to programming with processes and interprocess communication.

(c) To study classical parallel programming problems.

(d) To give a brief introduction to questions concerning parallel machines and how they are programmed.
Exercises and programming assignments

- Exercises on processor design and microcode.

- Exercises and a programming assignment in assembly language on low level programming and operating system kernels.

- Exercises and a programming assignment on using parallel processes and interprocess communication (using C as programming language).
Digital electronic circuits:
review

We will use basic facts about digital circuits.

- In digital circuits, an electrical signal is used to represent a Boolean (binary) piece of information \((0, 1)\), for example \([0, 1]\) volts represents 0 and \([2, 3]\) volts represents 1.

- Boolean circuits can perform operations on the represented Boolean values.

- We will use three types of circuits: combinatorial circuits, memory elements and synchronous sequential circuits.
Digital circuits
combinatorial circuits

- Each $f_i$ is a Boolean function ($\wedge, \lor, \neg$) of $x_1, \ldots, x_n$. If the inputs ($x_i$) change, there is a certain delay before the outputs ($f_i$) stabilize at the correct value.
Digital circuits
memory elements

CLK

LD

\( x_1 \ x_2 \ x_3 \ \ldots \ \ x_n \)

\( a_1 \ a_2 \ a_3 \ \ldots \ a_k \)

\( y_1 \ y_2 \ y_3 \ \ldots \ \ y_n \)

- CLK is a clock signal:
• When LD is active (value 1), the values of $x_1, \ldots, x_n$ are stored at the address given by $a_1, \ldots, a_k$ at the next ($0 \rightarrow 1$) clock transition. The values of the outputs $y_1, \ldots, y_n$ are equal to the values stored at the address $a_1, \ldots, a_k$.

• A register is a one location memory (no address)
At each clock transition, a Boolean function of the inputs and of the previous register contents is loaded in the registers.

The outputs are a Boolean function of the inputs and of the register contents.
From synchronous circuits to processors

A processor is a synchronous sequential circuit with a few peculiarities.

1. The circuit is divided into a “data” and a “control” part. The “data” part performs operations on memorized data; the “control” determines which operations are performed.
2. A large capacity memory is included in the “data” part (the machine’s main memory).

3. The behavior of the control part is influenced by the data part:

- control influenced -by the results of operations on the data;
- control determined by a program stored in the data path memory.
A simple data path

In the data path shown below, one finds memory elements and a computation element (a combinatorial circuit) interconnected by a bus.
A bus is a collection of shared interconnection lines.

- Each element with an output connected to the bus can be active (0 or 1) or inactive (disconnected – this is called 3 state logic).

- The bus presented is a simple bus with a centralized control managed by the control unit.

The signals needed for the data path to operate will be generated by the control unit.

To design the control unit one must first define the machine that is to be built.
A machine architecture:
The $\beta$ machine

- A 32 bit architecture: the registers contain 32 bits and the memory addresses are 32 bits long, which makes it possible to address $2^{32}$ bytes of memory.

- The machine has 32 (0 to 31) 32-bit registers and a 32-bit program counter whose value is always a multiple of 4. Register 31 always contains the value 0.

- All instructions are 32-bit long.
\( \beta \) Machine: instruction formats

All operations of the \( \beta \) Machine (except those accessing the main memory) are performed on registers.

There are two possible instruction formats:

- The format without a literal (\textit{valeur constante})

\[
\begin{array}{ccccccc}
31 & 26 & 25 & 21 & 20 & 16 & 15 & 11 & 10 & 0 \\
\hline
\text{Opcode} & \text{Rc} & \text{Ra} & \text{Rb} & \text{unused}
\end{array}
\]

- The format with a literal

\[
\begin{array}{cccccc}
31 & 26 & 25 & 21 & 20 & 16 & 15 & 0 \\
\hline
\text{Opcode} & \text{Rc} & \text{Ra} & \text{value (2’s complement)}
\end{array}
\]
\beta \text{ Machine : arithmetic instructions}

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Without a literal</th>
<th>Opcode</th>
<th>With a literal</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x20</td>
<td>ADD</td>
<td>0x30</td>
<td>ADDC</td>
</tr>
<tr>
<td>0x21</td>
<td>SUB</td>
<td>0x31</td>
<td>SUBC</td>
</tr>
<tr>
<td>0x22</td>
<td>MUL</td>
<td>0x32</td>
<td>MULC</td>
</tr>
<tr>
<td>0x23</td>
<td>DIV</td>
<td>0x33</td>
<td>DIVC</td>
</tr>
</tbody>
</table>

\[
\text{ADD}(Ra,Rb,Rc) : \quad \text{PC} \leftarrow \text{PC} + 4 \\
\quad \text{Reg}[Rc] \leftarrow \text{Reg}[Ra] + \text{Reg}[Rb]
\]

\[
\text{ADDC}(Ra,\text{literal},Rc) : \quad \text{PC} \leftarrow \text{PC} + 4 \\
\quad \text{Reg}[Rc] \leftarrow \text{Reg}[Ra] + \text{SEXT}(\text{literal})
\]

\text{SEXT}(\text{literal}) \text{ represents the value contained in the instruction extended from 16 to 32 bits, the sign being preserved.}

\text{The definitions of SUB, MUL and DIV are similar.}
β Machine: comparison instructions

<table>
<thead>
<tr>
<th>Without a literal</th>
<th>With a literal</th>
</tr>
</thead>
<tbody>
<tr>
<td>Opcode</td>
<td>name</td>
</tr>
<tr>
<td>0x24</td>
<td>COMPEQ</td>
</tr>
<tr>
<td>0x25</td>
<td>COMPLT</td>
</tr>
<tr>
<td>0x26</td>
<td>COMPLE</td>
</tr>
</tbody>
</table>

COMPEQ(Ra,Rb,Rc) : 
PC ← PC + 4
if Reg[Ra] = Reg[Rb] then Reg[Rc] ← 1
else Reg[Rc] ← 0

COMPEQC(Ra,literal,Rc) :
PC ← PC + 4
if Reg[Ra] = SEXT(literal) then Reg[Rc] ← 1
else Reg[Rc] ← 0

CMPLT and CMPLTC compare according to the < relation, CMPLE and CMPLEC according to ≤.
\[ \beta \text{ Machine: logical instructions} \]

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Opcode name</th>
<th>Opcode</th>
<th>Opcode name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x28</td>
<td>AND</td>
<td>0x38</td>
<td>ANDC</td>
</tr>
<tr>
<td>0x29</td>
<td>OR</td>
<td>0x39</td>
<td>ORC</td>
</tr>
<tr>
<td>0x2A</td>
<td>XOR</td>
<td>0x3A</td>
<td>XORC</td>
</tr>
</tbody>
</table>

\[ \text{AND}(Ra,Rb,Rc) : \quad PC \leftarrow PC + 4 \]
\[ \text{Reg}[Rc] \leftarrow \text{Reg}[Ra] \& \text{Reg}[Rb] \]

\[ \text{ANDC}(Ra,\text{literal},Rc) : \quad PC \leftarrow PC + 4 \]
\[ \text{Reg}[Rc] \leftarrow \text{Reg}[Ra] \& \text{SEXT(literal)} \]

\text{OR and ORC compute “or”, XOR and XORC exclusive “or”}.
**β Machine : shift instructions**

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Opcode</th>
<th>name</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x2C</td>
<td>0x3C</td>
<td>SHL</td>
<td>SHLC</td>
</tr>
<tr>
<td>0x2D</td>
<td>0x3D</td>
<td>SHR</td>
<td>SHRC</td>
</tr>
<tr>
<td>0x2E</td>
<td>0x3E</td>
<td>SRA</td>
<td>SRAC</td>
</tr>
</tbody>
</table>

SHL(Ra,Rb,Rc) : PC ← PC + 4
Reg[Rc] ← Reg[Ra] ≪ Reg[Rb]4:0

SHLC(Ra,literal,Rc) : PC ← PC + 4
Reg[Rc] ← Reg[Ra] ≪ literal4:0

The content of Ra is shifted by a number of positions given by the 5 least significant bits of Rb and the result is placed in Rc. 0 bits are introduced in the freed positions.

SHR and SHRC shift to the right (0 bits are introduced in the freed positions).

SRA and SRAC also shift to the right, but the sign bit (Reg[Ra]31) is introduced in the freed positions.
**β Machine: memory access instructions**

<table>
<thead>
<tr>
<th>Opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x18</td>
<td>LD</td>
</tr>
<tr>
<td>0x19</td>
<td>ST</td>
</tr>
</tbody>
</table>

LD(Ra, literal, Rc) : PC ← PC + 4  
EA ← Reg[Ra] + SEXT(literal)  
Reg[Rc] ← Mem[EA]

ST(Rc, literal, Ra) : PC ← PC + 4  
EA ← Reg[Ra] + SEXT(literal)  
Mem[EA] ← Reg[Rc]

EA is called the “effective address”. 
β Machine: memory access instructions II

<table>
<thead>
<tr>
<th>Opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x1F</td>
<td>LDR</td>
</tr>
</tbody>
</table>

LDR(label,Rc) : 
PC ← PC + 4
EA ← PC + 4 × SEXT(literal)
Reg[Rc] ← Mem[EA]

In LDR, the instruction given to the assembler contains a label, (label). In the assembled instruction, the literal is computed from the label as follows:

\[ \text{literal} = \left( \left( \text{OFFSET}(\text{label}) - \text{OFFSET}(\text{current inst.}) \right) \div 4 \right) - 1 \]
β Machine: branch instructions

<table>
<thead>
<tr>
<th>Opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x1B</td>
<td>JMP</td>
</tr>
</tbody>
</table>

JMP(Ra,Rc) :  
PC ← PC + 4  
EA ← Reg[Ra] & 0xFFFFFFFFFC  
Reg[Rc] ← PC  
PC ← EA

The 2 least significant bits of Ra are set to 0 to force the address to be a word address. The address of the instruction following the JMP is saved in Rc.
**β Machine : branch instructions II**

<table>
<thead>
<tr>
<th>Opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x1D</td>
<td>BEQ/BF</td>
</tr>
<tr>
<td>0x1E</td>
<td>BNE/BT</td>
</tr>
</tbody>
</table>

BEQ(Ra,label,Rc) :  
PC ← PC + 4  
EA ← PC + 4 × SEXT(literal)  
TMP ← Reg[Ra]  
Reg[Rc] ← PC  
if TMP = 0 then PC ← EA

In BEQ, the instruction given to the assembler contains a label. In the assembled instruction, the literal is computed as follows

\[
\text{literal} = (\text{OFFSET}(\text{label}) - \text{OFFSET}(<\text{current inst}.>)) \div 4 - 1
\]

BNE is similar to BEQ except that the test is “different from 0”

TMP is used because Ra and Rc could be the same register.
An implementation of the $\beta$ Machine: The machine ULg01

We start from the data path described previously.

- The static RAM will be used to implement the registers.

- The dynamic RAM will be used to implement the main memory.

- On still has to introduce a program counter (PC) and a control unit.
A first sketch of ULg01
A detailed view of the PC of ULg01

Only 20 bits are used because the machine being built will not have more than 4 megabytes (1 megaword) of memory.
A detailed view of the SRAM of ULg01

The circuit is designed to produce the value 0 when register 31 is read.
The DRAM module must contain refresh circuitry.

The clock speed is chosen so that the DRAM can operate at the same speed as the rest of the machine. In an optimized implementation, the DRAM operates at a lower speed.
A detailed view of the control unit of ULg01

The control unit is microprogrammed and allows up to 16 phases by instruction. The microprogram is stored in a ROM.
The ALU flags are used as address bits into the ROM. This allows the implementation of conditional instructions. The flags are

- **E** whose value is 1 is the output of the ALU is equal to $0xFFFFFFFF$,
- **C** The complemented carry bit,
- **N** the sign bit (bit 31).

The left-hand part of the control unit is used to separate the different parts of an instruction and connect them to the appropriate lines of the bus.

All that needs to be done to finish the implementation of the $\beta$ Machine, is to write the microcode that has to be placed in the ROM.
The microcode of ULg01

The instruction $\text{OR}(Ra, Rb, Rc)$

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Phase</th>
<th>Flags</th>
<th>Latch</th>
<th>ALU</th>
<th>LD SEL</th>
<th>DR SEL</th>
<th>PC+</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>101001</td>
<td>0000</td>
<td>*</td>
<td>1</td>
<td>00000000</td>
<td>011</td>
<td>001</td>
<td>0</td>
<td>SMAR ← Ra</td>
</tr>
<tr>
<td>101001</td>
<td>0001</td>
<td>*</td>
<td>1</td>
<td>00000000</td>
<td>001</td>
<td>100</td>
<td>0</td>
<td>A ← SRAM</td>
</tr>
<tr>
<td>101001</td>
<td>0010</td>
<td>*</td>
<td>1</td>
<td>00000000</td>
<td>011</td>
<td>010</td>
<td>0</td>
<td>SMAR ← Rb</td>
</tr>
<tr>
<td>101001</td>
<td>0011</td>
<td>*</td>
<td>1</td>
<td>00000000</td>
<td>010</td>
<td>100</td>
<td>0</td>
<td>B ← SRAM</td>
</tr>
<tr>
<td>101001</td>
<td>0100</td>
<td>*</td>
<td>1</td>
<td>00000000</td>
<td>011</td>
<td>000</td>
<td>0</td>
<td>SMAR ← Rc</td>
</tr>
<tr>
<td>101001</td>
<td>0101</td>
<td>*</td>
<td>1</td>
<td>111001</td>
<td>101</td>
<td>011</td>
<td>0</td>
<td>SRAM ← A</td>
</tr>
<tr>
<td>101001</td>
<td>0110</td>
<td>*</td>
<td>1</td>
<td>00000000</td>
<td>100</td>
<td>110</td>
<td>1</td>
<td>DMAR ← PC; PC+</td>
</tr>
<tr>
<td>101001</td>
<td>0111</td>
<td>*</td>
<td>1</td>
<td>00000000</td>
<td>000</td>
<td>101</td>
<td>0</td>
<td>INSTREG ← DRAM</td>
</tr>
</tbody>
</table>

32
The instruction **JMP(Ra, Rc)**

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Phase</th>
<th>Flags</th>
<th>Latch flags</th>
<th>ALU F, C&lt;sub&gt;in&lt;/sub&gt;, Mode</th>
<th>LD SEL</th>
<th>DR SEL</th>
<th>PC+</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>011011</td>
<td>0000</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>011</td>
<td>001</td>
<td>0</td>
<td>SMAR ← Ra</td>
</tr>
<tr>
<td>011011</td>
<td>0001</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>011</td>
<td>000</td>
<td>0</td>
<td>A ← SRAM</td>
</tr>
<tr>
<td>011011</td>
<td>0010</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>101</td>
<td>110</td>
<td>0</td>
<td>SMAR ← Rc</td>
</tr>
<tr>
<td>011011</td>
<td>0011</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>111</td>
<td>011</td>
<td>0</td>
<td>SRAM ← PC</td>
</tr>
<tr>
<td>011011</td>
<td>0100</td>
<td>*</td>
<td>1</td>
<td>111111</td>
<td>111</td>
<td>011</td>
<td>0</td>
<td>PC ← A</td>
</tr>
<tr>
<td>011011</td>
<td>0101</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>100</td>
<td>110</td>
<td>1</td>
<td>DMAR ← PC; PC+</td>
</tr>
<tr>
<td>011011</td>
<td>0110</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>000</td>
<td>101</td>
<td>0</td>
<td>INSTREG ← DRAM</td>
</tr>
</tbody>
</table>
### The instruction \textbf{BEQ(Ra, label, Rc)}

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Phase</th>
<th>Flags</th>
<th>Latch flags</th>
<th>ALU F, C_in, Mode</th>
<th>LD SEL</th>
<th>DR SEL</th>
<th>PC+</th>
</tr>
</thead>
<tbody>
<tr>
<td>011101</td>
<td>0000</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>011</td>
<td>001</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>0001</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>001</td>
<td>100</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>0010</td>
<td>*</td>
<td>0</td>
<td>1111110</td>
<td>001</td>
<td>011</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>0011</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>011</td>
<td>000</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>0100</td>
<td>*</td>
<td>1</td>
<td>0000000</td>
<td>101</td>
<td>110</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>0101</td>
<td>E=0</td>
<td>1</td>
<td>0000000</td>
<td>100</td>
<td>110</td>
<td>1</td>
</tr>
<tr>
<td>011101</td>
<td>0110</td>
<td>E=0</td>
<td>1</td>
<td>0000000</td>
<td>000</td>
<td>101</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>0101</td>
<td>E=1</td>
<td>1</td>
<td>0000000</td>
<td>001</td>
<td>010</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>0110</td>
<td>E=1</td>
<td>1</td>
<td>1100100</td>
<td>001</td>
<td>011</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>0111</td>
<td>E=1</td>
<td>1</td>
<td>1100100</td>
<td>001</td>
<td>011</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>1000</td>
<td>E=1</td>
<td>1</td>
<td>0000000</td>
<td>010</td>
<td>110</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>1001</td>
<td>E=1</td>
<td>1</td>
<td>1001110</td>
<td>011</td>
<td>101</td>
<td>0</td>
</tr>
<tr>
<td>011101</td>
<td>1010</td>
<td>E=1</td>
<td>1</td>
<td>0000000</td>
<td>100</td>
<td>110</td>
<td>1</td>
</tr>
<tr>
<td>011101</td>
<td>1011</td>
<td>E=1</td>
<td>1</td>
<td>0000000</td>
<td>000</td>
<td>101</td>
<td>0</td>
</tr>
</tbody>
</table>

Reminder: The literal that is found in the instruction interpreted by the microcode is computed (by the assembler) from the label found in the symbolically written instruction as follows

\[
\text{literal} = ((\text{OFFSET(label)} - \text{OFFSET(current inst.})) \div 4) - 1
\]
L’instruction **LD(Ra, literal, Rc)**

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Phase</th>
<th>Flags</th>
<th>Latch flags</th>
<th>ALU F, C&lt;sub&gt;in&lt;/sub&gt;, Mode</th>
<th>LD SEL</th>
<th>DR SEL</th>
<th>PC+</th>
</tr>
</thead>
<tbody>
<tr>
<td>011000</td>
<td>0000</td>
<td>*</td>
<td>1</td>
<td>000000</td>
<td>011</td>
<td>001</td>
<td>0</td>
</tr>
<tr>
<td>011000</td>
<td>0001</td>
<td>*</td>
<td>1</td>
<td>000000</td>
<td>001</td>
<td>100</td>
<td>0</td>
</tr>
<tr>
<td>011000</td>
<td>0010</td>
<td>*</td>
<td>1</td>
<td>000000</td>
<td>010</td>
<td>010</td>
<td>0</td>
</tr>
<tr>
<td>011000</td>
<td>0011</td>
<td>*</td>
<td>1</td>
<td>000000</td>
<td>011</td>
<td>000</td>
<td>0</td>
</tr>
<tr>
<td>011000</td>
<td>0100</td>
<td>*</td>
<td>1</td>
<td>100110</td>
<td>100</td>
<td>011</td>
<td>0</td>
</tr>
<tr>
<td>011000</td>
<td>0101</td>
<td>*</td>
<td>1</td>
<td>000000</td>
<td>101</td>
<td>101</td>
<td>0</td>
</tr>
<tr>
<td>011000</td>
<td>0110</td>
<td>*</td>
<td>1</td>
<td>000000</td>
<td>100</td>
<td>110</td>
<td>1</td>
</tr>
<tr>
<td>011000</td>
<td>0111</td>
<td>*</td>
<td>1</td>
<td>000000</td>
<td>000</td>
<td>101</td>
<td>0</td>
</tr>
</tbody>
</table>

- **SMAR ← Ra**
- **A ← SRAM**
- **B ← Lit**
- **SMAR ← Rc**
- **DMAR ← A+B**
- **SRAM ← DRAM**
- **DMAR ← PC; PC+**
- **INSTREG ← DRAM**