Posts Tagged pipeline
Talking about CPU architectures: Then it came assembly chain
Posted by Alessandro Pignotti in Bar discussions on October 22, 2009
As we’ve seen in the previous essay in a simple processor design each instruction needs several cycles to be completed. Each instruction passes several steps: first it’s loaded from memory (FETCH0 state), then it’s decoded (FETCH1 state) and the appropriate code path is selected to compute the operation results. In the my example I’ve not included other two essential steps: operands fetching and results storing on memory. This sequential approach has a big drawback, as for most of the time the CPU circuits (such as the memory interface or the ALU) are not doing useful work. If on average each instruction takes five steps to complete, then the processor is only executing one instruction every five cycles. To solve this problem we can introduce the pipeline.
The pipeline concept is borrowed from the industrial assembly line. The basic idea is to decouple the sequential steps of each instruction by inserting buffer registers in between. Each stage of the pipeline is isolated, and on each clock cycle it can take the input from the register upstream and output the results into the register downstream. Of course not every instruction needs processing by all stages (for example it may or may not access memory) and so each stage has to have some kind of No Operation mode to pass trough the unmodified input register to the output.
A simple pseudo-verilog description of a pipelined processor follows. We are assuming that the instruction and data memory are physically separate: an Havard like architecture. This is not true in current mainstream architectures, as both instruction and data are stored in RAM, but may be considered true to a large extend if there is a fast instruction cache. Moreover it has to be noted that although the access_memory module is referenced both in the input fetching and output storing stages, it can only correctly serve one request at a time, and so some kind of arbitration has to be used. We can arbitrarily decide that reads are prioritized over write, but in case of a collision the operation will be for sure not correct, though safe for the electronics. The register_file module is just a multiplexer for the registers, it will output the value of the registers for the provided indexes. The module it’s used both in the operands fetching and writeback stages, but it’s safe because it allows two register reads and one write and the same time (three way access).
This toy processor is inspired from MIPS architecture and as you can see it uses fixed length instructions. But I’ve also tried to design something more similar to an x86 processor. The instruction set (which I’m too lazy to define extensively) could allow both register-register and register-memory operation, and output to either memory or registers. No indirect memory access is allowed, sorry. As the instructions are four byte long probably only a 16 or 24 bit flat address space could be accessed.
always @(posedge clock)
begin
//
//Instructrion fetch stage
FETCHED_INSTRUCTION <= get_instruction_memory(PC); PC <= PC+4;
//
//Instruction decoding stage
{OPCODE_0, OPERAND1_TYPE, REG1_INDEX, REG2_INDEX, MEM_ADDRESS_0, WRITE_TO_MEMORY_0, DEST_REG_0} <= decode_instruction(FETCHED_INSTRUCTION);
//
//Operands fecthing stage, some register will be forwarded to the next stages
OPERAND1 <= (OPERAND1_TYPE==REG) ?
read_register_file(REG1_INDEX, REG2_INDEX)[31:0] :
access_memory(MEM_ADDRESS_0,READ,UNSPECIFIED);
OPERAND2 <= register_file(REG1_INDEX,REG2_INDEX)[63:32];
MEM_ADDRESS_1 <= MEM_ADDRESS_0;
WRITE_TO_MEMORY_1 <= WRITE_TO_MEMORY_0;
DEST_REG_1 <= DEST_REG_0;
OPCODE_1 <= OPCODE_0;
//
//Execution stage
RESULT <= alu(OPCODE_1,OPERAND1,OPERAND2);
MEM_ADDRESS_2 <= MEM_ADDRESS_1;
WRITE_TO_MEMORY_2 <= WRITE_TO_MEMORY_1;
DEST_REG_2 <= DEST_REG_1;
//
//Writeback stage
(WRITE_TO_MEMORY_2) ? access_memory(MEM_ADDRESS_2,WRITE,RESULT) : write_register_file(DEST_REG_2, RESULT);
end
Several considerations can be made on this design. First of all extreme care is needed from the compiler (or the user, if it’s crazy enough to write assembly manually), because there are several hazards, not only as said previously on concurrent read/write memory access, but also on registers. As the result of an instruction is written back to the register file only at the end of the pipeline it’s not legal to access a result before five more instruction has been executed. To guarantee memory access validity it’s enough to avoid issuing a memory write two instruction after a memory load. If no useful instruction can be issued during this forced waiting, a NOP should be inserted.
Real world architecture employs interlocking techniques which takes care of stopping (stalling) the pipeline when an hazard could occur. I’ll describe stalling in more detail in another essay. Anyway if a pipeline stage, such us the ALU for complicated computation (for example integer division), needs more than a cycle to complete it will stall the provious stages of the pipeline, and insert so called bubbles into the later stage. Bubbles are basically safe No Operation input states, the NOP opcode in any architecture is just an explicit pipeline bubble. Even in modern interlocked architecture correct instruction scheduling is extremely useful to gain performance as pipeline bubbles actually waste execution time.
That’s all for now, in the previous post I forgot to mention the ever failing, but very nice Very Long Instruction Word (VLIW) approach. I’m of course going to talk about that to sometime.
assembly, Cpu architectir, CPU architectures, CPU de, CPU design, mips, performance, pipeline, Verilog, x86
Blogroll
- 0x1BADFEED
- 0x1BADFEED
- Antonio Gulli’s coding playground
- Antonio Gulli’s coding playground
- Blog degli Allievi della Scuola Sant'Anna
- Blog degli Allievi della Scuola Sant'Anna
- Forum degli Allievi
- Forum degli Allievi
- Giuseppe Lipari's Scacciamennule
- Giuseppe Lipari's Scacciamennule
- Sant’Anna School of Advanced Studies
- Sant’Anna School of Advanced Studies
Archives
- April 2014
- March 2014
- January 2014
- December 2013
- November 2013
- October 2013
- July 2013
- May 2013
- April 2013
- March 2013
- January 2013
- October 2012
- June 2012
- May 2012
- April 2012
- December 2011
- July 2011
- May 2011
- April 2011
- March 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- March 2010
- February 2010
- January 2010
- October 2009
- September 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009