SlideShare a Scribd company logo
NIOS II Processor.ppt
Outline
 What is a “Soft” Processor
 What is the NIOS II?
 Architecture for NIOS II, what are the
implications
• TigerSHARC VS. NIOS II
• Pipeline Issues
• Issues related to FIR
 Hardware acceleration, using FPGA
logic
What’s is a “Soft”
Processor?
 Processor implemented in VHDL, Verilog,
etc., and downloaded onto FPGA hardware
 Can implement many parallel processors
on one FPGA
 Can use addition FPGA resources on the
same chip that is not part of the processor
core.
 NIOS II is a “Soft” Processor
Why “Soft” Processor?
 Higher level of design reuse
 Reduced obsolescence risk
 Simplified design update or change
 Increased design implementation
options
 Lower latency between processor and
FPGA components
What is NIOS II?
 Software-defined processor
 The processor core is loaded onto
FPGA
 Programmed using ‘normal’
programming tools (C, asm), not
hardware description languages
 Can use the rest of the FPGA hardware
for accelerating parts of the code
How Is NIOS II
Implemented
 The custom FPGA logic that interacts
with the processor is implemented in
Altera Quartus II
 The Avalon Interface bus (common
instruction/data bus) is implemented in
Quartus II
 The architecture is generated in Quartus
II and used for programming in Eclipse
IDE
NIOS II Processor.ppt
NIOS II IDE
 Coding is implemented in Eclipse rather than
VisualDSP.
The Different NIOS II Cores
 There are 3 cores available from Altera
 NIOSII/e: Economical Core
 NIOSII/s: Standard Core
 NIOSII/f: Fast Core
What’s the Difference between
the Cores?
An LE is equivalent to a 8-1 NAND gate + 1 D-Flip Flop
An ALM is equivalent to 2 LE’s
Comparison of TigerSHARC and
NIOS II architecture
TigerSHARC Architecture
NIOS II Architecture
-thirty two 32-bit general registers, six 32-bit control registers
-variable cache based on how much FPGA space you have
-ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is
not separate like TigerSHARC
Avalon Interface
-separate address, data and control lines
-up to 1024-bit data width transfer, can be set to any width (not power of 2)
-one transfer per clock cycle.
NIOS II/f pipeline
 Six stages
 One instruction can be dispatched and/or
retired pre cycle
 Dynamic branch prediction: 2-bit branch
history table (no BTB like in TigerSHARC)
NIOS II/f pipeline
The pipeline stalls for:
• Multi-cycle instructions
• Cache misses
• Data dependencies (2 cycles between
calculating and using result)
Mispredicted branch penalty: 3 cycles
NIOS II Processor.ppt
Hardware multiply
 Can use different options for multiplier
(at the processor design stage)
 No h/w multiply (saves FPGA gates)
○ Speed depends on algorithm
 Use embedded multipliers (if FPGA has
those)
○ 1-5 cycles (depends on FPGA)
 Implement multipliers on FPGA gates
○ 11 cycles
 Division 4-66 cycles on hardware
Compare to TigerSHARC
 No support for parallel instructions
 No support for SIMD operations
 Multicycle instructions stall the pipeline
All the above limitations can be overcome
by using FPGA space unoccupied by the
processor itself
Comparison of NIOS II and
TigerSHARC on an FIR Algorithm
Integer FIR algorithm
int coeff[]={1, 2, 3, 4, 5, 6, 7, 8};
int data1[] = {1, 0, 0, 0, 0 ,0 ,0 ,0};
int output[8];
int i=0, j=0, k=0;
for(k=0; k<8; k++) output[k] =0;
for( j =0; j< 8; j++)
{
for( i= 0; i< 8; i++)
{
output[j] += data1[i]*coeff[7-i];
}
}
Speed analysis
0 movi r4,8 i = 8
1 Loop: ldw r2,0(r6) load data
2 ldw r3,0(r7) load coefficient
3 addi r4,r4,-1 i--
4 addi r6,r6,4 coeffPt++
5 mul r2,r2,r3 data = data * coeff
6 addi r7,r7,-4 dataPt--
7 stall data stall – waiting for multiplication
result
8 add r5,r5,r2 output += data
9 bne
r4,zero,0x10002a0
will mispredict 2 times in the
beginning, and 1 time in the end of
the loop (waste 3 cycles each time)
Speed analysis
 9 cycles per iteration except the first two
(branch predicted not taken) and the last
(branch predicted taken) – those will be
9+3=12 cycles
 1 data stall – can remove by moving
instruction from line 4 to 7
 Speed: 8 cycles * (N-3) + 11 cycles * 3 =
8*(N-3)+33 cycles
 For 1024-tap FIR: 8201 cycles
 Clock cycle is 3 times longer (200MHz vs
600MHz)
Speed comparison
• 8201 NIOS II cycles equivalent to 24603
TigerSHARC cycles
• Lab3 timing:
– 56000 cycles Debug mode
– 13000 unoptimized ASM
– 4000 Optimized ASM
Worse than unoptimized assembly, but no
hardware acceleration used, so this is not
that bad
Hardware Acceleration
 Profiling tool in Eclipse can show how
long each function takes
 If function takes too long, it can be sped
up by
 Custom instructions
 Hardware Acceleration
 Hardware Acceleration is to take the
function and transform it into FPGA
circuitry
Hardware Acceleration
 Can be done using C2H compiler from Altera
 Trades off Logic Size for Speed up.
Table 1. User Application Results Example
Algorithm Speed Increase
(vs. Nios II CPU)
System fMAX
(Mhz)
System Resource
Increase (1)
Autocorrelation 41.0x 115 124%
Bit Allocation 42.3x 110 152%
Convolution Encoder 13.3x 95 133%
Fast Fourier Transform
(FFT)
15.0x 85 208%
High Pass Filter 42.9x 110 181%
Matrix Rotate 73.6x 95 106%
RGB to CMYK 41.5x 120 84%
RGB to YIQ 39.9x 110 158%
Conclusion
 “Soft” Processors such as the NIOSII
offers another alternative in the
embedded system scene.
 The NIOSII offers the advantage of
added configurability, and customization
that blur the line between FPGAs and
DSPs
References
[1] http://www.fpgajournal.com/articles/behere.htm
Describes an FPGA-DSP project based on Altera Nios
[2] http://www.altera.com/products/ip/processors/nios2/ni2-index.html
Official Nios II page
[3] http://www.hunteng.co.uk/dsp-fpga.htm
DSP or FPGA? What is better when?
[4] http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdf
Article from Xilinx about FPGA DSPs
[5] http://www.niosforum.com
Community forum for NIOS
[6] http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf
NIOSII Processor Handbook –Altera Corporation
[7] http://www.altera.com/literature/manual/mnl_avalon_spec.pdf
Avalon Memory-Mapped Interface Specifications – Altera Corporation
[8] http://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.html
ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded
DRAM

More Related Content

PPTX
Overview of Nios II Embedded Processor
PDF
UVM Methodology Tutorial
PPT
PPTX
Fpga architectures and applications
PPTX
Divide by N clock
PPTX
PDF
UVM TUTORIAL;
PPTX
Array multiplier
Overview of Nios II Embedded Processor
UVM Methodology Tutorial
Fpga architectures and applications
Divide by N clock
UVM TUTORIAL;
Array multiplier

What's hot (20)

PPTX
Vlsi Synthesis
PDF
Automatic Test Pattern Generation (Testing of VLSI Design)
PPTX
ARM Processor
PDF
Session 9 advance_verification_features
PPTX
Spyglass dft
PDF
FPGA Hardware Accelerator for Machine Learning
PPTX
PPTX
EC6601 VLSI Design Memory Circuits
PDF
Logic Synthesis
PPTX
Pipelining powerpoint presentation
PDF
Verilog Tasks & Functions
PPTX
Verilog
PPTX
Clock divider by 3
DOCX
UNIT-II CPLD & FPGA Architectures and Applications
PDF
14 static timing_analysis_5_clock_domain_crossing
PDF
2019 2 testing and verification of vlsi design_verification
PDF
Actel fpga
PPT
ASIC design Flow (Digital Design)
PPT
Bidirectional Bus Modelling
Vlsi Synthesis
Automatic Test Pattern Generation (Testing of VLSI Design)
ARM Processor
Session 9 advance_verification_features
Spyglass dft
FPGA Hardware Accelerator for Machine Learning
EC6601 VLSI Design Memory Circuits
Logic Synthesis
Pipelining powerpoint presentation
Verilog Tasks & Functions
Verilog
Clock divider by 3
UNIT-II CPLD & FPGA Architectures and Applications
14 static timing_analysis_5_clock_domain_crossing
2019 2 testing and verification of vlsi design_verification
Actel fpga
ASIC design Flow (Digital Design)
Bidirectional Bus Modelling
Ad

Similar to NIOS II Processor.ppt (20)

DOCX
Fpg as 11 body
PPT
The Microarchitecure Of FPGA Based Soft Processor
PDF
Nt1310 Unit 5 Algorithm
PDF
Introduction to FPGA, VHDL
PDF
Using a Field Programmable Gate Array to Accelerate Application Performance
PPT
Synopsys User Group Presentation
PDF
H344250
PPT
Introduction to Blackfin BF532 DSP
PPTX
RTF
4_BIT_ALU
PPTX
FPGA Overview
PDF
The basic graphics architecture for all modern PCs and game consoles is similar
PPT
Cyclone II FPGA Overview
PPT
emips_overview_apr08
PPTX
Introduction to FPGA acceleration
PDF
FPGAs for Supercomputing: The Why and How
PPTX
Introduction to EDA Tools
PPTX
Dpdk applications
PDF
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
PPT
FPGA and ASIC technologies comparison.ppt
Fpg as 11 body
The Microarchitecure Of FPGA Based Soft Processor
Nt1310 Unit 5 Algorithm
Introduction to FPGA, VHDL
Using a Field Programmable Gate Array to Accelerate Application Performance
Synopsys User Group Presentation
H344250
Introduction to Blackfin BF532 DSP
4_BIT_ALU
FPGA Overview
The basic graphics architecture for all modern PCs and game consoles is similar
Cyclone II FPGA Overview
emips_overview_apr08
Introduction to FPGA acceleration
FPGAs for Supercomputing: The Why and How
Introduction to EDA Tools
Dpdk applications
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
FPGA and ASIC technologies comparison.ppt
Ad

Recently uploaded (20)

PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Pre independence Education in Inndia.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Cell Structure & Organelles in detailed.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
master seminar digital applications in india
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
O7-L3 Supply Chain Operations - ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Supply Chain Operations Speaking Notes -ICLT Program
2.FourierTransform-ShortQuestionswithAnswers.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial disease of the cardiovascular and lymphatic systems
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Pre independence Education in Inndia.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Cell Structure & Organelles in detailed.
Final Presentation General Medicine 03-08-2024.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Cell Types and Its function , kingdom of life
master seminar digital applications in india
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
102 student loan defaulters named and shamed – Is someone you know on the list?

NIOS II Processor.ppt

  • 2. Outline  What is a “Soft” Processor  What is the NIOS II?  Architecture for NIOS II, what are the implications • TigerSHARC VS. NIOS II • Pipeline Issues • Issues related to FIR  Hardware acceleration, using FPGA logic
  • 3. What’s is a “Soft” Processor?  Processor implemented in VHDL, Verilog, etc., and downloaded onto FPGA hardware  Can implement many parallel processors on one FPGA  Can use addition FPGA resources on the same chip that is not part of the processor core.  NIOS II is a “Soft” Processor
  • 4. Why “Soft” Processor?  Higher level of design reuse  Reduced obsolescence risk  Simplified design update or change  Increased design implementation options  Lower latency between processor and FPGA components
  • 5. What is NIOS II?  Software-defined processor  The processor core is loaded onto FPGA  Programmed using ‘normal’ programming tools (C, asm), not hardware description languages  Can use the rest of the FPGA hardware for accelerating parts of the code
  • 6. How Is NIOS II Implemented  The custom FPGA logic that interacts with the processor is implemented in Altera Quartus II  The Avalon Interface bus (common instruction/data bus) is implemented in Quartus II  The architecture is generated in Quartus II and used for programming in Eclipse IDE
  • 8. NIOS II IDE  Coding is implemented in Eclipse rather than VisualDSP.
  • 9. The Different NIOS II Cores  There are 3 cores available from Altera  NIOSII/e: Economical Core  NIOSII/s: Standard Core  NIOSII/f: Fast Core
  • 10. What’s the Difference between the Cores? An LE is equivalent to a 8-1 NAND gate + 1 D-Flip Flop An ALM is equivalent to 2 LE’s
  • 11. Comparison of TigerSHARC and NIOS II architecture
  • 13. NIOS II Architecture -thirty two 32-bit general registers, six 32-bit control registers -variable cache based on how much FPGA space you have -ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is not separate like TigerSHARC
  • 14. Avalon Interface -separate address, data and control lines -up to 1024-bit data width transfer, can be set to any width (not power of 2) -one transfer per clock cycle.
  • 15. NIOS II/f pipeline  Six stages  One instruction can be dispatched and/or retired pre cycle  Dynamic branch prediction: 2-bit branch history table (no BTB like in TigerSHARC)
  • 16. NIOS II/f pipeline The pipeline stalls for: • Multi-cycle instructions • Cache misses • Data dependencies (2 cycles between calculating and using result) Mispredicted branch penalty: 3 cycles
  • 18. Hardware multiply  Can use different options for multiplier (at the processor design stage)  No h/w multiply (saves FPGA gates) ○ Speed depends on algorithm  Use embedded multipliers (if FPGA has those) ○ 1-5 cycles (depends on FPGA)  Implement multipliers on FPGA gates ○ 11 cycles  Division 4-66 cycles on hardware
  • 19. Compare to TigerSHARC  No support for parallel instructions  No support for SIMD operations  Multicycle instructions stall the pipeline All the above limitations can be overcome by using FPGA space unoccupied by the processor itself
  • 20. Comparison of NIOS II and TigerSHARC on an FIR Algorithm
  • 21. Integer FIR algorithm int coeff[]={1, 2, 3, 4, 5, 6, 7, 8}; int data1[] = {1, 0, 0, 0, 0 ,0 ,0 ,0}; int output[8]; int i=0, j=0, k=0; for(k=0; k<8; k++) output[k] =0; for( j =0; j< 8; j++) { for( i= 0; i< 8; i++) { output[j] += data1[i]*coeff[7-i]; } }
  • 22. Speed analysis 0 movi r4,8 i = 8 1 Loop: ldw r2,0(r6) load data 2 ldw r3,0(r7) load coefficient 3 addi r4,r4,-1 i-- 4 addi r6,r6,4 coeffPt++ 5 mul r2,r2,r3 data = data * coeff 6 addi r7,r7,-4 dataPt-- 7 stall data stall – waiting for multiplication result 8 add r5,r5,r2 output += data 9 bne r4,zero,0x10002a0 will mispredict 2 times in the beginning, and 1 time in the end of the loop (waste 3 cycles each time)
  • 23. Speed analysis  9 cycles per iteration except the first two (branch predicted not taken) and the last (branch predicted taken) – those will be 9+3=12 cycles  1 data stall – can remove by moving instruction from line 4 to 7  Speed: 8 cycles * (N-3) + 11 cycles * 3 = 8*(N-3)+33 cycles  For 1024-tap FIR: 8201 cycles  Clock cycle is 3 times longer (200MHz vs 600MHz)
  • 24. Speed comparison • 8201 NIOS II cycles equivalent to 24603 TigerSHARC cycles • Lab3 timing: – 56000 cycles Debug mode – 13000 unoptimized ASM – 4000 Optimized ASM Worse than unoptimized assembly, but no hardware acceleration used, so this is not that bad
  • 25. Hardware Acceleration  Profiling tool in Eclipse can show how long each function takes  If function takes too long, it can be sped up by  Custom instructions  Hardware Acceleration  Hardware Acceleration is to take the function and transform it into FPGA circuitry
  • 26. Hardware Acceleration  Can be done using C2H compiler from Altera  Trades off Logic Size for Speed up. Table 1. User Application Results Example Algorithm Speed Increase (vs. Nios II CPU) System fMAX (Mhz) System Resource Increase (1) Autocorrelation 41.0x 115 124% Bit Allocation 42.3x 110 152% Convolution Encoder 13.3x 95 133% Fast Fourier Transform (FFT) 15.0x 85 208% High Pass Filter 42.9x 110 181% Matrix Rotate 73.6x 95 106% RGB to CMYK 41.5x 120 84% RGB to YIQ 39.9x 110 158%
  • 27. Conclusion  “Soft” Processors such as the NIOSII offers another alternative in the embedded system scene.  The NIOSII offers the advantage of added configurability, and customization that blur the line between FPGAs and DSPs
  • 28. References [1] http://www.fpgajournal.com/articles/behere.htm Describes an FPGA-DSP project based on Altera Nios [2] http://www.altera.com/products/ip/processors/nios2/ni2-index.html Official Nios II page [3] http://www.hunteng.co.uk/dsp-fpga.htm DSP or FPGA? What is better when? [4] http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdf Article from Xilinx about FPGA DSPs [5] http://www.niosforum.com Community forum for NIOS [6] http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf NIOSII Processor Handbook –Altera Corporation [7] http://www.altera.com/literature/manual/mnl_avalon_spec.pdf Avalon Memory-Mapped Interface Specifications – Altera Corporation [8] http://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.html ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded DRAM

Editor's Notes

  • #2: Intro: Traditionally we have a dsp, and it interacts with other modules, usual other asics. Then we have SOCs, integrate other logics to improve latency. Now we have FPGAs, added reconfiguration. Well, we want to integrate that too. SOPCs: system on a programmable chip. This is what the NIOS II is suppose to do. What happens when we want to integrate a dsp on an sopc system. (we have a thing called a hard processor)
  • #3: Yay outline! Basically, the concept, how it looks like in software
  • #4: Similar to how a verilog wire circuit can be put on a fpga to allow for high configurability, a soft processor is a processor implemented on a fpga. This is different than a hard processor, which is a processor implemented in hardware. Soft processor is a logical schematic (software) that can be loaded onto any fpga. So a soft processor isn’t really a processor, but just a schematic (or code like software). This gives it all the advantages of software such as giving updates and improving the development cycle. Well, why do you want to do this? Isn’t an fpga slower clocked, high power consumption…
  • #5: No, not more power hungry because it can be better customized for the application, slower clocked doesn’t mean slower, it means more has to be done in a cycle, and an fpga allows the developer to customize it to make instructions finish in one cycle. Plus you get all the other advantages.
  • #6: It is a special schematic designed by altera that interacts very well with other altera IP mega blocks.
  • #7: Well, if the processor is in software, how do you write programs for it? So are you basically writing software for software? Doesn’t this seem somewhat redundant? Yes, exactly, it does seem a bit redundant. But it is the current model of soft processor right now, perhaps there will be a better programming environment for it later. What you need to do is write the processor (bus and fpga logic) in software first using quartus, make an emulation file, and use that to write your dsp program in ecilipse. (there is no hardware optimizer, like an assembler optimizer)
  • #8: Here is what it looks like for quartus. You need to define the schematic. At the top you have your clock source. The middle is your avalon interface, and the bottom is your FPGA logic.
  • #9: Here is your NIOS II IDE environment. Now you take your emulated file and program for it like VDSP. So if the processor is in software, does that mean you can do simulation analysis, and not hardware like in the labs? No… you can run the generated processor on an FPGA and have this connect to the FPGA when it runs.
  • #10: So exactly, what does altera give you as the basic architecture for you to customize? 3 cores of different features. Here are the specs…
  • #11: Notice it is very similar to a MIPS processor we learned in other classes.
  • #13: Print off sheet to list the architecture features
  • #14: Print sheet to list of architecture All the ports on the right actually share one bus, the avalon archtecture.
  • #15: -separate address, data and control lines. No need to decode data for address. -up to 1024-bit data width transfer, can be set to any width (not power of 2) -synchronous operation -dynamic bus sizing: this means no design consideration when address items that have different bus widths. -one transfer per clock cycle. -The Avalon Interface is basically an interface that creates a common interface from different interfaces of the all the memory and peripheral components of the system. Are there bus issues because it’s one common interface? No… it’s a special inteface. With dedicated memory ports.
  • #28: Cost Vs. Performance: niosII package $495for a year + $150 for cyclone II fpga, C2H is $3000/computer TigerSharc VDSP is $3500/computer + $750 for evaluation board tigerSHARC