SlideShare a Scribd company logo
Performance and Flexibility
for Multiple-Processor
SoC Design
Yalagoud A.Patil
OUTLINE
• Introduction
• The limitations of traditional ASIC design
• Extensible processors as an alternative to RTL
• Toward multiple-processor SoCs
• Processors and disruptive technology
• Conclusions
Introduction
• The rapid evolution of silicon technology is bringing a new crisis to
system-on-chip (SoC) design.
• One way to speed up the development of mega-gate SoCs is the use of
multiple microprocessor cores to perform much of the processing currently
relegated to RTL.
• A few characteristics of typical deep-sub-micron integrated circuit (IC)
design illustrate the challenges facing SoC design teams:
 In a generic, 130-nm standard-cell foundry process, silicon density routinely exceeds
100K usable gates per mm2.
 In the past, silicon capacity and design-automation tools limited the practical size of a
block of RTL to smaller than 100-K gates.
 The design complexity of a typical logic block grows much more rapidly than does its
gate count, and system complexity increases much more rapidly than the number of
constituent blocks.
 The cost of a design bug is going up. Much is made of the rising cost of deepsub-micron
IC masks—the cost of a full 130-nm mask set is approaching $1M, and 90-nm masks
may reach $2M.
 All embedded systems now contain significant amounts of software.
 Standard communication protocols are growing rapidly in complexity.
• In most markets, competitive forces drive the ever-increasing need to
embrace new technologies.
• Just one CMOS process step, say from 180 to 130nm roughly doubles the
available number of gates for a given die size and cost.
• The International Technology Roadmap for Semiconductors forecasts a
slight slowing in the pace of density increases, but exponential capacity
increases are expected to continue for at least the next decade, as shown in
Figure.
• The trend toward the use of large
numbers of RTL-based logic blocks
and the mixing together of control
processors and digital signal
processors on the same chip is
illustrated in Figure.
• This ceaseless growth in IC
complexity is a central dilemma for
SoC design.
• Unfortunately, general purpose
processors fall far short of the mark
with respect to application
throughput,cost, and power
efficiency for the most
computationally demanding
problems.
• designing custom RTL logic for
these new, complex functions or
emerging standards takes too long
and produces designs that are too
rigid to change easily.
• A closer look at the makeup of the
typical RTL block in Figure gives
insights into this paradox.
• In most RTL designs, the datapath consumes the vast majority of the gates
in the logic block.
• For example, a packet-processing block will probably employ a datapath
that closely corresponds to the packet header’s structure.
• This state machine may consume only a few percent of the block’s gate
count, but it embodies most of the design and verification risk due to its
complexity.
• One way to understand the risks associated with hardware state machines is
• to examine the combinatorial complexity of verification.
• A state machine with N states and I inputs may have as many as N2 next-
state equations, and each of these equations will be some function of the I
inputs, or 2I possible input combinations. Taken together, at least N2*2I
input combinations must be tried to test all the state transitions of this state
machine exhaustively.
• Configurable, extensible processors—a fundamentally new form of
microprocessor-provide a way of reducing the risk of state-machine design
by replacing hard-to-design, hard-to-verify state-machine logic blocks with
pre-designed, pre-verified processor cores and application firmware.
THE LIMITATIONS OF
TRADITIONAL ASIC DESIGN
• New chips are characterized by rapidly increasing logic complexity.
Moore’s-lawscaling of silicon density makes multi-million-gate designs
feasible.
• New chips are characterized by rapidly increasing logic complexity.
Moore’s-lawscaling of silicon density makes multi-million-gate designs
feasible.
• When requirements change,however, especially when new modes and
features must be added, RTL-level designs may not scale well, particularly
if the original design and verification team is not available to do the
redesign.
• The conventional SoC-design model closely follows the tradition of its
predecessor: combining a standard microprocessor, standard memory, and
RTL-builtlogic into an application-specific instruction set processor
(ASIC).
• Most commonly, the processors used for these board-level designs are
generalpurpose reduced instruction set computing (RISC) processors
originally designed in the 1980s for general-purpose UNIX desktops and
servers.
• When all system components are combined on a single piece of silicon,
clock frequency increases and power dissipation decreases relative to the
equivalent board-level design.
• SoC architectures that are cloned from board-level designs are often
organized around one or two 32-bit busses (often a fast memory bus, plus a
slow peripheral bus) because this approach saves pins—an expensive
commodity in a board-level design but much less relevant to an SoC’s
potential onchip connections.
The Impact of SoC Integration
• Ironically, bus bottlenecks commonly disappear in SoC designs.
• Wide busses are efficient and appropriate to use between adjoining SoC
logic blocks. The communications bandwidth between a processor and
surrounding logic can exceed 1GB per second on an SoC using these wider
busses.
• Although few practical SoC designs will even approach this limit, wide
onchip busses create tremendous architectural headroom and invite a new,
more effective approach to system architecture.
The Limitations of General-Purpose
Processors
• The traditional approach to SoC design is further constrained by the origins
and evolution of microprocessors.
• These processors were designed to serve general-purpose applications and
were structured for implementation as stand-alone integrated circuits.
• The general-purpose nature of these processors makes them well suited to
the extremely diverse mix of applications run on computer systems.
• Even the most silicon-intensive, deeply pipelined, super-scalar, general-
purpose processors can rarely sustain much more than two instructions per
cycle (IPC), and the harder processor designers push against this IPC limit,
the higher the cost and power per unit of useful performance extracted from
the microprocessor architecture.
• A digital camera may perform a variety of complex image processing but it
never executes standard query language (SQL) database queries.
• The specialized nature of individual embedded applications creates two
issues for general-purpose processors in data-intensive embedded
applications.
• First, there is a poor match between the critical functions of many
embedded applications (e.g., image, audio, protocol processing) and a
RISC processor’s basic integer instruction set and register file.
• Second, the more focused embedded devices cannot take full advantage of
all of a general-purpose processor’s broad capabilities.
• Instead, designers have traditionally turned to hard-wired circuits to
perform these data-intensive functions such as image manipulation,
protocol processing, signal compression, encryption, and so on.
DSP as Application-Specific Processor
• DSPs are often used in tandem with RISC controllers on SoCs, especially when
the end application calls for a mix of control and signal processing.
• The emergence of complex very long instruction word (VLIW) DSPs such as
Texas Instruments C6000 family and the StarCore architecture reflect this
“quest for generality.”
• In many cases a programmable DSP would be attractive, but only if it could be
sufficiently fast in the application to rival RTL performance.
• In the past 10 years, the wide availability of logic synthesis and ASIC design
tools has made RTL design the standard for hardware developers.
• Because they are not attempts to solve application-arbitrary sequential
problems, RTL designs avoid the general-purpose, single-processor
performance bottlenecks.
Extensible processors as an alternative
to RTL
• Hardwired RTL design has many attractive characteristics: small area, low
power, and high throughput.
• Application-specific processors as a replacement for complex RTL fit this
need.
• The Origins of Configurable Processors:
• A processor had to be “a jack of all trades, master of none.”
• Research in application-specific instruction processors (ASIPs), especially
in Europe (code generation at IMEC, processor specification at the
University of Dortmund, micro-code engines [“transport-triggered
architectures”] at the Technical University of Delft and fast simulation at
the University of Aachen all confirmed the possibility of developing a fully
automated system for designing processors.
Configurable, Extensible Processors
• Like RTL-based design using logic synthesis, extensible-processor technology
allows the design of high-speed logic blocks tailored to the assigned task.
• All these software-development tools are built for exactly the same architecture
by the processor generator from the same definition used to build the processor
itself.
• By generating the processor from a high-level description, the system designer
controls all the relevant cost, performance, and functional attributes of the
processor subsystem without having to become a microprocessor design expert.
• The four key questions for the use of configurable and extensible processors in
SoCs are these:
1. What target characteristics of the processor can be configured and extended?
2. How does the system designer capture the target characteristics?
3. What are the deliverables—the hardware and software components—to the
system designer?
4. What are typical results for building new platforms to address emerging
communications and consumer applications?
• To be useful for practical SoC development, configuration of the processor
must meet two important criteria:
1. The configuration mechanism must accelerate and simplify the creation of
useful configurations.
2. The generated processor must include complete hardware descriptions
software development tools and verification aids.
• A range of extensible or configurable processors is now widely available.
Configurable products can be roughly categorized into five groups:
• Non-architectural processor configuration
• Fixed menu of processor architecture configurations
• User-modifiable processor RTL
• Processor extension using an instruction-set description language
• Fully automated processor synthesis.
• The logical equivalent of the RTL datapaths are implemented using the integer
pipeline of the base processor and additional execution units, registers, and
other functions added by the chip architect for a specific application.
• This design migration from hardwired state machine to firmware program
control has important implications:
1. Flexibility
2. Software-based development
3. Faster, more complete system modeling
4. Unification of control and data
5. Time-to-market
• Configurable and Extensible Processor Feature
• Extending a Processor
• Exploiting Extensibility
• The Impact of Extensibility on Performance
• Extensibility and Energy Efficiency
Performance and Flexibility for Mmultiple-Processor SoC Design
Performance and Flexibility for Mmultiple-Processor SoC Design

More Related Content

PPT
Hard ip based SoC design
PDF
System On Chip
PDF
SOC Design Challenges and Practices
PPTX
Trends and challenges in IP based SOC design
PPTX
System on Chip (SoC)
PPTX
Soc lect1
PDF
SOC Chip Basics
PDF
OLA Conf 2002 - OLA in SoC Design Environment - paper
Hard ip based SoC design
System On Chip
SOC Design Challenges and Practices
Trends and challenges in IP based SOC design
System on Chip (SoC)
Soc lect1
SOC Chip Basics
OLA Conf 2002 - OLA in SoC Design Environment - paper

What's hot (20)

PPT
System On Chip (SOC)
PPTX
System-on-Chip Programmable Retina
PPTX
soc design for dsp applications
PPTX
What is system on chip (1)
PPTX
Software hardware co-design using xilinx zynq soc
PDF
Soc - Intro, Design Aspects, HLS, TLM
PDF
System on Chip (SoC) for mobile phones
PDF
System On Chip
PPT
PPTX
Public Seminar_Final 18112014
PDF
System-on-Chip Design, Embedded System Design Challenges
PPT
ASIC design Flow (Digital Design)
PPTX
System on chip approach
PDF
Soc architecture and design
PPTX
SoC: System On Chip
PPT
Assic 28th Lecture
PPT
Introduction to fpga synthesis tools
PDF
Vlsi design-styles
PDF
Synthesizing HDL using LeonardoSpectrum
System On Chip (SOC)
System-on-Chip Programmable Retina
soc design for dsp applications
What is system on chip (1)
Software hardware co-design using xilinx zynq soc
Soc - Intro, Design Aspects, HLS, TLM
System on Chip (SoC) for mobile phones
System On Chip
Public Seminar_Final 18112014
System-on-Chip Design, Embedded System Design Challenges
ASIC design Flow (Digital Design)
System on chip approach
Soc architecture and design
SoC: System On Chip
Assic 28th Lecture
Introduction to fpga synthesis tools
Vlsi design-styles
Synthesizing HDL using LeonardoSpectrum
Ad

Similar to Performance and Flexibility for Mmultiple-Processor SoC Design (20)

PPTX
Lect3_ customizable.pptx
PPTX
Lect4_ customizable.pptx
PDF
System on Chip Design and Modelling Dr. David J Greaves
PDF
system on chip book for reading apply the concept.pdf
PDF
PDF
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
DOCX
UNIT 1.docx
PPTX
SYSTEM approach in system on chip architecture
PDF
IMPLEMENTATION OF SOC CORE FOR IOT ENGINE
PPT
VLSI unit 1 Technology - S.ppt
PPT
Syste O CHip Concepts for Students.ppt
PPT
Fmcad08
PDF
Implementation of RISC-Based Architecture for Low power applications
PDF
vlsi fabrication technology scheme20.pdf
PDF
Digital VLSI Design : Introduction
PPT
FPGA_prototyping proccesing with conclusion
PPTX
VLSI and ES Design -An Overview.pptx
DOCX
PPTX
Seminario utovrm
Lect3_ customizable.pptx
Lect4_ customizable.pptx
System on Chip Design and Modelling Dr. David J Greaves
system on chip book for reading apply the concept.pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
UNIT 1.docx
SYSTEM approach in system on chip architecture
IMPLEMENTATION OF SOC CORE FOR IOT ENGINE
VLSI unit 1 Technology - S.ppt
Syste O CHip Concepts for Students.ppt
Fmcad08
Implementation of RISC-Based Architecture for Low power applications
vlsi fabrication technology scheme20.pdf
Digital VLSI Design : Introduction
FPGA_prototyping proccesing with conclusion
VLSI and ES Design -An Overview.pptx
Seminario utovrm
Ad

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Programs and apps: productivity, graphics, security and other tools
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
The AUB Centre for AI in Media Proposal.docx
Dropbox Q2 2025 Financial Results & Investor Presentation

Performance and Flexibility for Mmultiple-Processor SoC Design

  • 1. Performance and Flexibility for Multiple-Processor SoC Design Yalagoud A.Patil
  • 2. OUTLINE • Introduction • The limitations of traditional ASIC design • Extensible processors as an alternative to RTL • Toward multiple-processor SoCs • Processors and disruptive technology • Conclusions
  • 3. Introduction • The rapid evolution of silicon technology is bringing a new crisis to system-on-chip (SoC) design. • One way to speed up the development of mega-gate SoCs is the use of multiple microprocessor cores to perform much of the processing currently relegated to RTL. • A few characteristics of typical deep-sub-micron integrated circuit (IC) design illustrate the challenges facing SoC design teams:  In a generic, 130-nm standard-cell foundry process, silicon density routinely exceeds 100K usable gates per mm2.  In the past, silicon capacity and design-automation tools limited the practical size of a block of RTL to smaller than 100-K gates.
  • 4.  The design complexity of a typical logic block grows much more rapidly than does its gate count, and system complexity increases much more rapidly than the number of constituent blocks.  The cost of a design bug is going up. Much is made of the rising cost of deepsub-micron IC masks—the cost of a full 130-nm mask set is approaching $1M, and 90-nm masks may reach $2M.  All embedded systems now contain significant amounts of software.  Standard communication protocols are growing rapidly in complexity. • In most markets, competitive forces drive the ever-increasing need to embrace new technologies. • Just one CMOS process step, say from 180 to 130nm roughly doubles the available number of gates for a given die size and cost. • The International Technology Roadmap for Semiconductors forecasts a slight slowing in the pace of density increases, but exponential capacity increases are expected to continue for at least the next decade, as shown in Figure.
  • 5. • The trend toward the use of large numbers of RTL-based logic blocks and the mixing together of control processors and digital signal processors on the same chip is illustrated in Figure.
  • 6. • This ceaseless growth in IC complexity is a central dilemma for SoC design. • Unfortunately, general purpose processors fall far short of the mark with respect to application throughput,cost, and power efficiency for the most computationally demanding problems. • designing custom RTL logic for these new, complex functions or emerging standards takes too long and produces designs that are too rigid to change easily. • A closer look at the makeup of the typical RTL block in Figure gives insights into this paradox.
  • 7. • In most RTL designs, the datapath consumes the vast majority of the gates in the logic block. • For example, a packet-processing block will probably employ a datapath that closely corresponds to the packet header’s structure. • This state machine may consume only a few percent of the block’s gate count, but it embodies most of the design and verification risk due to its complexity. • One way to understand the risks associated with hardware state machines is • to examine the combinatorial complexity of verification. • A state machine with N states and I inputs may have as many as N2 next- state equations, and each of these equations will be some function of the I inputs, or 2I possible input combinations. Taken together, at least N2*2I input combinations must be tried to test all the state transitions of this state machine exhaustively. • Configurable, extensible processors—a fundamentally new form of microprocessor-provide a way of reducing the risk of state-machine design by replacing hard-to-design, hard-to-verify state-machine logic blocks with pre-designed, pre-verified processor cores and application firmware.
  • 8. THE LIMITATIONS OF TRADITIONAL ASIC DESIGN • New chips are characterized by rapidly increasing logic complexity. Moore’s-lawscaling of silicon density makes multi-million-gate designs feasible. • New chips are characterized by rapidly increasing logic complexity. Moore’s-lawscaling of silicon density makes multi-million-gate designs feasible. • When requirements change,however, especially when new modes and features must be added, RTL-level designs may not scale well, particularly if the original design and verification team is not available to do the redesign.
  • 9. • The conventional SoC-design model closely follows the tradition of its predecessor: combining a standard microprocessor, standard memory, and RTL-builtlogic into an application-specific instruction set processor (ASIC). • Most commonly, the processors used for these board-level designs are generalpurpose reduced instruction set computing (RISC) processors originally designed in the 1980s for general-purpose UNIX desktops and servers. • When all system components are combined on a single piece of silicon, clock frequency increases and power dissipation decreases relative to the equivalent board-level design. • SoC architectures that are cloned from board-level designs are often organized around one or two 32-bit busses (often a fast memory bus, plus a slow peripheral bus) because this approach saves pins—an expensive commodity in a board-level design but much less relevant to an SoC’s potential onchip connections.
  • 10. The Impact of SoC Integration • Ironically, bus bottlenecks commonly disappear in SoC designs. • Wide busses are efficient and appropriate to use between adjoining SoC logic blocks. The communications bandwidth between a processor and surrounding logic can exceed 1GB per second on an SoC using these wider busses. • Although few practical SoC designs will even approach this limit, wide onchip busses create tremendous architectural headroom and invite a new, more effective approach to system architecture.
  • 11. The Limitations of General-Purpose Processors • The traditional approach to SoC design is further constrained by the origins and evolution of microprocessors. • These processors were designed to serve general-purpose applications and were structured for implementation as stand-alone integrated circuits. • The general-purpose nature of these processors makes them well suited to the extremely diverse mix of applications run on computer systems. • Even the most silicon-intensive, deeply pipelined, super-scalar, general- purpose processors can rarely sustain much more than two instructions per cycle (IPC), and the harder processor designers push against this IPC limit, the higher the cost and power per unit of useful performance extracted from the microprocessor architecture. • A digital camera may perform a variety of complex image processing but it never executes standard query language (SQL) database queries.
  • 12. • The specialized nature of individual embedded applications creates two issues for general-purpose processors in data-intensive embedded applications. • First, there is a poor match between the critical functions of many embedded applications (e.g., image, audio, protocol processing) and a RISC processor’s basic integer instruction set and register file. • Second, the more focused embedded devices cannot take full advantage of all of a general-purpose processor’s broad capabilities. • Instead, designers have traditionally turned to hard-wired circuits to perform these data-intensive functions such as image manipulation, protocol processing, signal compression, encryption, and so on.
  • 13. DSP as Application-Specific Processor • DSPs are often used in tandem with RISC controllers on SoCs, especially when the end application calls for a mix of control and signal processing. • The emergence of complex very long instruction word (VLIW) DSPs such as Texas Instruments C6000 family and the StarCore architecture reflect this “quest for generality.” • In many cases a programmable DSP would be attractive, but only if it could be sufficiently fast in the application to rival RTL performance. • In the past 10 years, the wide availability of logic synthesis and ASIC design tools has made RTL design the standard for hardware developers. • Because they are not attempts to solve application-arbitrary sequential problems, RTL designs avoid the general-purpose, single-processor performance bottlenecks.
  • 14. Extensible processors as an alternative to RTL • Hardwired RTL design has many attractive characteristics: small area, low power, and high throughput. • Application-specific processors as a replacement for complex RTL fit this need. • The Origins of Configurable Processors: • A processor had to be “a jack of all trades, master of none.” • Research in application-specific instruction processors (ASIPs), especially in Europe (code generation at IMEC, processor specification at the University of Dortmund, micro-code engines [“transport-triggered architectures”] at the Technical University of Delft and fast simulation at the University of Aachen all confirmed the possibility of developing a fully automated system for designing processors.
  • 15. Configurable, Extensible Processors • Like RTL-based design using logic synthesis, extensible-processor technology allows the design of high-speed logic blocks tailored to the assigned task. • All these software-development tools are built for exactly the same architecture by the processor generator from the same definition used to build the processor itself. • By generating the processor from a high-level description, the system designer controls all the relevant cost, performance, and functional attributes of the processor subsystem without having to become a microprocessor design expert. • The four key questions for the use of configurable and extensible processors in SoCs are these: 1. What target characteristics of the processor can be configured and extended? 2. How does the system designer capture the target characteristics? 3. What are the deliverables—the hardware and software components—to the system designer? 4. What are typical results for building new platforms to address emerging communications and consumer applications?
  • 16. • To be useful for practical SoC development, configuration of the processor must meet two important criteria: 1. The configuration mechanism must accelerate and simplify the creation of useful configurations. 2. The generated processor must include complete hardware descriptions software development tools and verification aids. • A range of extensible or configurable processors is now widely available. Configurable products can be roughly categorized into five groups: • Non-architectural processor configuration • Fixed menu of processor architecture configurations • User-modifiable processor RTL • Processor extension using an instruction-set description language • Fully automated processor synthesis.
  • 17. • The logical equivalent of the RTL datapaths are implemented using the integer pipeline of the base processor and additional execution units, registers, and other functions added by the chip architect for a specific application. • This design migration from hardwired state machine to firmware program control has important implications: 1. Flexibility 2. Software-based development 3. Faster, more complete system modeling 4. Unification of control and data 5. Time-to-market • Configurable and Extensible Processor Feature • Extending a Processor • Exploiting Extensibility • The Impact of Extensibility on Performance • Extensibility and Energy Efficiency