SlideShare a Scribd company logo
2/11/2020 Priorities Shift In IC Design
https://semiengineering.com/higher-performance-plus-low-power/ 1/12
(/)
MENU 
Select Language ▼
259
Shares
LOW POWER-HIGH PERFORMANCE (/CATEGORY-MAIN-PAGE-LPHP/)
Priorities Shift In IC Design
AI, edge applications are driving design teams to nd new ways to achieve the best performance per watt.
The rush to the edge and new applications around AI are causing a shift in design strategies toward the highest
performance per watt, rather than the highest performance or lowest power.
This may sound like hair-splitting, but it has set a scramble in motion around how to process more data more quickly
without just relying on faster processors and accelerators. Several factors are driving these changes, including the
slowdown in Moore’s Law (https://semiengineering.com/knowledge_centers/standards-laws/laws/moores-law/),
which limits the number of traditional options, the rollout of AI
(https://semiengineering.com/knowledge_centers/arti cial-intelligence/) everywhere, and a surge in data from more
sensors, cameras and images with higher resolutions. In addition, more data is being run though convolutional
neural networks (https://semiengineering.com/knowledge_centers/arti cial-intelligence/neural-
networks/convolutional-neural-network/) or deep learning
(https://semiengineering.com/knowledge_centers/arti cial-intelligence/deep-learning/) inferencing systems, which
bring huge data processing loads.
“As semiconductor scaling slows, but processing demands increase, designers are going to need to start working
harder for those performance and e ciency gains,” said Russell Klein, HLS platform director at Mentor, a Siemens
Business (https://semiengineering.com/entities/mentor-a-siemens-business/). “When optimizing any system, you
need to focus on the biggest ine ciencies rst. For data processing on embedded systems, that will usually be
software.”
When Moore’s Law was in its prime, processor designers had so many gates they didn’t know what to do with them
all, Klein said. “One answer was to plop down more cores, but programmers were reluctant to adopt multi-core
programming paradigms. Another answer was to make the processor go as fast as possible without regard to area. A
feature that would add 10% to the speed of a processor was considered a win, even if it doubled the size of that
processor. Over time, high-end processors picked up a lot of bloat, but no one really noticed or cared. The
50 19
JANUARY 16TH, 2020 - BY: ANN STEFFORA MUTSCHLER (HTTPS://SEMIENGINEERING.COM/AUTHOR/ANN/)

2/11/2020 Priorities Shift In IC Design
https://semiengineering.com/higher-performance-plus-low-power/ 2/12
processors were being stamped out on increasingly e cient and dense silicon. MIPS was the only metric that
mattered, but if you start to care about system level e ciency, that bloated processor, and especially the software
running on it, might warrant some scrutiny.”
Software has a lot of very desirable characteristics, Klein pointed out, but even well-written software is neither fast
nor e cient when compared to the same function implemented in hardware. “Moving algorithms from software on
the processor into hardware can improve both performance and power consumption because software alone is not
going to deliver the performance needed to meet the demands of inferencing, high resolution video processing, or
5G.”
The need for speed
At the same time, tra c data speeds are increasing, and there are new demands on high speed interfaces to access
that data. “High-speed interfaces and SerDes (https://semiengineering.com/knowledge_centers/communications-
io/o -chip-communications/i-o-enabling-technology/serializer-deserializer-serdes/) are an integral part of the
networking chain, and these speed increases are required to support the latest technology demands of arti cial
intelligence (AI), Internet of Things (IoT), virtual reality (VR) and many more technologies that have yet to be
envisioned,” noted Suresh Andani, senior director of IP cores at Rambus
(https://semiengineering.com/entities/rambus-inc/).
Best design practices for high-performance devices include de ning and analyzing the solution space through
accurate full-system modeling; utilizing system design and concurrent engineering to maximize rst-time right
silicon; ensuring tight correlation between models and silicon results; leveraging a system-aware design
methodology; and including built-in test features to support bring-up, characterization and debug, he said.
There are many ways to improve performance per watt, and not just in hardware or software. Kunle Olukotun,
Cadence Design Systems Professor of electrical engineering and computer science at Stanford University, said that
relaxing precision, synchronization and cache coherence can reduce the amount of data that needs to be sent back
and forth. That can be reduced even further by domain-speci c languages, which do not require translation.
“You can have restricted expressiveness for a particular domain,” said Olukotun in a recent presentation. “You also
can utilize parallel patterns and put functional data into parallel patterns based on representation. And you can
optimize for locality and exploit parallelism.”
He noted that exible mapping of data is much more e cient. That can take advantage of data parallelism, model
parallelism, and dynamic precision as needed. In addition, the data ow can be made hierarchical using a wider
interface between the algorithms and the hardware, allowing for parallel patterns, explicit memory hierarchies,
hierarchical control and explicit parameters, all of which are very useful in boosting performance per watt in
extremely performance-centric applications.
Flexibility in designs has been one of the tradeo s in optimizing performance per watt, and many of the new AI chips
under development have been struggling to combine optimally tuned hardware and software into designs while still
leaving enough room for ongoing changes in algorithms and di erent compute tasks.
“You may spend 6 to 9 months mapping how to cut up work, and that provides a big impediment to embracing new
markets quickly,” said Stuart Biles, a fellow and director of research architecture at Arm
(https://semiengineering.com/entities/arm/) Research. “For large OSes, there is a set of functionality in the system
where a particular domain is likely to execute on a general-purpose core. But you can add in exibility for how you
partition that and make the loop quicker. That basically comes down to how well you use an SoC’s
(https://semiengineering.com/knowledge_centers/integrated-circuit/ic-types/system-on-chip/) resources.”
2/11/2020 Priorities Shift In IC Design
https://semiengineering.com/higher-performance-plus-low-power/ 3/12
Biles noted that once a common subset is identi ed, then certain functions can be specialized with an eFPGA
(https://semiengineering.com/knowledge_centers/integrated-circuit/ic-types/fpga/embedded-fpga-efpga/) or using
3D integration. We’ve moved from the initial 3D integration to the microarchitecture, where you can cut out cycles
and branch prediction. What’ you’re looking at is the time it takes to get from load/store to processor versus doing
that vertically, and you can change the microarchitectural assumptions based up speci c assumptions in 3D. That
results in di erent delays.”
A di erent take on the same problem is to limit the amount of data that needs to be processed in the rst place. This
is particularly important in edge systems such as cars, where performance per watt is critical due to limited battery
power and the need for real-time results. One way to change that equation is to sharply limit the amount of data
being sent to centralized processing systems in the vehicle by pre-screening it at the sensor level. So while not
actually speeding up the processing per watt, it achieves faster results using less power.
“You can provide a reasonable amount of compute power at the sensor, and you can reduce the amount of data that
the sensor identi es through pre-selection,” said Benjamin Prautsch, group manager for advanced mixed-signal
automation at Fraunhofer IIS’ (https://semiengineering.com/entities/fraunhofer-iis-eas/) Engineering of Adaptive
Systems Division. “So if you’re looking at what is happening in a room, the rst layer can identify if there are people in
there. The same can be used on a manufacturing line. You also can run DNN calculations in a parallel way to be more
e cient.”
Further, AI chips, like many high performance devices, have a tendency to develop hotspots, noted Richard
McPartland, technical marketing manager at Moortec (https://semiengineering.com/entities/moortec-semiconductor-
ltd/). “AI chips are designed to tackle immense processing tasks for training and inference,” he said. “They are
typically very large in silicon area, with hundreds or even thousands of cores on advanced nFET
(https://semiengineering.com/knowledge_centers/integrated-circuit/transistors/3d/ nfet-3/) processes consuming
high current – 100 amperes or more at supply voltages below 1 volt. With AI chip power consumptions at a minimum
in the tens of watts, but often well over 100 watts, it should be no surprise that best design practices includes in-chip
temperature monitoring. And it’s not just one sensor, but typically tens of temperature sensors distributed
throughout the clusters of processors and other blocks. In-chip monitoring should be considered early in the design
ow and included up front in oor planning, and not added as an afterthought. At a minimum, temperature
monitoring can provide protection from thermal runaway. But accurate temperature monitoring also supports
maximizing data throughput by minimizing throttling of the compute elements.”
In-chip voltage monitoring with multiple sense points is also recommended for high-performance devices such as AI
chips, he continued. “Again, this should be included early in the design ow to monitor the supply voltages at critical
circuits, such as the processor clusters, as well as supply drops between the supply pins and the circuit blocks.
Voltage droops occur when the AI chips start operating under load, and being software-driven, this can be di cult to
predict in the chip design phase with the software written later by another team. Including voltage sense points gives
visibility about what is going on with the internal chip supplies, and is invaluable in the chip bring-up phase, as well as
for reducing power consumption through minimizing guard bands.”
Process detectors are also a must-have on high-performance devices such as AI chips, McPartland said. “These
enable a quick and independent veri cation of process performance and variation, not just die-to-die but across
large individual die on advanced nodes. Further, they can be used for power optimization
(https://semiengineering.com/power-optimization-strategies-widen/), such as to reduce power consumption
(https://semiengineering.com/knowledge_centers/low-power/low-power-design/power-consumption/) through
2/11/2020 Priorities Shift In IC Design
https://semiengineering.com/higher-performance-plus-low-power/ 4/12
voltage scaling schemes where the voltage guard bands are minimized on a per-die basis based on process speed.
Lower power equates to higher processing performance in the AI world, where processing power is often
constrained by thermal and power issues.
AI algorithm performance challenges
An important consideration of AI and other high-performance devices is the fact that actual performance is not
known until the end application is run. This raises questions for many AI processor startups that insist they can build
a better hardware accelerator for matrix math and other AI algorithms than the next guy.
“That’s their key di erentiation,” said Ron Lowman, strategic marketing manager for IoT at Synopsys
(https://semiengineering.com/entities/synopsys-inc/). “Some of those companies may be in their second or third
designs, whereas the bigger players are in their third or fourth designs, and they’re learning something every time.
The math is changing on them just as rapidly as they can get a chip out, which is helping the situation, but it’s a game
for who can get the highest performance in the data center. That’s now moving down to edge computing
(https://semiengineering.com/knowledge_centers/compute-architectures/edge-computing/). Those AI accelerators
are being built on local and on-premise servers now, and they want to nd their niche in performance per watt and
for speci c applications. But in that space, they still have to accommodate many di erent types of AI functions, be it
for voice or audio or database extraction or vision. That’s a lot of di erent things. Then there’s the guys building the
applications, like for ADAS (https://semiengineering.com/knowledge_centers/automotive/adas-advanced-driver-
assistance-systems/). That’s a very speci c use case, and they can be more speci c to what they’re building, so they
know exactly the model they may want, although that too changes pretty rapidly.”
If the design team has a better handle on the end application and the intended use cases, they can look at each
di erent speci c space, whether it’s for mobile or edge computing, or for automotive. “You can see that the TOPS,
just the pure performance, has grown orders of magnitude over the last couple of years,” Lowman said. “The initial
mobile devices that were going to handle AI had under a TOPS (tera operations per second). Now you’re seeing up to
16 TOPS in those mobile devices. That’s how they start, by saying, ‘This is the general direction because we have to
handle many di erent types of AI functions in the mobile phone.’ You look at ADAS, and those guys were even ahead
of the mobile phones. Now you’re seeing up 35 TOPS for a single instantiation for ADAS, and that continues to grow.
In edge computing, they’re basically scaling down the data center devices to be more power-e cient, and those
applications can range between 50 to hundreds of TOPS. That’s where you start.”
However, a rst-generation AI architecture often is very ine cient for what they want to accomplish because they’re
trying to do too much. If the actual application could be run, the architecture could be tuned signi cantly, because it’s
not just a processor or the ability to just do the MAC. It’s a function of accessing the coe cients from memory, then
processing them very e ectively. It’s also not just adding a bunch of on-chip SRAM
(https://semiengineering.com/knowledge_centers/memory/volatile-memory/static-random-access-memory/) that
solves the problem. Modeling the IP, such as DDR instantiations, and di erent bitwidths with di erent access
capabilities, di erent types of DRAM (https://semiengineering.com/knowledge_centers/memory/volatile-
memory/dynamic-random-access-memory/) con gurations, or LPDDR versus DDR, optimal ways can be found before
the system development is complete using prototyping tools and systems explorations tools.
“If the development team has the real algorithm, it’s much more e ective,” Lowman said. “A lot of people use ResNet-
50 as a benchmark because that’s better than TOPS. But people are well beyond that. You see voice applications for
natural language understanding. ResNet 50 has maybe a few million coe cients, but some of these are in the billions
of coe cients now, so it’s not even representative. And the more representative you can get of the application, the
more accurately you can de ne your SoC architecture to handle those types of things.”
2/11/2020 Priorities Shift In IC Design
https://semiengineering.com/higher-performance-plus-low-power/ 5/12
259
Shares
There are so many moving pieces on this, the more modeling you can do upfront with the actual IP, the better o you
are. “This is where some traction is happening, seen in many aspects. The memory pieces that are so important, the
processing pieces that are so important. Just even the interfaces for the sensor inputs, like MIPI, or audio interfaces.
All that architecture can be optimized based on the algorithm, and it’s no di erent than it always has been. If you run
the actual software, you can go ahead and optimize much more e ectively. But there’s a constant need to grow the
performance per watt. If the estimates are to be believed, with some saying that 20% to 50% of all electricity will be
consumed by AI, that’s a huge problem. That is spurring the trend to move to more localized computing, and trying to
compress these things into the application itself. All of those require di erent types of architectures to handle the
di erent functions and features that you’re trying to accomplish,” Lowman said.
Power does play a role here because of the amount of memory capacity needed, the number of coe cients changes,
as well as the number of math blocks.
“You can throw on tons of multiply/accumulates, put them all on chip, but you also have to have all the other things
that are done afterward,” he said. “That includes the input of the data and conditioning of that input data. For
instance, for audio, you need to make sure there are no bottlenecks. How much cache is needed for each of these
data movements? There are all kinds of di erent architectural tradeo s, so the more modeling you can do up front,
the better your system will be if you know the application. If you create a generic one, and then run the one that you
actually run in the system, you may not get the accuracy that you thought you had. There’s a lot of work being done
to improve that over time, and make corrections for that to get the accuracy and power footprint that they need. You
can start with some general features, but every generation I’ve seen is moving very quickly on more performance,
less power, more optimized math, more optimized architectures, and the ability to do not just a standard SRAM but a
multi-port SRAM. This means you’re doing two accesses at once, so you may have as many multiply/accumulates as
you want. But if you can go ahead and do several reads and writes in a single cycle, that saves on power. You can
optimize what that looks like when you’re accessing, and the number of multiply/accumulates you need to do for that
particular stage in the pipeline.”
Conclusion
With so much activity in the high-performance and AI space, it’s an exciting time for the semiconductor ecosystem
around these applications. There is a tremendous amount of startup activity, with the thinking evolving from a more
generic mindset of, “We can do the math for neural networks,” to one in which everybody can do the math for
speci c neural networks in di erent elds, Lowman said. “You can do it for voice, you can do it for vision, you can do
it for data mining, and there are speci c types of vision, voice or sound where you can optimize for certain things.”
This only makes the AI market opportunity more exciting as the technology branches out into many di erent elds
that are extensions of current ones or new areas all together, and the development technologies and tool ecosystem
discovers new ways to make it all a reality.
—Ed Sperling contributed to this report.
TAGS: AI (HTTPS://SEMIENGINEERING.COM/TAG/AI/) ARM (HTTPS://SEMIENGINEERING.COM/TAG/ARM/)
CADENCE (HTTPS://SEMIENGINEERING.COM/TAG/CADENCE/) CHIP DESIGN (HTTPS://SEMIENGINEERING.COM/TAG/CHIP-DESIGN/)
DNNS (HTTPS://SEMIENGINEERING.COM/TAG/DNNS/) EDGE (HTTPS://SEMIENGINEERING.COM/TAG/EDGE/)
EDGE COMPUTING (HTTPS://SEMIENGINEERING.COM/TAG/EDGE-COMPUTING/) FRAUNHOFER EAS (HTTPS://SEMIENGINEERING.COM/TAG/FRAUNHOFER-EAS/)
HIGH PERFORMANCE (HTTPS://SEMIENGINEERING.COM/TAG/HIGH-PERFORMANCE/) IOT (HTTPS://SEMIENGINEERING.COM/TAG/IOT/)
LOW POWER (HTTPS://SEMIENGINEERING.COM/TAG/LOW-POWER/) MENTOR (HTTPS://SEMIENGINEERING.COM/TAG/MENTOR/)
MOORE’S LAW (HTTPS://SEMIENGINEERING.COM/TAG/MOORES-LAW-2/) MOORTEC (HTTPS://SEMIENGINEERING.COM/TAG/MOORTEC/)
RAMBUS (HTTPS://SEMIENGINEERING.COM/TAG/RAMBUS/) SEMICONDUCTOR (HTTPS://SEMIENGINEERING.COM/TAG/SEMICONDUCTOR/)
50 19
2/11/2020 Priorities Shift In IC Design
https://semiengineering.com/higher-performance-plus-low-power/ 6/12
SIEMENS (HTTPS://SEMIENGINEERING.COM/TAG/SIEMENS/) STANFORD UNIVERSITY (HTTPS://SEMIENGINEERING.COM/TAG/STANFORD-UNIVERSITY/)
SYNOPSYS (HTTPS://SEMIENGINEERING.COM/TAG/SYNOPSYS/) VIRTUAL REALITY (HTTPS://SEMIENGINEERING.COM/TAG/VIRTUAL-REALITY/)
Ann Steffora Mutschler  (all posts) (https://semiengineering.com/author/ann/)
Ann Ste ora Mutschler is executive editor at Semiconductor Engineering.
Leave a Reply
Comment
Name*
(Note: This name will be displayed publicly)
Email*
(This will not be displayed publicly)
Post Comment
SPONSORS
(http://www.mentor.com/) (http://www.rambus.com/)
(http://www.synopsys.com) (http://www.ansys.com/)
(http://www.arm.com/) (http://www.cadence.com)
(http://moortec.com/) (https://www.adestotech.com/)

More Related Content

PPTX
Digital twins - Technology that is Changing Industry
PDF
The Enterprise Internet of Things: Think Security First
PDF
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
PPTX
Green Compute and Storage - Why does it Matter and What is in Scope
PDF
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
PDF
SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17
PDF
IEEE CS Phoenix - Internet of Things Innovations & Megatrends Update
PDF
SeGW Whitepaper from Radisys
Digital twins - Technology that is Changing Industry
The Enterprise Internet of Things: Think Security First
AIOps: Anomalous Span Detection in Distributed Traces Using Deep Learning
Green Compute and Storage - Why does it Matter and What is in Scope
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
SIMA AZ: Emerging Information Technology Innovations & Trends 11/15/17
IEEE CS Phoenix - Internet of Things Innovations & Megatrends Update
SeGW Whitepaper from Radisys

What's hot (20)

PDF
The Value of Enterprise GIS
PDF
Accelerating IT Velocity: Agile Transformation at Dell
PDF
[Case study] DONG Energy: Improving the bottom line and getting better data q...
PPTX
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?
PDF
Build the network of the future on your terms today
PPTX
Edge AI Framework for Healthcare Applications
PDF
AI in Healh Care using IBM POWER systems
PDF
Architecting the Enterprise Internet of Things
PDF
Vertex perspectives ai optimized chipsets (part i)
PDF
Edge optimized architecture for fabric defect detection in real-time
PPTX
Deep learning for smart manufacturing
PDF
Edge Computing for the Industry
PDF
8. 9590 1-pb
PPT
Apc by Schneider - 27mai2011
PDF
IT OT Integration_Vishnu_Murali_05262016_UPDATED
PDF
Vertex Perspectives | AI Optimized Chipsets | Part IV
PDF
Vertex Perspectives | AI Optimized Chipsets | Part II
PDF
Tiarrah Computing: The Next Generation of Computing
PPTX
Digital_Twin_GUC_IE _AvinashMisra_&_AvinashNeema
PDF
Edge computing and its role in architecting IoT
The Value of Enterprise GIS
Accelerating IT Velocity: Agile Transformation at Dell
[Case study] DONG Energy: Improving the bottom line and getting better data q...
Big Data for Big Power: How smart is the grid if the infrastructure is stupid?
Build the network of the future on your terms today
Edge AI Framework for Healthcare Applications
AI in Healh Care using IBM POWER systems
Architecting the Enterprise Internet of Things
Vertex perspectives ai optimized chipsets (part i)
Edge optimized architecture for fabric defect detection in real-time
Deep learning for smart manufacturing
Edge Computing for the Industry
8. 9590 1-pb
Apc by Schneider - 27mai2011
IT OT Integration_Vishnu_Murali_05262016_UPDATED
Vertex Perspectives | AI Optimized Chipsets | Part IV
Vertex Perspectives | AI Optimized Chipsets | Part II
Tiarrah Computing: The Next Generation of Computing
Digital_Twin_GUC_IE _AvinashMisra_&_AvinashNeema
Edge computing and its role in architecting IoT
Ad

Similar to Priorities Shift In IC Design (20)

PDF
Keynote Speech - Low Power Seminar, Jain College, October 5th 2012
PPT
Conferencia
PPT
Conferencia
PDF
Artificial Intelligence has become a driving force across various industries,...
PPTX
High performance energy efficient multicore embedded computing
PDF
ChipEx 2019 keynote
PDF
Implementing AI: Running AI at the Edge
 
PPTX
CAQA5e_ch1 (3).pptx
PDF
Lecture 1 Advanced Computer Architecture
PDF
Implementing AI: Hardware Challenges
 
PDF
The Art of Applied Engineering - An Overview
PDF
lec01.pdf
PPTX
Caqa5e ch1 with_review_and_examples
PPTX
SYSTEM approach in system on chip architecture
PPTX
STUDY Introduction to advanced VLSI Design
PPTX
Education of basic VLSI design and its processor
PDF
Chip design with AI inside—designed by AI
PDF
Heterogeneous Computing : The Future of Systems
PDF
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
PDF
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
Keynote Speech - Low Power Seminar, Jain College, October 5th 2012
Conferencia
Conferencia
Artificial Intelligence has become a driving force across various industries,...
High performance energy efficient multicore embedded computing
ChipEx 2019 keynote
Implementing AI: Running AI at the Edge
 
CAQA5e_ch1 (3).pptx
Lecture 1 Advanced Computer Architecture
Implementing AI: Hardware Challenges
 
The Art of Applied Engineering - An Overview
lec01.pdf
Caqa5e ch1 with_review_and_examples
SYSTEM approach in system on chip architecture
STUDY Introduction to advanced VLSI Design
Education of basic VLSI design and its processor
Chip design with AI inside—designed by AI
Heterogeneous Computing : The Future of Systems
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
Ad

More from Abacus Technologies (20)

PDF
Cloud Technology Is the Underdog Of The Tech World
PDF
Small Business Owners: Eight Impactful Reasons To Leverage Cloud Technology
PDF
How to Improve Your Cloud and Container Security
PDF
Controlling cloud infrastructure costs: Tips & tricks
PDF
Does AI-driven cloud computing need ethics guidelines?
PDF
How the hybrid cloud is key to enterprise AI infrastructure strategies
PDF
Cloud Computing in Defence: Defence Trends
PDF
Remote Work Trends: How Cloud Computing Security Changed
PDF
Overcoming Digital Transformation Challenges With The Cloud
PDF
Why is Cloud Computing Important for Companies that Want to Deploy IoT Soluti...
PDF
5 best cloud computing certification courses in the U.S.
PDF
The 9 Best Cloud Computing Events and Conferences to Attend in 2021
PDF
Top 7 security mistakes when migrating to cloud-based apps
PDF
5 programming languages cloud engineers should learn
PDF
10 Fastest-growing cybersecurity skills to learn in 2021
PDF
Cybersecurity Is Not (Just) a Tech Problem
PDF
9 Tips to Prepare for the Future of Cloud & Network Security
PDF
Hybrid cloud strategy: 5 expert tips
PDF
14 Pro Tips For Efficiently Tracking Tech Bugs And Issues
PDF
The way a team functions and communicates
Cloud Technology Is the Underdog Of The Tech World
Small Business Owners: Eight Impactful Reasons To Leverage Cloud Technology
How to Improve Your Cloud and Container Security
Controlling cloud infrastructure costs: Tips & tricks
Does AI-driven cloud computing need ethics guidelines?
How the hybrid cloud is key to enterprise AI infrastructure strategies
Cloud Computing in Defence: Defence Trends
Remote Work Trends: How Cloud Computing Security Changed
Overcoming Digital Transformation Challenges With The Cloud
Why is Cloud Computing Important for Companies that Want to Deploy IoT Soluti...
5 best cloud computing certification courses in the U.S.
The 9 Best Cloud Computing Events and Conferences to Attend in 2021
Top 7 security mistakes when migrating to cloud-based apps
5 programming languages cloud engineers should learn
10 Fastest-growing cybersecurity skills to learn in 2021
Cybersecurity Is Not (Just) a Tech Problem
9 Tips to Prepare for the Future of Cloud & Network Security
Hybrid cloud strategy: 5 expert tips
14 Pro Tips For Efficiently Tracking Tech Bugs And Issues
The way a team functions and communicates

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation theory and applications.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
KodekX | Application Modernization Development
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation theory and applications.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity
KodekX | Application Modernization Development
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Priorities Shift In IC Design

  • 1. 2/11/2020 Priorities Shift In IC Design https://semiengineering.com/higher-performance-plus-low-power/ 1/12 (/) MENU  Select Language ▼ 259 Shares LOW POWER-HIGH PERFORMANCE (/CATEGORY-MAIN-PAGE-LPHP/) Priorities Shift In IC Design AI, edge applications are driving design teams to nd new ways to achieve the best performance per watt. The rush to the edge and new applications around AI are causing a shift in design strategies toward the highest performance per watt, rather than the highest performance or lowest power. This may sound like hair-splitting, but it has set a scramble in motion around how to process more data more quickly without just relying on faster processors and accelerators. Several factors are driving these changes, including the slowdown in Moore’s Law (https://semiengineering.com/knowledge_centers/standards-laws/laws/moores-law/), which limits the number of traditional options, the rollout of AI (https://semiengineering.com/knowledge_centers/arti cial-intelligence/) everywhere, and a surge in data from more sensors, cameras and images with higher resolutions. In addition, more data is being run though convolutional neural networks (https://semiengineering.com/knowledge_centers/arti cial-intelligence/neural- networks/convolutional-neural-network/) or deep learning (https://semiengineering.com/knowledge_centers/arti cial-intelligence/deep-learning/) inferencing systems, which bring huge data processing loads. “As semiconductor scaling slows, but processing demands increase, designers are going to need to start working harder for those performance and e ciency gains,” said Russell Klein, HLS platform director at Mentor, a Siemens Business (https://semiengineering.com/entities/mentor-a-siemens-business/). “When optimizing any system, you need to focus on the biggest ine ciencies rst. For data processing on embedded systems, that will usually be software.” When Moore’s Law was in its prime, processor designers had so many gates they didn’t know what to do with them all, Klein said. “One answer was to plop down more cores, but programmers were reluctant to adopt multi-core programming paradigms. Another answer was to make the processor go as fast as possible without regard to area. A feature that would add 10% to the speed of a processor was considered a win, even if it doubled the size of that processor. Over time, high-end processors picked up a lot of bloat, but no one really noticed or cared. The 50 19 JANUARY 16TH, 2020 - BY: ANN STEFFORA MUTSCHLER (HTTPS://SEMIENGINEERING.COM/AUTHOR/ANN/) 
  • 2. 2/11/2020 Priorities Shift In IC Design https://semiengineering.com/higher-performance-plus-low-power/ 2/12 processors were being stamped out on increasingly e cient and dense silicon. MIPS was the only metric that mattered, but if you start to care about system level e ciency, that bloated processor, and especially the software running on it, might warrant some scrutiny.” Software has a lot of very desirable characteristics, Klein pointed out, but even well-written software is neither fast nor e cient when compared to the same function implemented in hardware. “Moving algorithms from software on the processor into hardware can improve both performance and power consumption because software alone is not going to deliver the performance needed to meet the demands of inferencing, high resolution video processing, or 5G.” The need for speed At the same time, tra c data speeds are increasing, and there are new demands on high speed interfaces to access that data. “High-speed interfaces and SerDes (https://semiengineering.com/knowledge_centers/communications- io/o -chip-communications/i-o-enabling-technology/serializer-deserializer-serdes/) are an integral part of the networking chain, and these speed increases are required to support the latest technology demands of arti cial intelligence (AI), Internet of Things (IoT), virtual reality (VR) and many more technologies that have yet to be envisioned,” noted Suresh Andani, senior director of IP cores at Rambus (https://semiengineering.com/entities/rambus-inc/). Best design practices for high-performance devices include de ning and analyzing the solution space through accurate full-system modeling; utilizing system design and concurrent engineering to maximize rst-time right silicon; ensuring tight correlation between models and silicon results; leveraging a system-aware design methodology; and including built-in test features to support bring-up, characterization and debug, he said. There are many ways to improve performance per watt, and not just in hardware or software. Kunle Olukotun, Cadence Design Systems Professor of electrical engineering and computer science at Stanford University, said that relaxing precision, synchronization and cache coherence can reduce the amount of data that needs to be sent back and forth. That can be reduced even further by domain-speci c languages, which do not require translation. “You can have restricted expressiveness for a particular domain,” said Olukotun in a recent presentation. “You also can utilize parallel patterns and put functional data into parallel patterns based on representation. And you can optimize for locality and exploit parallelism.” He noted that exible mapping of data is much more e cient. That can take advantage of data parallelism, model parallelism, and dynamic precision as needed. In addition, the data ow can be made hierarchical using a wider interface between the algorithms and the hardware, allowing for parallel patterns, explicit memory hierarchies, hierarchical control and explicit parameters, all of which are very useful in boosting performance per watt in extremely performance-centric applications. Flexibility in designs has been one of the tradeo s in optimizing performance per watt, and many of the new AI chips under development have been struggling to combine optimally tuned hardware and software into designs while still leaving enough room for ongoing changes in algorithms and di erent compute tasks. “You may spend 6 to 9 months mapping how to cut up work, and that provides a big impediment to embracing new markets quickly,” said Stuart Biles, a fellow and director of research architecture at Arm (https://semiengineering.com/entities/arm/) Research. “For large OSes, there is a set of functionality in the system where a particular domain is likely to execute on a general-purpose core. But you can add in exibility for how you partition that and make the loop quicker. That basically comes down to how well you use an SoC’s (https://semiengineering.com/knowledge_centers/integrated-circuit/ic-types/system-on-chip/) resources.”
  • 3. 2/11/2020 Priorities Shift In IC Design https://semiengineering.com/higher-performance-plus-low-power/ 3/12 Biles noted that once a common subset is identi ed, then certain functions can be specialized with an eFPGA (https://semiengineering.com/knowledge_centers/integrated-circuit/ic-types/fpga/embedded-fpga-efpga/) or using 3D integration. We’ve moved from the initial 3D integration to the microarchitecture, where you can cut out cycles and branch prediction. What’ you’re looking at is the time it takes to get from load/store to processor versus doing that vertically, and you can change the microarchitectural assumptions based up speci c assumptions in 3D. That results in di erent delays.” A di erent take on the same problem is to limit the amount of data that needs to be processed in the rst place. This is particularly important in edge systems such as cars, where performance per watt is critical due to limited battery power and the need for real-time results. One way to change that equation is to sharply limit the amount of data being sent to centralized processing systems in the vehicle by pre-screening it at the sensor level. So while not actually speeding up the processing per watt, it achieves faster results using less power. “You can provide a reasonable amount of compute power at the sensor, and you can reduce the amount of data that the sensor identi es through pre-selection,” said Benjamin Prautsch, group manager for advanced mixed-signal automation at Fraunhofer IIS’ (https://semiengineering.com/entities/fraunhofer-iis-eas/) Engineering of Adaptive Systems Division. “So if you’re looking at what is happening in a room, the rst layer can identify if there are people in there. The same can be used on a manufacturing line. You also can run DNN calculations in a parallel way to be more e cient.” Further, AI chips, like many high performance devices, have a tendency to develop hotspots, noted Richard McPartland, technical marketing manager at Moortec (https://semiengineering.com/entities/moortec-semiconductor- ltd/). “AI chips are designed to tackle immense processing tasks for training and inference,” he said. “They are typically very large in silicon area, with hundreds or even thousands of cores on advanced nFET (https://semiengineering.com/knowledge_centers/integrated-circuit/transistors/3d/ nfet-3/) processes consuming high current – 100 amperes or more at supply voltages below 1 volt. With AI chip power consumptions at a minimum in the tens of watts, but often well over 100 watts, it should be no surprise that best design practices includes in-chip temperature monitoring. And it’s not just one sensor, but typically tens of temperature sensors distributed throughout the clusters of processors and other blocks. In-chip monitoring should be considered early in the design ow and included up front in oor planning, and not added as an afterthought. At a minimum, temperature monitoring can provide protection from thermal runaway. But accurate temperature monitoring also supports maximizing data throughput by minimizing throttling of the compute elements.” In-chip voltage monitoring with multiple sense points is also recommended for high-performance devices such as AI chips, he continued. “Again, this should be included early in the design ow to monitor the supply voltages at critical circuits, such as the processor clusters, as well as supply drops between the supply pins and the circuit blocks. Voltage droops occur when the AI chips start operating under load, and being software-driven, this can be di cult to predict in the chip design phase with the software written later by another team. Including voltage sense points gives visibility about what is going on with the internal chip supplies, and is invaluable in the chip bring-up phase, as well as for reducing power consumption through minimizing guard bands.” Process detectors are also a must-have on high-performance devices such as AI chips, McPartland said. “These enable a quick and independent veri cation of process performance and variation, not just die-to-die but across large individual die on advanced nodes. Further, they can be used for power optimization (https://semiengineering.com/power-optimization-strategies-widen/), such as to reduce power consumption (https://semiengineering.com/knowledge_centers/low-power/low-power-design/power-consumption/) through
  • 4. 2/11/2020 Priorities Shift In IC Design https://semiengineering.com/higher-performance-plus-low-power/ 4/12 voltage scaling schemes where the voltage guard bands are minimized on a per-die basis based on process speed. Lower power equates to higher processing performance in the AI world, where processing power is often constrained by thermal and power issues. AI algorithm performance challenges An important consideration of AI and other high-performance devices is the fact that actual performance is not known until the end application is run. This raises questions for many AI processor startups that insist they can build a better hardware accelerator for matrix math and other AI algorithms than the next guy. “That’s their key di erentiation,” said Ron Lowman, strategic marketing manager for IoT at Synopsys (https://semiengineering.com/entities/synopsys-inc/). “Some of those companies may be in their second or third designs, whereas the bigger players are in their third or fourth designs, and they’re learning something every time. The math is changing on them just as rapidly as they can get a chip out, which is helping the situation, but it’s a game for who can get the highest performance in the data center. That’s now moving down to edge computing (https://semiengineering.com/knowledge_centers/compute-architectures/edge-computing/). Those AI accelerators are being built on local and on-premise servers now, and they want to nd their niche in performance per watt and for speci c applications. But in that space, they still have to accommodate many di erent types of AI functions, be it for voice or audio or database extraction or vision. That’s a lot of di erent things. Then there’s the guys building the applications, like for ADAS (https://semiengineering.com/knowledge_centers/automotive/adas-advanced-driver- assistance-systems/). That’s a very speci c use case, and they can be more speci c to what they’re building, so they know exactly the model they may want, although that too changes pretty rapidly.” If the design team has a better handle on the end application and the intended use cases, they can look at each di erent speci c space, whether it’s for mobile or edge computing, or for automotive. “You can see that the TOPS, just the pure performance, has grown orders of magnitude over the last couple of years,” Lowman said. “The initial mobile devices that were going to handle AI had under a TOPS (tera operations per second). Now you’re seeing up to 16 TOPS in those mobile devices. That’s how they start, by saying, ‘This is the general direction because we have to handle many di erent types of AI functions in the mobile phone.’ You look at ADAS, and those guys were even ahead of the mobile phones. Now you’re seeing up 35 TOPS for a single instantiation for ADAS, and that continues to grow. In edge computing, they’re basically scaling down the data center devices to be more power-e cient, and those applications can range between 50 to hundreds of TOPS. That’s where you start.” However, a rst-generation AI architecture often is very ine cient for what they want to accomplish because they’re trying to do too much. If the actual application could be run, the architecture could be tuned signi cantly, because it’s not just a processor or the ability to just do the MAC. It’s a function of accessing the coe cients from memory, then processing them very e ectively. It’s also not just adding a bunch of on-chip SRAM (https://semiengineering.com/knowledge_centers/memory/volatile-memory/static-random-access-memory/) that solves the problem. Modeling the IP, such as DDR instantiations, and di erent bitwidths with di erent access capabilities, di erent types of DRAM (https://semiengineering.com/knowledge_centers/memory/volatile- memory/dynamic-random-access-memory/) con gurations, or LPDDR versus DDR, optimal ways can be found before the system development is complete using prototyping tools and systems explorations tools. “If the development team has the real algorithm, it’s much more e ective,” Lowman said. “A lot of people use ResNet- 50 as a benchmark because that’s better than TOPS. But people are well beyond that. You see voice applications for natural language understanding. ResNet 50 has maybe a few million coe cients, but some of these are in the billions of coe cients now, so it’s not even representative. And the more representative you can get of the application, the more accurately you can de ne your SoC architecture to handle those types of things.”
  • 5. 2/11/2020 Priorities Shift In IC Design https://semiengineering.com/higher-performance-plus-low-power/ 5/12 259 Shares There are so many moving pieces on this, the more modeling you can do upfront with the actual IP, the better o you are. “This is where some traction is happening, seen in many aspects. The memory pieces that are so important, the processing pieces that are so important. Just even the interfaces for the sensor inputs, like MIPI, or audio interfaces. All that architecture can be optimized based on the algorithm, and it’s no di erent than it always has been. If you run the actual software, you can go ahead and optimize much more e ectively. But there’s a constant need to grow the performance per watt. If the estimates are to be believed, with some saying that 20% to 50% of all electricity will be consumed by AI, that’s a huge problem. That is spurring the trend to move to more localized computing, and trying to compress these things into the application itself. All of those require di erent types of architectures to handle the di erent functions and features that you’re trying to accomplish,” Lowman said. Power does play a role here because of the amount of memory capacity needed, the number of coe cients changes, as well as the number of math blocks. “You can throw on tons of multiply/accumulates, put them all on chip, but you also have to have all the other things that are done afterward,” he said. “That includes the input of the data and conditioning of that input data. For instance, for audio, you need to make sure there are no bottlenecks. How much cache is needed for each of these data movements? There are all kinds of di erent architectural tradeo s, so the more modeling you can do up front, the better your system will be if you know the application. If you create a generic one, and then run the one that you actually run in the system, you may not get the accuracy that you thought you had. There’s a lot of work being done to improve that over time, and make corrections for that to get the accuracy and power footprint that they need. You can start with some general features, but every generation I’ve seen is moving very quickly on more performance, less power, more optimized math, more optimized architectures, and the ability to do not just a standard SRAM but a multi-port SRAM. This means you’re doing two accesses at once, so you may have as many multiply/accumulates as you want. But if you can go ahead and do several reads and writes in a single cycle, that saves on power. You can optimize what that looks like when you’re accessing, and the number of multiply/accumulates you need to do for that particular stage in the pipeline.” Conclusion With so much activity in the high-performance and AI space, it’s an exciting time for the semiconductor ecosystem around these applications. There is a tremendous amount of startup activity, with the thinking evolving from a more generic mindset of, “We can do the math for neural networks,” to one in which everybody can do the math for speci c neural networks in di erent elds, Lowman said. “You can do it for voice, you can do it for vision, you can do it for data mining, and there are speci c types of vision, voice or sound where you can optimize for certain things.” This only makes the AI market opportunity more exciting as the technology branches out into many di erent elds that are extensions of current ones or new areas all together, and the development technologies and tool ecosystem discovers new ways to make it all a reality. —Ed Sperling contributed to this report. TAGS: AI (HTTPS://SEMIENGINEERING.COM/TAG/AI/) ARM (HTTPS://SEMIENGINEERING.COM/TAG/ARM/) CADENCE (HTTPS://SEMIENGINEERING.COM/TAG/CADENCE/) CHIP DESIGN (HTTPS://SEMIENGINEERING.COM/TAG/CHIP-DESIGN/) DNNS (HTTPS://SEMIENGINEERING.COM/TAG/DNNS/) EDGE (HTTPS://SEMIENGINEERING.COM/TAG/EDGE/) EDGE COMPUTING (HTTPS://SEMIENGINEERING.COM/TAG/EDGE-COMPUTING/) FRAUNHOFER EAS (HTTPS://SEMIENGINEERING.COM/TAG/FRAUNHOFER-EAS/) HIGH PERFORMANCE (HTTPS://SEMIENGINEERING.COM/TAG/HIGH-PERFORMANCE/) IOT (HTTPS://SEMIENGINEERING.COM/TAG/IOT/) LOW POWER (HTTPS://SEMIENGINEERING.COM/TAG/LOW-POWER/) MENTOR (HTTPS://SEMIENGINEERING.COM/TAG/MENTOR/) MOORE’S LAW (HTTPS://SEMIENGINEERING.COM/TAG/MOORES-LAW-2/) MOORTEC (HTTPS://SEMIENGINEERING.COM/TAG/MOORTEC/) RAMBUS (HTTPS://SEMIENGINEERING.COM/TAG/RAMBUS/) SEMICONDUCTOR (HTTPS://SEMIENGINEERING.COM/TAG/SEMICONDUCTOR/) 50 19
  • 6. 2/11/2020 Priorities Shift In IC Design https://semiengineering.com/higher-performance-plus-low-power/ 6/12 SIEMENS (HTTPS://SEMIENGINEERING.COM/TAG/SIEMENS/) STANFORD UNIVERSITY (HTTPS://SEMIENGINEERING.COM/TAG/STANFORD-UNIVERSITY/) SYNOPSYS (HTTPS://SEMIENGINEERING.COM/TAG/SYNOPSYS/) VIRTUAL REALITY (HTTPS://SEMIENGINEERING.COM/TAG/VIRTUAL-REALITY/) Ann Steffora Mutschler  (all posts) (https://semiengineering.com/author/ann/) Ann Ste ora Mutschler is executive editor at Semiconductor Engineering. Leave a Reply Comment Name* (Note: This name will be displayed publicly) Email* (This will not be displayed publicly) Post Comment SPONSORS (http://www.mentor.com/) (http://www.rambus.com/) (http://www.synopsys.com) (http://www.ansys.com/) (http://www.arm.com/) (http://www.cadence.com) (http://moortec.com/) (https://www.adestotech.com/)