SlideShare a Scribd company logo
Accelerating EDA workloads on Azure
- Best Practice and benchmark on Intel EMR CPU
Meng-Ru Tsai
Principal Technical Program Manager, Microsoft
Jennifer Zickel
Director, Xeon Product Line Management, Intel
Abstract/Agenda
 This session will introduce the best practices for running EDA on Azure,
covering the recommended architecture.
 We will present benchmark results of running EDA tools on Azure,
including Synopsys VCS and Cadence Spectre-X, highlighting the
capabilities of the latest Azure VMs equipped with the new 5th Gen Intel®
Xeon® Platinum 8537C (Emerald Rapids) processor.
Context
# iterations in full Design Cycle
(e.g. 9 mo)
Number of parallel jobs
(distributed)
Peak mem across all jobs
(GB)
Average mem per
jobs (GB)
# cores per job
(Multi-threading)
Data I/O per
iteration (GB)
Average
Runtime per
job (Hrs)
AMS/IP Design
Circuit Layout Full Chip 50 1 10 10 8 10 8
Circuit Simulation - Cells Block 50 1,000 1 0.1 1 100 24
Circuit Simulation - MEM/IP Block 50 100 60 16 1 100 24
Chip Design
(Front End)
High Level Synth (HLS) Block 10 20 50 50 8 10 12
Functional Simulation
(RTL)
Block 810 1,000 8 4 1 3 0.20
Full Chip 270 500 64 16 1 10 0.75
Functional Simulation
(Gate Level)
Block 20 2 384 128 1 10 12
FullChip 5 1 1,500 1,500 1 100 72
RTL Synthesis
Block 90 50 64 32 8 50 8
Full Chip 20 4 768 768 16 100 24
CDC (Clock domain crossing Block 10 8 30 30 16 50 4
Formal Verification Block 90 40 50 50 16 50 8
DFT (Scan/Bist/ATPG) Block 30 4 384 384 16 50 4
RTL Power Analysis Block 90 4 64 64 16 50 4
Chip Design
(Back End)
APR (P&R)
Block 30 50 384 128 16 200 72
Full Chip 20 4 768 768 16 500 72
Signoff Timing
Block 90 250 128 80 16 100 6
Full Chip 60 50 800 800 16 700 12
Extraction
Block 90 30 100 50 32 200 6
Full Chip 30 256 300 300 32 1,000 6
Signoff DRC/LVS
Block 90 16 384 200 200 200 8
Full Chip 20 10 2,000 2,000 244 1,000 12
IR Drop Full Chip 30 700 128 128 64 200 12
ECO (e.g.Tweaker) Full Chip 10 10 500 500 16 200 12
Examples of silicon design workloads
Chip Design productivity
Development cycle dominated by alternating phases of EDA tools simulation time
and designer debug time.
EDA simulation Development time Designer productivity
EDA Tools/ISV landscape
EDA EDA Flow Synopsys Mentor Cadence Empyrean Ansys
IP
Circuit Layout Custom Compiler Tanner Virtuoso Aether x
Circult Simulation - Cells Hspice Eldo/AFS Spectre Qualib x
Circult Simulation - MEM/IP Hspice Eldo/AFS Spectre ALPS x
Front-End
High Level Synth (HLS) x Catapult Stratus x
Functional Simulation (RTL) VCS Questa xCelium / Ncsim x
Functional Simulation (Gates) VCS Questa xCelium / Ncsim x
RTL Synthesis Design Compiler Oasys-RTL Genus x
CDC (Clock Domain Crossing) Spyglass CDC Questa CDC Conformal CDC x
Formal Verifcation VS Formal / Formality Formal-Pro JasperGold / Conformal x
DFT (Scan/Bist/ATPG) DFTMAX/Tetramax Tessent Modus x
TRL Power Analysis PrimePower Power-Pro Joules PowerArtist
Back-End
APR (P&R) ICC-II Nitro Innovus Argus x
Signoff Timing PrimeTime Optimus Tempus x
Signoff Extraction Star-RC Xact-RC Quantus /QRC RCExplorer Extraction x
Signoff DRC/LVS ICV Calibre Pegasus /Assura Argus x
Signoff EM/IP Drop/Power PrimePower BlueWave Voltus RedHawk / RedHawk-SC
Programmable ERC ICV Calibre PERC Pegasus x
ECO (Tweaker) PrimeTime ECO Optimus Tempus ECO x
Post
Tapeout
Computational Lithography
(OPC, RET) Proteus Calibre x x
EDA Tools/ISV landscape
EDA EDA Flow Synopsys Mentor Cadence Empyrean Ansys
IP
Circuit Layout Custom Compiler Tanner Virtuoso Aether x
Circult Simulation - Cells Hspice Eldo/AFS Spectre Qualib x
Circult Simulation - MEM/IP Hspice Eldo/AFS Spectre ALPS x
Front-End
High Level Synth (HLS) x Catapult Stratus x
Functional Simulation (RTL) VCS Questa xCelium / Ncsim x
Functional Simulation (Gates) VCS Questa xCelium / Ncsim x
RTL Synthesis Design Compiler Oasys-RTL Genus x
CDC (Clock Domain Crossing) Spyglass CDC Questa CDC Conformal CDC x
Formal Verifcation VS Formal / Formality Formal-Pro JasperGold / Conformal x
DFT (Scan/Bist/ATPG) DFTMAX/Tetramax Tessent Modus x
TRL Power Analysis PrimePower Power-Pro Joules PowerArtist
Back-End
APR (P&R) ICC-II Nitro Innovus Argus x
Signoff Timing PrimeTime Optimus Tempus x
Signoff Extraction Star-RC Xact-RC Quantus /QRC RCExplorer Extraction x
Signoff DRC/LVS ICV Calibre Pegasus /Assura Argus x
Signoff EM/IP Drop/Power PrimePower BlueWave Voltus RedHawk / RedHawk-SC
Programmable ERC ICV Calibre PERC Pegasus x
ECO (Tweaker) PrimeTime ECO Optimus Tempus ECO x
Post
Tapeout
Computational Lithography
(OPC, RET) Proteus Calibre x x
Intel, AMD, Qualcomm, MediaTek (5nm & 7 nm), TSMC (DTP), etc.
Why Cloud?
Source: TSMC eNewsletter
• Accelerate design and characterization.
• Eliminates purchasing in-house CPUs which would stand idle during off-peak times.
• Greater quality with higher simulation coverage.
• Designers around the world to collaborate.
A simple pipe cleaning
License
server
VPN/ER
On Prem
Managed NFS services
Azure NetApp Files
Scheduler
EDA Data from on-prem Read/Write
License server
Scheduler
• VM scale set (VMSS)
• Local /tmp
• Accelerated networking
A 200-job cluster, CycleCloud for orchestration
License
server
VPN/ER
License server
On Prem
Managed NFS services
Azure NetApp Files
Scheduler
EDA Data from on-prem
Scheduler
Read/Write
• VM scale set (VMSS)
• Local /tmp
• Accelerated networking
Log Analytic
CycleCloud
• Dynamic scale up and down
• Parallel VM Provisioning
A full-production cluster w/ 50,000+ cores
License
server
VPN/ER
License server
On Prem
Scheduler
EDA Data from on-prem
Scheduler
• VM scale set (VMSS)
• Local /tmp
• Accelerated networking
Log Analytic
CycleCloud
• Dynamic scale up and down
• Parallel VM Provisioning
ANF
Write/output
ANF
Read/Write
/scratch
ANF
Read/tool
Managed NFS services
Testing environment
• Azure NetApp Files (ANF) serves as the NFS
storage solution, featuring a Premium 4TiB
volume.
• To minimize network traffic latency, compute
VMs, the license server VM, and storage are
all located within the same Proximity
Placement Group.
©Microsoft Corporation
Azure
Shared under NDA
VM size changes:
o 2:1 (Dlv6), 4:1 (Dv6), 8:1 (Ev6) Mem:vCPU ratios
o Dv6 Sizes ranging from 2 to 128 vCPUs, up to 512GiB RAM (D192 size under evaluation)
o Ev6 sizes ranging from 2 to 192vCPU, up to 1,832GiB RAM
Expected improvement vs the previous v5 VMs (depending on a size):
o >15%-20% CPU performance on average measured by SPECInt; >3X L3 cache
o Max remote storage IOPS increase from 80k to 260k with Premium v1 SSDs and 400k with Premium v2 SSDs
o Max remote storage throughput increase from 2.6GB/s to 6.8GB/s (D128) or 12GB/s (E192i)
o 4X Faster local NVMe SSD in read IOPS, +50% local SSD capacity
o Up to 200Gbps network BW
Public Preview Plan (subject to change)
o Preview from July 2024 in US East & US West regions
o Attend preview by filling out this survey
o VM specifications
[Intel] Dlv6, Dv6, Ev6 VMs based on Intel Emerald Rapids CPU
Preview
Azure FXv2-series VMs
• Preview: Compute-optimized FXmdsv2 and FXmsv2
• Processor: 5th Generation Intel® Xeon® Platinum
Emerald Rapids processor in a hyper-threaded configuration
• Workloads: Ideal for large databases, data analytics, SQL, and
EDA workloads
• Regions: West US 3 and Southeast Asia (will expand beyond
2024)
Learn more and get started
aka.ms/FXv2-series-Preview-Blog
Limited time offer on Linux VMs
aka.ms/LinuxPromoOffer
Compared to our previous generation
FXv1 based VMs, up to:
• Increased vCPUs up to 96
• Larger memory up to 1832 GiBs w/ up to
21:1 memory-to-vCPU ratios
• Up to 50% increased CPU performance
• Up to 100% increase in local storage (Read)
IOPS
• 100% increase in IOPS & 400% increase in
remote storage throughput with Premium v1
remotes storage SSDs
• Support up to 400k IOPS and up to 11 GBps
throughput with Premium v2/ Ultra Disk
support
Synopsys VCS
Jennifer Zickel
Director, Xeon Product Line Management, Intel
©Microsoft Corporation
Azure NetApp Files
Intel RTL Design: 1 to 32 simulations
FX64v2 (EmeraldRapids), D64dsv5 (Ice Lake) Azure Instances
Intel RTL Design:
. VCS RTL Simulation test design
. Complex RTL design (>10M gates)
. SVTB (System-Verilog Test Bench) simulation
test for 100K cycles
. Resident memory footprint per simulation
instance is 7 GB.
VCS is a Synopsys Functional verification solution in the EDA space
We observe speedup for the Emerald Rapids instance compared to Ice Lake instance from 17 to 43% for
the range of simultaneous simulations shown in the chart above. Emerald Rapids performance vectors:
Newer Gen architecture delivering higher IPC, Higher all core turbo frequency, Higher B/W and Larger
L2/L3 caches, faster UPI NUMA links and PCIe 5.0 support vs PCIe 4.0
0
500
1000
1500
1 Sim 2 Sim 4 Sim 8 Sim 16 Sim 24 Sim 32 Sim
Completion time in Seconds: Lower is better
FX64v2 (Emerald Rapids, all core turbo frequency up to 4.0 GHz
D64dsv5 (Ice Lake, all core turbo frequency up to 3.5 GHz
©Microsoft Corporation
Azure NetApp Files
Intel RTL Design
FX64v2 (Emerald Rapids), D64dsv5 (Ice Lake) Azure Instances
Number of simultaneous simulations tested. Seconds to completion. Lower is better
Top chart has raw completion numbers. Bottom chart has the percentage speedup for the Emerald Rapids
instance compared to Ice Lake instance.
Instance/
Simulations
1 2 4 8 16 24 32
FX64v2
Emerald Rapids
879 876 910 989 1088 1140 1196
D64dsv5
Ice Lake
1196 1249 1262 1295 1316 1348 1403
Speedup %/
Simulations
1 2 4 8 16 24 32
D64dsv5/
FX64v2 1.36 1.43 1.39 1.31 1.21 1.18 1.17
Notices and Disclaimers
 Performance varies by use, configuration and other factors. Learn more on
the Performance Index site.
 Performance results are based on testing as of dates shown in configurations
and may not reflect all publicly available updates. See backup for configuration
details. No product or component can be absolutely secure.
 Your costs and results may vary.
 Intel technologies may require enabled hardware, software or service activation.
 © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks
of Intel Corporation or its subsidiaries. Other names and brands may be
claimed as the property of others.
Cadence Spectre-X
Meng-Ru Tsai
Principal Technical Program Manager, Microsoft
Observation
 Test design: Post Layout DSPF design with 100+K circuit inventories.
 CPU average utilization kept 95+% during the runtime. Very
compute-intensive and CPU-bound.
Simulation time and scalability
 Total elapsed time (seconds), the lower the better
 Compare to Ice Lake in %
# of
threads
D64dsv5 (Ice Lake,
all-core-turbo
frequency up to 3.5
GHz
D64dsv6 (Emerald Rapids,
all-core-turbo frequency
up to 3.6 GHz)
FX64v2 (Emerald Rapids,
all-core-turbo frequency
up to 4.0 GHz)
1 7010 5740 4990
2 3590 3070 2690
4 1970 1740 1500
8 1190 1050 925
# of
threads
D64dsv5 D64dsv6 FX64v2
1 100% 82% 71%
2 100% 86% 75%
4 100% 88% 76%
8 100% 88% 78%
0
1000
2000
3000
4000
5000
6000
7000
8000
0 1 2 3 4 5 6 7 8
Performance improves
of multithreading Spectre-X jobs
D64dsv5 (Ice Lake, all-core-turbo frequency up to 3.5 GHz
D64dsv6 (Emerald Rapids, all-core-turbo frequency up to 3.6 GHz)
FX64v2 (Emerald Rapids, all-core-turbo frequency up to 4.0 GHz)
Cost effective estimation
 The estimated total time and VM cost for running 500 single-
threaded Spectre-X jobs:
o D64lds v6 has the lowest cost.
o FX64mds v2 has the shortest total time.
ACCUMULATE points by scanning the QR and have the chance to
WIN PRIZES!
Innovation Pass
Thank you!

More Related Content

PPT
Orcl siebel-sun-s282213-oow2006
PDF
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
PPT
Collaborate07kmohiuddin
PPT
No[1][1]
PDF
PCCC21:日本電気株式会社「一台何役?SX-Aurora TSUBASA最新情報」
PDF
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
PDF
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
PPT
Parallelism Processor Design
Orcl siebel-sun-s282213-oow2006
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
Collaborate07kmohiuddin
No[1][1]
PCCC21:日本電気株式会社「一台何役?SX-Aurora TSUBASA最新情報」
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Parallelism Processor Design

Similar to Accelerating EDA workloads on Azure – Best Practice and benchmark on Intel EMR CPU.pdf (20)

PDF
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
PDF
MT41 Dell EMC VMAX: Ask the Experts
PDF
PowerDRC/LVS 2.0.1 released by POLYTEDA
PDF
Hardware & Software Platforms for HPC, AI and ML
PDF
9/ IBM POWER @ OPEN'16
PPTX
Ceph Day Taipei - Accelerate Ceph via SPDK
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
PDF
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
PDF
Supermicro X12 Performance Update
PPT
The Cell Processor
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
PDF
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
PPTX
Presenta completaoow2013
PPTX
Azure Local AMD Solutions Offering 20250131.pptx
PDF
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
PPTX
configurations type cloud VNX
PPTX
Webinar: Untethering Compute from Storage
PPTX
Ceph Day New York 2014: Ceph, a physical perspective
PDF
Direct Code Execution - LinuxCon Japan 2014
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
MT41 Dell EMC VMAX: Ask the Experts
PowerDRC/LVS 2.0.1 released by POLYTEDA
Hardware & Software Platforms for HPC, AI and ML
9/ IBM POWER @ OPEN'16
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Supermicro X12 Performance Update
The Cell Processor
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
Presenta completaoow2013
Azure Local AMD Solutions Offering 20250131.pptx
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
configurations type cloud VNX
Webinar: Untethering Compute from Storage
Ceph Day New York 2014: Ceph, a physical perspective
Direct Code Execution - LinuxCon Japan 2014
Ad

More from Meng-Ru (Raymond) Tsai (20)

PDF
2024年11月14日的講座《AI 業界應用與未來趨勢》由微軟Azure HPC/AI工程部的主要計劃經理蔡孟儒主講,涵蓋了生成式AI的進展、如何客製化A...
PDF
Microsoft Generative AI and Medical case studies.
PDF
20211119 ntuh azure hpc workshop final
PDF
202002 DIGI+Talent數位網路學院線上課程: 五大領堿先修課
PDF
20190627 ai+blockchain
PDF
20171024 文化大學 1 azure big data ai
PDF
20171024 文化大學 2 big data ai
PPTX
20180126 microsoft ai on healthcare
PDF
20170330 彰基 azure healthcare
PPTX
4 module09 iot
PPTX
3 module06 monitoring
PPTX
2 module07 cognitive services and the bot framework
PPTX
1 module04 dev ops
PDF
20170123 外交學院 大數據趨勢與應用
PDF
20160525 跨界新識力沙龍論壇 機器學習與跨業應用展望
PDF
20170108 微軟大數據整合解決方案- cortana intelligence suite
PPTX
20160930 bot framework workshop
PPTX
20160930 bot framework workshop
PPTX
20160323 台大 微軟學生大使招生分享會
PDF
20160304 blockchain in fsi client ready raymond
2024年11月14日的講座《AI 業界應用與未來趨勢》由微軟Azure HPC/AI工程部的主要計劃經理蔡孟儒主講,涵蓋了生成式AI的進展、如何客製化A...
Microsoft Generative AI and Medical case studies.
20211119 ntuh azure hpc workshop final
202002 DIGI+Talent數位網路學院線上課程: 五大領堿先修課
20190627 ai+blockchain
20171024 文化大學 1 azure big data ai
20171024 文化大學 2 big data ai
20180126 microsoft ai on healthcare
20170330 彰基 azure healthcare
4 module09 iot
3 module06 monitoring
2 module07 cognitive services and the bot framework
1 module04 dev ops
20170123 外交學院 大數據趨勢與應用
20160525 跨界新識力沙龍論壇 機器學習與跨業應用展望
20170108 微軟大數據整合解決方案- cortana intelligence suite
20160930 bot framework workshop
20160930 bot framework workshop
20160323 台大 微軟學生大使招生分享會
20160304 blockchain in fsi client ready raymond
Ad

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Programs and apps: productivity, graphics, security and other tools
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Approach and Philosophy of On baking technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Digital-Transformation-Roadmap-for-Companies.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Network Security Unit 5.pdf for BCA BBA.
Per capita expenditure prediction using model stacking based on satellite ima...

Accelerating EDA workloads on Azure – Best Practice and benchmark on Intel EMR CPU.pdf

  • 1. Accelerating EDA workloads on Azure - Best Practice and benchmark on Intel EMR CPU Meng-Ru Tsai Principal Technical Program Manager, Microsoft Jennifer Zickel Director, Xeon Product Line Management, Intel
  • 2. Abstract/Agenda  This session will introduce the best practices for running EDA on Azure, covering the recommended architecture.  We will present benchmark results of running EDA tools on Azure, including Synopsys VCS and Cadence Spectre-X, highlighting the capabilities of the latest Azure VMs equipped with the new 5th Gen Intel® Xeon® Platinum 8537C (Emerald Rapids) processor.
  • 3. Context # iterations in full Design Cycle (e.g. 9 mo) Number of parallel jobs (distributed) Peak mem across all jobs (GB) Average mem per jobs (GB) # cores per job (Multi-threading) Data I/O per iteration (GB) Average Runtime per job (Hrs) AMS/IP Design Circuit Layout Full Chip 50 1 10 10 8 10 8 Circuit Simulation - Cells Block 50 1,000 1 0.1 1 100 24 Circuit Simulation - MEM/IP Block 50 100 60 16 1 100 24 Chip Design (Front End) High Level Synth (HLS) Block 10 20 50 50 8 10 12 Functional Simulation (RTL) Block 810 1,000 8 4 1 3 0.20 Full Chip 270 500 64 16 1 10 0.75 Functional Simulation (Gate Level) Block 20 2 384 128 1 10 12 FullChip 5 1 1,500 1,500 1 100 72 RTL Synthesis Block 90 50 64 32 8 50 8 Full Chip 20 4 768 768 16 100 24 CDC (Clock domain crossing Block 10 8 30 30 16 50 4 Formal Verification Block 90 40 50 50 16 50 8 DFT (Scan/Bist/ATPG) Block 30 4 384 384 16 50 4 RTL Power Analysis Block 90 4 64 64 16 50 4 Chip Design (Back End) APR (P&R) Block 30 50 384 128 16 200 72 Full Chip 20 4 768 768 16 500 72 Signoff Timing Block 90 250 128 80 16 100 6 Full Chip 60 50 800 800 16 700 12 Extraction Block 90 30 100 50 32 200 6 Full Chip 30 256 300 300 32 1,000 6 Signoff DRC/LVS Block 90 16 384 200 200 200 8 Full Chip 20 10 2,000 2,000 244 1,000 12 IR Drop Full Chip 30 700 128 128 64 200 12 ECO (e.g.Tweaker) Full Chip 10 10 500 500 16 200 12 Examples of silicon design workloads
  • 4. Chip Design productivity Development cycle dominated by alternating phases of EDA tools simulation time and designer debug time. EDA simulation Development time Designer productivity
  • 5. EDA Tools/ISV landscape EDA EDA Flow Synopsys Mentor Cadence Empyrean Ansys IP Circuit Layout Custom Compiler Tanner Virtuoso Aether x Circult Simulation - Cells Hspice Eldo/AFS Spectre Qualib x Circult Simulation - MEM/IP Hspice Eldo/AFS Spectre ALPS x Front-End High Level Synth (HLS) x Catapult Stratus x Functional Simulation (RTL) VCS Questa xCelium / Ncsim x Functional Simulation (Gates) VCS Questa xCelium / Ncsim x RTL Synthesis Design Compiler Oasys-RTL Genus x CDC (Clock Domain Crossing) Spyglass CDC Questa CDC Conformal CDC x Formal Verifcation VS Formal / Formality Formal-Pro JasperGold / Conformal x DFT (Scan/Bist/ATPG) DFTMAX/Tetramax Tessent Modus x TRL Power Analysis PrimePower Power-Pro Joules PowerArtist Back-End APR (P&R) ICC-II Nitro Innovus Argus x Signoff Timing PrimeTime Optimus Tempus x Signoff Extraction Star-RC Xact-RC Quantus /QRC RCExplorer Extraction x Signoff DRC/LVS ICV Calibre Pegasus /Assura Argus x Signoff EM/IP Drop/Power PrimePower BlueWave Voltus RedHawk / RedHawk-SC Programmable ERC ICV Calibre PERC Pegasus x ECO (Tweaker) PrimeTime ECO Optimus Tempus ECO x Post Tapeout Computational Lithography (OPC, RET) Proteus Calibre x x
  • 6. EDA Tools/ISV landscape EDA EDA Flow Synopsys Mentor Cadence Empyrean Ansys IP Circuit Layout Custom Compiler Tanner Virtuoso Aether x Circult Simulation - Cells Hspice Eldo/AFS Spectre Qualib x Circult Simulation - MEM/IP Hspice Eldo/AFS Spectre ALPS x Front-End High Level Synth (HLS) x Catapult Stratus x Functional Simulation (RTL) VCS Questa xCelium / Ncsim x Functional Simulation (Gates) VCS Questa xCelium / Ncsim x RTL Synthesis Design Compiler Oasys-RTL Genus x CDC (Clock Domain Crossing) Spyglass CDC Questa CDC Conformal CDC x Formal Verifcation VS Formal / Formality Formal-Pro JasperGold / Conformal x DFT (Scan/Bist/ATPG) DFTMAX/Tetramax Tessent Modus x TRL Power Analysis PrimePower Power-Pro Joules PowerArtist Back-End APR (P&R) ICC-II Nitro Innovus Argus x Signoff Timing PrimeTime Optimus Tempus x Signoff Extraction Star-RC Xact-RC Quantus /QRC RCExplorer Extraction x Signoff DRC/LVS ICV Calibre Pegasus /Assura Argus x Signoff EM/IP Drop/Power PrimePower BlueWave Voltus RedHawk / RedHawk-SC Programmable ERC ICV Calibre PERC Pegasus x ECO (Tweaker) PrimeTime ECO Optimus Tempus ECO x Post Tapeout Computational Lithography (OPC, RET) Proteus Calibre x x Intel, AMD, Qualcomm, MediaTek (5nm & 7 nm), TSMC (DTP), etc.
  • 7. Why Cloud? Source: TSMC eNewsletter • Accelerate design and characterization. • Eliminates purchasing in-house CPUs which would stand idle during off-peak times. • Greater quality with higher simulation coverage. • Designers around the world to collaborate.
  • 8. A simple pipe cleaning License server VPN/ER On Prem Managed NFS services Azure NetApp Files Scheduler EDA Data from on-prem Read/Write License server Scheduler • VM scale set (VMSS) • Local /tmp • Accelerated networking
  • 9. A 200-job cluster, CycleCloud for orchestration License server VPN/ER License server On Prem Managed NFS services Azure NetApp Files Scheduler EDA Data from on-prem Scheduler Read/Write • VM scale set (VMSS) • Local /tmp • Accelerated networking Log Analytic CycleCloud • Dynamic scale up and down • Parallel VM Provisioning
  • 10. A full-production cluster w/ 50,000+ cores License server VPN/ER License server On Prem Scheduler EDA Data from on-prem Scheduler • VM scale set (VMSS) • Local /tmp • Accelerated networking Log Analytic CycleCloud • Dynamic scale up and down • Parallel VM Provisioning ANF Write/output ANF Read/Write /scratch ANF Read/tool Managed NFS services
  • 11. Testing environment • Azure NetApp Files (ANF) serves as the NFS storage solution, featuring a Premium 4TiB volume. • To minimize network traffic latency, compute VMs, the license server VM, and storage are all located within the same Proximity Placement Group.
  • 12. ©Microsoft Corporation Azure Shared under NDA VM size changes: o 2:1 (Dlv6), 4:1 (Dv6), 8:1 (Ev6) Mem:vCPU ratios o Dv6 Sizes ranging from 2 to 128 vCPUs, up to 512GiB RAM (D192 size under evaluation) o Ev6 sizes ranging from 2 to 192vCPU, up to 1,832GiB RAM Expected improvement vs the previous v5 VMs (depending on a size): o >15%-20% CPU performance on average measured by SPECInt; >3X L3 cache o Max remote storage IOPS increase from 80k to 260k with Premium v1 SSDs and 400k with Premium v2 SSDs o Max remote storage throughput increase from 2.6GB/s to 6.8GB/s (D128) or 12GB/s (E192i) o 4X Faster local NVMe SSD in read IOPS, +50% local SSD capacity o Up to 200Gbps network BW Public Preview Plan (subject to change) o Preview from July 2024 in US East & US West regions o Attend preview by filling out this survey o VM specifications [Intel] Dlv6, Dv6, Ev6 VMs based on Intel Emerald Rapids CPU
  • 13. Preview Azure FXv2-series VMs • Preview: Compute-optimized FXmdsv2 and FXmsv2 • Processor: 5th Generation Intel® Xeon® Platinum Emerald Rapids processor in a hyper-threaded configuration • Workloads: Ideal for large databases, data analytics, SQL, and EDA workloads • Regions: West US 3 and Southeast Asia (will expand beyond 2024) Learn more and get started aka.ms/FXv2-series-Preview-Blog Limited time offer on Linux VMs aka.ms/LinuxPromoOffer Compared to our previous generation FXv1 based VMs, up to: • Increased vCPUs up to 96 • Larger memory up to 1832 GiBs w/ up to 21:1 memory-to-vCPU ratios • Up to 50% increased CPU performance • Up to 100% increase in local storage (Read) IOPS • 100% increase in IOPS & 400% increase in remote storage throughput with Premium v1 remotes storage SSDs • Support up to 400k IOPS and up to 11 GBps throughput with Premium v2/ Ultra Disk support
  • 14. Synopsys VCS Jennifer Zickel Director, Xeon Product Line Management, Intel
  • 15. ©Microsoft Corporation Azure NetApp Files Intel RTL Design: 1 to 32 simulations FX64v2 (EmeraldRapids), D64dsv5 (Ice Lake) Azure Instances Intel RTL Design: . VCS RTL Simulation test design . Complex RTL design (>10M gates) . SVTB (System-Verilog Test Bench) simulation test for 100K cycles . Resident memory footprint per simulation instance is 7 GB. VCS is a Synopsys Functional verification solution in the EDA space We observe speedup for the Emerald Rapids instance compared to Ice Lake instance from 17 to 43% for the range of simultaneous simulations shown in the chart above. Emerald Rapids performance vectors: Newer Gen architecture delivering higher IPC, Higher all core turbo frequency, Higher B/W and Larger L2/L3 caches, faster UPI NUMA links and PCIe 5.0 support vs PCIe 4.0 0 500 1000 1500 1 Sim 2 Sim 4 Sim 8 Sim 16 Sim 24 Sim 32 Sim Completion time in Seconds: Lower is better FX64v2 (Emerald Rapids, all core turbo frequency up to 4.0 GHz D64dsv5 (Ice Lake, all core turbo frequency up to 3.5 GHz
  • 16. ©Microsoft Corporation Azure NetApp Files Intel RTL Design FX64v2 (Emerald Rapids), D64dsv5 (Ice Lake) Azure Instances Number of simultaneous simulations tested. Seconds to completion. Lower is better Top chart has raw completion numbers. Bottom chart has the percentage speedup for the Emerald Rapids instance compared to Ice Lake instance. Instance/ Simulations 1 2 4 8 16 24 32 FX64v2 Emerald Rapids 879 876 910 989 1088 1140 1196 D64dsv5 Ice Lake 1196 1249 1262 1295 1316 1348 1403 Speedup %/ Simulations 1 2 4 8 16 24 32 D64dsv5/ FX64v2 1.36 1.43 1.39 1.31 1.21 1.18 1.17
  • 17. Notices and Disclaimers  Performance varies by use, configuration and other factors. Learn more on the Performance Index site.  Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.  Your costs and results may vary.  Intel technologies may require enabled hardware, software or service activation.  © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
  • 18. Cadence Spectre-X Meng-Ru Tsai Principal Technical Program Manager, Microsoft
  • 19. Observation  Test design: Post Layout DSPF design with 100+K circuit inventories.  CPU average utilization kept 95+% during the runtime. Very compute-intensive and CPU-bound.
  • 20. Simulation time and scalability  Total elapsed time (seconds), the lower the better  Compare to Ice Lake in % # of threads D64dsv5 (Ice Lake, all-core-turbo frequency up to 3.5 GHz D64dsv6 (Emerald Rapids, all-core-turbo frequency up to 3.6 GHz) FX64v2 (Emerald Rapids, all-core-turbo frequency up to 4.0 GHz) 1 7010 5740 4990 2 3590 3070 2690 4 1970 1740 1500 8 1190 1050 925 # of threads D64dsv5 D64dsv6 FX64v2 1 100% 82% 71% 2 100% 86% 75% 4 100% 88% 76% 8 100% 88% 78% 0 1000 2000 3000 4000 5000 6000 7000 8000 0 1 2 3 4 5 6 7 8 Performance improves of multithreading Spectre-X jobs D64dsv5 (Ice Lake, all-core-turbo frequency up to 3.5 GHz D64dsv6 (Emerald Rapids, all-core-turbo frequency up to 3.6 GHz) FX64v2 (Emerald Rapids, all-core-turbo frequency up to 4.0 GHz)
  • 21. Cost effective estimation  The estimated total time and VM cost for running 500 single- threaded Spectre-X jobs: o D64lds v6 has the lowest cost. o FX64mds v2 has the shortest total time.
  • 22. ACCUMULATE points by scanning the QR and have the chance to WIN PRIZES! Innovation Pass