SlideShare a Scribd company logo
Providentia Worldwide
S. Ryan Quick @phaedo, Providentia Worldwide. April 2020
HPC Impact
EDA Telemetry Neural Networks
Systems Intelligence
Ecosystem Management
Providentia Worldwide
Systems
Intelligence
Principles
Methodology for leveraging
multiple data domains
through complex data
processing
Disparate / Unlike Domains
Messaging Middleware
Insight
Insight
Providentia Worldwide
• Aggregation

• Event Statistics

• Atomic Pattern Recognition
• Simple example shown as “waterfalling” for
illustration — the operations are parallel and
stateless

• Pattern is an example of the type and method
of telemetry we use for EDA environmental and
in-workload collection to feed AI and neural
networks inline

• There are literally thousands of metrics for a
single operation, millions per job
Multiple-Domain Simple
Data Access
Metrics Calculator
CPU
Event
Source
app login r/sec
app successful login r/sec
app failed login r/sec
cpu 1m load avg
cpu 5m load avg
cpu 15m load avg
cpu blocked proc cnt
cpu running proc cnt
cpu waiting proc cnt
cpu user %
cpu idle %
cpu system %
cpu io wait %
db active queries
db slow queries
db selects
db updates
db deletes
db rows fetched
db table locks held
db row locks held
Available Source Fields App
Login
Event
Source
DB
Access
Event
Source
> 3?
app failed login /
app success
login * 100
AVG(cpu waiting /
cpu running)) / cpu
1M load avg * 100
> 0.5?
DB Slow
Queries
> 4?
Anomaly Detected:
Potential Login
Attack
yes
yes
yes
Providentia Worldwide
• Affinity + Simple Case

• Stream + Augmented Datasource

• Parallel Stream
• Frequency-Shifted Stream

• “Correlative/Normalized View”: Similar to a SQL “join”
concept, we relate data fields in disparate stream sources

• Many examples — for other talks :)

• This illustrates the mechanisms by which we can combine
and augment data types for complex events in AI/neural
networks and utilize inline training and active models.

• Also allows us to introduce the notion of insight, which is
crucial to incremental improvement model — especially
for “slight touch ecosystems” like coral reefs
Multiple-Domain Complex
Event Processing
Approaches
Complex Event Processor
CPU
Source
Zookeeper
Source
RabbitMQ
Source
Application
Event
Source
Parallel Source
Disparate
Normalization
Correlative/
Normalized
View
Correlative/
Normalized
View
Correlative/
Normalized
View
approx-data-sz
avg-latency
ephemeral-count
followers
max-fd-cnt
max-latency
min-latency
open-fd-cnt
num-alive-connections
outstanding-requests
packets-received
packets-sent
pending-syncs
synced-followers
watch-cnt
znode-cnt
Zookeeper
message total
message ready
message unasked
rate.publish
rate.deliver
rate.redeliver
rate.confirm
rate.ack
connection.total
connection.idle
channel.total
channel.publisher
channel.consumer
channel.duplex
channel.inactive
exchange.rate.phaedo
q.total
q.idle
q.messages.phaedo
q.consumers.phaedo
q.memory.phaedo
q.ingress.phaedo
q.egress.phaedo
binding.total
RabbitMQ
Providentia Worldwide
Semiconductor EDA
Designing the Digital Future
Providentia Worldwide
HPC HTC
• “High Throughput Computing”

• Very predictable, common engineering pipeline

• Toolset geared to repeat the steps in the pattern
100s, 1000s of times per iteration, per engineer
constantly. Each adjustment cascades hundreds/
thousands of small jobs.

• Jobs are very short lived. Avg time on single core is
under 3s. Job scheduler itself is often a
bottleneck on large, shared systems.

• EDA requires multiple phases of HDL synthesizers
and HLL compilers and so can result in different
sorts of computational bottlenecks at different
phases of the pipeline as well as resulting for
different design choices in the engineering
decisions.
EDA Characteristics
Providentia Worldwide
Well-established Sector
• Traditional enterprise storage (NFS3)

• 10-100M small <=1M files/dir)

• user and group based access controls

• POSIX, locking not required

• OS scheduler is often sufficient. Sometimes,
job submission separated by login node.

• License model well understood, and generally
by core or time-based. Codes are generally
proprietary.

• Turnkey deployment is up and running in
minutes on nearly any sized system. Very little
motivation to alter the status quo.
EDA Characteristics
Providentia Worldwide
What Would it Take to Try something new?
• All on-prem, w/ cloud tests successful
but not adopted:

• too costly

• intellectual property concerns

• ROI delayed

• data management difficulties

• Storage enhancements show
improvements, and large shops adopt
those, but NFS3 performs well for
most small-medium practitioners.
EDA Environments
Providentia Worldwide
What Would it Take to Try something new?
• EDA Process is well-known, easy-to-
hire to, and well-understood in the
industry. Why rock the boat?

• Any perturbations to the system
would need to overcome the cost of
change, which in semiconductor
fabrication can be immense.

• Even where bottlenecks are known
(storage, compute, scheduling), they
are understood and manageable.
New is new and unpredictable with
unknown value…
EDA Pipelines at Scale?
Providentia Worldwide
For valuable and motivational change in
semiconductor EDA, we need disruption both
in behavior and environment simultaneously.
Providentia Worldwide
External focus for HTC/Systems Intelligence
• Two primary mechanisms for
augmenting the EDA process:

Internally (inside the EDA
pipeline).

Externally (augmenting and
enhancing the pipelining
environment). 

We are focusing here for this
project, but the usual neural
network caveats apply.
Neural Networks for EDA Pipelines
Semiconductor Electronic Design Automation
«precondition» API to workflow data
Chip Specification
Design entry/Functional verification
RTL synthesis
Partitioning of chip
Design for test (DFT) insertion
Floor planning
Placement stage
Clock tree synthesis (CTS)
Routing stage
Final verification
GDS II
Infrastructure Automation
«precondition» API to all components
«precondition» API backwards compatible
Systems Provisioning
Network Provisioning
Application Deployment
Configuration Management
Platform Management
Change Orchestration
capabilities
XY
User/group file CRUD
Workflow scheduling
Job management
License management
sd Systems Intelligence — EDA Messaging Substrate
Data Analytics Command & Control
Internal
External
Providentia Worldwide
Semiconductor EDA
Designing the Digital Future
“When we think of sensing technologies as devices
that order the world, rather than devices that describe
it, then alternative relationships between the social and
the technical are strikingly brought to light.”
— Genevieve Bell (Intel) @feraldata
Providentia Worldwide
EDA Workflow and Supporting Infrastructure SI Messaging
XY
User/group file CRUD
Workflow scheduling
Job management
License management
X
Y
sd Systems Intelligence — EDA Messaging Substrate
C
E
P
I
n
g
e
s
t
Data Analytics
inline models
offline models
Atomic Pattern
Recognition
Parallel Stream
Command & Control
Stream Augmentation
data/scores/metrics
decisioning
orchestration
validation
feedback
Frequency-Shifted
Streams
Affinity Streams
Aggregation/ Statistics
Semiconductor Electronic Design Automation
«precondition» API to workflow data
Chip Specification
Design entry/Functional verification
RTL synthesis
Partitioning of chip
Design for test (DFT) insertion
Floor planning
Placement stage
Clock tree synthesis (CTS)
Routing stage
Final verification
GDS II
Infrastructure Automation
«precondition» API to all components
«precondition» API backwards compatible
Systems Provisioning
Network Provisioning
Application Deployment
Configuration Management
Platform Management
Change Orchestration
capabilities
XY
User/group file CRUD
Workflow scheduling
Job management
License management
X
Y
sd Systems Intelligence — EDA Messaging Substrate
C
E
P
I
n
Data Analytics
inline models
offline models
Atomic Pattern
Recognition
Command & Control
Stream Augmentation
data/scores/metrics
decisioning
orchestration
External Capabilities and Infrastructure
EDA SI Messaging Substrate
Insight
Insight
Providentia Worldwide
EDA Workflow and AI/NN Frameworks
Semiconductor Electronic Design Automation
«precondition» API to workflow data
Chip Specification
Design entry/Functional verification
RTL synthesis
Partitioning of chip
Design for test (DFT) insertion
Floor planning
Placement stage
Clock tree synthesis (CTS)
Routing stage
Final verification
GDS II
Infrastructure Automation
«precondition» API to all components
«precondition» API backwards compatible
Systems Provisioning
Network Provisioning
Application Deployment
Configuration Management
Platform Management
Change Orchestration
capabilities
XY
User/group file CRUD
Workflow scheduling
Job management
License management
X
Y
sd Systems Intelligence — EDA Messaging Substrate
C
E
P
I
n
Data Analytics
inline models
offline models
Atomic Pattern
Recognition
Command & Control
Stream Augmentation
data/scores/metrics
decisioning
orchestration
GDS II
XY
User/group file CRUD
Workflow scheduling
Job management
License management
sd Neural Networks
sd Messaging-Based Machine Learning / AI / Neural Networks Workflow
Data Analytics and
Normalization
Reactive Systems
scoring/metrics
decisioning
orchestration
validation
feedback
inline learning models
Clustering,
Classification, Decision
Trees
Insight
Consumers
Ecosystem Insight and
KPI Enhancements
Ecosystem Messaging Platform
Pattern Enhancements
ModelRunModelTraining
Offline / replay learning models
CEP/INGESTfromExisting
Datasources
X
Y
Y
X
External Capabilities and Infrastructure
EDA ML / AI / NN Workflow
SIMessagingSubstrate
Insight
Insight
Insight
Providentia Worldwide
Unique position for AI and NN
Why Artificial Intelligence/Neural Networks for this Problem?
• Small, incremental human-driven changes are not cost-effective in
today’s DevOps systems

• Continuous observation for “minority report” style changes is difficult
to design sprints and test efficacy, even harder to measure ROI

• Command and control systems can be designed to allow incremental
change directly from NNs based on deployments — e.g. allow each
“reef” to tune itself based on its own ecosystem

• The “show your work”/“show your rationale” problems are weaker in
EDA compared to delivering results than in other domains
Providentia Worldwide
Insight: “looking inward”
Insight provides a mechanism for self-tuning behavior of the running system at all
levels:

•algorithms, models, data access, expert systems, KPIs, behaviors, reports,
accuracy, efficiency, even insight itself

•In-built feedback mechanism for capturing behavior and performance

•Mechanism to ensure that changes over time are accounted for and noticed if not
understood

•Allows for inline and ongoing training without having to maintain offline (and
outdated) training datasets

•Allows for locale-specific NN training (the NN-locale problem).
Providentia Worldwide
Program Status
Where are we now?
• Telemetry data from workload systems feeding messaging platform

• Synthetic workload (provided from partner benchmarking suite) being modified for user-
emulation

• NN specific topology choice and models under discussion with wider team considering
we will need to utilize simultaneous learning, model promotion, results propagation, etc.

• Insight mechanisms are developed in the messaging substrate automatically, with
common APIs available to higher level structures. Common reporting in dashboards etc.

• Always looking for helpers to take things farther — will report more later as we
(un)shelter…

More Related Content

PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PDF
DPDK & Cloud Native
PDF
State of ARM-based HPC
PDF
Open Source 5G/Edge Automation via ONAP
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
Versal Premium ACAP for Network and Cloud Acceleration
Preparing to program Aurora at Exascale - Early experiences and future direct...
DPDK & Cloud Native
State of ARM-based HPC
Open Source 5G/Edge Automation via ONAP

What's hot (20)

PDF
Building the SD-Branch using uCPE
PDF
InfiniBand In-Network Computing Technology and Roadmap
PDF
Operationalizing SDN
PDF
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
PPTX
Akraino and Edge Computing
PDF
Enabling MEC as a New Telco Business Opportunity
PDF
Introduction to container networking in K8s - SDN/NFV London meetup
PPSX
Development, test, and characterization of MEC platforms with Teranium and Dr...
PPT
State Of FPGA: Current & Future - A Panel discussion @ 4th FPGA Camp
PPTX
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
PPTX
SDN Service Provider Use Cases
PDF
Your Path to Edge Computing - Akraino Edge Stack Update
PDF
P4/FPGA, Packet Acceleration
PDF
Ligato - A platform for development of Cloud-Native VNF's - SDN/NFV London me...
PDF
Create New Value for You - Huawei Agile Network
PPTX
Weaving the Future - Enable Networks to Be More Agile for Services
PDF
Introducing the Vitis Unified Software Platform for Programming FPGAs
PDF
Mellanox OpenPOWER features
PDF
SDN/NFV Building Block Introduction
PDF
FPGAs and Machine Learning
Building the SD-Branch using uCPE
InfiniBand In-Network Computing Technology and Roadmap
Operationalizing SDN
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Akraino and Edge Computing
Enabling MEC as a New Telco Business Opportunity
Introduction to container networking in K8s - SDN/NFV London meetup
Development, test, and characterization of MEC platforms with Teranium and Dr...
State Of FPGA: Current & Future - A Panel discussion @ 4th FPGA Camp
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
SDN Service Provider Use Cases
Your Path to Edge Computing - Akraino Edge Stack Update
P4/FPGA, Packet Acceleration
Ligato - A platform for development of Cloud-Native VNF's - SDN/NFV London me...
Create New Value for You - Huawei Agile Network
Weaving the Future - Enable Networks to Be More Agile for Services
Introducing the Vitis Unified Software Platform for Programming FPGAs
Mellanox OpenPOWER features
SDN/NFV Building Block Introduction
FPGAs and Machine Learning
Ad

Similar to HPC Impact: EDA Telemetry Neural Networks (20)

PDF
Handling data and workflows in computational materials science: the AiiDA ini...
PPTX
The Role of Models in Semiconductor Smart Manufacturing
PPTX
Mirabilis_Design AMD Versal System-Level IP Library
PPTX
AI in the Enterprise at Scale
PDF
5 Things to Consider When Deploying AI in Your Enterprise
PPTX
Addressing Connectivity Challenges of Disparate Data Sources in Smart Manufac...
PPTX
Develop High-bandwidth/low latency electronic systems for AI/ML application
PDF
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
PPTX
Smarter Manufacturing with SEMI Standards: Practical Approaches for Plug-and-...
PDF
Gartner Top 10 Strategy Technology Trends 2018
PDF
Defining a Practical Path to Artificial Intelligence
PDF
Introduction to Event Driven Architecture
PDF
How to create innovative architecture using ViualSim?
PDF
How to create innovative architecture using VisualSim?
PDF
How to create innovative architecture using VisualSim?
PPT
Event Driven Architecture (EDA), November 2, 2006
PPTX
Plenary Session: application drive design alberto sv
PDF
FPGA Hardware Accelerator for Machine Learning
PPTX
Connectivity challenges APC Europe by Alan Weber
PDF
Calibration of Deployment Simulation Models - A Multi-Paradigm Modelling Appr...
Handling data and workflows in computational materials science: the AiiDA ini...
The Role of Models in Semiconductor Smart Manufacturing
Mirabilis_Design AMD Versal System-Level IP Library
AI in the Enterprise at Scale
5 Things to Consider When Deploying AI in Your Enterprise
Addressing Connectivity Challenges of Disparate Data Sources in Smart Manufac...
Develop High-bandwidth/low latency electronic systems for AI/ML application
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Smarter Manufacturing with SEMI Standards: Practical Approaches for Plug-and-...
Gartner Top 10 Strategy Technology Trends 2018
Defining a Practical Path to Artificial Intelligence
Introduction to Event Driven Architecture
How to create innovative architecture using ViualSim?
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
Event Driven Architecture (EDA), November 2, 2006
Plenary Session: application drive design alberto sv
FPGA Hardware Accelerator for Machine Learning
Connectivity challenges APC Europe by Alan Weber
Calibration of Deployment Simulation Models - A Multi-Paradigm Modelling Appr...
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
PDF
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
PDF
Data Parallel Deep Learning
PDF
Making Supernovae with Jets
PDF
Adaptive Linear Solvers and Eigensolvers
PDF
Scientific Applications and Heterogeneous Architectures
PDF
SW/HW co-design for near-term quantum computing
PDF
Deep Learning State of the Art (2020)
Major Market Shifts in IT
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Data Parallel Deep Learning
Making Supernovae with Jets
Adaptive Linear Solvers and Eigensolvers
Scientific Applications and Heterogeneous Architectures
SW/HW co-design for near-term quantum computing
Deep Learning State of the Art (2020)

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Approach and Philosophy of On baking technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Monthly Chronicles - July 2025
Advanced Soft Computing BINUS July 2025.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Approach and Philosophy of On baking technology
Spectral efficient network and resource selection model in 5G networks
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction
CIFDAQ's Market Insight: SEC Turns Pro Crypto
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...

HPC Impact: EDA Telemetry Neural Networks

  • 1. Providentia Worldwide S. Ryan Quick @phaedo, Providentia Worldwide. April 2020 HPC Impact EDA Telemetry Neural Networks
  • 3. Providentia Worldwide Systems Intelligence Principles Methodology for leveraging multiple data domains through complex data processing Disparate / Unlike Domains Messaging Middleware Insight Insight
  • 4. Providentia Worldwide • Aggregation • Event Statistics • Atomic Pattern Recognition • Simple example shown as “waterfalling” for illustration — the operations are parallel and stateless • Pattern is an example of the type and method of telemetry we use for EDA environmental and in-workload collection to feed AI and neural networks inline • There are literally thousands of metrics for a single operation, millions per job Multiple-Domain Simple Data Access Metrics Calculator CPU Event Source app login r/sec app successful login r/sec app failed login r/sec cpu 1m load avg cpu 5m load avg cpu 15m load avg cpu blocked proc cnt cpu running proc cnt cpu waiting proc cnt cpu user % cpu idle % cpu system % cpu io wait % db active queries db slow queries db selects db updates db deletes db rows fetched db table locks held db row locks held Available Source Fields App Login Event Source DB Access Event Source > 3? app failed login / app success login * 100 AVG(cpu waiting / cpu running)) / cpu 1M load avg * 100 > 0.5? DB Slow Queries > 4? Anomaly Detected: Potential Login Attack yes yes yes
  • 5. Providentia Worldwide • Affinity + Simple Case • Stream + Augmented Datasource • Parallel Stream • Frequency-Shifted Stream • “Correlative/Normalized View”: Similar to a SQL “join” concept, we relate data fields in disparate stream sources • Many examples — for other talks :) • This illustrates the mechanisms by which we can combine and augment data types for complex events in AI/neural networks and utilize inline training and active models. • Also allows us to introduce the notion of insight, which is crucial to incremental improvement model — especially for “slight touch ecosystems” like coral reefs Multiple-Domain Complex Event Processing Approaches Complex Event Processor CPU Source Zookeeper Source RabbitMQ Source Application Event Source Parallel Source Disparate Normalization Correlative/ Normalized View Correlative/ Normalized View Correlative/ Normalized View approx-data-sz avg-latency ephemeral-count followers max-fd-cnt max-latency min-latency open-fd-cnt num-alive-connections outstanding-requests packets-received packets-sent pending-syncs synced-followers watch-cnt znode-cnt Zookeeper message total message ready message unasked rate.publish rate.deliver rate.redeliver rate.confirm rate.ack connection.total connection.idle channel.total channel.publisher channel.consumer channel.duplex channel.inactive exchange.rate.phaedo q.total q.idle q.messages.phaedo q.consumers.phaedo q.memory.phaedo q.ingress.phaedo q.egress.phaedo binding.total RabbitMQ
  • 7. Providentia Worldwide HPC HTC • “High Throughput Computing” • Very predictable, common engineering pipeline • Toolset geared to repeat the steps in the pattern 100s, 1000s of times per iteration, per engineer constantly. Each adjustment cascades hundreds/ thousands of small jobs. • Jobs are very short lived. Avg time on single core is under 3s. Job scheduler itself is often a bottleneck on large, shared systems. • EDA requires multiple phases of HDL synthesizers and HLL compilers and so can result in different sorts of computational bottlenecks at different phases of the pipeline as well as resulting for different design choices in the engineering decisions. EDA Characteristics
  • 8. Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small <=1M files/dir) • user and group based access controls • POSIX, locking not required • OS scheduler is often sufficient. Sometimes, job submission separated by login node. • License model well understood, and generally by core or time-based. Codes are generally proprietary. • Turnkey deployment is up and running in minutes on nearly any sized system. Very little motivation to alter the status quo. EDA Characteristics
  • 9. Providentia Worldwide What Would it Take to Try something new? • All on-prem, w/ cloud tests successful but not adopted: • too costly • intellectual property concerns • ROI delayed • data management difficulties • Storage enhancements show improvements, and large shops adopt those, but NFS3 performs well for most small-medium practitioners. EDA Environments
  • 10. Providentia Worldwide What Would it Take to Try something new? • EDA Process is well-known, easy-to- hire to, and well-understood in the industry. Why rock the boat? • Any perturbations to the system would need to overcome the cost of change, which in semiconductor fabrication can be immense. • Even where bottlenecks are known (storage, compute, scheduling), they are understood and manageable. New is new and unpredictable with unknown value… EDA Pipelines at Scale?
  • 11. Providentia Worldwide For valuable and motivational change in semiconductor EDA, we need disruption both in behavior and environment simultaneously.
  • 12. Providentia Worldwide External focus for HTC/Systems Intelligence • Two primary mechanisms for augmenting the EDA process: Internally (inside the EDA pipeline). Externally (augmenting and enhancing the pipelining environment). We are focusing here for this project, but the usual neural network caveats apply. Neural Networks for EDA Pipelines Semiconductor Electronic Design Automation «precondition» API to workflow data Chip Specification Design entry/Functional verification RTL synthesis Partitioning of chip Design for test (DFT) insertion Floor planning Placement stage Clock tree synthesis (CTS) Routing stage Final verification GDS II Infrastructure Automation «precondition» API to all components «precondition» API backwards compatible Systems Provisioning Network Provisioning Application Deployment Configuration Management Platform Management Change Orchestration capabilities XY User/group file CRUD Workflow scheduling Job management License management sd Systems Intelligence — EDA Messaging Substrate Data Analytics Command & Control Internal External
  • 13. Providentia Worldwide Semiconductor EDA Designing the Digital Future “When we think of sensing technologies as devices that order the world, rather than devices that describe it, then alternative relationships between the social and the technical are strikingly brought to light.” — Genevieve Bell (Intel) @feraldata
  • 14. Providentia Worldwide EDA Workflow and Supporting Infrastructure SI Messaging XY User/group file CRUD Workflow scheduling Job management License management X Y sd Systems Intelligence — EDA Messaging Substrate C E P I n g e s t Data Analytics inline models offline models Atomic Pattern Recognition Parallel Stream Command & Control Stream Augmentation data/scores/metrics decisioning orchestration validation feedback Frequency-Shifted Streams Affinity Streams Aggregation/ Statistics Semiconductor Electronic Design Automation «precondition» API to workflow data Chip Specification Design entry/Functional verification RTL synthesis Partitioning of chip Design for test (DFT) insertion Floor planning Placement stage Clock tree synthesis (CTS) Routing stage Final verification GDS II Infrastructure Automation «precondition» API to all components «precondition» API backwards compatible Systems Provisioning Network Provisioning Application Deployment Configuration Management Platform Management Change Orchestration capabilities XY User/group file CRUD Workflow scheduling Job management License management X Y sd Systems Intelligence — EDA Messaging Substrate C E P I n Data Analytics inline models offline models Atomic Pattern Recognition Command & Control Stream Augmentation data/scores/metrics decisioning orchestration External Capabilities and Infrastructure EDA SI Messaging Substrate Insight Insight
  • 15. Providentia Worldwide EDA Workflow and AI/NN Frameworks Semiconductor Electronic Design Automation «precondition» API to workflow data Chip Specification Design entry/Functional verification RTL synthesis Partitioning of chip Design for test (DFT) insertion Floor planning Placement stage Clock tree synthesis (CTS) Routing stage Final verification GDS II Infrastructure Automation «precondition» API to all components «precondition» API backwards compatible Systems Provisioning Network Provisioning Application Deployment Configuration Management Platform Management Change Orchestration capabilities XY User/group file CRUD Workflow scheduling Job management License management X Y sd Systems Intelligence — EDA Messaging Substrate C E P I n Data Analytics inline models offline models Atomic Pattern Recognition Command & Control Stream Augmentation data/scores/metrics decisioning orchestration GDS II XY User/group file CRUD Workflow scheduling Job management License management sd Neural Networks sd Messaging-Based Machine Learning / AI / Neural Networks Workflow Data Analytics and Normalization Reactive Systems scoring/metrics decisioning orchestration validation feedback inline learning models Clustering, Classification, Decision Trees Insight Consumers Ecosystem Insight and KPI Enhancements Ecosystem Messaging Platform Pattern Enhancements ModelRunModelTraining Offline / replay learning models CEP/INGESTfromExisting Datasources X Y Y X External Capabilities and Infrastructure EDA ML / AI / NN Workflow SIMessagingSubstrate Insight Insight Insight
  • 16. Providentia Worldwide Unique position for AI and NN Why Artificial Intelligence/Neural Networks for this Problem? • Small, incremental human-driven changes are not cost-effective in today’s DevOps systems • Continuous observation for “minority report” style changes is difficult to design sprints and test efficacy, even harder to measure ROI • Command and control systems can be designed to allow incremental change directly from NNs based on deployments — e.g. allow each “reef” to tune itself based on its own ecosystem • The “show your work”/“show your rationale” problems are weaker in EDA compared to delivering results than in other domains
  • 17. Providentia Worldwide Insight: “looking inward” Insight provides a mechanism for self-tuning behavior of the running system at all levels: •algorithms, models, data access, expert systems, KPIs, behaviors, reports, accuracy, efficiency, even insight itself •In-built feedback mechanism for capturing behavior and performance •Mechanism to ensure that changes over time are accounted for and noticed if not understood •Allows for inline and ongoing training without having to maintain offline (and outdated) training datasets •Allows for locale-specific NN training (the NN-locale problem).
  • 18. Providentia Worldwide Program Status Where are we now? • Telemetry data from workload systems feeding messaging platform • Synthetic workload (provided from partner benchmarking suite) being modified for user- emulation • NN specific topology choice and models under discussion with wider team considering we will need to utilize simultaneous learning, model promotion, results propagation, etc. • Insight mechanisms are developed in the messaging substrate automatically, with common APIs available to higher level structures. Common reporting in dashboards etc. • Always looking for helpers to take things farther — will report more later as we (un)shelter…