Mirabilis Design- NoC Webinar- 15th-Oct 2024

Webinar:
ARM corelink, Arteris NoC, UCIe, Bunch-of-wires, CXL and PCIe-
Designing the interconnect is not for the weak-hearted!
Host:
Deepak Shankar, Vice President Technology
Mirabilis Design Inc.
Email: dshankar@mirabilisdesign.com

Agenda
Challenges
VisualSim
Solution
Extending
System
Modeling
Methodology
Experiments
The future

Explore and Measure using System-Level Exploration
NoC/
UCIe
AI Engine Tiles
Warp
Schedule
r
PE
PE
PE
PE
Local
Mem
GPU
Memory
Chiplet
ADC
DDR5
Processor subsystem
Core L1
B
u
s
SLC
Round-Trip Latency
Which one is it?
Neoverse/A720/RISC-V/Tensilica Lx8
Number and type of
GPU and TPU Cores
What is the AI
Clock Speed?
Optimal
Mesh size
Peak power
Thermal heat and temp
Management
Number
Port &
Modules
Interface
Buffer
Interconnect Speed
Scheduling and assignment
Throughput
Use benchmarks,
traffic, traces and
workloads
Buffer
Usage
Consider this SoC Architecture

Types of Experiments to be Conducted for an SoC Design
• Select interconnect- AXI vs NoC vs Crossbar vs mesh?
• Assign NoC, AHB or ACE to each level of Hierarchy?
• Commercial or custom NoC development?
• Optimize MEOSI coherence on a custom mesh to maximize cache hit-ratio?
• Deciding on monolithic vs multi-die chiplets?
• Impact of new power management on peak and total power?
• Flat vs hierarchy topology to ensure maximum memory bandwidth?
• Integrate with SoC generation configuration tools?

Enabling Customers with Full-Coverage Experiment
Block
Diagram
Model using
System-level
IP
Parameters
&
constraints
Regression
Sweep
Generate
statistics
and
specification
BLOCK METRICS
CONSTRAI
NT
CONSTRAINT
VALUE
STATISTIC
TYPE
Cache_I_1 A_Hit_Ratio >= 0.7 All
Cache_d_1 A_Miss_Ratio < 0.2 All
Cache_I_2
A_Number_Entere
d >= 175 All
Cache_SLC Buffer_Occupancy < 6 All
AXI_Top_Mast
er_1 Read_Data_Bytes >= 1.00E+07 All
CMN_XP Buffer_Overflow >= 10 All
Task_1 Latency < 1.23E-06 Mean
Task_2 Latency < 4.60E-03 Max
Task_3 Latency < 6.00E-05 Min
Cache_SLC Read_Hit_Ratio >= 0.9 All
Read_MBs_per_Se

Case Study: Data Center SoC
Design Challenges
1. What is the buffer size to prevent overflow on interconnect?
2. What is the memory throughput required to meet the goals?
3. How many Cores are required to meet 33 ms response time?
4. Should power management be Threshold, time-based, DVFS
or utilization-based?
Project Goals
1. Data Center SoC for Neural Network applications
2. Handle 30 million vertices/ second
3. Power consumption < 40W
4. Resnet 50 workload inference time < 3.2 seconds
VisualSim Solution
1. Library:
ARM A720, Cache, HBM3, DMA, CMN Cyprus,
Arteris NoC, UCIe
GPU, Sensor, Power State Machine
2. Custom model for AI braking and proximity test
3. Workload generator
For data center and automotive applications
4. Flow control and scheduling algorithms
5. Performance and power report generators
Evaluation of Constraints
Project Outcome
1. Generated component list, clock speed, bus width, buffer size and flow control
2. Expected statistics for performance, correctness and power
3. Executable specification for customer Architects to conduct trade-offs
Suggested Block Diagram
Statistics and Reports BLOCK METRICS MEASURED STATISTIC TYPE RESULT
AMBA-AXi GPU_Read_Data_Bytes
3,392,408,19
2 Max TRUE
AMBA-AXI DDR4_Bandwidth_Utilization 28% Std Deviation FALSE
NoC- CMN
System-Level
Cache_Read_Data_Bytes
3,392,128,25
6 Mean TRUE
NoC-
Arteris Read Buffer Channel Usage 32 Min TRUE
Data
Cache Hit_Ratio 89.148 Mean TRUE
Data
Cache Latency 4.79E-08 Mean FALSE
Data
Cache GB/Second 1.776 Min TRUE
Processor Context_Switch_Time 16.83 Max TRUE
Processor Application Processing Delay 3.86E-06 Min FALSE
Page_Tabl
e Memory_Used_By_TLB 128K Min FALSE
Cache Bus Request Buffer_Occupanc 440 Min FALSE
Processor Processor_Utilization 50% Max TRUE
Thermal Temperature 65C Mean FALSE
Power Peak Power for Chiplet Die 1 51W Max FALSE
Regression varying Parameters and Workloads
-Process_Node_nm 7 -Bus2_Clk_Speed 2000.0 -Core_Clk_Speed 2500.0

VisualSim Solution
VisualSim with libraries
Quickstart Training
Modeling services
Analysis and insight
Integration
The Product
The Offerings

VisualSim System-Level IP Library
VisualSim
System-Level IP
Library
Quantity and Time Queue
System Resources
Scheduler
RTOS Builder, ARINC 653, AUTOSAR
Task Graph, Workload Builder
Stochastic and Software
SoC Compute, Interconnect and Hardware
Systems and Networks
Traffic
Custom Builder
Distribution- and Trace-based
Sensors, VCD, Network, Sequence
Scripting language
RegEx
C/C++/Java/Python Wrapper
Statistics
Latency, Throughput,
Utilization, hit-ratio
Ave/peak power (instant, ave)
Heat, Temp
TSN, AVB, 10BaseT1S, Switched Ethernet
Resilient Packet Ring, RP3, WiFi 802.11
Bluetooth, PAN, Spacewire, SpaceFibre
IEEE802.1Q, Time-Triggered Ethernet
AFDX, 5G
VME, PCI/PCI-X/PCIe 6.0, CXL,
SPI 3.0, 1553B, FlexRay, CAN-FD/XL
AFDX, TTEthernet, OpenVPX
AMBA (AHB/ APB/ AXI/CHI),Tilelink
Corelink (600, 700), NoC (Generic,
Arteris), Virtual Channel, DMA,
Crossbar, Serial Switch, Bridge, UCie
CPU, DSP, GPU, TPU, MCU
ARM (M0-55), R5, Cortex (A8, A72,
A53, A76, A77, A65, A78, A720,
Neoverse V and X), Nvidia- Pascal to
Ampere, Leon, Power, X86, DSP &
ADI- TI, Tensilica- Lx8, Renesas, AI
RISC-V
SiFive
In-Order/Out-of-Order
Flash, NVMe, Disk, SSD,
NAS, Fibre Channel,
FireWire, HBM3.0, HMC
• Memory Controller, Disk, SDR
DRAM 2-5, LPDDR 2-5-X, SSD
QDR, RDRAM, MPMC, Cache,
Coherent cache
Storage and Memory
FPGA
Xilinx- Versal, Zynq, Ultrascale, Kintex
Altera-Stratix, Arria
Microsemi- Smartfusion
Programmable logic generator
Power States, Allocation
Transition, Loss, Battery
Consumption, Management
Generation, Distribution and
Thermal
Power
Communication
RF Tx/Rx, Baseband, Channels,
Analog, A/D transceivers, Antenna
Signal/audio/Image algorithms

Complete SoC System Model Solution in VisualSim
Reuse IP
Define
Hierarchy
Parameters
Power &
Thermal
Metrics
Capture
Custom IP
Builder
Debugging
& Profiling
Plotting

Three Levels of NoC Modeling
• Stochastic or queuing theory-based
• Focus on overlap latency and throughput without specific implementation
• Hybrid which is cycle-accurate but not fully pipelined
• Specific-vendor products but without the detailed underlying registers and
algorithms
• Most times, combines micro-arch for processors, cache and memory with a
slightly more abstract interconnect
• Micro-architecture
• Detailed implementation of a specific-vendor or custom product
• All modeling devices are functionality accurate

Stochastic NoC- Flow Control Modeling

Hybrid NoC Modeling- Vendor-specific Arteris NoC

Micro-Architecture Modeling of the Custom NoC
Statistics
NIU_INIU_00_Flits_Initiated = 100,
NIU_INIU_01_Request_Throughput_MBps = 800.0008,
NIU_INIU_02_Response_Throughput_MBps = 692.000,
NIU_INIU_03_Read_Request_Initiated = 100,
NIU_INIU_04_Read_Response_received = 43,
NIU_INIU_05_Write_Request_Initiated = 0,
NIU_INIU_06_Write_Response_received = 0,
NIU_INIU_07_Total_Read_Request_Bytes = 800,
NIU_INIU_08_Total_Write_Request_Bytes = 0,
NIU_INIU_09_Total_Request_Bytes = 800,
NIU_INIU_10_Total_Response_Bytes = 692,
NIU_INIU_13_Request_Buffer_overflow = 0,
NIU_INIU_14_Packets_Waiting_in_Request_Buffer = 0,
NIU_INIU_15_Packets_Waiting_in_ROB_Buffer = 0}
{NIU_TNIU_00_Flits_Responded = 178,
NIU_TNIU_01_Response_Throughput_MBps = 712.000,
NIU_TNIU_02_Request_Throughput_MBps = 1600.00,
NIU_TNIU_03_Read_Response_completed = 92.0,
NIU_TNIU_04_Read_Request_received = 200.0,
NIU_TNIU_05_Write_Response_completed = 0.0,
NIU_TNIU_06_Write_Request_received = 0.0,
NIU_TNIU_07_Total_Request_Bytes = 1600,
NIU_TNIU_08_Total_Response_Bytes = 712,
NIU_TNIU_09_Response_Buffer_overflow = 0,
NIU_TNIU_10_Packets_Waiting_in_Response_Buffer
= 6,
NIU_TNIU_11_Packets_Waiting_in_ROB_Buffer = 0}
Debugging
Master Flow control block (Master_1) ::: adding packet to the Queue
Packet details ::: ID = 99, A_Address = 5568517352L,
Master Flow control block (Master_1) ::: sending packet out
Tracing the Activity
Time_Array = {4.305E-7, 4.33E-7, 4.34E-7, 4.34E-7, 4.34E-7,
4.35E-7, 4.35E-7, 4.35E-7, 4.36E-7, 4.36E-7, 4.36E-7, 4.36E-7, 4.36E-7,
4.36E-7, 4.36E-7, 4.38E-7, 4.38E-7, 4.39E-7, 4.39E-7, 4.39E-7, 4.4E-7,
4.4E-7, 4.4E-7, 4.41E-7, 4.41E-7, 4.41E-7, 4.41E-7, 4.41E-7, 4.41E-7,
9.7286E-7, 9.7286E-7, 9.86923E-7, 9.86923E-7, 9.86923E-7, 9.89E-7,
9.9E-7, 9.91E-7, 9.92E-7, 9.92E-7, 9.92E-7, 9.92E-7, 9.94E-7, 9.95E-7,
9.96E-7},
Trace_Array = {"INIU2_Request_Queue_in",
"MUX_1_Port_2_in", "MUX_1_out", "Buffer_1_in",
"Buffer_Buffer_1_in", "Buffer_Buffer_1_out", "Buffer_1_out",
"DEMUX_1_in", "DEMUX_1_out", "TNIU_Req_in",
"TNIU_ROB_Queue_in", "TNIU_ROB_Queue_out", "TNIU_Req_out",
"INIU3_Req_in", "INIU3_Request_Queue_in", "INIU3_Req_out",
"MUX_3_Port_1_in", "MUX_3_out", "Buffer_3_in",
"Buffer_Buffer_3_in", "Buffer_Buffer_3_out", "Buffer_3_out",
"DEMUX_3_in", "DEMUX_3_out", "TNIU3_Req_in",
"TNIU3_ROB_Queue_in", "TNIU3_ROB_Queue_out",
"TNIU3_Req_out", "DRAM_in", "LPDDR_Scheduler_in",
"LPDDR_Scheduler_out", "DRAM_out", "TNIU3_Resp_in",
"TNIU3_Response_Queue_in", "TNIU3_Resp_out",
"Buffer_Buffer_4_in", "Buffer_Buffer_4_out", "INIU3_Resp_in",
"INIU3_Resp_out", "TNIU_Resp_in", "TNIU_Response_Queue_in",
"TNIU_Resp_out", "Buffer_Buffer_2_in", "Buffer_Buffer_2_out"}}

VisualSim Model- Imported using Configuration files

Model Statistics
Cache_SLC_1_A_Hit_Ratio = 0.0,
Cache_SLC_1_A_Miss_Ratio = 100.0,
Cache_SLC_1_A_Number_Entered = 1386L,
Cache_SLC_1_A_Number_Returned = 1386L,
Cache_SLC_1_A_Prefetch_Completed = 1L,
Cache_SLC_1_A_Prefetch_Issued = 1L,
Cache_SLC_1_A_Prefetch_Useful = 0L,
Cache_SLC_1_Buffer_Occupancy = 0,
Cache_SLC_1_Buffer_Overflow = 0L,
Cache_SLC_1_Latency_Avg = 1.6392367965368E-7,
Cache_SLC_1_Latency_Max = 4.3999899999999E-7,
Cache_SLC_1_Latency_Min = 5.7111999999999E-8,
Cache_SLC_1_Read_MBs_per_Second = 837.1201674240335,
Cache_SLC_1_Total_Cache_Lines_Evicted = 0L,
Cache_SLC_1_Total_Cache_Lines_Write_Backed = 0L,
Cache_SLC_1_Total_MBs = 0.177472,
Cache_SLC_1_Total_MBs_per_Second = 3549.440709888142,
Cache_SLC_1_Utilization = 2.7733338880003
MC_DRAM_DRAM_1_00_Total_Requests = 136,
MC_DRAM_DRAM_1_01_Completed_Requests = 136,
MC_DRAM_DRAM_1_02_Total_MB_per_Second = 87.1179217539336,
MC_DRAM_DRAM_1_03_Total_Bytes = 8704,
MC_DRAM_DRAM_1_04_Read_Bytes = 8704,
MC_DRAM_DRAM_1_06_Read_MB_per_Second = 87.1179217539336,
MC_DRAM_DRAM_1_10_Max_Queue_Usage = 2,
MC_DRAM_DRAM_1_12_Queue_Removal_Position = {136, 0, 0, 0},
MC_DRAM_DRAM_1_21_Total_Activates = 148,
MC_DRAM_DRAM_1_22_Total_Precharges = 147,
MC_DRAM_DRAM_1_23_Total_RRD_L_S = {{0, 0}, {12, 0}},
MC_DRAM_DRAM_1_24_Total_CCD_L_S = {{135, 135}, {0, 0}},
MC_DRAM_DRAM_1_25_Total_WTR_L_S = {{0, 0}, {0, 0}},
MC_DRAM_DRAM_1_26_Total_RTP_WR_RAS_RTW = {{135, 132}, {0, 0}, {135, 122}, {0, 0}},
MC_DRAM_DRAM_1_27_Refresh_Percent = 1.920016,
MC_DRAM_DRAM_1_28_DRAM_Delay_Min = 1.4416999999991E-8,
MC_DRAM_DRAM_1_29_DRAM_Delay_Max = 1.4417000000004E-8,
MC_DRAM_DRAM_1_30_DRAM_Delay_Mean = 1.4417E-8,
MC_DRAM_DRAM_1_31_DRAM_Delay_StDev = 6.9311872118497E-16}
PCIe_Switch_PCIe_Switch_1_Port_1_Rx_MBps = 870.2400087024001,
PCIe_Switch_PCIe_Switch_1_Port_1_Total_MBps = 1740.4800174048003,
PCIe_Switch_PCIe_Switch_1_Port_1_Tx_MBps = 870.2400087024001,
PCIe_Switch_PCIe_Switch_1_Port_1_to_Port_7_Max_Latency = 8.6170000000051E-9,
PCIe_Switch_PCIe_Switch_1_Port_1_to_Port_7_Mean_Latency = 4.8814421768711E-9,
PCIe_Switch_PCIe_Switch_1_Port_1_to_Port_7_Min_Latency = 4.7219999999969E-9,
{CMN600_RND_1_Max_End_to_End_Latency = 6.55668E-7,
CMN600_RND_1_Max_Network_Latency = 8.4334000000002E-8,
CMN600_RND_1_Mean_End_to_End_Latency = 3.6446105555556E-7,
CMN600_RND_1_Mean_Network_Latency = 5.6137656565657E-8,
CMN600_RND_1_Min_End_to_End_Latency = 2.37999E-7,
CMN600_RND_1_Min_Network_Latency = 4.9999999999994E-8,
CMN600_RND_2_Max_End_to_End_Latency = 4.7366499999999E-7,
CMN600_R_0_0_EAST_In_Buffer_Max_Buffer_Occupancy = 18.0,
CMN600_R_0_0_SOUTH_In_Buffer_Max_Buffer_Occupancy = 8.0,
CMN600_R_0_1_EAST_In_Buffer_Max_Buffer_Occupancy = 11.0,
CMN600_R_0_1_SOUTH_In_Buffer_Max_Buffer_Occupancy = 21.0,
CMN600_R_0_1_WEST_In_Buffer_Max_Buffer_Occupancy = 5.0,
DS_Name = "CMN600_Stats"}
{ACE_Bus_1_Master_1_Read_Data_Bytes = 7936,
ACE_Bus_1_Master_1_Read_Data_MBps = 79.3600007936,
ACE_Bus_1_Slave_1_BW_Utilization_Prct = 0.6613333399467,
ACE_Bus_1_Slave_1_Read_Data_Bytes = 7936,
ACE_Bus_1_Slave_1_Read_Data_MBps = 79.3600007936}
{ACE_Bus_1_Slave_1_Rd_Threshold_Usage = 2.0,
ACE_Bus_1_Slave_1_Rd_Transactions = 124}

Reference Data:
High-Performance Multi-Die Exploration

Block Diagram and VisualSim Model

Experiments
Exp Description Mean
Latency
Mean
RNF
Latency
DRAM-
MBps
LLC
Cache-
MBps
L2 Cache
Buffer
Overflow
Maximum
UCIe
MBps
Maximim
Power
(Watts)
1 Traffic rate= 1.0e-7, 8 Cores per Cluster, Sequential
Addresses, AXI and L2 Cache= 4200Mhz , DRAM and
Memory Controller= 2400Mhz, CMN
frequency=1200Mhz, CMN Buffer capacity= 1000, AXI
read and write threshold= 250
4.105e-7 1.355e-8 728.903 3200.006 461L 12481.600 22
2 Traffic rate= 1.0e-7, 8 Cores per Cluster, Random
9.8718e-7 1.72e-8 663.212 2432.004 367L 12816.001 17
6.136e-7 1.250e-8 663.212 3200.006 293L 12475.200 17
5.06e-7 1.249e-8 663.212 2688.005 464L 12558.400 17
7.395e-7 1.28e-8 663.212 2688.005 0L 12176.001 17

RNF Latencies
Exp Description RNF Latency (Maximum End to End Latency)
1 Traffic rate= 1.0e-7, 8 Cores per Cluster, Sequential Addresses, AXI and L2
Cache= 4200Mhz , DRAM and Memory Controller= 2400Mhz, CMN
frequency=1200Mhz, CMN Buffer capacity= 1000, AXI read and write
threshold= 250
[ RNF_1 = 4.9998000000001E-9, RNF_2 = 6.6664000000001E-9, RNF_3 = 4.9998000000001E-9, RNF_4 = 1.04988E-8,
RNF_5 = 1.16662E-8, RNF_6 = 9.9996E-9, RNF_7 = 1.49994E-8, RNF_8 = 1.33919E-8, RNF_9 = 1.6786E-8, RNF_10 =
1.33328E-8, RNF_11 = 9.9996E-9, RNF_12 = 1.40361E-8, RNF_13 = 7.36696E-8, RNF_14 = 6.95799E-8, RNF_15 =
6.94706E-8, RNF_16 = 6.76522E-8 ]
2 Traffic rate= 1.0e-7, 8 Cores per Cluster, Random Addresses, AXI and L2
threshold= 250
[ RNF_1 = 4.9998000000001E-9, RNF_2 = 1.33328E-8, RNF_3 = 5.6658E-9, RNF_4 = 7.582E-9, RNF_5 = 1.16662E-8,
RNF_6 = 9.9996E-9, RNF_7 = 1.49994E-8, RNF_8 = 1.39988E-8, RNF_9 = 2.12506E-8, RNF_10 = 1.33328E-8, RNF_11 =
8.58516E-8 ]
threshold= 250
[ RNF_1 = 4.9998000000001E-9, RNF_2 = 6.6664000000001E-9, RNF_3 = 4.9998000000001E-9, RNF_4 =
7.32668E-8, RNF_15 = 8.03479E-8, RNF_16 = 6.83493E-8 ]
threshold= 250
[ RNF_1 = 4.9998e-09, RNF_2 = 6.6664e-09, RNF_3 = 4.9998e-09, RNF_4 = 6.6664e-09, RNF_5 = 1.21021e-08, RNF_6 =
1.07689e-08, RNF_7 = 1.49994e-08, RNF_8 = 1.41021e-08, RNF_9 = 1.70836e-08, RNF_10 = 1.36022e-08, RNF_11 =
6.84963e-08 ]
threshold= 100
[ RNF_1 = 4.9998e-09, RNF_2 = 6.6664e-09, RNF_3 = 4.9998e-09, RNF_4 = 6.6664e-09, RNF_5 = 1.21021e-08, RNF_6 =
6.84963e-08 ]

Latency of Processor Request Per Cluster
Experiment 3 Latency_Cluster_1 Experiment 4 Latency_Cluster1
Experiment 1 Latency_Cluster_1 Experiment 2 Latency_Cluster1
Experiment 5 Latency_Cluster_6

64 Tile SOC
• In the 64 tile SOC, We have 8 clusters
and each cluster contains 8 cores.
• There are 8 cores per clusters. The
addresses are generated randomly
and sequentially. These packets are
going to cache through AXI bus.

Interconnect Architecture
• 64 core SOC Die is connected
via 4 UCIe ports to two Dies
• Die 2 and 3 have an
SLC cache and the
DRAMs

Power Exploration
Experiment 3 Power_DIE_1 Experiment 4 Power_DIE_1
Experiment 1 Power_DIE_1 Experiment 2 Power_DIE_1
Experiment 5 Power_DIE_1

System-Level Verification
of Automotive and
Defense SYstems
System Integration of SoC using Chiplet

System Overview
Gateway
Transfer messages between different CAN
and TSN networks
CAN Bus
CAN bus is the network that connects
sensors and ECU’s
TSN Switch
STN bus is the network that connects
High-Performance Cameras, Lidars and
Servers
Wheel
1
Wheel
4
Wheel
3
Wheel
2
Gateway
CAN
Bus
Engine
Proximity
Sensor
Brake
Pedal
Gyro
Sensor
Road
condition
sensor
TSN
Bus
CAN
Bus
ECU

Embedding the SoC in an Automotive Application for Testing

Evaluation of Chiplet Performance in Automotive Application

Power Modeling and Future Innovation

Power
Generation
Power
Storage
Power
Consumption
Thermal
Management
• Different charging schemes
• Impact of surge and shocks
• Battery Lifecycle
• Battery Consumption
• Statistics
• Heat and
temperature
• Impact of
cooling strategy
• Add impact of
power spikes
• State based power consumption
of electronics (controller, SOC)
and Mechanical (brakes, wheels)
• Average, instant and Cumulative
• Power per device and application
Verification and Debugging
• 4 Types of Power
Generators in VisualSim
• Constant, variable, motor,
solar charge
• Charge sent to battery
1 2 3 5
6
• Optimize and test the power management algorithms
• Sizing of power generators and battery
• Estimate power consumed by the software application
Downstream Integration
• Integrate with physical hardware
• Generate UPF file with power domains
• Generate SystemVerilog power testbench
7
Power
Management
• Change in power
state controlled by
time, utilization,
temperature and
expected activity
4
Generate Power and Thermal Characteristic

Behavior Task Graph
Power Table
Power management Unit
SystemVerilog Output for Power System Test
VCD Waveform for Verification
create_power_domain PD_Top -include_scope
create_power_domain -name PD_1_2.0 -elements {"CLKMUX"}
create_power_domain -name PD_1_1.0 -elements {"PLL","G2","G3"}
create_power_domain -name PD_1_3.0 -elements {"PROC"}
create_supply_port -port VDD_1.0 -direction in -domain PD_Top
create_supply_port -port VSS_0.0 -direction in -domain PD_Top
create_supply_net VDD_1.0 -domain PD_Top
create_supply_net VSS_0.0 -domain PD_Top
connect_supply_net VDD_1.0 -ports VDD_1.0
connect_supply_net VSS_0.0 -ports VSS_0.0
add_power_state PD_1_2.0 -state Active
{-supply_expr (VDD_2.0 == {ON, 2.0}) && (VSS_0.0 =={ON,0.0})}
add_power_state PD_1_2.0 -state
OFF {-supply_expr (VDD_2.0 == {OFF, 0.0}) && (VSS_0.0 =={ON,0.0})}
add_power_state PD_1_1.0 -state OFF
{-supply_expr (VDD_1.0 == {OFF, 0.0}) && (VSS_0.0 =={ON,0.0})}
add_power_state PD_1_3.0 -state OFF
{-supply_expr (VDD_3.0 == {OFF, 0.0}) && (VSS_0.0 =={ON,0.0})}
Power Modeling Integration

AI-based Simulation for Rapid System Exploration
• Run number 19 – clock
frequency at 1000 MHz satisfied
the performance requirements
we had set.
• Since the frequency was
increased from 600 MHz, the
total power consumption went
up while running the system at
1000 MHz
• Architect can evaluate
different processing
resources – DSP vs Xeon
cores vs Power cores if
they have stringent power
thresholds
Requirements being evaluated for each simulation
run in the parameter sweep
Overall Results – We can identify the simulation runs which
meet the requirements and select the right configuration
after considering cost vs performance trade-offs

System Verification
• Generate test cases and compare RTL
• Performance, Power and Functionality
• Validate product not just HW/SW
• Application relevant test vectors
• Link to board, emulators and
instruments
Golden
Reference
Comparator
Match Tag
Architecture
model of IP
Verilog/C/
emulation

Enable Better Products
• One Product- One Model for power, performance and functionality
• Complete library of System-level IP library and IP Builders for
software, systems, networks and missions
• AI-based Regression tool linked to Requirements
• Proven to eliminate 90% of the bottlenecks prior to development
• Demonstrated over 40% schedule savings

Mirabilis Design- NoC Webinar- 15th-Oct 2024

More Related Content

Similar to Mirabilis Design- NoC Webinar- 15th-Oct 2024 (20)

More from Deepak Shankar (20)

Recently uploaded (20)

Mirabilis Design- NoC Webinar- 15th-Oct 2024