SlideShare a Scribd company logo
7
Most read
8
Most read
12
Most read
NPU IP Hardware Shaped Through
Software and Use-Case Analysis
Yair Siegel
Senior Director Wireless and Emerging Markets
Ceva
40-50 licensing deals annually
>70 royalty paying customers
>100 active customers
>200 registered patents
NASDAQ:CEVA
>$100m annual revenue
$164m cash, no debt
~450 employees (~75% R&D)
HQ in Maryland, main R&D Centres:
U.S., France, Israel, Greece, Serbia
Trusted partner for over 2 decades
>19bn Ceva-powered devices shipped
>2bn Ceva-powered chips shipped annually
#1 worldwide wireless connectivity IP,
67% market share*
Edge AI NPUs portfolio, scalable from
Embedded ML up to GenAI
Company Overview
www.ceva-ip.com 2
*Source: IPNest’s latest Design IP report – 2023 (published May 2024)
©2025 Ceva Inc.
Embedded ML Applications for Consumer & Industrial IoT
©2025 Ceva Inc. 3
Voice: keyword spotting (KWS), voice biometrics,
sound detection & classification, environmental
noise cancellation (ENC)
Vision: object detection, image classification,
always-on human presence detection or similar
contactless recognition
Predictive Maintenance: vibration, temperature,
humidity, sound sensing for predictive
maintenance
Health & Fitness Sensing: physical activity
tracking, heart rate monitoring, sleep pattern
analysis
End-User
Devices:
Embedded ML applications typically consume
<1 Watt and support 10’s of GOPs of compute
Typical Technical Requirements for Embedded ML Deployment
Memory Footprint
• < 10 MB Flash/ROM/RAM size
• < 500 KB code + dynamic data memories
Model Size
• 0.01 MB to 10 MB memory required for
the model weights (aka parameters)
Key requirement: Easily deployable on battery powered and resource limited devices,
to reduce deployment costs and maximize value of Edge AI
Power Consumption
• Optimized for Low Power < 10 mW
• Enable battery-powered devices
• Minimize device recharges
Computational Requirements
• Typical computation: 10s of GOPS or more
• Deployable on resource constrained
hardware (e.g., MCU)
©2025 Ceva Inc. 4
Embedded ML solutions require flexible and scalable architectures delivering optimal
balance of performance, size, & power efficiency, with a complete AI SDK
Embedded ML Implementation Challenges
Rapid Technology Evolution
New use cases, networks and data types
Ultra Low Power Requirements
Always-on, battery powered devices
Low-Cost Expectations
Small memory size & die-size needed for proliferation
Complex Software Infrastructure
AI frameworks, proprietary silicon, and varied
networks
Full Hardwired NPUs
Can’t cope well with new networks or data types
Made for very specific tasks with no upgrade path
MCUs or DSPs plus separate NPU
Multi-core solution yields sub-optimal area & cost
MCUs / DSPs not ML optimized ->
poor in power consumption and performance
Complex integration, SW, memory management
Key Challenges Existing Solutions
©2025 Ceva Inc. 5
Ceva-NeuPro-Nano Embedded ML NPU: Design Guidelines
• Shaped by deep analysis of user perspectives, recognizing need for powerful and user-friendly
solution
• Design philosophy guided by application-level challenges, vs. neural-net layer level challenges
• Approach ensures 3 major workloads can be handled efficiently and seamlessly:
• Neural network workloads
• DSP workloads
• Control workloads
©2025 Ceva Inc. 6
Ceva-NeuPro-Nano: Software First Approach,
Designed to Address Embedded ML Market Needs
©2025 Ceva Inc. 7
Si Single Core Efficient & Flexible Compute
• End-to-end AI application on single core
• Efficiently executes various NN architectures,
operators, feature extraction, DSP and control code
• DSP built into NPU architecture
• NeuPro-Nano not a HW accelerator DSP+HWA
• Mixed math precision: supports 4/8/16/32-bit
integer as well as floating point math
• Transformers & SLMs inherent support
• Weight de-compression on-the-fly
• Sparsity compression
Programmable & Extensible Fast TTM to Deploy
1 2
3 4
• Enables support for new AI frameworks, NN
architectures and operators
• Supports future new DSP and NN enabled
applications
• All in software  no hardware changes
• Robust AI SDK solution, Ready for 3rd party
IDE/SDK integration
• No-friction business model for deployment
• Strong ecosystem of AI software and
development companies
Complete End-to-End AI Application on a Single Core
• Typical Embedded ML applications constructed from feature extraction & NN layers
• Each block consumes substantial resources
• Single core Edge NPU for complete Embedded ML applications
• Handles control code, NN layers and feature extraction (Signal Processing - MFCC) on same
processor
Ceva-NeuPro-Nano
MCU
Sensor
Sensor
Interface
Feature
Extraction
Classifier
NN
Model
Feature
Extraction
Mask
Generati
on NN
Model
Signal
Filtering
App A
App B
Wake-up signal
(When event is detected)
Filtered signal
(Application Dependent)
Sensor Data
©2025 Ceva Inc. 8
Complete AI Application on a Single Core
• Architecture minimizes die size and memory utilization by efficiently processing all
application workloads on a single core
• NN layers include Fully Connected, RNN, Attention
• Signal Processing: pre & post-processing (e.g., vision networks), feature extraction (STFT, iSTFT and
MFCC)
Single core, future compatible NPU ensures high efficiency
on NN layers and Feature Extraction workloads
36%
64%
Ceva-ClearVox Control
(cycles partition)
Signal Processing
NN Layers
68%
32%
Ceva-ClearVox ENC
(cycles partition)
Signal Processing
NN Layers
©2025 Ceva Inc. 9
Ceva’s complete AI based applications:
• Ceva-ClearVox™ Control - Wake Word
and Commands (Amazon AVS qualified)
• Ceva-ClearVox™ ENC - Environmental
Noise Cancellation for crisp calls in any
conditions
Software First Approach Principles
Main principles followed:
1. Software requirements drive hardware
architecture
2. Prioritize hardware flexibility and
programmability
3. Design for end-to-end system efficiency
• Prioritize application-level performance
• Consider data transfers and memory hierarchies
10
©2025 Ceva Inc.
Software First Approach Requirements
Driving hardware architecture decisions through software requirements
Define
Application
Domain
Embedded
ML
Resource
Constrained
Battery
Powered
ML Models
Workloads
AI
DSP
Control
Identify
Critical
Components
Compute
Patterns
Memory
Access
Model
Deployment
Requirements
App Level
Performance
Optimizer &
Profiler
Extensible
Layer
Support
Design
Hardware
Advanced
Caches
Non Linear
Activations
Fully
Programmable
©2025 Ceva Inc. 11
User
Experience
Requirements
Short TTM
Inference
Framework
Support
Easy Debug
Software First Approach
Prioritize hardware flexibility and programmability over pure performance
Software Capabilities
Optimized NN Layers
Software Centric
Coding
Minimal Run-time
and Application Code
Minimal Memory
Management Code
Hardware Features
Fully Programmable
Processor
Non-Linear
Activations
Advanced Cache
Mechanisms
©2025 Ceva Inc. 12
Application-Level vs Layer-Level Optimization
• Minimize total application compute
• Control and DSP workflows are major compute
consumers, handled within NPU
• Easy & efficient new operators support via
software
• Unsupported operators not a bottleneck
Application-Level (typical for processors) Layer-Level (typical for accelerators)
• Minimize layer level compute
• Control and DSP workflows are major compute
consumers, handled outside NPU (adding
latency)
• New operators support requires hardware
modification or executed by MCU offloading
• Unsupported operators become a bottleneck
(MCU may incur severe compute penalty and
memory transfers latency)
©2025 Ceva Inc. 13
Ceva-NeuPro Studio Overview
Comprehensive AI SDK uniquely accelerating OEM and semiconductor ML product design & deployment
Interfaces Leading Industry AI Tools
Complete Development Tools
Model Optimizations & Deployment
Pre-Optimized Models
• BYOM
• MATLAB connection
• Edge Impulse & NVIDIA TAO
• Eclipse / Visual Studio IDE
• AI Model Viewer - Netron
• C/C++ toolchain
• Extendable tools
• Graph compiler & runtime
• AI model & app profiling
• Model optimization & services
• Bring your own operator
• Model zoo
• Ecosystem optimized models
• NN libraries
• Domain specific libs & algos
©2025 Ceva Inc. 14
Ceva-NeuPro Studio AI Model Deployment Flow
15
Compile
Model Zoo
Data Set HW Config
Profile
Infer
NN Model/s AI Application
BYOM
Model/s
Application
Runtime Libs
Ceva Optimized
Libraries (DSP, NN,
domain specific)
Micro
Arch Planner
NN Graph Compiler
C/C++ Compiler
Execute
Debug
Profile
NeuPro Studio
C/C++ User Code
Image
Recognition
Classification Segmentation
Object
Detection
Voice
Analysis
Anomaly
Detection
GenAI Multimodal
©2025 Ceva Inc.
Summary
• Hardware design balancing power, performance, and ease of use achieved through deep
internalization of software requirements:
• Real world applications and use cases
• Emerging technologies and trends
• Programmer pain points
©2025 Ceva Inc. 16
Ceva-NeuPro-Nano: Software First NPU Design
Ceva-NeuPro-Nano Information - https://www.ceva-ip.com/product/ceva-neupro-nano/
Ceva-NeuPro Studio SDK - https://www.ceva-ip.com/product/ceva-neupro-studio/
Tech Insights, Microprocessor Report by Dylan McGrath - https://www.ceva-ip.com/wp-
content/uploads/Ceva-NPU-Core-Targets-TinyML-Workloads.pdf
Google LiteRT for Microcontrollers - https://ai.google.dev/edge/litert/microcontrollers/
Edge AI Foundation - https://www.edgeaifoundation.org/
Alliance Dev Tools - https://www.edge-ai-vision.com/resources/technologies/development-tools/
17
Resources
©2025 Ceva Inc.
2024 IoT Edge Computing Excellence Award The Best IP/ Processor of the Year 2024 award at the
prestigious EE Awards Asia event
Ceva-NeuPro-Nano NPU Already Won Industry Awards
©2025 Ceva Inc. 18
Q&A
©2025 Ceva Inc. 19
Thank You!
For more info: yairs@ceva-ip.com

More Related Content

PPTX
AI Hardware Landscape 2021
PDF
FPGA Hardware Accelerator for Machine Learning
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
PDF
Vertex Perspectives | AI Optimized Chipsets | Part II
PDF
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
PDF
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
PDF
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
PDF
Implementing AI: Running AI at the Edge
 
AI Hardware Landscape 2021
FPGA Hardware Accelerator for Machine Learning
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Vertex Perspectives | AI Optimized Chipsets | Part II
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
Implementing AI: Running AI at the Edge
 

Similar to “NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentation from Ceva (20)

PDF
Vertex Perspectives | AI Optimized Chipsets | Part III
PDF
AI Crash Course- Supercomputing
PDF
Machine Learning Challenges and Opportunities in Education, Industry, and Res...
PDF
In datacenter performance analysis of a tensor processing unit
PDF
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
PDF
Deep learning: Hardware Landscape
PDF
“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...
PDF
Vertex Perspectives | AI-optimized Chipsets | Part I
PDF
Vertex perspectives ai optimized chipsets (part i)
PDF
HKG18-312 - CMSIS-NN
PDF
AI Chip Trends and Forecast
PPTX
Tensor Processing Unit (TPU)
PPTX
AI Hardware
PDF
Leading Research Across the AI Spectrum
PPTX
realtime_ai_systems_academia.pptx
PPTX
Innovation with ai at scale on the edge vt sept 2019 v0
PPTX
Computer Design Concepts for Machine Learning
PDF
Smart Data Slides: Emerging Hardware Choices for Modern AI Data Management
PDF
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
PPTX
IoT Week 2021_Jens Hagemeyer presentation
Vertex Perspectives | AI Optimized Chipsets | Part III
AI Crash Course- Supercomputing
Machine Learning Challenges and Opportunities in Education, Industry, and Res...
In datacenter performance analysis of a tensor processing unit
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
Deep learning: Hardware Landscape
“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...
Vertex Perspectives | AI-optimized Chipsets | Part I
Vertex perspectives ai optimized chipsets (part i)
HKG18-312 - CMSIS-NN
AI Chip Trends and Forecast
Tensor Processing Unit (TPU)
AI Hardware
Leading Research Across the AI Spectrum
realtime_ai_systems_academia.pptx
Innovation with ai at scale on the edge vt sept 2019 v0
Computer Design Concepts for Machine Learning
Smart Data Slides: Emerging Hardware Choices for Modern AI Data Management
ML for embedded systems at the edge - NXP and Arm - FINAL.pdf
IoT Week 2021_Jens Hagemeyer presentation
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Ad

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Empathic Computing: Creating Shared Understanding
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Spectral efficient network and resource selection model in 5G networks
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
A Presentation on Artificial Intelligence
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
Empathic Computing: Creating Shared Understanding
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Electronic commerce courselecture one. Pdf
Encapsulation_ Review paper, used for researhc scholars
Spectral efficient network and resource selection model in 5G networks
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
NewMind AI Monthly Chronicles - July 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf

“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentation from Ceva

  • 1. NPU IP Hardware Shaped Through Software and Use-Case Analysis Yair Siegel Senior Director Wireless and Emerging Markets Ceva
  • 2. 40-50 licensing deals annually >70 royalty paying customers >100 active customers >200 registered patents NASDAQ:CEVA >$100m annual revenue $164m cash, no debt ~450 employees (~75% R&D) HQ in Maryland, main R&D Centres: U.S., France, Israel, Greece, Serbia Trusted partner for over 2 decades >19bn Ceva-powered devices shipped >2bn Ceva-powered chips shipped annually #1 worldwide wireless connectivity IP, 67% market share* Edge AI NPUs portfolio, scalable from Embedded ML up to GenAI Company Overview www.ceva-ip.com 2 *Source: IPNest’s latest Design IP report – 2023 (published May 2024) ©2025 Ceva Inc.
  • 3. Embedded ML Applications for Consumer & Industrial IoT ©2025 Ceva Inc. 3 Voice: keyword spotting (KWS), voice biometrics, sound detection & classification, environmental noise cancellation (ENC) Vision: object detection, image classification, always-on human presence detection or similar contactless recognition Predictive Maintenance: vibration, temperature, humidity, sound sensing for predictive maintenance Health & Fitness Sensing: physical activity tracking, heart rate monitoring, sleep pattern analysis End-User Devices: Embedded ML applications typically consume <1 Watt and support 10’s of GOPs of compute
  • 4. Typical Technical Requirements for Embedded ML Deployment Memory Footprint • < 10 MB Flash/ROM/RAM size • < 500 KB code + dynamic data memories Model Size • 0.01 MB to 10 MB memory required for the model weights (aka parameters) Key requirement: Easily deployable on battery powered and resource limited devices, to reduce deployment costs and maximize value of Edge AI Power Consumption • Optimized for Low Power < 10 mW • Enable battery-powered devices • Minimize device recharges Computational Requirements • Typical computation: 10s of GOPS or more • Deployable on resource constrained hardware (e.g., MCU) ©2025 Ceva Inc. 4
  • 5. Embedded ML solutions require flexible and scalable architectures delivering optimal balance of performance, size, & power efficiency, with a complete AI SDK Embedded ML Implementation Challenges Rapid Technology Evolution New use cases, networks and data types Ultra Low Power Requirements Always-on, battery powered devices Low-Cost Expectations Small memory size & die-size needed for proliferation Complex Software Infrastructure AI frameworks, proprietary silicon, and varied networks Full Hardwired NPUs Can’t cope well with new networks or data types Made for very specific tasks with no upgrade path MCUs or DSPs plus separate NPU Multi-core solution yields sub-optimal area & cost MCUs / DSPs not ML optimized -> poor in power consumption and performance Complex integration, SW, memory management Key Challenges Existing Solutions ©2025 Ceva Inc. 5
  • 6. Ceva-NeuPro-Nano Embedded ML NPU: Design Guidelines • Shaped by deep analysis of user perspectives, recognizing need for powerful and user-friendly solution • Design philosophy guided by application-level challenges, vs. neural-net layer level challenges • Approach ensures 3 major workloads can be handled efficiently and seamlessly: • Neural network workloads • DSP workloads • Control workloads ©2025 Ceva Inc. 6
  • 7. Ceva-NeuPro-Nano: Software First Approach, Designed to Address Embedded ML Market Needs ©2025 Ceva Inc. 7 Si Single Core Efficient & Flexible Compute • End-to-end AI application on single core • Efficiently executes various NN architectures, operators, feature extraction, DSP and control code • DSP built into NPU architecture • NeuPro-Nano not a HW accelerator DSP+HWA • Mixed math precision: supports 4/8/16/32-bit integer as well as floating point math • Transformers & SLMs inherent support • Weight de-compression on-the-fly • Sparsity compression Programmable & Extensible Fast TTM to Deploy 1 2 3 4 • Enables support for new AI frameworks, NN architectures and operators • Supports future new DSP and NN enabled applications • All in software  no hardware changes • Robust AI SDK solution, Ready for 3rd party IDE/SDK integration • No-friction business model for deployment • Strong ecosystem of AI software and development companies
  • 8. Complete End-to-End AI Application on a Single Core • Typical Embedded ML applications constructed from feature extraction & NN layers • Each block consumes substantial resources • Single core Edge NPU for complete Embedded ML applications • Handles control code, NN layers and feature extraction (Signal Processing - MFCC) on same processor Ceva-NeuPro-Nano MCU Sensor Sensor Interface Feature Extraction Classifier NN Model Feature Extraction Mask Generati on NN Model Signal Filtering App A App B Wake-up signal (When event is detected) Filtered signal (Application Dependent) Sensor Data ©2025 Ceva Inc. 8
  • 9. Complete AI Application on a Single Core • Architecture minimizes die size and memory utilization by efficiently processing all application workloads on a single core • NN layers include Fully Connected, RNN, Attention • Signal Processing: pre & post-processing (e.g., vision networks), feature extraction (STFT, iSTFT and MFCC) Single core, future compatible NPU ensures high efficiency on NN layers and Feature Extraction workloads 36% 64% Ceva-ClearVox Control (cycles partition) Signal Processing NN Layers 68% 32% Ceva-ClearVox ENC (cycles partition) Signal Processing NN Layers ©2025 Ceva Inc. 9 Ceva’s complete AI based applications: • Ceva-ClearVox™ Control - Wake Word and Commands (Amazon AVS qualified) • Ceva-ClearVox™ ENC - Environmental Noise Cancellation for crisp calls in any conditions
  • 10. Software First Approach Principles Main principles followed: 1. Software requirements drive hardware architecture 2. Prioritize hardware flexibility and programmability 3. Design for end-to-end system efficiency • Prioritize application-level performance • Consider data transfers and memory hierarchies 10 ©2025 Ceva Inc.
  • 11. Software First Approach Requirements Driving hardware architecture decisions through software requirements Define Application Domain Embedded ML Resource Constrained Battery Powered ML Models Workloads AI DSP Control Identify Critical Components Compute Patterns Memory Access Model Deployment Requirements App Level Performance Optimizer & Profiler Extensible Layer Support Design Hardware Advanced Caches Non Linear Activations Fully Programmable ©2025 Ceva Inc. 11 User Experience Requirements Short TTM Inference Framework Support Easy Debug
  • 12. Software First Approach Prioritize hardware flexibility and programmability over pure performance Software Capabilities Optimized NN Layers Software Centric Coding Minimal Run-time and Application Code Minimal Memory Management Code Hardware Features Fully Programmable Processor Non-Linear Activations Advanced Cache Mechanisms ©2025 Ceva Inc. 12
  • 13. Application-Level vs Layer-Level Optimization • Minimize total application compute • Control and DSP workflows are major compute consumers, handled within NPU • Easy & efficient new operators support via software • Unsupported operators not a bottleneck Application-Level (typical for processors) Layer-Level (typical for accelerators) • Minimize layer level compute • Control and DSP workflows are major compute consumers, handled outside NPU (adding latency) • New operators support requires hardware modification or executed by MCU offloading • Unsupported operators become a bottleneck (MCU may incur severe compute penalty and memory transfers latency) ©2025 Ceva Inc. 13
  • 14. Ceva-NeuPro Studio Overview Comprehensive AI SDK uniquely accelerating OEM and semiconductor ML product design & deployment Interfaces Leading Industry AI Tools Complete Development Tools Model Optimizations & Deployment Pre-Optimized Models • BYOM • MATLAB connection • Edge Impulse & NVIDIA TAO • Eclipse / Visual Studio IDE • AI Model Viewer - Netron • C/C++ toolchain • Extendable tools • Graph compiler & runtime • AI model & app profiling • Model optimization & services • Bring your own operator • Model zoo • Ecosystem optimized models • NN libraries • Domain specific libs & algos ©2025 Ceva Inc. 14
  • 15. Ceva-NeuPro Studio AI Model Deployment Flow 15 Compile Model Zoo Data Set HW Config Profile Infer NN Model/s AI Application BYOM Model/s Application Runtime Libs Ceva Optimized Libraries (DSP, NN, domain specific) Micro Arch Planner NN Graph Compiler C/C++ Compiler Execute Debug Profile NeuPro Studio C/C++ User Code Image Recognition Classification Segmentation Object Detection Voice Analysis Anomaly Detection GenAI Multimodal ©2025 Ceva Inc.
  • 16. Summary • Hardware design balancing power, performance, and ease of use achieved through deep internalization of software requirements: • Real world applications and use cases • Emerging technologies and trends • Programmer pain points ©2025 Ceva Inc. 16 Ceva-NeuPro-Nano: Software First NPU Design
  • 17. Ceva-NeuPro-Nano Information - https://www.ceva-ip.com/product/ceva-neupro-nano/ Ceva-NeuPro Studio SDK - https://www.ceva-ip.com/product/ceva-neupro-studio/ Tech Insights, Microprocessor Report by Dylan McGrath - https://www.ceva-ip.com/wp- content/uploads/Ceva-NPU-Core-Targets-TinyML-Workloads.pdf Google LiteRT for Microcontrollers - https://ai.google.dev/edge/litert/microcontrollers/ Edge AI Foundation - https://www.edgeaifoundation.org/ Alliance Dev Tools - https://www.edge-ai-vision.com/resources/technologies/development-tools/ 17 Resources ©2025 Ceva Inc.
  • 18. 2024 IoT Edge Computing Excellence Award The Best IP/ Processor of the Year 2024 award at the prestigious EE Awards Asia event Ceva-NeuPro-Nano NPU Already Won Industry Awards ©2025 Ceva Inc. 18
  • 20. Thank You! For more info: yairs@ceva-ip.com