“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentation from Ceva

NPU IP Hardware Shaped Through
Software and Use-Case Analysis
Yair Siegel
Senior Director Wireless and Emerging Markets
Ceva

40-50 licensing deals annually
>70 royalty paying customers
>100 active customers
>200 registered patents
NASDAQ:CEVA
>$100m annual revenue
$164m cash, no debt
~450 employees (~75% R&D)
HQ in Maryland, main R&D Centres:
U.S., France, Israel, Greece, Serbia
Trusted partner for over 2 decades
>19bn Ceva-powered devices shipped
>2bn Ceva-powered chips shipped annually
#1 worldwide wireless connectivity IP,
67% market share*
Edge AI NPUs portfolio, scalable from
Embedded ML up to GenAI
Company Overview
www.ceva-ip.com 2
*Source: IPNest’s latest Design IP report – 2023 (published May 2024)
©2025 Ceva Inc.

Embedded ML Applications for Consumer & Industrial IoT
©2025 Ceva Inc. 3
Voice: keyword spotting (KWS), voice biometrics,
sound detection & classification, environmental
noise cancellation (ENC)
Vision: object detection, image classification,
always-on human presence detection or similar
contactless recognition
Predictive Maintenance: vibration, temperature,
humidity, sound sensing for predictive
maintenance
Health & Fitness Sensing: physical activity
tracking, heart rate monitoring, sleep pattern
analysis
End-User
Devices:
Embedded ML applications typically consume
<1 Watt and support 10’s of GOPs of compute

Typical Technical Requirements for Embedded ML Deployment
Memory Footprint
• < 10 MB Flash/ROM/RAM size
• < 500 KB code + dynamic data memories
Model Size
• 0.01 MB to 10 MB memory required for
the model weights (aka parameters)
Key requirement: Easily deployable on battery powered and resource limited devices,
to reduce deployment costs and maximize value of Edge AI
Power Consumption
• Optimized for Low Power < 10 mW
• Enable battery-powered devices
• Minimize device recharges
Computational Requirements
• Typical computation: 10s of GOPS or more
• Deployable on resource constrained
hardware (e.g., MCU)
©2025 Ceva Inc. 4

Embedded ML solutions require flexible and scalable architectures delivering optimal
balance of performance, size, & power efficiency, with a complete AI SDK
Embedded ML Implementation Challenges
Rapid Technology Evolution
New use cases, networks and data types
Ultra Low Power Requirements
Always-on, battery powered devices
Low-Cost Expectations
Small memory size & die-size needed for proliferation
Complex Software Infrastructure
AI frameworks, proprietary silicon, and varied
networks
Full Hardwired NPUs
Can’t cope well with new networks or data types
Made for very specific tasks with no upgrade path
MCUs or DSPs plus separate NPU
Multi-core solution yields sub-optimal area & cost
MCUs / DSPs not ML optimized ->
poor in power consumption and performance
Complex integration, SW, memory management
Key Challenges Existing Solutions
©2025 Ceva Inc. 5

Ceva-NeuPro-Nano Embedded ML NPU: Design Guidelines
• Shaped by deep analysis of user perspectives, recognizing need for powerful and user-friendly
solution
• Design philosophy guided by application-level challenges, vs. neural-net layer level challenges
• Approach ensures 3 major workloads can be handled efficiently and seamlessly:
• Neural network workloads
• DSP workloads
• Control workloads
©2025 Ceva Inc. 6

Ceva-NeuPro-Nano: Software First Approach,
Designed to Address Embedded ML Market Needs
©2025 Ceva Inc. 7
Si Single Core Efficient & Flexible Compute
• End-to-end AI application on single core
• Efficiently executes various NN architectures,
operators, feature extraction, DSP and control code
• DSP built into NPU architecture
• NeuPro-Nano not a HW accelerator DSP+HWA
• Mixed math precision: supports 4/8/16/32-bit
integer as well as floating point math
• Transformers & SLMs inherent support
• Weight de-compression on-the-fly
• Sparsity compression
Programmable & Extensible Fast TTM to Deploy
1 2
3 4
• Enables support for new AI frameworks, NN
architectures and operators
• Supports future new DSP and NN enabled
applications
• All in software  no hardware changes
• Robust AI SDK solution, Ready for 3rd party
IDE/SDK integration
• No-friction business model for deployment
• Strong ecosystem of AI software and
development companies

Complete End-to-End AI Application on a Single Core
• Typical Embedded ML applications constructed from feature extraction & NN layers
• Each block consumes substantial resources
• Single core Edge NPU for complete Embedded ML applications
• Handles control code, NN layers and feature extraction (Signal Processing - MFCC) on same
processor
Ceva-NeuPro-Nano
MCU
Sensor
Sensor
Interface
Feature
Extraction
Classifier
NN
Model
Feature
Extraction
Mask
Generati
on NN
Model
Signal
Filtering
App A
App B
Wake-up signal
(When event is detected)
Filtered signal
(Application Dependent)
Sensor Data
©2025 Ceva Inc. 8

Complete AI Application on a Single Core
• Architecture minimizes die size and memory utilization by efficiently processing all
application workloads on a single core
• NN layers include Fully Connected, RNN, Attention
• Signal Processing: pre & post-processing (e.g., vision networks), feature extraction (STFT, iSTFT and
MFCC)
Single core, future compatible NPU ensures high efficiency
on NN layers and Feature Extraction workloads
36%
64%
Ceva-ClearVox Control
(cycles partition)
Signal Processing
NN Layers
68%
32%
Ceva-ClearVox ENC
(cycles partition)
Signal Processing
NN Layers
©2025 Ceva Inc. 9
Ceva’s complete AI based applications:
• Ceva-ClearVox™ Control - Wake Word
and Commands (Amazon AVS qualified)
• Ceva-ClearVox™ ENC - Environmental
Noise Cancellation for crisp calls in any
conditions

Software First Approach Principles
Main principles followed:
1. Software requirements drive hardware
architecture
2. Prioritize hardware flexibility and
programmability
3. Design for end-to-end system efficiency
• Prioritize application-level performance
• Consider data transfers and memory hierarchies
10
©2025 Ceva Inc.

Software First Approach Requirements
Driving hardware architecture decisions through software requirements
Define
Application
Domain
Embedded
ML
Resource
Constrained
Battery
Powered
ML Models
Workloads
AI
DSP
Control
Identify
Critical
Components
Compute
Patterns
Memory
Access
Model
Deployment
Requirements
App Level
Performance
Optimizer &
Profiler
Extensible
Layer
Support
Design
Hardware
Advanced
Caches
Non Linear
Activations
Fully
Programmable
©2025 Ceva Inc. 11
User
Experience
Requirements
Short TTM
Inference
Framework
Support
Easy Debug

Software First Approach
Prioritize hardware flexibility and programmability over pure performance
Software Capabilities
Optimized NN Layers
Software Centric
Coding
Minimal Run-time
and Application Code
Minimal Memory
Management Code
Hardware Features
Fully Programmable
Processor
Non-Linear
Activations
Advanced Cache
Mechanisms
©2025 Ceva Inc. 12

Application-Level vs Layer-Level Optimization
• Minimize total application compute
• Control and DSP workflows are major compute
consumers, handled within NPU
• Easy & efficient new operators support via
software
• Unsupported operators not a bottleneck
Application-Level (typical for processors) Layer-Level (typical for accelerators)
• Minimize layer level compute
• Control and DSP workflows are major compute
consumers, handled outside NPU (adding
latency)
• New operators support requires hardware
modification or executed by MCU offloading
• Unsupported operators become a bottleneck
(MCU may incur severe compute penalty and
memory transfers latency)
©2025 Ceva Inc. 13

Ceva-NeuPro Studio Overview
Comprehensive AI SDK uniquely accelerating OEM and semiconductor ML product design & deployment
Interfaces Leading Industry AI Tools
Complete Development Tools
Model Optimizations & Deployment
Pre-Optimized Models
• BYOM
• MATLAB connection
• Edge Impulse & NVIDIA TAO
• Eclipse / Visual Studio IDE
• AI Model Viewer - Netron
• C/C++ toolchain
• Extendable tools
• Graph compiler & runtime
• AI model & app profiling
• Model optimization & services
• Bring your own operator
• Model zoo
• Ecosystem optimized models
• NN libraries
• Domain specific libs & algos
©2025 Ceva Inc. 14

Ceva-NeuPro Studio AI Model Deployment Flow
15
Compile
Model Zoo
Data Set HW Config
Profile
Infer
NN Model/s AI Application
BYOM
Model/s
Application
Runtime Libs
Ceva Optimized
Libraries (DSP, NN,
domain specific)
Micro
Arch Planner
NN Graph Compiler
C/C++ Compiler
Execute
Debug
Profile
NeuPro Studio
C/C++ User Code
Image
Recognition
Classification Segmentation
Object
Detection
Voice
Analysis
Anomaly
Detection
GenAI Multimodal
©2025 Ceva Inc.

Summary
• Hardware design balancing power, performance, and ease of use achieved through deep
internalization of software requirements:
• Real world applications and use cases
• Emerging technologies and trends
• Programmer pain points
©2025 Ceva Inc. 16
Ceva-NeuPro-Nano: Software First NPU Design

Ceva-NeuPro-Nano Information - https://www.ceva-ip.com/product/ceva-neupro-nano/
Ceva-NeuPro Studio SDK - https://www.ceva-ip.com/product/ceva-neupro-studio/
Tech Insights, Microprocessor Report by Dylan McGrath - https://www.ceva-ip.com/wp-
content/uploads/Ceva-NPU-Core-Targets-TinyML-Workloads.pdf
Google LiteRT for Microcontrollers - https://ai.google.dev/edge/litert/microcontrollers/
Edge AI Foundation - https://www.edgeaifoundation.org/
Alliance Dev Tools - https://www.edge-ai-vision.com/resources/technologies/development-tools/
17
Resources
©2025 Ceva Inc.

2024 IoT Edge Computing Excellence Award The Best IP/ Processor of the Year 2024 award at the
prestigious EE Awards Asia event
Ceva-NeuPro-Nano NPU Already Won Industry Awards
©2025 Ceva Inc. 18

Thank You!
For more info: yairs@ceva-ip.com

“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentation from Ceva

More Related Content

Similar to “NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentation from Ceva (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentation from Ceva