Accelerating and Enhancing SPICE Simulations with Neural Network-Based Models: Part 1.

Nathan Iyer

RF & Mixed-Signal Expert | Solving AI Bandwidth & Latency at Scale with Photonics

Published Mar 25, 2025

Introduction

Circuit designers often face discrepancies between simulation and measured results. This is more common with engineers working with high-frequency circuits, space applications, photonics and cryogenic models (such as those used in Quantum computing) operating at very low temperatures. In these demanding conditions, conventional SPICE (Simulation Program with Integrated Circuit Emphasis) models frequently struggle to capture the full complexity of the device behavior.

While SPICE models aim to approximate device behavior, they face fundamental trade-offs: achieving increased accuracy requires complex, physics-based models with higher-order polynomials, which in turn significantly slow down simulations due to numerical convergence challenges—particularly when using iterative solvers such as Newton-Raphson and differential equation solvers like the Trapezoidal, backward Euler, and Gear methods.

With recent advancements in computing power and accessibility, deep neural networks (DNNs) have emerged as powerful universal function approximators, offering a paradigm shift in device modeling. By training on empirical data, DNNs capture the full complexity of device behavior, enabling high-fidelity modeling without restrictive assumptions. Once trained, these models operate in inference mode, where backpropagation is disabled, and only the forward pass is used—eliminating the need for convergence and equation solving, resulting in significantly faster and more efficient simulations.

Using a common 2N2222 transistor as our example, we demonstrate a complete workflow that includes:

Training a neural network model using empirical data from a transistor curve tracer
Integrating the AI model into SPICE simulations via PySpice
Validating its performance against traditional simulation approaches

This approach bridges the gap between simulation and reality through data-driven modeling, allowing engineers to base their simulations directly on measured performance data rather than idealized approximations.

The Limitations of Traditional SPICE Models

SPICE has been the industry standard for circuit simulation since its development in the early 1970s. While tremendously useful, SPICE models rely on mathematical approximations of device physics that become increasingly inaccurate under extreme or non-standard operating conditions. The conventional approach to improving accuracy has been to add more physics-based equations and higher-order terms, but this creates two significant problems:

Convergence issues: More complex equations make numerical solvers less stable and more prone to convergence failures
Simulation slowdown: Higher-order models dramatically increase computation time, especially for large circuits

Neural Networks as Device Models

Neural networks offer a fundamentally different approach to device modeling. Rather than relying on physics-based equations, they learn directly from measured data, capturing the actual behavior of devices including real-world non-idealities. Key advantages include:

Universal Approximation: Deep neural networks can represent any continuous function with arbitrary precision
Inference Speed: Once trained, they perform simple matrix multiplications without iterative solving
Adaptability: Models can be trained on specific device batches to capture manufacturing variations
Improved accuracy: By learning from actual measurements, neural networks can represent behaviors that physics-based models might miss

By leveraging neural networks to represent device transfer characteristics, we can capture subtle nonlinearities more precisely while simultaneously reducing simulation time [2-5]. In this paper we provide a simple example of use of a neural network to model a transistor.

Methodology Overview

We present a step-by-step approach to creating neural network-based device models:

Data Collection and Preprocessing: Obtain comprehensive measurement data from real devices across their operating range. Clean, normalize, and prepare the data for neural network training
Building a Neural Network Model: Design and train a neural network to accurately predict device behavior
Model Inference, Validation, and Visualization: Verify model accuracy against measured data
SPICE Integration: Incorporate the neural model into circuit simulations

Our example uses the widely available 2N2222 BJT transistor, demonstrating complete Python/Jupyter notebook workflows and comparing conventional and neural model results.

Step 1: Data Collection and Preprocessing

Data Collection

A comprehensive dataset was acquired using the HP 4145A transistor curve tracer, measuring collector current (Ic) and base voltage (Vb) over a range of base currents (Ib) for a 2N2222 transistor:

High-resolution data captured the transistor's I-V characteristics across the operating range
Multiple operating points were recorded to capture temperature variation effects
Data collection focused on regions where traditional models typically show inaccuracies

Figure 2: Extraction of Transistor data using HP 4145A Curve Tracer.

Data Preprocessing

Once collected, the measurement data must be prepared for neural network training. This preprocessing ensures our neural network will learn the underlying patterns in transistor behavior rather than being influenced by the different scales of the input variables.

This Python code prepares our transistor measurement data for neural network training. It performs This preprocessing code performs several critical functions, including:

Data Splitting: We divide our dataset into training (80%), validation (10%), and test (10%) sets. This separation ensures our model generalizes well to unseen data.

Figure 3: Selecting from Dataset for Training, Validation and Testing

Standardization: A critical step is standardizing the data using StandardScaler. This transforms our measurements to have zero mean and unit variance, which helps the neural network train more efficiently. We fit the scalers on the training data only and apply the same transformation to validation and test sets to prevent data leakage.

We normalize the data using StandardScaler, which transforms measurements to have zero mean and unit variance according to the formula:

Tensor Conversion: We convert NumPy arrays to PyTorch tensors, preparing them for use with PyTorch's neural network framework.
Scaler Preservation: We save our fitted scalers to a file, which will be essential later when we want to transform new input data or convert the model's output back to real-world values.

This preprocessing ensures our neural network will learn the underlying patterns in the transistor behavior rather than being influenced by the different scales of the input variables.

Step 2: Building a Neural Network Model

Network Architecture Design

The code below implements a feed-forward neural network with multiple hidden layers to learn the relationship between transistor inputs and outputs. Let's break down how this works:

Our architecture consists of:

Input layer: 3 neurons for our three input parameters (VC, VB, IB)
Hidden layers: Three fully-connected layers with 256, 128, and 64 neurons respectively
Activation functions: ReLU (Rectified Linear Unit) to introduce non-linearity
Output layer: A single neuron that predicts the collector current (IC)

Figure 5: Block diagram of the neural network architecture

This architecture balances complexity with efficiency, providing enough parameters to capture the non-linear relationships in transistor behavior while remaining computationally efficient.

Training Process

The training process uses mini-batch gradient descent, which is more efficient than processing the entire dataset at once. We've set a batch size of 64 measurements, which offers a good balance between training speed and gradient accuracy.

For optimization, we use the Adam optimizer with a learning rate of 0.001. Adam adapts the learning rate during training, which generally leads to faster convergence compared to standard gradient descent.

We train for 100 epochs*, which means the model will see the entire training dataset 100 times. During each epoch, we:

Train the model on batches of training data
Evaluate the model on the validation set
Print the validation loss to monitor progress

The Mean Squared Error (MSE) loss function measures how close our predictions are to the actual measured values. Lower MSE values indicate better model performance.

*Please note: to improve code readability, early stopping code has been removed, it is included in the final codebase

In summary, key aspects of the training process include:

Mini-batch Processing: We use batches of 64 samples, which offers a good balance between training speed and gradient accuracy.
Adam Optimizer: This adaptive optimization algorithm adjusts learning rates dynamically, typically leading to faster convergence than standard gradient descent.
Loss Function: Mean Squared Error (MSE) measures the average squared difference between predicted and actual collector currents.
Validation Monitoring: We evaluate the model on a separate validation set after each epoch to monitor its generalization performance.
Model Persistence: After training, we save the model's learned parameters to a file (bjt_model_.pth). This allows us to later load the model for inference without having to retrain it.

The training process shows progressively improving validation loss. The decreasing validation loss indicates that our model is successfully learning to predict transistor behavior without overfitting.

Step 3: Model Inference, Validation, and Visualization

1. Model Inference and Validation

After training our neural network model, the next critical step is to validate its performance against real-world test data. The following sections demonstrate how to implement and validate the model:

Loading the Trained Model

This code restores our previously trained model and prepares the test dataset that the model has never seen before.

Creating an Inference Function

The predict_ic() function takes the three transistor inputs (VC, VB, IB) and returns the predicted collector current. This function handles all the necessary scaling transformations behind the scenes, making it easy to use in a SPICE simulation environment.

Evaluating on Test Data

This code below evaluates the model's performance on the entire test dataset, collecting both actual and predicted values for comprehensive analysis.

It evaluates our neural network model on test data, predicting collector current (IC) based on voltage and current inputs (VC, VB, IB). It extracts these parameters from each batch in the test loader, makes predictions using the predict_ic() function, and stores both actual and predicted values. Finally, it organizes all results into a pandas DataFrame for easy analysis and comparison.

2. Results Visualization

To understand how well our model performs, we'll create two key visualizations:

2.1 Scatter Plot for comparing Predicted vs. Actual Values

Figure 6: Scatter plot comparing neural network predictions to measured values

This scatter plot shows how closely our model's predictions match the actual measured values for a subset of test points. The close overlap between red points (measured data) and green squares (model predictions) demonstrates the model's accuracy.

2.2 Visualizing BJT IV Characteristic Curves - Measured vs NN Model

To thoroughly validate our neural network model, we need to examine how well it captures the entire family of characteristic curves for our BJT transistor:

Figure 7: Family of IV curves for different base currents (IB)

This comprehensive visualization plots both the measured data (solid lines) and our neural network predictions (dashed lines) for base currents ranging from 0 to 100 µA. Notice how the dashed curves closely follow the solid lines, indicating minimal approximation error. This visualization is particularly valuable because:

It shows how the model performs across the transistor's entire operating range
It demonstrates the model's ability to capture the non-linear relationship between voltage and current
It provides visual confirmation that our neural approach correctly models the Early effect (slope in the saturation region)

Key Model Characteristics

Looking at these curves, we can observe how closely the neural network predictions match the actual measured data. The slight differences between the solid and dashed lines represent the model's approximation error, which is minimal across most operating regions.

The family of curves also reveals important transistor characteristics that our model has successfully learned:

Threshold behavior: The point at which collector current begins to flow
Active region: The relatively flat portion of each curve, where the transistor operates as an amplifier
Saturation region: The transition from the active region to saturation
Early effect: The finite output resistance in the active region, visible as a slight upward slope

Advantages of the Neural Network Approach

This inference process demonstrates how our neural network serves as a drop-in replacement for traditional SPICE models. Once trained, it can predict transistor behavior with high accuracy while avoiding the convergence issues that plague physics-based models.

Our approach offers several significant advantages for circuit simulation:

Improved accuracy: More accurate representation of device behavior across all operating regions
Enhanced performance: Faster simulation times since neural network inference doesn't require iterative convergence
Data-driven modeling: Direct incorporation of real-world measurement data into the simulation
Adaptability: Ability to model new devices by simply retraining on different datasets

Conclusion

In this part, we explored the limitations of traditional SPICE models and introduced a data-driven approach using neural networks. We collected IV characteristic curves for a 2N2222 bipolar transistor using precision measurement equipment, processed this data, and developed a neural network architecture capable of accurately modeling the transistor's behavior. Our network was trained to predict collector current (IC) based on collector-emitter voltage (VCE), base-emitter voltage (VBE), and base current (IB). After training with careful hyperparameter tuning, our model achieved excellent accuracy with an R² value of 0.998 and RMSE of 1.34 mA on the test dataset.

Having validated our neural network model against real-world measurements, we now turn our attention to the practical implementation and comparative evaluation of this approach. Part 2 will demonstrate how to integrate our neural network model into a circuit simulation workflow, compare its predictions against traditional SPICE simulations, and analyze the benefits and challenges of this novel approach.

We will begin by examining the computational mechanics of traditional SPICE simulations to understand where and how neural network models can provide advantages. Then, we'll walk through a step-by-step integration of our trained neural network into a simulation loop, followed by a detailed comparison of results and performance metrics. Finally, we'll discuss the implications of these findings for electronic design automation and outline promising directions for future research.

Special thanks to Dr. Daniel Dobkin (https://www.linkedin.com/in/daniel-dobkin-25aa432/) for reviewing our post and for providing valuable references.

References

[1] L. W. Nagel, SPICE2: A Computer Program to Simulate Semiconductor Circuits, EECS Department, University of California, Berkeley.

[2] A. Zaabab et. al., Device and Circuit-Level Modeling Using Neural Networks with Faster Training Based on Network Sparsity, IEEE Trans Microwave Theory Tech 45 p. 1696 (1997)

[3] H. Jeong et. al., Fast and Expandable ANN-Based Compact Model and Parameter Extraction for Emerging Transistors, J. Electron Dev Soc 11 p. 153 (2023)

[4] H. Kang et. al., Research on Device Modeling Technique Based on MLP Neural Network for Model Parameter Extraction, Applied Sciences 12 p. 1357 (2022)

[5] Y. Lee et. al., Simplified Silicon Carbide MOSFET Model Based on Neural Network, Materials Science Forum 954 p. 163 (2019)

[6] M. Sullivan, Statistics: Informed Decisions Using Data, 5th ed.

[7] R. Nag, Stanford SCI 52: Introduction to AI and Deep Learning.

[8] L. T. Pillage, et al., Electronic Circuit and System Simulation Methods.

[9] A. Karpathy, The Spelled-Out Intro to Neural Networks.

[10] X. Cao, et al., Comparison of VBIC and Gummel-Poon Bipolar Models.

[11] PyTorch Documentation, Neural Network Training Best Practices.

[12] D. Bourke, PyTorch for Deep Learning & Machine Learning, FreeCodeCamp.org, https://youtu.be/V_xro1bcAuA, https://www.learnpytorch.io/

[13] GitHub Code, https://github.com/maheshbrahmi/NeuralSPICE.git.

Google Web Links

Accelerating and Enhancing SPICE Simulations with Neural Network-Based Models: Part 1

Accelerating and Enhancing SPICE Simulations with Neural Network-Based Models: Part 2

Accelerating and Enhancing SPICE Simulations with Neural Network-Based Models: Part 1.

Nathan Iyer

RF & Mixed-Signal Expert | Solving AI Bandwidth & Latency at Scale with Photonics