Today we’re excited to announce the launch of Soda Metrics Observability. The first Soda feature powered by NannyML. Built for data teams who move fast. And don’t have time for mystery failures. It brings real-time anomaly detection to the core of data workflows, without the need for manual rules or long training times. It’s designed to help engineers catch what you didn’t know to look for. Before it hits production. Learn more at: https://launch.soda.io/ — One Last Thing 🔥 We’re giving away a $1,000+ mechanical keyboard! Want a chance to win it? Check out the links in the comment and just request access to the new Soda Cloud.
NannyML (acq. by Soda Data Quality)
Software Development
Estimate your model performance in production without Ground Truth. Open Source. For Data Scientists.
About us
NannyML is an open-source python library for estimating post-deployment model performance (without access to targets), detecting data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, interactive visualizations, is completely model-agnostic and currently supports all tabular use cases, classification, and regression. Key features: - Performance Estimation (without access to targets) and Calculation - Business Value Estimation and Calculation - Data Quality - Multivariate Drift Detection To get started, check our GitHub ⚡️
- Website
-
https://www.nannyml.com/
External link for NannyML (acq. by Soda Data Quality)
- Industry
- Software Development
- Company size
- 2-10 employees
- Headquarters
- Leuven
- Type
- Privately Held
- Specialties
- Artificial Intelligence, Monitoring, Machine Learning, and Data Science
Locations
-
Primary
Leuven, BE
Employees at NannyML (acq. by Soda Data Quality)
Updates
-
NannyML is joining Soda. We are merging two incredible teams to work on a shared mission: building the most intelligent, context-aware data quality platform on the market. Over the past 4.5 years the NannyML team tackled one of the hardest problems in modern AI systems. How do you monitor model performance in production, when there’s no ground truth yet? Our software became the go-to toolkit for teams running models where labels are delayed, sparse, or unavailable. But we saw what was coming. Models don’t fail in isolation. They fail when data pipelines degrade, when user behavior shifts, when upstream assumptions break. We realized that to solve this was to close the loop between data quality and AI behavior. By bringing our teams and platforms together, we’re unifying those layers. Delivering a product that can monitor your entire system, not just pieces of it. We are very excited about what is coming. This is the first drop of Soda’s Launch Week. Over the next five days, you’ll see the first results of our unification. Learn more about the acquisition at: https://lnkd.in/duzs3nmD ------------------------------ One Last Thing 🔥 We’re giving away a $1,000+ mechanical keyboard! Want a chance to win it? Check out the links in the comment and just request access to the new Soda Cloud.
-
-
Retraining is not all you need. Even if you regularly retrain your models in production, ML monitoring is still necessary. Here’s why: 1. If your model performance decreased due to covariate shift, retraining won't fix it. This is especially true if your model input drifted regions where there's less information about the target, for example near the class boundary. 2. If the root cause of the performance drop is data quality, retraining is likely to cause a further drop in performance. We've seen it over and over again with our users. 3. Monitoring is cheaper than retraining.
-
-
All the effort in training an ML model only counts if it delivers good performance in production. Deploying an AI model is easier than ever. What’s hard is tracking, monitoring, and maintaining performance in production. Why is that? Because AI models age. Studies show that 91% of models degrade over time due to factors like covariate shift and concept drift. That means the errors in the predictions increase as time passes. Waiting until performance declines is not an option. Businesses can't afford that risk. That’s why proactive monitoring matters. Algorithms like CBPE let you estimate performance even without ground truth, so you can act before it’s too late. Monitoring and maintaining are not things you think of once your model is deployed. You need them from day one.
-
-
How to quantify the impact of data drift on model performance? 📏 If you are looking for information on how to check if a deployed model is making the right predictions on new/unseen data, you will likely come across data drift monitoring. Data drift monitoring tells us about how the feature distributions of the model have changed over time. 📊⏳ 🚨 But one thing it won't tell us is how those changes are affecting the model's performance! There are three possible scenarios that can happen when features' distributions change. 1️⃣ Model performance stays the same. ― 2️⃣ Model performance improves. 📈 3️⃣ Model performance degrades. 📉 Most people and tools assume that data drift always correlates with performance degradation, but just by looking at drift results, it is impossible to tell in which direction those changes are going to affect the model. ✍️ There is a quote from Shreya Shankar that I always like to share: "(...) many organizations fall into this trap of monitoring fine-grained metrics first, as if some change on the median value of an obscure feature will give any actionable insight." 🔙 So, returning to the topic of the post, then, what is the best way to quantify the impact of data drift on model performance? 💡 Our best answer so far is by estimating the model's performance given the previous and current feature distributions. How do we do it? One way is by using an algorithm called Multicalibrated Confidence-based Performance Estimation (M-CBPE). M-CBPE takes the model’s historical inputs and outputs (predictions and probability scores) to learn how to estimate elements of a confusion matrix on new unseen data. Since M-CBPE calibrates predicted probabilities according to the latest data distribution, it can return a value for the expected model performance given the current data drift. We have seen that this is the best way to determine how data drift is impacting the model's performance. And funnily enough, it is not always in the decreasing direction :) --------- this post was 100% stolen from Santiago Viquez
-
-
Drift detection is only half the story—measuring its impact is what truly matters. In production, labels aren’t available, so we often focus on detecting data drift. But have you thought about whether that drift actually affects your model’s performance? Here’s why this distinction is important: Not all drift is harmful—some has no impact on performance. No impact = no need for alerts, saving your team from unnecessary distractions. Drift without measurable consequences is just noise in your monitoring pipeline. So, what can you do? Track expected performance, even without labels. By measuring impact rather than just detecting drift, you shift monitoring from reactive to proactive—streamlining your workflow and focusing on what really matters. What’s your take on measuring drift impact? Let’s discuss!
-
-
Why Data Scientists (not ML Engineers) should lead ML monitoring 🔥 Here are three reasons why Data Scientists are best suited to oversee ML models in production: 1️⃣ Post-deployment data science work requires context: Monitoring a model isn’t just about keeping it running—it’s about: Assessing the business impact of predictions. Detecting concept drift or covariate shift. Diagnosing and resolving performance issues. These tasks demand a deep understanding of the model’s data, metrics, and purpose—something only a Data Scientist can bring. 📊🔍 2️⃣ Data Scientists know the model inside out: Who better to monitor the model than the person who built it? They understand the problem context and the model’s inner workings. They know the business case and the nuances of the dataset. This knowledge is invaluable for diagnosing issues and implementing solutions effectively. 🧠💡 3️⃣ Ownership and responsibility lie with Data Scientists: Models will inevitably encounter real-world challenges like: Covariate shifts and concept drifts. Data quality issues. It’s the Data Scientist’s responsibility to address these problems, ensuring the model continues to deliver value over time. 🤝💼 While ML Engineers play a crucial role in deploying and maintaining system health (e.g., uptime, latency, requests), their focus is on operational metrics rather than the nuanced data science aspects of model performance. ⚙️💻 Let’s not start a DS vs. MLE debate here! But what’s your take? Should Data Scientists own ML monitoring? Let’s discuss below—civilly, of course. 😂
-
-
Did you know your data can drift even when individual feature distributions look unchanged? 👀 This happens when the correlation between variables shifts, a phenomenon known as multivariate data drift. Unlike univariate drift, detecting multivariate drift is more complex because it involves relationships between features rather than changes in individual distributions. One powerful method to detect multivariate drift is PCA reconstruction error: 🔹 PCA (Principal Component Analysis) reduces data to a smaller set of uncorrelated variables, called principal components. 🔹 By reversing the transformation, you compare the original dataset to its reconstruction and measure how much information has been lost. 🔹 This reconstruction error reflects changes in variable correlations—if it exceeds a threshold, multivariate drift is likely. For example: A scatter plot reveals changes in feature correlations over time, even when single-variable distributions remain stable. The PCA reconstruction error line plot shows these correlation shifts as spikes, making it easier to identify hidden drift. 📖 Want to learn more about spotting univariate and multivariate data drift? Check out the blog by Magdalena Kowalczuk—https://lnkd.in/gun8tEUj 💡 How does your team detect multivariate data drift? Have you explored PCA reconstruction error? Let’s discuss below!
-
-
Univariate drift detection misses the bigger picture when relationships between features change. Imagine this scenario: You’re working with a dataset that tracks age and income weekly. 🔹 In Week 10, there’s a positive correlation: as age increases, income rises—reflecting career growth and promotions. 🔹 By Week 16, the marginal distributions of age and income remain unchanged (standard normal), but the relationship flips. Now, age and income show a negative correlation, likely due to a focus on retirees with lower incomes. What’s the problem? Univariate drift detection won’t catch this shift because it only looks at individual features. But a model trained on Week 10 data would struggle on Week 16, where the joint distribution has shifted dramatically. To detect changes like these, you need multivariate drift detection methods that monitor feature relationships, not just individual distributions. 💡 How does your team handle joint distribution shifts? Have you explored multivariate drift detection? Let’s discuss below!
-
-
🎯 Did you know your model’s probability scores can help you estimate performance—even without true labels? In classification, models don’t just predict classes—they also provide probability scores that reflect their confidence in those predictions. Our Confidence-Based Performance Estimation (CBPE) algorithm leverages these probability scores to estimate performance metrics without needing the true target classes. Here’s how it works: Let’s say your model outputs probabilities for three predictions: [0.90, 0.63, 0.21]. 1️⃣ For 0.90, it’s classified as positive. There’s a 90% chance it’s correct (true positive) and a 10% chance it’s wrong (false positive). 2️⃣ These fractions—0.9 and 0.1—are added to the corresponding cells in the confusion matrix. 3️⃣ Repeat this for all predictions (e.g., 0.63 for positive, 0.21 for negative), and update the matrix. At the end, you calculate performance metrics: ✅ Accuracy: (Sum of true positives + true negatives) ÷ Total predictions. In this example, it’s (1.53 + 0.79) / 3 = 0.77. This method allows you to: 🔹 Continuously estimate performance. 🔹 Monitor your model in production—even without ground truth labels. 💡 What’s your take on CBPE? Have you tried using probability scores to monitor model performance? Let’s discuss below!
-