From the course: Foundations of Responsible AI
Problems in ML that differ from software engineering
From the course: Foundations of Responsible AI
Problems in ML that differ from software engineering
- Machine learning is unique for various reasons. Despite the fact that many ML models can be created in popular software engineering languages, you can create traditional ML, like logistic regression, in Excel. Technically, you don't need high-powered GPUs and transformers to build the kinds of AI that companies often release. First, you just need a lot of data. It can be argued some methods don't require as much training data, as algorithms like neural networks, but all AI systems are trained on some group of data. How that data is collected, manipulated, and stored is a major concern for customers and users. But these aspects are rarely considered critically when teams build AI systems. Unlike software engineering that allows you to build tools from scratch with few dependencies, the available training data is the main catalyst of ML performance, alongside using the best parameters and optimal settings. Whereas in traditional software engineering, the quality of the software depends on the quality of the code. In AI, this reliance on data proves to add layers of complexity. Was informed consent given to use the data for training ML? Have there been data leaks, or other issues with anonymizing sensitive data? It's understandable why many software applications aren't created under the kinds of scrutiny ML applications are. While both leverage computations, software typically takes a program and an input and outputs results. In contrast, machine learning takes data as an input and desired results to output a program or probability. Software engineers use their human ingenuity to come up with a solution and formulate it as a precise program a computer can execute. Data scientists collect input data, such as sensors on a vehicle, and desired target values, the throttle level, and the angle of the steering wheel. Then data scientists instruct a model to identify a program that computes an output for each input value. In other words, what actions a car should take based on its sensor inputs. Machine learning also operates at scale in AI systems. While software can handle the bandwidth of millions of users making millions of decisions is more difficult for a variety of reasons. When humans make bad decisions, or decisions highly influenced by our cognitive biases, we do so at a far slower rate than machine learning models. While software engineers are concerned with correctness for every edge case, ML involves a lot more uncertainty. Considering that at its root, ML is pattern recognition, we know some details will be lost when we build a machine learning model on any data. AI requires us to pay special attention to different metrics and tests when compared to software engineering metrics that are important when measuring ML are accuracy, recall, precision, AUC and F1. These metrics aim to test the reliability of ML models, but we must also be aware of the issues that arise when we deploy models to production. A major concern here is model drift. When we put models into production, often the predictions tend to degrade a lot. When metrics like false positive rates spike and accuracy drops, model drift is typically to blame. We can think of model drift as having four major types. When the nature of the variable you're predicting changes, we call that concept drift. If we're predicting weather patterns, and the real world is experiencing weather anomalies, it's likely our model will experience concept drift during those times. Data drift happens when the nature of features changes. Maybe you're building a crypto price predictor, but there was recently a major crypto scam, or world events that interfere with normally stable features. Then we can say our data has drifted from the statistical distributions we used to train our model, rendering our model less effective. Fairness drift occurs when after checking for fairness metrics, teams observe models become more unfair over time. This can be caused by various factors, but result in once fair models skewing to favor one group over another. Accuracy drift happens when concepts don't change, but a model does not perform with the same level of accuracy as they did when initially deployed. This is often a signal that the training data has become stale. Sources of drift include out of control real world events, temporal features or seasonality, or poorly automated training workflows. Drift can be detected by checking for anomalies and checking for stationarity with statistical tests. The types of tests needed to understand ML models cannot be as simple as unit tests used to check software performance. Instead, we must implement comprehensive observability tools, since ML models are fallible to concept data, accuracy, and fairness drift. There are various startups that offer paid platforms for production model monitoring. But there are also open source tools available to do this, such as ML flow. What these tools do is allow teams to automate model monitoring, receive alerts when model drift occurs, and more easily track experiments and version control for models already in production. This is a key aspect of AI development, and few teams have found long term success in AI endeavors without proper production monitoring and observability.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.