From the course: Foundations of Responsible AI

Causal reasoning and fairness

From the course: Foundations of Responsible AI

Causal reasoning and fairness

- Often, machine learning models find spurious correlations rather than robust ones. When models are exposed to data, the models assume they have all the data they need to make predictions. This is common as models perform poorly when the data they're deployed on varies from the data they were trained on. But this is common for ML as our deployment environments rarely match perfectly to our training data. We're not only making decisions based on available training data, but by using active decision making, we change the environment which often breaks the patterns we have identified. We can look to causal inference to better understand the process in which data used to train machine learning models are generated. Causal inference considers the assumptions, study designs, and estimation strategies to draw conclusions about the events in the world. We can think about causal learning in the three layer hierarchy. First is association and then intervention and counterfactuals. Association seeks to find statistical relationships between variables. By contrast, intervention and counterfactuals demand causal information. An intervention is a change to the data generating process. So it describes the distribution of a data set if we change a feature's values while keeping the rest of the data set the same. A counterfactual describes a causal situation in this form. If X had not occurred, Y would not have occurred. For example, if I had not taken the train to work today, I wouldn't have arrived at work today. Causal learning can address several aspects for responsible AI, including bias mitigation, fairness, transparency, and generalizability. As I go through some of the most common tools within causal learning, take a moment and write these down. These are causal assumptions, do-calculus, counterfactuals, analysis, and adaptability. When researchers encode causal assumptions explicitly, it allows for greater transparency and testability for AI systems. It also allows us to discern whether these assumptions are plausible and compatible with the available data, giving us a better understanding of how a system works. Confounding factors, or factors that also contribute to the cause of an event, are a major cause of many socially indifferent behaviors of an AI system. Do-calculus enables the system to exclude spurious correlations by eliminating confounding factors through other criteria. Counterfactual analysis allows us to ask what would've happened if some condition were different. This is a popular framework for assessing fairness in AI systems specifically because data can be manipulated to simulate outcomes in a what if scenario. If we have a data set of job applicants, we can change the gender in the data set and understand if our model would make a different decision. We can also add data to our data set for better results. Counterfactual data augmentation is a technique to augment training data with their counterfactually revised levels to eliminate spurious correlations. Adaptability is a model's capability to generalize in different environments. AI algorithms typically can't perform well when the environment changes. So we strive to create adaptable models and causal reasoning helps us do that by better understanding the conditions of the world around us. Using causal reasoning, we can uniquely identify the underlying mechanism responsible for the changes, whether it's in data, in production, real world events, or the way we deployed our models.

Contents