From the course: Foundations of Responsible AI

Visualization and comparing data

From the course: Foundations of Responsible AI

Visualization and comparing data

- Let's look at an example from healthcare. For an insurance company, predicting healthcare costs to influence insurance premiums is a common use case of machine learning. There are a few factors to consider, like which groups have lacked access to healthcare historically. Many in Native American, Black, and Latin communities don't have adequate access to medical services like primary physicians, urgent care facilities, and properly staffed ERs. Some of the variables that impact how someone interacts with a healthcare system include their socioeconomic status, gender, if they're an immigrant, if they're a native English speaker, and plenty more attributes, like if they're part of a historically marginalized group, like Black, Latin, Native, and lower income Asian communities. These group also tend to deal with language or cultural barriers when interacting with healthcare systems. In many marginalized communities, there's a hesitance to try new or experimental medical treatments due to systemic marginalization of non-white people within medical communities. This is demonstrated by the treatment of Henrietta Lacks, and the increased maternal mortality rate amongst Black women. Let's start investigating the dispersion of medical facilities in each neighborhood in San Francisco. If we layer this data with historical data around low income and high crime neighborhoods we can see that there are fewer services in neighborhoods considered risky. We can build some charts that more visibly highlight this inequality. These are the kinds of insights we can use to push for more time spent on algorithmic impact assessments. Always consider that a single dataset is hardly descriptive of the real world. It's common to have millions of rows about a single snapshot of time, but rarely the full story. Despite the amount of data we have, we hardly have a complete look at an individual person's situation. This is relevant especially for healthcare, but even in industries like finance. A person's financial situation at the time of applying for a credit card or home loan doesn't accurately encapsulate their successes and hardships. This data doesn't show you who has generational wealth, those who have been victims of identity theft, victims of subprime loans, and various other groups who have unequal access to financial institutions. Now, let's consider how we can visualize data in ways that responsibly represent real life. There are some common pitfalls we make when attempting to compare groups of data. If we don't ensure that we begin from zero and use the same scale it's easy to misrepresent what our data means in each chart. We can also easily make the mistake of inappropriately leaving out data. We must be aware that the ways in which we display data can also encode our perspectives about the world. Producing good data comparisons using data visualization best practices is one of the best ways to help teams gain an in-depth data understanding. In the next video, we'll talk about turning data into good data stories.

Contents