From the course: Applied Machine Learning: Ensemble Learning (2022)

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Cleaning up categorical features

Cleaning up categorical features

- [Instructor] In this video, we'll continue the cleaning we started in the last video, but now we'll focus on the categorical features. Just a reminder, if you're picking this up as a new notebook you'll need to rerun the prior cells, to ensure that you have the appropriate data and packages for the code that we'll be covering. Let's start by creating an indicator for the cabin feature. As a quick reminder, running this isnull sum method to count the missings, we see that cabin is missing for 687 people in this data set. Recall for age, we simply replaced the missing values with the average value for age, we're able to take that approach because age was missing at random, so we couldn't really bake any information into that missing value. It's not the same in this case, and let's see why. Let's take a quick look at survival rate, based on whether cabin is missing in this data set or not. We'll…

Contents