From the course: Executive Guide to Predictive Modeling Strategy at Scale

Unlock the full course today

Join today to access over 24,700 courses taught by industry experts.

Why you might have too little data

Why you might have too little data

- I want to revisit one of the recurring themes of the course. Sometimes you'll actually discover that you have too little data. Years ago in the IBM SPSS Modeler and CRISP-DM communities, we'd hear about the tale of the vanishing terabyte. It was a popular story of a client engagement from many years ago, way back in the 90s in fact, in some ways it's a big data story long before that phrase became popular. The name, the vanishing terabyte alone communicates the basic idea. The data mining consultant in the story was told that their client was very concerned that their systems couldn't handle the volume. As they performed the data understanding phase and began the actual act of choosing the relevant data, they discovered that they only had a few hundred instances of fraud. They may have had a very large number of total transactions, but as is often the case, the critical data points, the fraudulent transactions, were actually quite rare. Overall scale has gone up in the years since…

Contents