This document summarizes a talk by Sarah Guido at PyCon 2017, focusing on challenges in data wrangling and various examples from her work as a senior data scientist at Mashable. It discusses predicting building sales using logistic regression and addressing class imbalance, clustering user interactions with k-means, and understanding audience composition despite data limitations. The talk emphasizes the importance of problem identification, data quality, and creative solutions in data science.
Related topics: