This document discusses best practices for writing code for data science projects. It recommends starting with scripts to understand problems, then refactoring code into well-named functions and modules for reuse and testing. Caching and online algorithms can help handle large datasets. The document also advocates adopting software engineering practices like version control and testing as projects mature. Overall it provides guidance on structuring data science code for productivity, quality, and scalability.
Related topics: