This document discusses setting up an environment for agile data science and analytics applications. It recommends:
- Publishing atomic records like emails or logs to a "database" like MongoDB in order to make the data accessible to designers, developers and product managers.
- Wrapping the records with tools like Pig, Avro and Bootstrap to enable viewing, sorting and linking the records in a browser.
- Taking an iterative approach of refining the data model and publishing insights to gradually build up an application that discovers insights from exploring the data, rather than designing insights upfront.
- Emphasizing simplicity, self-service tools, and minimizing impedance between layers to facilitate rapid iteration and collaboration across roles.
Related topics: