This document provides instructions for setting up a Python, Spark, and Jupyter notebook environment for data science. It discusses installing Miniconda to manage Python packages, downloading and configuring Spark, and setting the PySpark kernel. Pixiedust is also installed to visualize data in the notebooks. Examples of demographic analyses using open data sources are provided.
Related topics: