This document provides an overview of the Pig tool, which is a scripting language for exploring large datasets within the Apache Hadoop ecosystem. It discusses how Pig allows processing of terabytes of data through just a few lines of code by customizing all parts of the processing path, including storing, filtering, grouping, and joining data. The document then presents a sample problem using a publicly available million song dataset to demonstrate loading and storing the data with Pig, finding the song density, and filtering the results. It analyzes the input and output, showing that the input dataset contained 1 million records which were filtered down to the top 50 songs with the highest sound densities.