This document provides an overview of big data and how to approach it. It discusses how data has become more distributed and large-scale, necessitating new tools and approaches. It introduces data parallel processing which distributes data, moves computation close to data, and aggregates results. This allows scaling beyond a single machine. It also discusses challenges like data distribution, fault tolerance, and abstraction. The document then outlines the major components of a big data ecosystem, including distributed data storage, processing systems like Hadoop and Spark that handle distribution and fault tolerance, and coordination systems. It provides examples of using big data for recommendations and inventory predictions.