This document provides an agenda and slides for a presentation on introducing big data concepts using open source tools. The presentation covers ingesting and analyzing sample data using Spark SQL, including joining datasets to count the number of books by author. It also demonstrates basic machine learning by loading sample revenue data, applying data quality rules to correct anomalies, and using linear regression to predict revenue for a party of 40 guests. The goal is to make big data concepts accessible to audiences of all experience levels.
Related topics: