This document summarizes a presentation about using Hadoop for online content optimization at Yahoo. It discusses using machine learning and large-scale data from user interactions to build models that learn users' interests and content attributes to deliver personalized recommendations. Key points include:
- Collecting hundreds of GB of user interaction data daily to build user profiles and content models
- Storing models and metadata in HBase for fast lookup and updating models every 5-30 minutes
- Using Pig and Hadoop jobs to generate features, build recommendation models, and analyze results
- A service architecture with HBase, Pig, Hive, and edge services to power large-scale personalized recommendations.