This document describes Series-O-Rama, a system that allows users to search for and get recommendations on TV series using SQL. It mines subtitles to extract terms for each series. These terms are indexed and weighted using TF-IDF to model each series as a vector. Series similarity is calculated based on shared terms. Queries can retrieve matching series based on term weights and series can be recommended based on a user's interests. The system provides search, browsing and recommendation capabilities through its GUI and uses a database to store the subtitle data and indexes.
Related topics: