This document summarizes two tasks for a final project using Yelp data:
1. Predicting business categories using an information retrieval approach. The dataset was divided into training and test sets. Categories were predicted for businesses in the test set based on features extracted from reviews in the training set. Precision and recall were calculated by comparing predictions to ground truths.
2. Predicting the most discussed attributes for each city, such as "good for kids" or "music". An attribute map was created using WordNet. Attributes were ranked for cities in test and training sets using BM25. Precision and recall were calculated by comparing test set predictions to those from the training set. Challenges included data cleaning, feature extraction, and evaluation
Related topics: