The document discusses language variety identification as a task in author profiling, utilizing distributed representations of words and documents to classify different language varieties. It presents various models and approaches, including the continuous bag-of-words and skip-gram models to effectively capture linguistic regularities and improve classification accuracy. The research introduces a new dataset, Hispablogs1, and demonstrates competitive results in identifying language varieties in social media, along with plans for future investigations in related profiling tasks.
Related topics: