The document outlines a methodology for semi-automatically analyzing large-scale User-Generated Content (UGC) data, specifically focusing on travel blogs and online reviews related to tourism in Catalonia. It describes the web mining process used to collect, clean, and analyze travel diaries, highlighting the growth of UGC data and its significance in understanding tourist experiences. The findings emphasize the organized structure of data collection and the necessity of cleaning steps to ensure quality information for future content analysis.