The document discusses extracting structured data from web forums. It proposes leveraging more site-level knowledge such as the forum sitemap and page layouts to improve data extraction. Template-based clustering is used to group similar forum pages. Relationships between elements on pages and across pages are also analyzed. Structured data like posts and threads are extracted using formulas defined in a Markov logic network model that takes advantage of the site structure and element relationships. Experiments demonstrate the approach can accurately extract author, title, time and content from forum posts and threads.