The presentation discusses information extraction techniques for distilling structured data from unstructured text. It provides an example of building a website to find continuing education opportunities by extracting structured data from unstructured web pages. The presentation covers machine learning approaches to information extraction, such as using a wrapper to query unstructured sources as databases. It also discusses challenges such as verifying extracted data and automatically repairing wrappers when extractions change or fail.
Related topics: