The document discusses entity matching for semistructured data in the cloud. It presents ChuQL, an entity matching architecture, and MAXIM, an entity matching system implemented in a Hadoop cluster. MAXIM uses a three stage process - preparation, blocking, and matching. The preparation stage extracts and indexes data. The blocking stage generates candidate record pairs. The matching stage applies similarity functions to candidate pairs.
Related topics: