This document summarizes a system for extracting and curating data from tables in clinical trial publications. It describes the challenges in extracting information from tables due to their complex structures and dense, ambiguous content. The system decomposes tables into different types such as list, matrix, super-row, and multi-tables. It then annotates the tables with metadata and semantics by linking data cells to existing knowledge bases. Evaluation showed 85% accuracy in decomposition. Further work will add more semantic annotations and expand linked knowledge bases.
Related topics: