SlideShare a Scribd company logo
Supporting clinical trial data
curation and integration
with table mining
Nikola Milosevic1, Cassie Gregson3, Robert Hernandez3, Goran Nenadic1,2
1School of Computer Science, University of Manchester
2 The Farr Institute @HeRC
3AstraZeneca
Clinical trial publications
• Around 800 000 clinical trials in PubMed
• Difficult to digest/search
• Text mining approaches
• But tables and figures are
often not processed
Tables in publications
• Present factual information
• Usually:
• Experimental settings (i.e. demographics)
• Findings and results (e.g. DDI, side effects, adverse events…)
• Background information (previous research, datasets, etc.)
• Examples
• Important information about trials
Extraction and curation of table data
Challenges
• Complex structure
• Table dimensionality (1, 2, multi-dimensional)
• Visual relationships
• Dense content
• Ambiguous short text
• Lack of context
• Acronyms and abbreviations
• Incomplete information
Supporting clinical trial data curation and integration with table mining
Table analysis overview
Table types (1)
• 4 types: list, matrix, super-row and multi-tables
• List table:
Table types (2)
• Matrix table
Table types (3)
• Super-row table
Table types (4)
• Multi-table
Example of decomposition
Example of decomposition
Example of decomposition
Results
Next steps
• Add semantic annotations
• Link patterns in data cells with its meaning
• Build/Expand knowledge bases
• Relate to existing knowledge on the semantic web
Annotation schema
• Meta-data
• Paper (name, abstract, authors, publisher)
• Authors (names, emails, affiliations)
• Table (caption, footers)
• Cells (content, role)
• Inter-cell relationships
• Semantics (links to ontologies,
dictionaries, knowledge bases)
Summary
• Tables contain valuable information such as settings or
results
• System for extraction and curation of table data
• Decomposition and annotation of the tables
• Accuracy of 85%
• Semantic analysis and information extraction
nikola.milosevic@manchester.ac.uk

More Related Content

PDF
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
PPTX
Table mining and data curation from biomedical literature
PPTX
Ema Čelebić, Vedrana Filić, Magdalena Kuleš: Information problem solving by s...
PPT
Research project
PPTX
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
PPTX
PPTX
Database design
PPT
Writing the Report and Dissemination
Euraxess ERD2018 Presentation on a JSPS Usability & eHealth Project
Table mining and data curation from biomedical literature
Ema Čelebić, Vedrana Filić, Magdalena Kuleš: Information problem solving by s...
Research project
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
Database design
Writing the Report and Dissemination

What's hot (18)

PPTX
Evaluating Information Sources for Academic Essays
PPTX
Syndicate
PPTX
Syndicate format
PDF
Introduction to mining massive datasets
PPTX
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
PPTX
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
PDF
CV_Min_Jiang
PDF
Data collection methods RSS6 2014
DOCX
Doaj เครื่องมือช่วยค้นวารสารแบบเปิด
PPTX
Elasticsearch: Removal of types
PPTX
Importance of stochastic process
PPTX
Data structure(Part 2)
PDF
Accuracy of citation data in Web of Science and Scopus
PDF
PDF
Attitude scale of statistics, development, reliability and validity studies
PPTX
Large-scale visualization of science
PPTX
Iodl 2019 presentation
DOC
Victor (Shengli) Sheng
Evaluating Information Sources for Academic Essays
Syndicate
Syndicate format
Introduction to mining massive datasets
Linkages to EHRs and Related Standards. What can we learn from the Parallel U...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
CV_Min_Jiang
Data collection methods RSS6 2014
Doaj เครื่องมือช่วยค้นวารสารแบบเปิด
Elasticsearch: Removal of types
Importance of stochastic process
Data structure(Part 2)
Accuracy of citation data in Web of Science and Scopus
Attitude scale of statistics, development, reliability and validity studies
Large-scale visualization of science
Iodl 2019 presentation
Victor (Shengli) Sheng
Ad

More from Nikola Milosevic (20)

PPTX
Classifying intangible social innovation concepts using machine learning and ...
PPTX
Machine learning (ML) and natural language processing (NLP)
PPTX
Veštačka inteligencija
PPTX
AI an the future of society
PPTX
Machine learning prediction of stock markets
PPTX
Equity forecast: Predicting long term stock market prices using machine learning
PPTX
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
PPTX
Extracting patient data from tables in clinical literature
PPTX
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
PPTX
PDF
PDF
Sentiment analysis for Serbian language
PDF
Http and security
PDF
Android business models
ODP
Android(1)
PPT
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
PPTX
Mašinska analiza sentimenta rečenica na srpskom jeziku
PPT
PDF
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
PPT
Software Freedom day Serbia - Owasp open source resenja
Classifying intangible social innovation concepts using machine learning and ...
Machine learning (ML) and natural language processing (NLP)
Veštačka inteligencija
AI an the future of society
Machine learning prediction of stock markets
Equity forecast: Predicting long term stock market prices using machine learning
BelBi2016 presentation: Hybrid methodology for information extraction from ta...
Extracting patient data from tables in clinical literature
Mobile security, OWASP Mobile Top 10, OWASP Seraphimdroid
Sentiment analysis for Serbian language
Http and security
Android business models
Android(1)
Sigurnosne prijetnje i mjere zaštite IT infrastrukture
Mašinska analiza sentimenta rečenica na srpskom jeziku
Software Freedom day Serbia - Owasp - informaciona bezbednost u Srbiji open s...
Software Freedom day Serbia - Owasp open source resenja
Ad

Recently uploaded (20)

PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Introduction to Data Science and Data Analysis
PDF
[EN] Industrial Machine Downtime Prediction
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Quality review (1)_presentation of this 21
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Introduction to the R Programming Language
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to Knowledge Engineering Part 1
Introduction to Data Science and Data Analysis
[EN] Industrial Machine Downtime Prediction
Reliability_Chapter_ presentation 1221.5784
Quality review (1)_presentation of this 21
oil_refinery_comprehensive_20250804084928 (1).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Miokarditis (Inflamasi pada Otot Jantung)
Introduction-to-Cloud-ComputingFinal.pptx
Database Infoormation System (DBIS).pptx
Introduction to the R Programming Language
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
SAP 2 completion done . PRESENTATION.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu

Supporting clinical trial data curation and integration with table mining