SlideShare a Scribd company logo
Isabel Segura-Bedmar, Paloma Martínez, María Herrero-Zazo
Universidad Carlos III de Madrid, SPAIN
SemEval-2013 Task 9:
Extraction of Drug-Drug Interactions
from Biomedical Texts
Outline
2
 Motivation
 Previous Work: DDIExtraction 2011
 New in DDIExtraction 2013
 The DDI corpus
 Tasks
 Task 9.1: Drug Name Recognition and Classification
 Taks 9.2: Drug-Drug Interaction Extraction
 Conclusions
What is a Drug-Drug Interaction (DDI)?
3
Motivation
 A DDI occurs when a drug influences the
level or the activity of another drug.
 A DDI can be beneficial, but most times
DDIs are dangerous for patients and can
increase healthcare costs.
 Medical literature is the most effective
source for the detection of DDIs.
Information Extraction
4
Motivation
We thank the team at the Humboldt-Universitaet zu Berlin for making available a visualization of the DDI corpus using Stav:
http://http://corpora.informatik.hu-berlin.de/, https://github.com/TsujiiLaboratory/stav
Previous Work: DDIExtraction 2011
5
 Automatic extraction of drug-drug
interactions from texts.
 Dataset: a collection of 579 documents
from DrugBank.
 DDIs annotated by a pharmacist,
 Drugs automatically annotated.
 F1 ranged between 0.16 and 0.66.
Previous Work
New in SemEval Task 9
6
 Task 9.1: Drug Name Recognition and
Classification.
 Task 9.2: DDI Detection and Classification.
 The DDI corpus:
 double size: 1,025 annotated documents, 18,502
pharmacological substances and 5,028 DDIs.
 Drugs and DDIs were manually annotated by two
pharmacists.
 Available annotation guidelines and Inter-Annotator
agreement.
 Two different text sources:
 MedLine
 DrugBank.
Motivation
Tasks
7
 Task 9.1: Drug Name Recognition and
Classification.
 Task 9.2: Drug-Drug Interaction Extraction
Tasks
Task 9.1 - Drug Classification
8
Tasks
 drug type for generic drugs.
(Eg. Heparin, ibuprofen, methotrexate).
 brand type for trade drugs.
(Eg. Espidifen, aspirin).
 group type for groups of drugs.
(Eg. Analgesics, anticoagulants).
 drug_n type for active substances not approved for
human use.
(Eg. Picrotoxin, heroin)
Task 9.1 - Teams
9
Team Affiliation Approach
LASIGE Lisbon University Conditional Random
Fields
UEM_UC3M European
University, Carlos III
University of Madrid
Ontology-based
approach
UMCC_DLSI Matanzas
University, Alicant
University
J48 classifier
Uturku Turku University SVM classifier
(TEES system)
WBI Humboldt University
of Berlin
Conditional Random
Fields
Tasks
Task 9.1 Evaluation
10
 Recognition (regardless to the type):
 Exact-boundary matching (EXACT).
 Partial-boundary matching (PARTIAL).
 Recognition and classification:
 Exact-boundary + type matching
(STRICT).
 Partial-boundary + type matching
(TYPE).
Tasks
Task 9.1- Overview of the results
11
 Groups and substances not approved are
more difficult than drugs and brands:
 brand names: short and unique.
 generic names: no ambiguity because they
are simplified chemical names.
 group names can be ambiguous (eg.
anticoagulant, anti-retroviral, etc)
 group names: many variants and
abbreviations.
Tasks
Task 9.1- Overview of the results
12
 Drug-n type was the most difficult
type:
 very scarce in DrugBank (less1%).
 less clearly defined in guidelines.
 Systems are able to identify, but fail to
classify them.
Tasks
Tasks
13
 Task 9.1: Drug Name Recognition and
Classification.
 Task 9.2: Drug-Drug Interaction
Extraction
Tasks
14
 Gold annotations for drugs are provided to
teams both for training and test datasets.
Task 9.2: Drug-Drug Interaction (DDI) Extraction
Tasks
15
 Gold annotations for drugs are provided to
teams both for training and test datasets.
 Detect DDI and classify them
Task 9.2: Drug-Drug Interaction (DDI) Extraction
Tasks
16
 Gold annotations for drugs are provided to
teams both for training and test datasets.
 Detect DDI and classify them
Task 9.2: Drug-Drug Interaction (DDI) Extraction
Tasks
EFFECT
EFFECT
MECHANISM
DDI Classification
17
Tasks
 mechanism type for interactions describing the way the
interaction occurs.
Lansoprazole may decrease the absorption of enoxacin.
 effect type for interactions describing the consequence of
the interaction.
Additive CNS depression may occur when antihistamines are
administered with barbiturates.
 advice type for interactions describing a recommendation or
advice.
Patients taking isoniazid and disulfiram concomitantly should
closely monitored.
 int type for mentions of interactions without any additional
information. Clopidogrel interacts with omeprazol.
Task 9.2 Teams
18
Team Affiliation Approach
FBK-irst FBK-irst, Italy Hybrid kernel + scope of
negations and semantic
roles
NIL_UCM Complutense University of
Madrid, Spain
SVM classifier
SCAI Fraunhofer SCAI,
Germany
SVM classifier
UC3M Carlos III University of
Madrid, Spain
Shallow Linguistic Kernel
UCOLORADO_SO
M
University of Colorado,
School of Medicine, USA
SMV classifier
Uturku Turku University, Finland SVM classifier (TEES
system)
UWM_TRIADS University of Wisconsin,
USA
Two-stage SVM
WBI_DDI Humboldt University of Ensemble of kernels
Tasks
Task 9.2- Results
19
0.827
0.676
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DrugBank
Tasks
0.53
0.42
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MedLine
Task 9.2- Overview of the results
20
 Detection: significant improvement over 2011:
66% F1 (2011) vs . 82% F1 (2013)
 In DrugBank:
 Int DDI type is the most difficult (54% F1).
 Mechanism, effect and advice types show
similar F1 (70%).
 In MedLine, results for effect and mechanism
types are considerably lower due to the
complexity of sentences describing these DDIs.
 Non-linear kernel-based methods overcome
linear SVMs.
Tasks
Conclusion
21
 13 teams from 7 different countries.
 In both tasks, the results on DrugBank are
considerably better than the ones on MedLine.
 Best F1:
Task 9.1 Drug NERC Task 9.2 Extraction of DDIs
Recognitio
n
Recognition +
Classification
Detection Detection +
Classification
DrugBan
k
90% 87% 82% 53%
MedLine 80% 58% 67% 42%
Conclusion
22
 13 teams from 7 different countries.
 In both tasks, results on DrugBank
considerably better than the ones on
MedLine.
 Task 9.1:
 Best system (WBI): conditional random field
+ the training dataset extended with the
test dataset for task 9.2.
 Most difficult: groups and drug-n.
 Task 9.2:
 There is much room to improve.
Future of the task
23
 Include new types of texts:
 prescription drug documents,
 health records,
 texts from social media about DDIs and
adverse event drugs.
 No plans for annotating new documents.
 Goal of the next DDIExtraction:
 Create a silver standard DDI corpus.
 To annotate effect, mechanism, drug dosages,
etc.
 Similar to CALBC challenge.
Acknowledgments
24
 This work was supported by the Regional
Government of Madrid under the Research Network
MA2VICMR [S2009/TIC-1542] and by the Spanish
Ministry of Education under the project
MULTIMEDICA [TIN2010-20644-C03-01].
 To all participants for their efforts and to congratulate
them to their interesting work.
 To the Uturku team who provided TEES analyses for
training and test datasets.
 To the WBI team who made available a visualization
of the DDI corpus using Stav.
Thanks!!!
25

More Related Content

PDF
harold_cv052015
PDF
Executive Team Program
PPTX
Hoe verbinden met Acerta op Doccle?
PPTX
Things to consider when buying marcasite jewelery wholesale
PPTX
презентация ооп 4 дет сад
PPTX
Wholesale Silver Jewelry New York : Silver Alloys are in High Demand
PPT
2414274
PDF
EyeBizz 7-2016 JZ
harold_cv052015
Executive Team Program
Hoe verbinden met Acerta op Doccle?
Things to consider when buying marcasite jewelery wholesale
презентация ооп 4 дет сад
Wholesale Silver Jewelry New York : Silver Alloys are in High Demand
2414274
EyeBizz 7-2016 JZ

Similar to Extraction of Drug-Drug Interactions from Biomedical Texts (20)

PPTX
The DDI (Drug-Drug Interaction) Corpus
PPT
Lessons from the Drug-Drug Interaction Extraction Task
ODP
Application of Information Extraction techniques to pharmacological domain: E...
ODP
Combining Syntactic Information and domain-specific Lexical Patterns to Extra...
PPTX
Addressing gaps-in-clinically-useful-evidence-on-dd is-nlm-training-2014
PPTX
Towards a foundational representation of potential drug-drug interaction know...
PPT
Linked data-and-sp ls-fda-spl-jamboree-092014
PPTX
Artificial intelligence in drug discovery
PPTX
Initial progress on the journey toward an open source potential drug-drug int...
PDF
ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...
PDF
Drug Discovery and Development Using AI
PPTX
Presentation "Spanish Resources in Trendminer Project"
PDF
Assessing Drug Safety Using AI
PPTX
Acquiring and representing drug-drug interaction knowledge and evidence, Litm...
PDF
DINTO An Ontology for Drug-Drug Interactions
PDF
Keynote malone-clinical-relevance-of-ddi-evidence
PPT
Drug discovery strategy final draft
PPTX
Pharmacodynamic drug interactions
PDF
Mobile Apps for transporter Drug-drug interaction prediction - a tool of the ...
PPTX
Role of bioinformatics in drug designing
The DDI (Drug-Drug Interaction) Corpus
Lessons from the Drug-Drug Interaction Extraction Task
Application of Information Extraction techniques to pharmacological domain: E...
Combining Syntactic Information and domain-specific Lexical Patterns to Extra...
Addressing gaps-in-clinically-useful-evidence-on-dd is-nlm-training-2014
Towards a foundational representation of potential drug-drug interaction know...
Linked data-and-sp ls-fda-spl-jamboree-092014
Artificial intelligence in drug discovery
Initial progress on the journey toward an open source potential drug-drug int...
ICIC 2014 Semantic Integration of Pharmaceutical Content : Blueprint and Exam...
Drug Discovery and Development Using AI
Presentation "Spanish Resources in Trendminer Project"
Assessing Drug Safety Using AI
Acquiring and representing drug-drug interaction knowledge and evidence, Litm...
DINTO An Ontology for Drug-Drug Interactions
Keynote malone-clinical-relevance-of-ddi-evidence
Drug discovery strategy final draft
Pharmacodynamic drug interactions
Mobile Apps for transporter Drug-drug interaction prediction - a tool of the ...
Role of bioinformatics in drug designing
Ad

More from Grupo HULAT (20)

PDF
Interaccion 2019 lourdes moreno
PDF
Low vision interaccion2018v4
PDF
Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...
PDF
Babelfy: Entity Linking meets Word Sense Disambiguation.
PDF
Integration of Accessibility Requirements in the Design of Multimedia User Ag...
PPTX
New Approaches to Interactive Multimedia Content Retrieval from different Sou...
PPTX
Mujeres, ciencia y tecnología. Encuesta sobre la percepción de las dificultad...
PPT
BioSEPLN 2010 Workshop on Language Technology applied to biomedical and heal...
PDF
Building a Graph of Names and Contextual Patterns for Named Entity Classifica...
PPTX
Accessibility to mobile interfaces for older people
PPTX
Toward an integration of Web accessibility into testing processes
PPT
Revisión de los requisitos de accesibilidad en la interacción del usuario anc...
PPTX
Formación y tecnologías en accesibilidad para la Universidad
PPTX
Requisitos de accesibilidad web en los reproductores multimedia
PPT
Integrating HCI in a Web accessibility engineering approach
PPT
A MDD approach for modelling web accessibility
PPT
Inclusive Usability Techniques in Requirements Analysis of Accessible Web App...
PPTX
Adaptation Rules for Accessible Media Player Interface
PPTX
An approach to User Interface Design of an accessible user agent
PPTX
A study of accessibility requirements for media players on the Web
Interaccion 2019 lourdes moreno
Low vision interaccion2018v4
Exploring language technologies to provide support to WCAG 2.0 and E2R guidel...
Babelfy: Entity Linking meets Word Sense Disambiguation.
Integration of Accessibility Requirements in the Design of Multimedia User Ag...
New Approaches to Interactive Multimedia Content Retrieval from different Sou...
Mujeres, ciencia y tecnología. Encuesta sobre la percepción de las dificultad...
BioSEPLN 2010 Workshop on Language Technology applied to biomedical and heal...
Building a Graph of Names and Contextual Patterns for Named Entity Classifica...
Accessibility to mobile interfaces for older people
Toward an integration of Web accessibility into testing processes
Revisión de los requisitos de accesibilidad en la interacción del usuario anc...
Formación y tecnologías en accesibilidad para la Universidad
Requisitos de accesibilidad web en los reproductores multimedia
Integrating HCI in a Web accessibility engineering approach
A MDD approach for modelling web accessibility
Inclusive Usability Techniques in Requirements Analysis of Accessible Web App...
Adaptation Rules for Accessible Media Player Interface
An approach to User Interface Design of an accessible user agent
A study of accessibility requirements for media players on the Web
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Introduction to Data Science and Data Analysis
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Managing Community Partner Relationships
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
annual-report-2024-2025 original latest.
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction-to-Cloud-ComputingFinal.pptx
Predictive modeling basics in data cleaning process
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Clinical guidelines as a resource for EBP(1).pdf
Reliability_Chapter_ presentation 1221.5784
IB Computer Science - Internal Assessment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Knowledge Engineering Part 1
Introduction to Data Science and Data Analysis
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Analytics and business intelligence.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
climate analysis of Dhaka ,Banglades.pptx
Managing Community Partner Relationships
Miokarditis (Inflamasi pada Otot Jantung)
Qualitative Qantitative and Mixed Methods.pptx
annual-report-2024-2025 original latest.
Galatica Smart Energy Infrastructure Startup Pitch Deck

Extraction of Drug-Drug Interactions from Biomedical Texts

  • 1. Isabel Segura-Bedmar, Paloma Martínez, María Herrero-Zazo Universidad Carlos III de Madrid, SPAIN SemEval-2013 Task 9: Extraction of Drug-Drug Interactions from Biomedical Texts
  • 2. Outline 2  Motivation  Previous Work: DDIExtraction 2011  New in DDIExtraction 2013  The DDI corpus  Tasks  Task 9.1: Drug Name Recognition and Classification  Taks 9.2: Drug-Drug Interaction Extraction  Conclusions
  • 3. What is a Drug-Drug Interaction (DDI)? 3 Motivation  A DDI occurs when a drug influences the level or the activity of another drug.  A DDI can be beneficial, but most times DDIs are dangerous for patients and can increase healthcare costs.  Medical literature is the most effective source for the detection of DDIs.
  • 4. Information Extraction 4 Motivation We thank the team at the Humboldt-Universitaet zu Berlin for making available a visualization of the DDI corpus using Stav: http://http://corpora.informatik.hu-berlin.de/, https://github.com/TsujiiLaboratory/stav
  • 5. Previous Work: DDIExtraction 2011 5  Automatic extraction of drug-drug interactions from texts.  Dataset: a collection of 579 documents from DrugBank.  DDIs annotated by a pharmacist,  Drugs automatically annotated.  F1 ranged between 0.16 and 0.66. Previous Work
  • 6. New in SemEval Task 9 6  Task 9.1: Drug Name Recognition and Classification.  Task 9.2: DDI Detection and Classification.  The DDI corpus:  double size: 1,025 annotated documents, 18,502 pharmacological substances and 5,028 DDIs.  Drugs and DDIs were manually annotated by two pharmacists.  Available annotation guidelines and Inter-Annotator agreement.  Two different text sources:  MedLine  DrugBank. Motivation
  • 7. Tasks 7  Task 9.1: Drug Name Recognition and Classification.  Task 9.2: Drug-Drug Interaction Extraction Tasks
  • 8. Task 9.1 - Drug Classification 8 Tasks  drug type for generic drugs. (Eg. Heparin, ibuprofen, methotrexate).  brand type for trade drugs. (Eg. Espidifen, aspirin).  group type for groups of drugs. (Eg. Analgesics, anticoagulants).  drug_n type for active substances not approved for human use. (Eg. Picrotoxin, heroin)
  • 9. Task 9.1 - Teams 9 Team Affiliation Approach LASIGE Lisbon University Conditional Random Fields UEM_UC3M European University, Carlos III University of Madrid Ontology-based approach UMCC_DLSI Matanzas University, Alicant University J48 classifier Uturku Turku University SVM classifier (TEES system) WBI Humboldt University of Berlin Conditional Random Fields Tasks
  • 10. Task 9.1 Evaluation 10  Recognition (regardless to the type):  Exact-boundary matching (EXACT).  Partial-boundary matching (PARTIAL).  Recognition and classification:  Exact-boundary + type matching (STRICT).  Partial-boundary + type matching (TYPE). Tasks
  • 11. Task 9.1- Overview of the results 11  Groups and substances not approved are more difficult than drugs and brands:  brand names: short and unique.  generic names: no ambiguity because they are simplified chemical names.  group names can be ambiguous (eg. anticoagulant, anti-retroviral, etc)  group names: many variants and abbreviations. Tasks
  • 12. Task 9.1- Overview of the results 12  Drug-n type was the most difficult type:  very scarce in DrugBank (less1%).  less clearly defined in guidelines.  Systems are able to identify, but fail to classify them. Tasks
  • 13. Tasks 13  Task 9.1: Drug Name Recognition and Classification.  Task 9.2: Drug-Drug Interaction Extraction Tasks
  • 14. 14  Gold annotations for drugs are provided to teams both for training and test datasets. Task 9.2: Drug-Drug Interaction (DDI) Extraction Tasks
  • 15. 15  Gold annotations for drugs are provided to teams both for training and test datasets.  Detect DDI and classify them Task 9.2: Drug-Drug Interaction (DDI) Extraction Tasks
  • 16. 16  Gold annotations for drugs are provided to teams both for training and test datasets.  Detect DDI and classify them Task 9.2: Drug-Drug Interaction (DDI) Extraction Tasks EFFECT EFFECT MECHANISM
  • 17. DDI Classification 17 Tasks  mechanism type for interactions describing the way the interaction occurs. Lansoprazole may decrease the absorption of enoxacin.  effect type for interactions describing the consequence of the interaction. Additive CNS depression may occur when antihistamines are administered with barbiturates.  advice type for interactions describing a recommendation or advice. Patients taking isoniazid and disulfiram concomitantly should closely monitored.  int type for mentions of interactions without any additional information. Clopidogrel interacts with omeprazol.
  • 18. Task 9.2 Teams 18 Team Affiliation Approach FBK-irst FBK-irst, Italy Hybrid kernel + scope of negations and semantic roles NIL_UCM Complutense University of Madrid, Spain SVM classifier SCAI Fraunhofer SCAI, Germany SVM classifier UC3M Carlos III University of Madrid, Spain Shallow Linguistic Kernel UCOLORADO_SO M University of Colorado, School of Medicine, USA SMV classifier Uturku Turku University, Finland SVM classifier (TEES system) UWM_TRIADS University of Wisconsin, USA Two-stage SVM WBI_DDI Humboldt University of Ensemble of kernels Tasks
  • 20. Task 9.2- Overview of the results 20  Detection: significant improvement over 2011: 66% F1 (2011) vs . 82% F1 (2013)  In DrugBank:  Int DDI type is the most difficult (54% F1).  Mechanism, effect and advice types show similar F1 (70%).  In MedLine, results for effect and mechanism types are considerably lower due to the complexity of sentences describing these DDIs.  Non-linear kernel-based methods overcome linear SVMs. Tasks
  • 21. Conclusion 21  13 teams from 7 different countries.  In both tasks, the results on DrugBank are considerably better than the ones on MedLine.  Best F1: Task 9.1 Drug NERC Task 9.2 Extraction of DDIs Recognitio n Recognition + Classification Detection Detection + Classification DrugBan k 90% 87% 82% 53% MedLine 80% 58% 67% 42%
  • 22. Conclusion 22  13 teams from 7 different countries.  In both tasks, results on DrugBank considerably better than the ones on MedLine.  Task 9.1:  Best system (WBI): conditional random field + the training dataset extended with the test dataset for task 9.2.  Most difficult: groups and drug-n.  Task 9.2:  There is much room to improve.
  • 23. Future of the task 23  Include new types of texts:  prescription drug documents,  health records,  texts from social media about DDIs and adverse event drugs.  No plans for annotating new documents.  Goal of the next DDIExtraction:  Create a silver standard DDI corpus.  To annotate effect, mechanism, drug dosages, etc.  Similar to CALBC challenge.
  • 24. Acknowledgments 24  This work was supported by the Regional Government of Madrid under the Research Network MA2VICMR [S2009/TIC-1542] and by the Spanish Ministry of Education under the project MULTIMEDICA [TIN2010-20644-C03-01].  To all participants for their efforts and to congratulate them to their interesting work.  To the Uturku team who provided TEES analyses for training and test datasets.  To the WBI team who made available a visualization of the DDI corpus using Stav.

Editor's Notes

  • #2: Hello, My name is Isabel Segura-Bedmar from the University Carlos the third of Madrid, I am here /hir/ with Paloma Martínez. We have organized the task nine ABOUT the extraction /ek-traek-shon/ of drug-drug interactions /inter-aek-shons/ from biomedical texts /teksts/ Time: 0.15 min
  • #3: Now I am going to describe /dis-kraib/ the task, the participating /parti-sipeiting/ sytems and their results /deir risalts/.
  • #4: DEFINICIÓN- DANGEROUS- DATABASES. I will start talking /’toking/ about what it’s a drug-drug interaction. An interaction occurs when a drug influences the level or the activity of another drug, Unfortunately /an’fo:rchunali/, many interactions are very dangerous /’deingeras/ Clinicians use drug databases to avoid DDIs, however these databases are not comprehensive, because many interactions are only described /diskraibd/ in medical journals or in drug safety re’ports. Time: 0.30 min.
  • #5: INFORMATION EXTRACTION - EXAMPLE We think that Information Extraction can help to improve the early /e:rli/ detection /di-tek-shon/ of drug interactions and to reduce /ri-duus/ time spent by healthcare proffessionals on reading all published(pablisht/ information about DDIs. For example, in this sentences, an IE could identiy the drugs (marked in color) and then, extract their interactions (the pairs of drugs connected with the label the DDI label /leibol/ Time: 1 min
  • #6: DDIEXTRACION 2011 – DDI CORPUS – DRUGS NO HUMAN REVIW – 10 TEMAS AND F1 RANGED In 2011 /tuentiileven/, we organized /o:r’ga-naisd/ a first challenge /’chalinch/ to promote ‘re-search /’ri-sarch/ on this topic. With this goal /goul/, We created /krietid/ a corpus annotated with DDIs by a pharmacist. However, drugs were annotated by the MetaMap tool without any man-ual /man-inual/ review /ri-viu/ 10 teams participated /partisipeitid/ in the task and THE F1-meAsures ranged /reinchd/ between zero /zierou/ point one six and zero /zierou/ point six six. Time: 1 min.
  • #7: NEW TASK 9.1 – AND CLAS DDIS + DOUBLE SIZE, MANUALLY ANNOTATED, GUIDELINES AND IAA + MEDLINE IN ADITTION TO DRUGBNAK Ok, and now what’s it new? In this new challenge, we propose a new task, the recognition and classificaction of the pharmacological substances. And the initial task, detection of DDIs, is extended to include the classification of DDIs. Moreover, we have improved the DDI corpus. In particular, we have increase its size, almost double. Also, In this new version, both drugs and DDIs are manually annotated by two different pharmacists. We created annotation guidelines and also measured the inter-annotaror agreement. The new corpus contains MedLine abstracts in addition to the DrugBank documents. Time: 2 min
  • #8: So, now I am going to describe the first task. In this task, systems should be able to identify /aidentifai/ the drugs /sap’stansis/ and then, to classiy (klaesifai/ them with their right types. Time: 0.15 sg.
  • #9: In particular, we propose four different types of pharmacologcial substances: Drug type for generic drugs, for example, ‘hep-arin Brand type for commercial names, for example, asirin (/asprin/) Group type for groups of drugs, for example, analgesics /anal-gisecs/ And drug_n for active /aktiv/ substances not approved /a-proud/ for human /hu’man/ use (for example, heroin /jer-oin/) Time: 0.30 sg.
  • #10: 5- SUPERVISED – ONLY ONE DICTIONARIES – BEST SYSTEM 5 teams /tims/ particiated /partisipetid/ from 5 /faiv/ different countries,. Most teams used supervised /su’per-vaiz/ methods such as CRF, SVM or j48 /jei fortyeight/ classifiers /klaesifairs/ Only one team developed /di’velopt/ a system based on dictionaries and rules to classify the pharmacological substances. The best team was Humbold university of berlin. This system was based on CRF.
  • #11: We proposed four different evaluations: The first two, exact /ig´sakt/ and partial boundary matching, for the evaluation of drug recognition task, that is, without considering the types. The latter two for the evaluation of the overall task, recognition and classification. Unfortunately, we do not have enough time to discuss /dis-cas/ each evaluation, so in this presentation, we will only present the results /’risalts/ for the overall task.
  • #12: We also calculated F-measure for each entity type to know what type of substances are more difficult /difikalt/ than others. The results showed that groups and substances not approved /a-pproud/ for human use are more difficult than drugs and brands. This may be because brand names are usually short are unique and in general, generic drugs are no ambigues because they are simplified /simplifaid/ chemical names, and therefore, they are unique. On the other hand, group names have many variants and abbreviations, and also can be ambigues (for example, anticoagulant is a group and also its effect).
  • #13: Finally, substances not approved for human use is the most difficult type. This is becuase this type is very scarce /skees/ in DrugBank, and therefore, there are few examples for training. We also think that the definition of this type may be some confused. Indeed, the systems were able to identiy these substances, but failed /feld/ to classify them.
  • #14: Now, I am going to describe the second task about the extraction /eks-trak-shon/ of DDIs.
  • #15: In this task, the gold annotations for pharmacological substances are provided for training and test datasets Time: 0.15 sg.
  • #16: Then, Systems should be able to detect /di-tekt/ which pairs /pers/ of drugs are interacting /in-teraktin/ Time: 0.15 sg.
  • #17: And then to classify them. Time: 0.15 sg.
  • #18: We propose four types to classify /klaesifai/ DDIs: Mechanism /mecanisem/ type to classify DDIs describing the way the interaction occurs. Effect /e’fekt/ type to classify DDIs describing the consequence of the interaction. Advice type to classify DDIs providing a recommendation to avoid a possible interaction. And the int type for the sentences describing an interaction without providing any additional information. Time: 0.30 sg.
  • #19: 8- BUILD /BILT/ ON SVM – NON-LINEAR KERNELS – BEST ON DRUGBANK – BEST ON MEDLINE. 8 teams participated in the task. Most sytems were built /bilt/ on SVM. Thee of them used non-linear kernel methods, and the rest linear SVM In DrugBank, The two best teams were FBK-irst /ef bi key from Italy/ and the team from Humbold University. These two teams were also the two best teams in 2011. In MedLine, the best team was the fraunhofer SCAI (s-si-ei-ai) team.
  • #20: OVERALL TASK MORE DIFFICULT THAN ONLY DETECTION. DEC (82) vs (67) DEC&CLA on DRUGBANK. FBK DEC (53) vs (42) DEC&CLA on MEDLINE. FBK, SCAI. CONCLUSION: MORE DIFFICULT FOR MEDLINE Here you can see the results for the detection task (bars in dark blue) and for the overall task, detection and classification of DDIs (bars in light blue) The overall task is more difficult tan only detection. For example, in DrugBAnk, the best F1 was 82% for detection and only 67% for the overall task. (this is a difference of 25% between both tasks). The best team was the FBK. In MedLine, the best F! Was 53% for detection and only 42% for the overal task. We can conclude the extraction of DDIs is more dificult on MedLine than on DrugBank. .
  • #21: IMPROVEMENT – MOST DIFFICUTL TYPE – MECHANISM AND EFFECT MORE EXAMPLES FOR TRAINING – ADVICE SIMILAR TEXT PATTERNS – WORSE MEDLINE – KERNLES OVERCOM LINEAR CLASSIFIERS OK, let me say some highlights of the results: First, There is an important improvement /impruvment/ in the results of the the detetion task over 2011, because the best F1 in 2011 was only 65% and now is 82%, almost a 17% of increase. ----- Well, In DrugBank, the most difficult is the int type because it’s least /li:st/ frequent type, and thereore, there few examples for training. The rest of types show similar performance around 70% of F1. Mechanisim and effect types also show 70% of f1, we think becuase they are the most common types, and therefore, they have more examples for training. And on the other hand, advice is less frequente, however Sentences describing advices usually follow very same similar text patterns. in MedLine,we would to say that the results are lower due /diut/ to mainly the complexity of these sentences. Like in 2011, kernels methods overcome linear classifiers.
  • #22: NUM TEAMS- TASK MORE DIFFICULT MEDLINE Ok, main conclusions of the tast are: A total of 13 /thirtiin/ teams from 7 different countries /kan’tris/ have participated in the two tasks, 5 in the drug name reconition and 8 in the extraction of DDIs. Both tasks are more difficult for MedLine texts than for DrugBank texts. In fact, Task 9.1- difference almost 30% in F1 between the two datasets. We think that this is because DrugBank contain very few instances of the most difficult type, substances not approved por human use. In the detection and classification of DDI, the diference almost 11%, We think that this is becuase DrugBank sentences are shorter and simpler, and therefore, less difficutl to process than MedLIne sentences which usuallly are long and complex.
  • #23: Ok, Finally, Let me remind you/ri-maind-you/ that In the first task, the best system was based on CRF and used the training dataset extended /ek-tendid/ with the test dataset for the extraction of DDIs task. In general the systems obtain good performance on DrugBank, however the results In the second task, there are much room to improve, particularly on MedLine, becuase the best f1 was only 42% for detection and classification of DDIs.
  • #24: We would like to improve the task in several ways, for example, including new types of documents such as health records or prescription drug documents. However, we have no plans to annotate more documents because the task is very expensive. For this reason, we plan to organize a third challenge in which the participating systems colaborate to create a silver corpus. Also, In this new challenge, we would like to annotate additional features /fi’chars/ (such as the effect /i’fekt/, the mechanism /’mekanisem/ or the drug dosages /dosichs/) because are very important to determine /di’te:rmin/ the clinical significance of a DDI. We think that this new challenge can be similar to the CALBC /kabici/ challenge.
  • #25: I have to say that the task task was funded /andid/ by the MULTIMEDICA and MAVIR projects. We would like to thank all teams, and In particular, two teams: The Uturku team who provided TEES analyses for datasets. And the team from the humbold univerity of Berlin, who provided a visualization /visualiseison/ of the DDI corpus using the Stav tool .
  • #26: Thank you for your attention. If you have questions or comments