SlideShare a Scribd company logo
IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 201023-26 May, Vancouver, BC, CanadaComputational Knowledge & Information Management in Veterinary EpidemiologySvitlana Volkova and William H. HsuLaboratory for Knowledge Discovery in DatabasesDepartment of Computing and Information SciencesKansas State UniversitySponsors: K-State National Agricultural Biosecurity Center (NABC)US Department of Defense
AgendaIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010OverviewAnimal Disease Monitoring SystemsManually Supported Web-InterfacesAutomated Web-ServicesFramework for Epidemiological AnalyticsWeb Crawling & SearchDomain-specific Entity ExtractionAnimal Disease-related Event RecognitionSummary
Animal Infectious Disease Outbreaksinfluence on the travel and tradecause economic crises, political instabilitydiseases, zoonotic in type can cause loss of lifeIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
AgendaIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010OverviewAnimal Disease Monitoring SystemsManually Supported Web-InterfacesAutomated Web-ServicesFramework for Epidemiological AnalyticsSystem FunctionalityWeb CrawlingDomain-specific Entity ExtractionAnimal Disease-related Event RecognitionSummary
Animal Disease Monitoring Systems: ManuallySupportedWeb Interfaces (1)International:World Animal Health Information Database (WAHID) Interface - http://www.oie.int/wahis/public.php?page=homeWHO Global Atlas of Infectious Diseases - http://diseasemaps.usgs.gov/index.htmEmergency Prevention System (EMPRES) for Transboundary Animal and Plant Pests and Diseases - http://www.fao.org/EMPRES/default.htmlIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
Animal Disease Monitoring Systems: ManuallySupportedWeb Interfaces(2)USACenters for Disease Control and Prevention (CDC) - http://www.cdc.govU.S. Department of Agriculture (USDA) - http://www.usda.gov/wps/portal/usdahomeU.S. Geological Survey (USGS) and U.S. Geological Survey (USGS) National Wildlife Health Center (NWHC) - http://www.nwhc.usgs.govIowa State University Center for Food Security and Public Health (CFSPH) - http://www.cfsph.iastate.eduBioPortal - http://biocomputingcorp.com/bpsystem.htmlFMD BioPortal - https://fmdbioportal.ucdavis.eduUnited KingdomDepartment for Environment Food and Rural Affairs (DEFRA) - http://www.defra.gov.ukIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
AgendaIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010OverviewAnimal Disease Monitoring SystemsManually Supported Web-InterfacesAutomatedWeb-ServicesFramework for Epidemiological AnalyticsSystem FunctionalityWeb CrawlingDomain-specific Entity ExtractionAnimal Disease-related Event RecognitionSummary
Animal Disease Monitoring Systems: Automated Web Services (1)BioCaster - http://biocaster.nii.ac.jp/follows 1500 RSS feeds hourly
classifies documents as topically relevant or not
taxonomy of 4300 named entities (50 disease names, 243 country names, 4025 province/city names, latitudes and longitudes)
identifies 40 diseases at up to 25-30 locations per day
multilingual information extraction on to English, French, Spanish, Chinese, Thai, Vietnamese, Japanese
uses ontology pattern matching approaches to recognize disease-location-verb pairs
plots events on a Google Map
does not classify events into categories and does not report past outbreaks
no timeline visualizationIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
BioCaster -http://biocaster.nii.ac.jp/IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
Animal Disease Monitoring Systems: Automated Web Services (1)Information retrieval system MedISys  -  http://medusa.jrc.it/medisys/homeedition/all/home.htmlPattern-based Understanding and Learning System (PULS) - http://sysdb.cs.helsinki.fi/puls/jrc/allallows automated recognizing of the metadata and structured facts related to the disease outbreaks
collects an average 50000 news articles per day from about 1400 news portals and about 150 specialized Public Health sites
43 languages
current ontology contains 2400 disease names, 400 organisms, 1500 political entities and over 70000 location names including towns, cities, provinces
real-time news clustering and filtering by matching 3000 patterns
does not classify events and does not report past outbreaks.IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
MedISys - http://medusa.jrc.it/medisys/homeedition/all/home.html*part of the Europe Media Monitor (EMM) product family http://emm.jrc.it/overview.htmlIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
Pattern-based Understanding and Learning System (PULS)IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
Animal Disease Monitoring Systems: Automated Web Services (1)HealthMap - http://healthmap.org/enaggregates articles from Google News and ProMED-Mail portal
2300 locations and 1100 disease names
identifies between 20-30 outbreaks per day
multiple languages English, Russian, Arabic, French, Portuguese, Spanish, Chinese
manually supported systemEpiSpider- http://www.epispider.org/combines emerging infectious disease data from:
ProMED-Mail - www.promedmail.org
The Global Disaster Alert Coordinating System (GDACS) - www.gdacs.org
Central Intelligence Agency (CIA) Factbook - https://www.cia.gov/library/publications/the-world-factbook/
TheUnited Nations Human Development Report sites - http://hdr.undp.org/enIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
HealthMap - http://healthmap.org/enIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
ProMED-Mail -www.promedmail.orgIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
EpiSpider - http://www.epispider.org/IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
IEEE ISI'10
Animal Disease-related Data OnlineStructured DataUnstructured DataOfficial reports by different organizations:state and federal laboratories, bioportals; health care providers;governmental agricultural or environmental agencies.Web-pagesNewsE-mails (e.g., ProMed-Mail)BlogsMedical literature (e.g., books)Scientific papers (e.g., PubMed)IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
Research Challenges (1)Large amount of information from multiple sources
Extract facts/structured information from unstructured text
Manage the specificity of the content  (e.g. blogosphere, biomedical literature, news, official reports etc.)
Necessity of data aggregation from multiple sources
Speculate about event confidence
Solution: Reliability of the source by majority voting
Multiple locations, different case status, different victim types lead to multiple data base entries
Solution:  spurious event detection and event disambiguation e.g.,source 1: 10 victims vs. source 2: 15 victimsResearch Challenges (2)Resolve location disambiguation  “Rabies in Isle of Wight”
What geo-tag in Virginia, USA or UK?
Solution: track geo-tag of the original source of information

More Related Content

PPTX
Master Thesis
PDF
Livestock Disease Prediction System
PPTX
PPT
ECDC and early detection of public health threats of EU concern: the role of ...
PDF
FAO partnerships on health risk and control of influenza and emerging zoonoses
PPTX
The Good, the Bad and the Ugly: a portrait of health social media trends and ...
PPTX
Bioinformatics Database Computer applications
PPTX
20th ieee re conference, chicago 2012
Master Thesis
Livestock Disease Prediction System
ECDC and early detection of public health threats of EU concern: the role of ...
FAO partnerships on health risk and control of influenza and emerging zoonoses
The Good, the Bad and the Ugly: a portrait of health social media trends and ...
Bioinformatics Database Computer applications
20th ieee re conference, chicago 2012

Similar to IEEE ISI'10 (20)

PPTX
Exploiting NLP for Digital Disease Informatics
PPT
High throughput analysis and alerting of disease outbreaks from the grey lite...
PPT
Biosurveillance 2.0: Lecture at Emory University
PDF
Integrated Series in Information Systems 1st Edition by Ramesh Sharda, Stefan...
PPT
Wildlife diseases
PDF
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
PPT
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
PDF
Integrated Series in Information Systems 1st Edition by Ramesh Sharda, Stefan...
PPTX
Intelligence supported media monitoring in veterinary medicine
PDF
Multimodal Information Extraction: Disease, Date and Location Retrieval
PDF
Improving Disease Surveillance in the United States Using Companion Animal Data
PPT
2009 EpiSPIDER CDC GIS Day
PPT
Bioterrorism Talk.ppt
PPT
Biosurveillance 2.0
PPT
Sansone mibbi-intro
PDF
Epidemic Alert System: A Web-based Grassroots Model
PDF
Innovative information systems enabling public research organizations and the...
PDF
A generic model for disease outbreak
PDF
20200405 MEDical INTelligence Platform INTRO.pdf
PPTX
Big data for precision medicine: challenges and opportunities
Exploiting NLP for Digital Disease Informatics
High throughput analysis and alerting of disease outbreaks from the grey lite...
Biosurveillance 2.0: Lecture at Emory University
Integrated Series in Information Systems 1st Edition by Ramesh Sharda, Stefan...
Wildlife diseases
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
Biosurveillance: Machine Learning And Disease Surveillance by Kass-Hout Di Tada
Integrated Series in Information Systems 1st Edition by Ramesh Sharda, Stefan...
Intelligence supported media monitoring in veterinary medicine
Multimodal Information Extraction: Disease, Date and Location Retrieval
Improving Disease Surveillance in the United States Using Companion Animal Data
2009 EpiSPIDER CDC GIS Day
Bioterrorism Talk.ppt
Biosurveillance 2.0
Sansone mibbi-intro
Epidemic Alert System: A Web-based Grassroots Model
Innovative information systems enabling public research organizations and the...
A generic model for disease outbreak
20200405 MEDical INTelligence Platform INTRO.pdf
Big data for precision medicine: challenges and opportunities
Ad

More from Svitlana volkova (15)

PDF
EACL'12 Poster
PDF
Grace Hopper Celebration 2010
PPTX
Web Intelligence 2010
PPTX
MS Thesis Short
PDF
Multilingual Ner Using Wiki
PDF
WiML Poster
PDF
Topics Modeling
PPTX
Project Proposal Topics Modeling (Ir)
PDF
Social Networks
PDF
Methods Of Reliability Analysis
PDF
Ohio Project
PDF
Ukraine Presentation
PDF
Ukraine Presentation at Kansas State University
PDF
Communicatons Fulbright
PDF
Communications Ternopil
EACL'12 Poster
Grace Hopper Celebration 2010
Web Intelligence 2010
MS Thesis Short
Multilingual Ner Using Wiki
WiML Poster
Topics Modeling
Project Proposal Topics Modeling (Ir)
Social Networks
Methods Of Reliability Analysis
Ohio Project
Ukraine Presentation
Ukraine Presentation at Kansas State University
Communicatons Fulbright
Communications Ternopil
Ad

Recently uploaded (20)

PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
RMMM.pdf make it easy to upload and study
PPTX
Cell Types and Its function , kingdom of life
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Supply Chain Operations Speaking Notes -ICLT Program
STATICS OF THE RIGID BODIES Hibbelers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Pharmacology of Heart Failure /Pharmacotherapy of CHF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
102 student loan defaulters named and shamed – Is someone you know on the list?
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
RMMM.pdf make it easy to upload and study
Cell Types and Its function , kingdom of life
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Pharma ospi slides which help in ospi learning
human mycosis Human fungal infections are called human mycosis..pptx
Final Presentation General Medicine 03-08-2024.pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
GDM (1) (1).pptx small presentation for students
Anesthesia in Laparoscopic Surgery in India
Supply Chain Operations Speaking Notes -ICLT Program

IEEE ISI'10

  • 1. IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 201023-26 May, Vancouver, BC, CanadaComputational Knowledge & Information Management in Veterinary EpidemiologySvitlana Volkova and William H. HsuLaboratory for Knowledge Discovery in DatabasesDepartment of Computing and Information SciencesKansas State UniversitySponsors: K-State National Agricultural Biosecurity Center (NABC)US Department of Defense
  • 2. AgendaIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010OverviewAnimal Disease Monitoring SystemsManually Supported Web-InterfacesAutomated Web-ServicesFramework for Epidemiological AnalyticsWeb Crawling & SearchDomain-specific Entity ExtractionAnimal Disease-related Event RecognitionSummary
  • 3. Animal Infectious Disease Outbreaksinfluence on the travel and tradecause economic crises, political instabilitydiseases, zoonotic in type can cause loss of lifeIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 4. AgendaIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010OverviewAnimal Disease Monitoring SystemsManually Supported Web-InterfacesAutomated Web-ServicesFramework for Epidemiological AnalyticsSystem FunctionalityWeb CrawlingDomain-specific Entity ExtractionAnimal Disease-related Event RecognitionSummary
  • 5. Animal Disease Monitoring Systems: ManuallySupportedWeb Interfaces (1)International:World Animal Health Information Database (WAHID) Interface - http://www.oie.int/wahis/public.php?page=homeWHO Global Atlas of Infectious Diseases - http://diseasemaps.usgs.gov/index.htmEmergency Prevention System (EMPRES) for Transboundary Animal and Plant Pests and Diseases - http://www.fao.org/EMPRES/default.htmlIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 6. IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 7. Animal Disease Monitoring Systems: ManuallySupportedWeb Interfaces(2)USACenters for Disease Control and Prevention (CDC) - http://www.cdc.govU.S. Department of Agriculture (USDA) - http://www.usda.gov/wps/portal/usdahomeU.S. Geological Survey (USGS) and U.S. Geological Survey (USGS) National Wildlife Health Center (NWHC) - http://www.nwhc.usgs.govIowa State University Center for Food Security and Public Health (CFSPH) - http://www.cfsph.iastate.eduBioPortal - http://biocomputingcorp.com/bpsystem.htmlFMD BioPortal - https://fmdbioportal.ucdavis.eduUnited KingdomDepartment for Environment Food and Rural Affairs (DEFRA) - http://www.defra.gov.ukIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 8. AgendaIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010OverviewAnimal Disease Monitoring SystemsManually Supported Web-InterfacesAutomatedWeb-ServicesFramework for Epidemiological AnalyticsSystem FunctionalityWeb CrawlingDomain-specific Entity ExtractionAnimal Disease-related Event RecognitionSummary
  • 9. Animal Disease Monitoring Systems: Automated Web Services (1)BioCaster - http://biocaster.nii.ac.jp/follows 1500 RSS feeds hourly
  • 10. classifies documents as topically relevant or not
  • 11. taxonomy of 4300 named entities (50 disease names, 243 country names, 4025 province/city names, latitudes and longitudes)
  • 12. identifies 40 diseases at up to 25-30 locations per day
  • 13. multilingual information extraction on to English, French, Spanish, Chinese, Thai, Vietnamese, Japanese
  • 14. uses ontology pattern matching approaches to recognize disease-location-verb pairs
  • 15. plots events on a Google Map
  • 16. does not classify events into categories and does not report past outbreaks
  • 17. no timeline visualizationIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 18. BioCaster -http://biocaster.nii.ac.jp/IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 19. Animal Disease Monitoring Systems: Automated Web Services (1)Information retrieval system MedISys - http://medusa.jrc.it/medisys/homeedition/all/home.htmlPattern-based Understanding and Learning System (PULS) - http://sysdb.cs.helsinki.fi/puls/jrc/allallows automated recognizing of the metadata and structured facts related to the disease outbreaks
  • 20. collects an average 50000 news articles per day from about 1400 news portals and about 150 specialized Public Health sites
  • 22. current ontology contains 2400 disease names, 400 organisms, 1500 political entities and over 70000 location names including towns, cities, provinces
  • 23. real-time news clustering and filtering by matching 3000 patterns
  • 24. does not classify events and does not report past outbreaks.IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 25. MedISys - http://medusa.jrc.it/medisys/homeedition/all/home.html*part of the Europe Media Monitor (EMM) product family http://emm.jrc.it/overview.htmlIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 26. Pattern-based Understanding and Learning System (PULS)IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 27. Animal Disease Monitoring Systems: Automated Web Services (1)HealthMap - http://healthmap.org/enaggregates articles from Google News and ProMED-Mail portal
  • 28. 2300 locations and 1100 disease names
  • 29. identifies between 20-30 outbreaks per day
  • 30. multiple languages English, Russian, Arabic, French, Portuguese, Spanish, Chinese
  • 31. manually supported systemEpiSpider- http://www.epispider.org/combines emerging infectious disease data from:
  • 33. The Global Disaster Alert Coordinating System (GDACS) - www.gdacs.org
  • 34. Central Intelligence Agency (CIA) Factbook - https://www.cia.gov/library/publications/the-world-factbook/
  • 35. TheUnited Nations Human Development Report sites - http://hdr.undp.org/enIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 36. HealthMap - http://healthmap.org/enIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 37. ProMED-Mail -www.promedmail.orgIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 38. EpiSpider - http://www.epispider.org/IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 40. Animal Disease-related Data OnlineStructured DataUnstructured DataOfficial reports by different organizations:state and federal laboratories, bioportals; health care providers;governmental agricultural or environmental agencies.Web-pagesNewsE-mails (e.g., ProMed-Mail)BlogsMedical literature (e.g., books)Scientific papers (e.g., PubMed)IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 41. Research Challenges (1)Large amount of information from multiple sources
  • 42. Extract facts/structured information from unstructured text
  • 43. Manage the specificity of the content (e.g. blogosphere, biomedical literature, news, official reports etc.)
  • 44. Necessity of data aggregation from multiple sources
  • 46. Solution: Reliability of the source by majority voting
  • 47. Multiple locations, different case status, different victim types lead to multiple data base entries
  • 48. Solution: spurious event detection and event disambiguation e.g.,source 1: 10 victims vs. source 2: 15 victimsResearch Challenges (2)Resolve location disambiguation “Rabies in Isle of Wight”
  • 49. What geo-tag in Virginia, USA or UK?
  • 50. Solution: track geo-tag of the original source of information
  • 51. Deal with unknown or undiagnosed diseases
  • 52. “The deadly outbreak has so far killed 16 people in Gabon”
  • 53. Solution: Look into the context/recent outbreaks in this location
  • 55. “FMD outbreak was reported last week/today…”
  • 56. Solution: Useset of regular expressions and date/time ontologyExisting Systems vs. Designed System (1)Existing SystemsDesigned SystemIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 57. Existing Systems vs. Designed System (2)Existing SystemsDesigned SystemIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 58. Targeting AudienceResearch and Public Health communities 1. Managing the specificity of blogosphereHealth Care Providers (e.g. hospitals)Governmental Agencies (e.g. Center for Disease Control and Prevention )2. Dealing with biomedical literature3. News content & official reports processingLaboratories4. Capturing all possible breakdowns in communication channels between levels of animal disease managementStateNationalInternationalIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 59. AgendaIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010OverviewAnimal Disease Monitoring SystemsManually Supported Web-InterfacesAutomated Web-ServicesFramework for Epidemiological AnalyticsSystem FunctionalityWeb CrawlingDomain-specific Entity ExtractionAnimal Disease-related Event RecognitionSummary
  • 60. Framework forEpidemiological AnalyticsIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 61. 1. Data Collection (1)Periodically crawl the web using Heritrix crawler - http://crawler.archive.org/set of seeds (ProMED-Mail, DEFRA etc.)set of terms (animal disease names from the ontology) Text-to-tag ratio-based method for content extraction from web pagesIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 62. 1. Data Collection (2)WWWEmailCrawlerDBDocument CollectionQueryLiterature
  • 63. 2. Data SharingIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010Document relevance classification using Naive Bayes Classifier from Mallet - http://mallet.cs.umass.eduRelevantNon-relevant
  • 64. 3. SearchIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010Lucene-based* rankingQuery-based keyword searchSearch by animal disease name and/or location*Lucene - http://lucene.apache.org
  • 65. 4. Data AnalysisEvent example:“On 12 September 2007, a new foot-and-mouth disease outbreak was confirmed in Egham, Surrey”IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 66. Domain Meta-dataDomain-independent knowledgeDomain-specific knowledgeLocation hierarchynames of countries, states, cities;Time hierarchycanonical dates.Medical ontologydiseases, serotypes, and viruses.IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 67. Event Recognition MethodologyStep 1. Entity recognition from raw text.Step 2. Sentence classification from which entities are extracted as being related to an event or not; if they are related to an event we classify them as confirmed or suspected.Step 3. Combination of entities within an event sentence into the structured tuples and aggregation of tuples related to the same event into one comprehensive tuple.IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 68. Step 1.Entity RecognitionIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010Locate and classify atomic elements into predefined categories:Disease names:“foot and mouth disease”, “rift valley fever”; viruses: “picornavirus”; serotypes: “Asia-1”;Species: “sheep”, “pigs”, “cattle” and “livestock”;Locationsof events specified at different levels of geo-granularity: “United Kingdom", “eastern provinces of Shandong and Jiangsu, China”;Datesin different formats: “last Tuesday”, “two month ago”.
  • 69. Entity Recognition ToolsAnimal Disease Extractor*relies on a medical ontology, automatically-enriched with synonyms and causative viruses.Species Extractor* pattern matching on a stemmed dictionary of animal names from Wikipedia.Location ExtractorStanford NER Tool** (uses conditional random fields);NGA GEOnet Names Database (GNS)*** for location disambiguation and retrieving latitude/longitude.Date/Time Extractorset of regular expressions.*KDD KSU DSEx - http://fingolfin.user.cis.ksu.edu:8080/diseaseextractor/**Stanford NER - http://nlp.stanford.edu/ner/index.shtml***GNS - http://earth-info.nga.mil/gns/html/IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 70. Step 2. Event Sentence Classification Constraint: True events should include a disease name together with a status verb from Google Sets* and WordNet** (eliminate event non-related sentences).“Foot and mouth disease is[V] a highly pathogenic animal disease”.Confirmed status verbs “happened” and verb phrases “strike out”“On 9 Jun 2009, the farm's owner reported[V] symptoms of FMD in more than 30 hogs”.Suspected status verbs “catch” and verb phrases “be taken in”“RVF is suspected[V] in Saudi Arabia in September 2000”. *GoogleSets - http://labs.google.com/sets **WordNet - http://wordnet.princeton.edu/IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 71. Step 3. Event Tuple GenerationIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010Event attributes:diseasedatelocationspeciesconfirmation statusEvent tuple:Eventi = < disease; date; location; species; status > = <FMD, 9 Jun 2009, Taoyuan, hog, confirmed>Event tuple with missing attributes:Eventj = <FMD, ?, ?, ?, confirmed>
  • 72. Event Recognition WorkflowStep 1: Entity RecognitionFoot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC]. Taiwan's TVBS television station reports that agricultural authorities confirmed foot-and-mouth disease[DIS] on a hog[SP] farm in Taoyuan[LOC]. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP]. Subsequent testing confirmed FMD[DIS]. Agricultural authorities asked the farmer to strengthen immunization. The outbreak has not affected other farms. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks.Step 2: Sentence ClassificationYES 1. Foot-and-mouth disease[DIS]on hog[SP] farm in Taoyuan[LOC]. YES 2.Taiwan's TVBS television station reports that agricultural authorities confirmedfoot-and-mouth disease[DIS]on a hog[SP] farm in Taoyuan[LOC]. YES 3. On 9 Jun 2009[DT], the farm's owner reported symptoms of FMD[DIS] in more than 30 hogs[SP]. YES 4. Subsequent testing confirmedFMD[DIS].NO 5. Agricultural authorities asked the farmer to strengthen immunization.NO 6. The outbreak has not affected other farms. NO 7. Authorities stipulated that the affected hog[SP] farm may not sell pork for 2 weeks.Step 3a: Tuple GenerationE1 = <Foot-and-mouth disease, ?, Taoyuan, hog, ?> E3 = <FMD, 9 Jun 2009,?, hog, reported>E2 = <Foot-and-mouth disease, ?, Taoyuan, hog, confirmed > E4 = <FMD, ?, ?, ?, confirmed>Step 3b: Tuple AggregationE = <disease, date, location, species, status> = <Foot-and-mouth disease, 9 Jun 2009, Taoyuan, hog, confirmed > The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010)
  • 73. Animal Disease Extraction Results (1)Synonymic relationships – “E1 is a kind of E2” E1 = “swine influenza” is a kind of E2 = “swine fever”Hyponymic relationships – “E1 and E1 are diseases” E1 = “anthrax”, E2 = “yellow fever” are diseasesCausal relationships – “E1 is caused by E2” E1 = “Ovine epididymitis” is caused by E2 = “Brucella ovis”The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010)
  • 74. Animal Disease Extraction Results (2)The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010)
  • 75. Animal Disease Extraction Results (3)The First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010)
  • 76. Event Recognition ResultsScorei = < wddisease; wtdate; wllocation; wsspecies; wcstatus… >,subject to disease + status = 2Interpret the Pyramid score values -http://www1.cs.columbia.edu/~becky/DUC2006/2006-pyramid-guidelines.html_ducviewas an event extraction accuracyApply list of verbs from GoogleSets and WordNetWe use NS (unstemmed) and S (stemmed) versions of the verb listsThe First International Workshop on Web Science and Information Exchange in the Medical Web (MedEx 2010)
  • 77. 5. VisualizationIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010Map ViewGoogleMaps API - http://code.google.com/apis/maps/TimeLine ViewSIMILE API - http://www.simile-widgets.org/timeline/
  • 78. Event Representation by Date/TimeTimeline Viewhttp://fingolfin.user.cis.ksu.edu/timemap.1.4/FMD_2007_UK_Viz/FMD_Viz.htm
  • 79. Event Representation by LocationMap Viewhttp://fingolfin.user.cis.ksu.edu/timemap.1.4/FMD_2007_UK_Viz/FMD_Viz.htm
  • 80. Summaryperform focused crawling of different sources (books, research papers, blogs, governmental sources, etc.)use semantic relationship learning approach (including synonymic, hyponymic, causal relationships) for automated-ontology expansion for domain-specific entity extraction (e.g., diseases, viruses)recognize geo-entities using CRF approach and disambiguates them using GNServerextract animal disease-related events with more descriptive event attributes such as: species, dates, event confirmation status, in contrast to ”disease-location” pairssupport timeline representation of extracted events in SIMILE in addition to visualized events on GoogleMapsIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 81. IEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010
  • 82. AcknowledgmentsIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010KDD Lab alumni: Tim Weninger (crawler deployment) and Jing Xia (rule-based event extraction)KDD Lab assistants:Information Extraction Team (John Drouhard, Landon Fowles, Swathi Bujuru)Spatial Data Mining Team (Wesam Elshamy, AndrewBerggren)Topic Detection & Tracking Team (Surya Kallumadi, Danny Jones, Srinivas Reddy)Faculty at the University of Illinois at Urbana-Champaign (2009 Data Sciences Summer Institute)ChengXiangZhai, Dan Roth, Jiawei Han and Kevin Chang.
  • 83. ReferencesIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010S. Volkova, W. Hsu, and D. Caragea, “Named entity recognition and tagging in the domain of epizootics”, In Proc. of Women in Machine Learning Workshop (WiML’09).S. Volkova, D. Caragea, W. H. Hsu, and S. Bujuru, “Animal disease event recognition and classification,” In Proc. of The First International Workshop on Web Science and Information Exchange in the Medical Web, 19th World Wide Web Conference WWW-2010.S. Volkova, D. Caragea, W. H. Hsu, J. Drouhard, and L. Fowles, “Boosting Biomedical Entity Extraction by using Syntactic Patterns for Semantic Relation Discovery”, ACM Web Intelligence Conference, 2010 (to appear).
  • 84. Thank you!Svitlana Volkova, svitlana.volkova@gmail.comhttp://people.cis.ksu.edu/~svitlanaWilliam H. Hsu, bhsu@cis.ksu.eduhttp://people.cis.ksu.edu/~bhsuIEEE International Conference on Intelligence and Security Informatics Public Safety and Security, ISI 2010