SlideShare a Scribd company logo
Integrated Data
Platform at Bayer
Turning bits into insights
Wolfgang Thielemann
Agenda
What platform did we built?
What does it look like?
Why did we build it?
Architecture and data enrichment
Challenges
Plans for the future
2 /// AI-SDV 2022 // Integrated Data Platform at Bayer
/// AI-SDV 2022 // Integrated Data Platform at Bayer
3
What Platform did we built?
1
/// AI-SDV 2022 // Integrated Data Platform at Bayer
4
Our platform semantically integrates Terabytes
of external scientific textual data to support
insight generation along the R&D value chain
/// AI-SDV 2022 // Integrated Data Platform at Bayer
5
Big data platform
This platform is…
• A semantically integrated and harmonized big data hub containing major external, text-
rich, and life-science related data sources
• Enriched with FAIR meta-data generated by extracting the key information (e.g., molecular
targets, medical conditions, active ingredients, technologies etc.) using NLP
• An analysis-ready platform for end-users (GUI access) and data scientists (API access)
/// AI-SDV 2022 // Integrated Data Platform at Bayer
6
Scientific
end users
Data scientists
Developers of
digital products
The users
/// AI-SDV 2022 // Integrated Data Platform at Bayer
7
The users
End-user GUIs
more power &
precision for
scientific search
Project leaders
R&D scientists
Tech scouts
& Co
Find relevant information
Alerts
Analysis
Filter & Review
Expert APIs
Provide structured
data for insight
generation
Data scientists
Computational scientists
Information professionals
Bioinformaticians
Generate insights
Find new targets & treatments
Support pipeline decisions
Build predictive models
/// AI-SDV 2022 // Integrated Data Platform at Bayer
8
What does it look like?
2
/// AI-SDV 2022 // Integrated Data Platform at Bayer
9
Example: Liver cancer
Google-like search interface
/// AI-SDV 2022 // Integrated Data Platform at Bayer
10
Example: Liver cancer
Interactive analysis and filtering
/// AI-SDV 2022 // Integrated Data Platform at Bayer
11
Example: Liver cancer
Result overview
/// AI-SDV 2022 // Integrated Data Platform at Bayer
12
Example: Liver cancer
Record view
/// AI-SDV 2022 // Integrated Data Platform at Bayer
13
Why did we build it?
3
/// AI-SDV 2022 // Integrated Data Platform at Bayer
14
Big Data Platform
6 Reasons why building it made and makes sense
Richness of data sources
Flexibility
Costs
Scalability
FAIR meta-data
Full transparency
and control
/// AI-SDV 2022 // Integrated Data Platform at Bayer
15
Scientific sources in our platform Platforms limited to publicly
available data
1. Bandwidth and richness of data sources
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
16
2. Maximum flexibility to analyze the data and to integrate it into our
Bayer data ecosystem
Existing platforms often come with limited/pre-defined analysis options and
limited integrability
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
17
Our platform is built on a scalable cloud infrastructure for big data analysis
and does allow you to analyze millions of records in one go.
Big Data Platform
Why did we build it?
3. Full scalability
/// AI-SDV 2022 // Integrated Data Platform at Bayer
18
4. Costs
This platform allowed us to save money and reduce complexity be replacing
various proprietary legacy platforms
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
19
5. One terminology across entire content and option to
adjust it to our needs
Individual sources / platforms typically have their own standards and
terminologies
One terminology
for entire platform
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
20
6. Comprehensiveness and quality of meta-data
Since we built on 20 years of thesauri and NLP algorithms optimized to
Bayer’s needs, our terminologies cover the real-life use of science much
better than established terminologies
Big Data Platform
Why did we build it?
MeSH:
/// AI-SDV 2022 // Integrated Data Platform at Bayer
21
6. Comprehensiveness and quality of meta-data
Proprietary disease thesaurus:
Big Data Platform
Why did we build it?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
22
Architecture & Data enrichment
4
/// AI-SDV 2022 // Integrated Data Platform at Bayer
23
Conference Abstracts
Literature Abstracts
Literature Fulltexts
Patents
Patent Chemistry
Clinical Trials
Pipeline Information
Market reports
Company Websites Industry News
Research Grants
Tech Transfer Offers
D
A
T
A
Data Engineering: Normalization, Deduplication, Classification, etc
(Kafka Streams)
Index, Search, and API Services (Elastic)
Semantic Enrichment: Targets, Organisms, Sequences, Drugs,
Active Ingredients, Companies/Organizations, Analytics, etc
Automated Data Acquisition (Kafka Technology)
P
R
O
C
E
S
S
APIs & Data Science
Platform architecture
End User Products
D
E
L
I
V
E
R
Cross-search GUI
Advanced literature GUI
Advanced patent GUI
System/Application Integrations
Other proprietary
platforms and
workflows use this
platform as source
/// AI-SDV 2022 // Integrated Data Platform at Bayer
24
Resolve all flavours of heterogeneity to make textual data FAIR
Big Data Platform
Semantic data integration at large
Semantic data
integration
Structural heterogeneity
Same facts expressed in different
schemata
Missing / additional attributes
Technical heterogeneity
Data formats (JSON vs. XML),
communication protocols (REST vs.
ODBC), query languages (SQL vs.
SPARQL)
Data model heterogeneity
Relational vs. Semi-structured, Tuples
vs. Graphs,…
Syntactic heterogeneity
Different presentation of the same fact
(Unicode or ASCII, EUR or €,…)
Semantic heterogeneity
Same concepts are named differently
➢ Pulmonary carcinoma
➢ Neoplasm of the lung
➢ ….
Different concepts are named same
GSK
Lung cancer
/// AI-SDV 2022 // Integrated Data Platform at Bayer
26
Challenges
5
Heterogeneous
formats
/// AI-SDV 2022 // Integrated Data Platform at Bayer
27
Challenges: Data ingestion
Heterogeneous
update schedules
hourly
daily
weekly
monthly
/// AI-SDV 2022 // Integrated Data Platform at Bayer
28
Challenges: Data ingestion
Changes in record
structure
Changes in
volume over time
/// AI-SDV 2022 // Integrated Data Platform at Bayer
29
Challenges: Data ingestion
De-duplication
De-duplication
De-duplication
De-duplication
De-duplication
/// AI-SDV 2022 // Integrated Data Platform at Bayer
30
Challenges: Semantic enrichment
Lack of universially accepted identifier for an entity class
Human gene
NCBI Gene ID
Chemical compound
INN name
IUPAC
CAS-Nr
PubChem CID
Canonical smiles
Disease
MeSH ID
UMLS ID
Snomed ID
NCIT ID
Orphanet ID
Mondo ID
ICD-10 ID
MedDRA ID
DO ID
…..
/// AI-SDV 2022 // Integrated Data Platform at Bayer
31
Challenges: Semantic enrichment
Identification of different entities require different technologies:
➢Terminology based NLP (e.g., disease names)
➢ML based NLP (e.g., for ambiguous acronyms like cell lines, gene acronyms etc.)
➢Rule/pattern-based extraction (e.g., IUPAC chemical names, gene mutations)
“A lamp-snp assay detecting c580y mutation in pfkelch13 gene from clinically dried blood spot samples”
➢Image/graph processing (e.g., image2mol)
C1=CC=C(C(=C1)CC(=O)[O-])NC2=C(C=CC=C2Cl)Cl.[Na+]
/// AI-SDV 2022 // Integrated Data Platform at Bayer
32
Status quo & Plans for the future
6
/// AI-SDV 2022 // Integrated Data Platform at Bayer
33
Are we now living in a fairytale where everything is perfect?
/// AI-SDV 2022 // Integrated Data Platform at Bayer
34
Are we now living in a fairytale where everything is perfect?
There is still a lot to do…
➢Terminology is constantly evolving (new companies, new technologies etc.)
➢Development of scalable algorithms for complex entities
➢Finding the most relevant information in the ocean of data
➢Advanced visualization and analytics
➢Further standardization
➢…..
/// AI-SDV 2022 // Integrated Data Platform at Bayer
35
What can you do to help us in our endevour?
Vendors / Publisher / Data base producers
• Data quality
• FAIRification
• Using generally available
standards & IDs
• Consistency
• Collecting scattered data
• Harmonization
/// AI-SDV 2022 // Integrated Data Platform at Bayer
36
SOURCES
e.g., drug labels,
guidelines
USABILITY
THESAURI
Automatization
e.g. alerting CHEMISTRY
ANALYSES features
Big Data Platform
Plans for the future
Thank you!
Special thanks to
my colleagues on
the team

More Related Content

PDF
AI in healthcare - Use Cases
PDF
Big data in the research life cycle: technologies, infrastructures, policies
PDF
Go from data to decision in one unified platform.pdf
PDF
Transforming Big Data into Big Value
PPTX
From documents to datasets and back: challenges and solutions
PDF
Covid-19 Response Capability with Power Systems
PPTX
Webinar: Leveraging big data in life sciences & healthcare
PDF
Big Data Analytics in the Health Domain
AI in healthcare - Use Cases
Big data in the research life cycle: technologies, infrastructures, policies
Go from data to decision in one unified platform.pdf
Transforming Big Data into Big Value
From documents to datasets and back: challenges and solutions
Covid-19 Response Capability with Power Systems
Webinar: Leveraging big data in life sciences & healthcare
Big Data Analytics in the Health Domain

Similar to AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany ) (20)

PDF
AI Pharma Summit Keynote Boston 7-26-17
PDF
Why an AI-Powered Data Catalog Tool is Critical to Business Success
PDF
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
PPTX
Neuroinformatics Databases Ontologies Federated Database.pptx
PPTX
Neuroinformatics_Databses_Ontologies_Federated Database.pptx
PDF
Maximize the Value of Your Data: Neo4j Graph Data Platform
PDF
II-SDV 2016 BizInt
PDF
Fair by design
PDF
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
PPTX
Business Intelligence Capabilities @ Neev
PDF
BioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
PPTX
Big Data Forum - Phoenix
PPTX
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
PDF
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
PPTX
Knowledge Management in the AI Driven Scintific System
PPTX
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
PDF
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
PDF
Global Data Annotation Tools Market Size & Analysis - Forecasts To 2026
PDF
2021 gartner mq dsml
PDF
Deep Learning Image Processing Applications in the Enterprise
AI Pharma Summit Keynote Boston 7-26-17
Why an AI-Powered Data Catalog Tool is Critical to Business Success
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Neuroinformatics Databases Ontologies Federated Database.pptx
Neuroinformatics_Databses_Ontologies_Federated Database.pptx
Maximize the Value of Your Data: Neo4j Graph Data Platform
II-SDV 2016 BizInt
Fair by design
Data Virtualization at UMC Utrecht: Don't Collect, Connect! by Erik Fransen (...
Business Intelligence Capabilities @ Neev
BioIT 2017 - Ontoforce and Amgen Gene Knowledge Discovery
Big Data Forum - Phoenix
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
Knowledge Management in the AI Driven Scintific System
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Global Data Annotation Tools Market Size & Analysis - Forecasts To 2026
2021 gartner mq dsml
Deep Learning Image Processing Applications in the Enterprise
Ad

More from Dr. Haxel Consult (20)

PDF
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
PDF
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
PDF
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
PDF
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
PDF
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
PDF
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
PDF
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
PDF
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
PDF
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
PDF
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
PDF
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
PDF
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
PDF
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
PDF
AI-SDV 2022: Copyright Clearance Center
PDF
AI-SDV 2022: Lighthouse IP
PDF
AI-SDV 2022: New Product Introductions: CENTREDOC
PDF
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
PDF
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
The Artificial Intelligence Conference on Search, Data and Text Mining, Analy...
Ad

Recently uploaded (20)

PDF
The Internet -By the Numbers, Sri Lanka Edition
PPTX
QR Codes Qr codecodecodecodecocodedecodecode
PPTX
Introduction to Information and Communication Technology
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
durere- in cancer tu ttresjjnklj gfrrjnrs mhugyfrd
PPTX
ppt for upby gurvinder singh padamload.pptx
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PPTX
PPT_M4.3_WORKING WITH SLIDES APPLIED.pptx
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PDF
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
PPTX
Digital Literacy And Online Safety on internet
PDF
Centralized Business Email Management_ How Admin Controls Boost Efficiency & ...
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
Behind the Smile Unmasking Ken Childs and the Quiet Trail of Deceit Left in H...
The Internet -By the Numbers, Sri Lanka Edition
QR Codes Qr codecodecodecodecocodedecodecode
Introduction to Information and Communication Technology
introduction about ICD -10 & ICD-11 ppt.pptx
Unit-1 introduction to cyber security discuss about how to secure a system
durere- in cancer tu ttresjjnklj gfrrjnrs mhugyfrd
ppt for upby gurvinder singh padamload.pptx
Paper PDF World Game (s) Great Redesign.pdf
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
WebRTC in SignalWire - troubleshooting media negotiation
PPT_M4.3_WORKING WITH SLIDES APPLIED.pptx
Job_Card_System_Styled_lorem_ipsum_.pptx
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
An introduction to the IFRS (ISSB) Stndards.pdf
LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1
Digital Literacy And Online Safety on internet
Centralized Business Email Management_ How Admin Controls Boost Efficiency & ...
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Behind the Smile Unmasking Ken Childs and the Quiet Trail of Deceit Left in H...

AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany )

  • 1. Integrated Data Platform at Bayer Turning bits into insights Wolfgang Thielemann
  • 2. Agenda What platform did we built? What does it look like? Why did we build it? Architecture and data enrichment Challenges Plans for the future 2 /// AI-SDV 2022 // Integrated Data Platform at Bayer
  • 3. /// AI-SDV 2022 // Integrated Data Platform at Bayer 3 What Platform did we built? 1
  • 4. /// AI-SDV 2022 // Integrated Data Platform at Bayer 4 Our platform semantically integrates Terabytes of external scientific textual data to support insight generation along the R&D value chain
  • 5. /// AI-SDV 2022 // Integrated Data Platform at Bayer 5 Big data platform This platform is… • A semantically integrated and harmonized big data hub containing major external, text- rich, and life-science related data sources • Enriched with FAIR meta-data generated by extracting the key information (e.g., molecular targets, medical conditions, active ingredients, technologies etc.) using NLP • An analysis-ready platform for end-users (GUI access) and data scientists (API access)
  • 6. /// AI-SDV 2022 // Integrated Data Platform at Bayer 6 Scientific end users Data scientists Developers of digital products The users
  • 7. /// AI-SDV 2022 // Integrated Data Platform at Bayer 7 The users End-user GUIs more power & precision for scientific search Project leaders R&D scientists Tech scouts & Co Find relevant information Alerts Analysis Filter & Review Expert APIs Provide structured data for insight generation Data scientists Computational scientists Information professionals Bioinformaticians Generate insights Find new targets & treatments Support pipeline decisions Build predictive models
  • 8. /// AI-SDV 2022 // Integrated Data Platform at Bayer 8 What does it look like? 2
  • 9. /// AI-SDV 2022 // Integrated Data Platform at Bayer 9 Example: Liver cancer Google-like search interface
  • 10. /// AI-SDV 2022 // Integrated Data Platform at Bayer 10 Example: Liver cancer Interactive analysis and filtering
  • 11. /// AI-SDV 2022 // Integrated Data Platform at Bayer 11 Example: Liver cancer Result overview
  • 12. /// AI-SDV 2022 // Integrated Data Platform at Bayer 12 Example: Liver cancer Record view
  • 13. /// AI-SDV 2022 // Integrated Data Platform at Bayer 13 Why did we build it? 3
  • 14. /// AI-SDV 2022 // Integrated Data Platform at Bayer 14 Big Data Platform 6 Reasons why building it made and makes sense Richness of data sources Flexibility Costs Scalability FAIR meta-data Full transparency and control
  • 15. /// AI-SDV 2022 // Integrated Data Platform at Bayer 15 Scientific sources in our platform Platforms limited to publicly available data 1. Bandwidth and richness of data sources Big Data Platform Why did we build it?
  • 16. /// AI-SDV 2022 // Integrated Data Platform at Bayer 16 2. Maximum flexibility to analyze the data and to integrate it into our Bayer data ecosystem Existing platforms often come with limited/pre-defined analysis options and limited integrability Big Data Platform Why did we build it?
  • 17. /// AI-SDV 2022 // Integrated Data Platform at Bayer 17 Our platform is built on a scalable cloud infrastructure for big data analysis and does allow you to analyze millions of records in one go. Big Data Platform Why did we build it? 3. Full scalability
  • 18. /// AI-SDV 2022 // Integrated Data Platform at Bayer 18 4. Costs This platform allowed us to save money and reduce complexity be replacing various proprietary legacy platforms Big Data Platform Why did we build it?
  • 19. /// AI-SDV 2022 // Integrated Data Platform at Bayer 19 5. One terminology across entire content and option to adjust it to our needs Individual sources / platforms typically have their own standards and terminologies One terminology for entire platform Big Data Platform Why did we build it?
  • 20. /// AI-SDV 2022 // Integrated Data Platform at Bayer 20 6. Comprehensiveness and quality of meta-data Since we built on 20 years of thesauri and NLP algorithms optimized to Bayer’s needs, our terminologies cover the real-life use of science much better than established terminologies Big Data Platform Why did we build it? MeSH:
  • 21. /// AI-SDV 2022 // Integrated Data Platform at Bayer 21 6. Comprehensiveness and quality of meta-data Proprietary disease thesaurus: Big Data Platform Why did we build it?
  • 22. /// AI-SDV 2022 // Integrated Data Platform at Bayer 22 Architecture & Data enrichment 4
  • 23. /// AI-SDV 2022 // Integrated Data Platform at Bayer 23 Conference Abstracts Literature Abstracts Literature Fulltexts Patents Patent Chemistry Clinical Trials Pipeline Information Market reports Company Websites Industry News Research Grants Tech Transfer Offers D A T A Data Engineering: Normalization, Deduplication, Classification, etc (Kafka Streams) Index, Search, and API Services (Elastic) Semantic Enrichment: Targets, Organisms, Sequences, Drugs, Active Ingredients, Companies/Organizations, Analytics, etc Automated Data Acquisition (Kafka Technology) P R O C E S S APIs & Data Science Platform architecture End User Products D E L I V E R Cross-search GUI Advanced literature GUI Advanced patent GUI System/Application Integrations Other proprietary platforms and workflows use this platform as source
  • 24. /// AI-SDV 2022 // Integrated Data Platform at Bayer 24 Resolve all flavours of heterogeneity to make textual data FAIR Big Data Platform Semantic data integration at large Semantic data integration Structural heterogeneity Same facts expressed in different schemata Missing / additional attributes Technical heterogeneity Data formats (JSON vs. XML), communication protocols (REST vs. ODBC), query languages (SQL vs. SPARQL) Data model heterogeneity Relational vs. Semi-structured, Tuples vs. Graphs,… Syntactic heterogeneity Different presentation of the same fact (Unicode or ASCII, EUR or €,…) Semantic heterogeneity Same concepts are named differently ➢ Pulmonary carcinoma ➢ Neoplasm of the lung ➢ …. Different concepts are named same GSK Lung cancer
  • 25. /// AI-SDV 2022 // Integrated Data Platform at Bayer 26 Challenges 5
  • 26. Heterogeneous formats /// AI-SDV 2022 // Integrated Data Platform at Bayer 27 Challenges: Data ingestion Heterogeneous update schedules hourly daily weekly monthly
  • 27. /// AI-SDV 2022 // Integrated Data Platform at Bayer 28 Challenges: Data ingestion Changes in record structure Changes in volume over time
  • 28. /// AI-SDV 2022 // Integrated Data Platform at Bayer 29 Challenges: Data ingestion De-duplication De-duplication De-duplication De-duplication De-duplication
  • 29. /// AI-SDV 2022 // Integrated Data Platform at Bayer 30 Challenges: Semantic enrichment Lack of universially accepted identifier for an entity class Human gene NCBI Gene ID Chemical compound INN name IUPAC CAS-Nr PubChem CID Canonical smiles Disease MeSH ID UMLS ID Snomed ID NCIT ID Orphanet ID Mondo ID ICD-10 ID MedDRA ID DO ID …..
  • 30. /// AI-SDV 2022 // Integrated Data Platform at Bayer 31 Challenges: Semantic enrichment Identification of different entities require different technologies: ➢Terminology based NLP (e.g., disease names) ➢ML based NLP (e.g., for ambiguous acronyms like cell lines, gene acronyms etc.) ➢Rule/pattern-based extraction (e.g., IUPAC chemical names, gene mutations) “A lamp-snp assay detecting c580y mutation in pfkelch13 gene from clinically dried blood spot samples” ➢Image/graph processing (e.g., image2mol) C1=CC=C(C(=C1)CC(=O)[O-])NC2=C(C=CC=C2Cl)Cl.[Na+]
  • 31. /// AI-SDV 2022 // Integrated Data Platform at Bayer 32 Status quo & Plans for the future 6
  • 32. /// AI-SDV 2022 // Integrated Data Platform at Bayer 33 Are we now living in a fairytale where everything is perfect?
  • 33. /// AI-SDV 2022 // Integrated Data Platform at Bayer 34 Are we now living in a fairytale where everything is perfect? There is still a lot to do… ➢Terminology is constantly evolving (new companies, new technologies etc.) ➢Development of scalable algorithms for complex entities ➢Finding the most relevant information in the ocean of data ➢Advanced visualization and analytics ➢Further standardization ➢…..
  • 34. /// AI-SDV 2022 // Integrated Data Platform at Bayer 35 What can you do to help us in our endevour? Vendors / Publisher / Data base producers • Data quality • FAIRification • Using generally available standards & IDs • Consistency • Collecting scattered data • Harmonization
  • 35. /// AI-SDV 2022 // Integrated Data Platform at Bayer 36 SOURCES e.g., drug labels, guidelines USABILITY THESAURI Automatization e.g. alerting CHEMISTRY ANALYSES features Big Data Platform Plans for the future
  • 36. Thank you! Special thanks to my colleagues on the team