SlideShare a Scribd company logo
AI Drug Discovery in Patent
Space
Hanjo Kim
Principal Scientist at Standigm Inc.
hanjo.kim@standigm.com
business@standigm.com
apply@standigm.com
www.standigm.com
Disclaimer
• Statements of fact and opinions expressed in this presentation
and on the following slides are solely those of the presenter and
not necessarily those of Standigm Inc.
Standigm Inc.
2015
Founded by three researchers at Samsung Advanced Institute of Technology
Jinhan Kim, PhD Artificial Intelligence (The University of Edinburgh)
Sang Ok Song, PhD Chemical Engineering (Seoul National University)
So Jeong Yun, PhD Systems Biology (POSTECH)
$23M
Funding raised
SK Holdings, Mirae Asset Capital, Mirae Asset Venture Investment, DSC
Investment, Wonik Investment, Atinum Investment, LB Investment, Kakao
Ventures
Seoul Korea (33)
Ann Arbor
Michigan (2)
Standigm= drug discovery company that generates and optimizes therapeutic
lead compounds by using advanced artificial intelligence toward license-out
Cambridge
UK (1)
AI, 16
Biology, 6
Chemistry, 8
Systems Biology,
4
Advisor, 3
PhD
20/37*
* Except Operation 5, Patent attorney 1
The AI solution
Disease Hit Lead Preclinical Clinical Drug
Drug
repositioning
The Standigm AI solution is industrializing drug discovery
Discovery at Scale
Target
* developing
BEST
TM
ASK
TM
Insight
TM
FIRST
*
Standigm ASKTM is freely available at
https://icluenask.standigm.com
Standigm BEST Platform
Standigm BESTStandigm
ASK
Knowledge
based biology
platform
for
novel targets,
pathways, and
MoA discovery
Standigm
FIRST
Hit generation
platform
for
novel and/or
undruggable
targets
Generative Models
Graph-based VAE
Scaffold-based
conditional enumerator
Novel Molecular
Representation
Scoring Functions
Simulations
AI rescoring models
Machine learning models
Compound Database
Known Molecules
Seed Molecules
Novel Virtual Structures
Commercial Library Privileged Standigm Library
Target Database Public data (gene, protein, function) BEST Feasibility
Public Library
Strategy setup Hit Generation Hit-2-Lead
Predictive Models
ADME/Tox predictors
Novelty (patentability)
Synthetic accessibility
Filters/Ranking models
External
CROs
Organic
synthesis,
In vitro/in vivo
Assays
Novel/Commercial Hits Lead Series
Graph-based VAE
Chemical
space
Encoder Decoder
Latent
space
Chemical
space
E DZ
Learning chemical space
Training DB
~4M
Y
Property/Target information
Contextualizing:
- substructures
- topology
- shape
- etc
property 1
property 2
property 3
Z : latent space
predictor
q(y|z)
seed molecules
decoder
p(x|z)
X : original chemical space
encoder
q(z|x)
Analogue structure generation
functionally similar
but novel scaffolds/molecules
Lead optimization
novel molecules
w/ better desired properties
decoder
p(x|z)
Smart library expansion
IP generation & expansion
Patent Space
Target A Compounds in latent space
Competitor 1
Competitor 2
Competitor 3
Interesting Area
potentweak
Chemical Space Navigation
• Chemical Space ~ Map
• Known scaffolds ~ POIs
• Information-rich space (ChEMBL, PubChem Bioassays, etc.)
• Novel scaffold ~ New POI
• El Dorado
• Patent
• Markush structure: How to protect as wide as possible area
• Exemplified compounds: boundary stones
Using ChemCurator
• Project types
• Google Patents (most cases)
• PDF files (do not use pdf files!)
• Text files (when google ocr is not good)
Using ChemCurator
Google patents
Using ChemCurator
Text files
OCR (and chemical OCR)
• Lessons
• Google patents is reliable in most cases
• It even provides the compound table though very primitive
• Professional OCR software can give better results
• Convert pdf file to plain text with chemical names
• Complex tables
• Image (not OCRed) tables (next 3 slides)
• Chemical OCR engine helps a lot
• Text-image comparison
• Chemical OCR engines
• CLiDE (recommended, proprietary)
• Osra (open-source, recommended on Linux machine)
• Imago (I have no experience)
• Unsupported engines (like ChemGrapher,
https://pubs.acs.org/doi/10.1021/acs.jcim.0c00459)
Chemical structures in patens
Chemical structures in patents
Chemical structures in patents
Better OCR result
Markush Structures
• Very expressive
• Same set of compounds can be written to very different forms
• Not well-validated
• ChemCurator helps
• Extracting example compounds
• Matching them to the Markush structure
• Require manual correction
• Sentence to chemical groups
• Ambiguous/incomplete R-group definitions
AI can help
• Reduction of frequent text OCR error
• NLP technique can correct frequent OCR errors
• The availability of large training set is important
• Extraction of relevant data
• Biological activities
• Analytical data
• Chemical OCR can be improved
• AI can do image recognition very well
• Different drawing styles can be managed
Acknowledgement
• Standigm Inc.
• Sanghyung JIN, Minkyu HA, Soyeon Kim, Sangok SONG
• T&J Tech. (Korean distributor)
• Jung-A HAN

More Related Content

PDF
“A Barra e os Portos da Ria de Aveiro – 1808-1932 – Arquivo da Administração ...
DOC
SUBMISSION AAPTEL PARTS A-E 06_14_13
PPTX
Machine Learning for Protein Design: Antibodies and Biologics
PDF
Software Engineering Research: Leading a Double-Agent Life.
PPTX
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
PDF
Cheminformatics Software Development: Case Studies
PPTX
MADICES Mungall 2022.pptx
PDF
(Very) Recent AI advances for Chemical Engineering research and education
“A Barra e os Portos da Ria de Aveiro – 1808-1932 – Arquivo da Administração ...
SUBMISSION AAPTEL PARTS A-E 06_14_13
Machine Learning for Protein Design: Antibodies and Biologics
Software Engineering Research: Leading a Double-Agent Life.
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cheminformatics Software Development: Case Studies
MADICES Mungall 2022.pptx
(Very) Recent AI advances for Chemical Engineering research and education

Similar to Patent Data for Artificial Intelligence based Drug Discovery (20)

PDF
Tag.bio aws public jun 08 2021
PDF
Semantic Solutions from Information Exploration.pptx
PPTX
20 million public patent structures: looking at the gift horse
PPTX
AI and technology tools in Research and academics.pptx
PPTX
artifical intelligence (ai), robotics and cf in pharmaceutical dynamics
PDF
Nesher Tech I-Corps@NIH 121014
PPTX
Learning Systems for Science
PPT
IT Cluster Skolkovo Presentation at FRUCT.org conference
PPTX
Osp 1st sep2015 OSDD
PDF
A Peek Into a Must-Have Add-On Solution for Oracle Clinical
PDF
Resilience Engineering: A field of study, a community, and some perspective s...
PDF
Overview of SureChEMBL
PDF
Predicting medical tests results using Driverless AI
PDF
IntroVision investment
PDF
Lionel Briand ICSM 2011 Keynote
PPTX
Liquilume NSF Final Presentation
PPT
Linking chemistry: wider lessons for how we publish research
PDF
Why and How to do a Software Startup
PDF
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
PPTX
Enhancing the Quality of ImmPort Data
Tag.bio aws public jun 08 2021
Semantic Solutions from Information Exploration.pptx
20 million public patent structures: looking at the gift horse
AI and technology tools in Research and academics.pptx
artifical intelligence (ai), robotics and cf in pharmaceutical dynamics
Nesher Tech I-Corps@NIH 121014
Learning Systems for Science
IT Cluster Skolkovo Presentation at FRUCT.org conference
Osp 1st sep2015 OSDD
A Peek Into a Must-Have Add-On Solution for Oracle Clinical
Resilience Engineering: A field of study, a community, and some perspective s...
Overview of SureChEMBL
Predicting medical tests results using Driverless AI
IntroVision investment
Lionel Briand ICSM 2011 Keynote
Liquilume NSF Final Presentation
Linking chemistry: wider lessons for how we publish research
Why and How to do a Software Startup
ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities
Enhancing the Quality of ImmPort Data
Ad

More from ChemAxon (20)

PPTX
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
PDF
Chemaxon EU UGM 2022 | Translating data to predictive models
PDF
Translating data to predictive models
PDF
Efficient biomolecular structural data handling and analysis - Webinar with D...
PDF
Biomolecule structural data management
PPTX
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
PDF
Enhanced stereochemistry representation
PDF
Intellectual property (IP) intelligence solutions designed for the way resear...
PDF
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
PPTX
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
PDF
Research data management on the cloud
PDF
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
PDF
Cheminfo Stories APAC 2020 - JChem Engines introduction
PDF
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
PDF
Cheminfo Stories APAC 2020 -- Markush technology
PDF
JChem Microservices
PDF
Migration from joc to jpc or choral
PPTX
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
PPTX
Chemicalize Pro - Cheminfo Stories 2020 Day 5
PPTX
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Chemaxon EU UGM 2022 | Translating data to predictive models
Translating data to predictive models
Efficient biomolecular structural data handling and analysis - Webinar with D...
Biomolecule structural data management
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Enhanced stereochemistry representation
Intellectual property (IP) intelligence solutions designed for the way resear...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Research data management on the cloud
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 -- Markush technology
JChem Microservices
Migration from joc to jpc or choral
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Pasteur Institute User Story - Cheminfo Stories 2020 Day 5
Ad

Recently uploaded (20)

PDF
System and Network Administraation Chapter 3
PPTX
Essential Infomation Tech presentation.pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
AI in Product Development-omnex systems
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
history of c programming in notes for students .pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
System and Network Administraation Chapter 3
Essential Infomation Tech presentation.pptx
How Creative Agencies Leverage Project Management Software.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Softaken Excel to vCard Converter Software.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PTS Company Brochure 2025 (1).pdf.......
Navsoft: AI-Powered Business Solutions & Custom Software Development
Reimagine Home Health with the Power of Agentic AI​
Odoo Companies in India – Driving Business Transformation.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
AI in Product Development-omnex systems
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
history of c programming in notes for students .pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool

Patent Data for Artificial Intelligence based Drug Discovery

  • 1. AI Drug Discovery in Patent Space Hanjo Kim Principal Scientist at Standigm Inc. hanjo.kim@standigm.com business@standigm.com apply@standigm.com www.standigm.com
  • 2. Disclaimer • Statements of fact and opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of Standigm Inc.
  • 3. Standigm Inc. 2015 Founded by three researchers at Samsung Advanced Institute of Technology Jinhan Kim, PhD Artificial Intelligence (The University of Edinburgh) Sang Ok Song, PhD Chemical Engineering (Seoul National University) So Jeong Yun, PhD Systems Biology (POSTECH) $23M Funding raised SK Holdings, Mirae Asset Capital, Mirae Asset Venture Investment, DSC Investment, Wonik Investment, Atinum Investment, LB Investment, Kakao Ventures Seoul Korea (33) Ann Arbor Michigan (2) Standigm= drug discovery company that generates and optimizes therapeutic lead compounds by using advanced artificial intelligence toward license-out Cambridge UK (1) AI, 16 Biology, 6 Chemistry, 8 Systems Biology, 4 Advisor, 3 PhD 20/37* * Except Operation 5, Patent attorney 1
  • 4. The AI solution Disease Hit Lead Preclinical Clinical Drug Drug repositioning The Standigm AI solution is industrializing drug discovery Discovery at Scale Target * developing BEST TM ASK TM Insight TM FIRST * Standigm ASKTM is freely available at https://icluenask.standigm.com
  • 5. Standigm BEST Platform Standigm BESTStandigm ASK Knowledge based biology platform for novel targets, pathways, and MoA discovery Standigm FIRST Hit generation platform for novel and/or undruggable targets Generative Models Graph-based VAE Scaffold-based conditional enumerator Novel Molecular Representation Scoring Functions Simulations AI rescoring models Machine learning models Compound Database Known Molecules Seed Molecules Novel Virtual Structures Commercial Library Privileged Standigm Library Target Database Public data (gene, protein, function) BEST Feasibility Public Library Strategy setup Hit Generation Hit-2-Lead Predictive Models ADME/Tox predictors Novelty (patentability) Synthetic accessibility Filters/Ranking models External CROs Organic synthesis, In vitro/in vivo Assays Novel/Commercial Hits Lead Series
  • 6. Graph-based VAE Chemical space Encoder Decoder Latent space Chemical space E DZ Learning chemical space Training DB ~4M Y Property/Target information Contextualizing: - substructures - topology - shape - etc property 1 property 2 property 3 Z : latent space predictor q(y|z) seed molecules decoder p(x|z) X : original chemical space encoder q(z|x) Analogue structure generation functionally similar but novel scaffolds/molecules Lead optimization novel molecules w/ better desired properties decoder p(x|z) Smart library expansion IP generation & expansion
  • 7. Patent Space Target A Compounds in latent space Competitor 1 Competitor 2 Competitor 3 Interesting Area potentweak
  • 8. Chemical Space Navigation • Chemical Space ~ Map • Known scaffolds ~ POIs • Information-rich space (ChEMBL, PubChem Bioassays, etc.) • Novel scaffold ~ New POI • El Dorado • Patent • Markush structure: How to protect as wide as possible area • Exemplified compounds: boundary stones
  • 9. Using ChemCurator • Project types • Google Patents (most cases) • PDF files (do not use pdf files!) • Text files (when google ocr is not good)
  • 12. OCR (and chemical OCR) • Lessons • Google patents is reliable in most cases • It even provides the compound table though very primitive • Professional OCR software can give better results • Convert pdf file to plain text with chemical names • Complex tables • Image (not OCRed) tables (next 3 slides) • Chemical OCR engine helps a lot • Text-image comparison • Chemical OCR engines • CLiDE (recommended, proprietary) • Osra (open-source, recommended on Linux machine) • Imago (I have no experience) • Unsupported engines (like ChemGrapher, https://pubs.acs.org/doi/10.1021/acs.jcim.0c00459)
  • 17. Markush Structures • Very expressive • Same set of compounds can be written to very different forms • Not well-validated • ChemCurator helps • Extracting example compounds • Matching them to the Markush structure • Require manual correction • Sentence to chemical groups • Ambiguous/incomplete R-group definitions
  • 18. AI can help • Reduction of frequent text OCR error • NLP technique can correct frequent OCR errors • The availability of large training set is important • Extraction of relevant data • Biological activities • Analytical data • Chemical OCR can be improved • AI can do image recognition very well • Different drawing styles can be managed
  • 19. Acknowledgement • Standigm Inc. • Sanghyung JIN, Minkyu HA, Soyeon Kim, Sangok SONG • T&J Tech. (Korean distributor) • Jung-A HAN