Kairntech & vocabularies:
AI support for creating and
maintaining vocabularies
AI SDV
Oct 4+5, 2021
Stefan Geißler
www.kairntech.com
Introducing Kairntech
• Software & Service company with a focus on
NLP & AI for industry use cases
• Focus on making powerful ML approaches
accessible for domain experts (not just
programmers and data scientists)
• Created in dec 2018, HQ in Grenoble, France
• Team with 20+ years of experience in the
field (Xerox, IBM, TEMIS, …)
• We’ve been attending the SDV for many
years, it is a pleasure to be ‘here’ again ☺
Europe’s highest mountain, the Mont
Blanc, is visible from many places in the
surroundings of Grenoble (~100km)
Kairntech: Different Approaches to Content Analysis
Create NLP models by importing
annotated data or adding manual
annotation
• Entities, Categories, Relations
• Users adding their domain
expertise
• “Active Learning”: Reduces
required manual efforts
• Immediate feedback
• Annotation as Teamwork: Have
people cooperate on projects
Import of existing vocabularies und
thesauri. System will learn the
relevant concept.
• Integrating your knowledge
sources (company- or domain-
specific)
• Quick creating of respective
annotation models
• New similar terms? variants?
• Import from many different
formats
Benefitting from public world
knowledge : more than 90 mio
concepts, multilingual,
disambiguated, linked.
• Based on wikidata
• Regularly updated knowledge
source
• “Tesla” - inventor or electric
car? Kairntech this and countless
other ambiguous cases.
Use case today: AI support in vocabulary management
• Thesaurus: Structured
vocabulary of terms
• Often domain-specific
• Important in information
retrieval and content analysis
• Non-trivial thesauri are often
very large (>>10000 terms)
• … require considerable effort
to build
• … and to keep up-to-date as a
field evolves
• This can be a challenge,
especially when working on
different subjects at once
Case study: Kairntech client TecIntelli
• https://www.tecintelli.de
• Technology and Innovation
Intelligence
• Based in Stuttgart, Germany
• Analysis of large volumes of text
content: Web sources, technical
documents, scientific literature
• Technology scouting, technology
monitoring, coaching and consulting
• Which technologies exist, which are
on the rise, what solutions exist for a
given problem? What markets for a
given solution?
Example: Technology scouting for tech SME
• Client specializes in building switches / actuators
• Realizes their switches are quite fast, in fact faster
than competitors’ products
• “What else can be done with these? Who else
needs faster switches than what is typically sold?”
• SME → (often no large research department)
• Technology watch project: What are technological
fields that need our fast switches?
• Literature/market review identified markets and
potential clients
• An important part of this literature analysis is the
identification of key concepts and actors
Kairntech: AI support for vocabulary maintenance
Raw
documents
Automatic
Annotation
enriching
documents with
imported terms
Train ML model
Broad range of ML
algorithms
Model
application
Automatically created
suggestions of new
terms
Wrap AI/NLP/ML into easy-to-use GUI:
Domain-
specific seed
vocabulary
• Powerful approaches supporting this use case
exist (Deep Learning-based entity recognition)
• Productive use requires coding and data
sciences expertise
• Make ML model creation, optimization and
application available to domain users without
coding experience
Point and click AI
Sample domain: battery technology
• Technology field with fast-growing economic
potential
• Projected yearly worldwide growth of > 12% to
reach 279 bn US-$ by 2027 by
researchandmarkets.com
• Key component in e-mobility, home batteries
and portable devices and others
• Area of intense research and industrial
innovation
Batteries: relentless innovation
A vocabulary maintenance workflow
Seed vocabulary of domain-
specific terms
Apply vocabulary on
document corpus
System suggesting new candidates (here
new, yet « unknown » types of batteries)
Configure Machine
Learning experiment
Searching for new terms
• Deep Learning based models take into account various types of clues
• Internal structure of candidate term
• “… ion …”, “ … Li …”, “ … cell … “, “ … redox …”
• Context
• „… electrodes of XYZ batteries are often built from …“
• Model architecture allows both types of clues to be taken into consideration
• Available ML approaches from fast and relatively simple Conditional Random Fields (CRF)
to powerful and computation intensive Deep Learning
• No manual rule-writing process required
• Large scale pre-trained embeddings and transformer models (such as BERT) are key ingredients
Full workflow still requires (or benefits from) expert input
• Import of seed vocabulary and definition and application of annotator on
document content
• Fully automatic
• Review of annotation results and eventual curation of seed vocabulary and
annotations (ambiguities)
• Expert input, manual
• Consistency is king: “Alzheimer’s” or “Alzheimer’s Disease”? Be consistent in
your vocabulary and in your annotations
• Definition and application of Machine Learning model training
• Fully automatic
• Review of newly found terms
• Expert decision, manual
Outcome
• Application returned “MVI2 flow battery”
and “lithium organosulfor battery” as
potential new battery technologies
• Both are in fact relatively new approaches
in the field (not contained in the seed
vocabulary)
• Setup allows regular, large-scale scanning
of domain-specific content
• Time&effort for thesaurus maintenance
reduced
Conclusion
• Kairntech: AI/NLP solutions also for
non-programmers
• Wide range of use cases and
languages
• Consulting, on-premise packaged
software or cloud-based
• We love to hear about your use cases
Danke!
info@kairntech.com
www.kairntech.com

More Related Content

PPTX
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
PDF
AI-SDV 2021 Biomax
PDF
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
PDF
AI-SDV 2021 - VantagePoint / Search Technolgy
PDF
AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...
PDF
AI-SDV 2021 - Harald Jenny - Integrated Artificiel Intelligence – A Factory P...
PDF
AI-SDV 2020: Delivering AIM™ Patent Landscapes for Competitive Intelligence –...
PDF
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...
Utilising Open Source and Communities to Drive Innovation in a Cost-Effective...
AI-SDV 2021 Biomax
AI-SDV 2021 - Tony Trippe - The Current State of Machine Learning for Patent ...
AI-SDV 2021 - VantagePoint / Search Technolgy
AI-SDV 2021 - Holger Keibel; Daniele Puccinelli - Leveraging pre-trained lang...
AI-SDV 2021 - Harald Jenny - Integrated Artificiel Intelligence – A Factory P...
AI-SDV 2020: Delivering AIM™ Patent Landscapes for Competitive Intelligence –...
ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...

What's hot (20)

PDF
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
PDF
AI-SDV 2021: Angela Bauch - AILANI for clinical competitive landscaping
PDF
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
PDF
AI-SDV 2020: IPscreener
PDF
AI-SDV 2020: Implementation of new technology within a big pharma company: Fi...
PDF
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
PDF
II-SDV 2017: Centredoc
PDF
AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?
PDF
Biased Information Retrieval in Pharmaceutical Drug Development
PDF
AI-SDV 2021: Dolcera
PDF
New Product Introductions - LexisNexis
PDF
IC-SDV 2018: Search Technology / VanatagePoint
PDF
AI-SDV 2021 - Deep SEARCH 9
PDF
Building up a Data Science Team from Scratch
PDF
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
PDF
Edmc use cases 2018 nyc
PDF
AI-SDV 2020: Biomax
PPTX
State street edmc swaps pilot
PDF
ICIC 2013 New Product Introductions CEPT
PDF
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
AI-SDV 2021 - Klaus Kater - The secret of successful CI: precise targeting + ...
AI-SDV 2021: Angela Bauch - AILANI for clinical competitive landscaping
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
AI-SDV 2020: IPscreener
AI-SDV 2020: Implementation of new technology within a big pharma company: Fi...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
II-SDV 2017: Centredoc
AI-SDV 2021: Nils Newmann - AI – Who is in control and why is that important?
Biased Information Retrieval in Pharmaceutical Drug Development
AI-SDV 2021: Dolcera
New Product Introductions - LexisNexis
IC-SDV 2018: Search Technology / VanatagePoint
AI-SDV 2021 - Deep SEARCH 9
Building up a Data Science Team from Scratch
II-SDV 2014 Automated Relevancy Check of Patents and Scientific Literature (K...
Edmc use cases 2018 nyc
AI-SDV 2020: Biomax
State street edmc swaps pilot
ICIC 2013 New Product Introductions CEPT
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...
Ad

Similar to AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabularies (20)

PDF
Session 2.1 ontological representation of the telecom domain for advanced a...
PPTX
Precision Content™ Tools, Techniques, and Technology
PPTX
ISO 15926 Reference Data Engineering Methodology
PPTX
DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016
PDF
Data-X-v3.1
PPTX
Text Mining
PDF
Data-X-Sparse-v2
PDF
Large Language Models Bootcamp
PDF
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
PDF
AI-SDV 2020: Bringing AI to SME projects: Addressing customer needs with a fl...
PDF
DITA Interoperability
PDF
Building a Scalable and reliable open source ML Platform with MLFlow
PDF
ModelWriter Presentation International 01-07-2015
PPT
Stefan Geissler kairntech - SDC Nice Apr 2019
PPTX
GenerativeAI and Automation - IEEE ACSOS 2023.pptx
PDF
2014 01-ticosa
PDF
Software Mining and Software Datasets
PPTX
ProjectsSummary.pptx
PDF
SemTecBiz 2012: Corporate Semantic Web
PDF
Solved Big Data and Data Science Projects pdf.pdf
Session 2.1 ontological representation of the telecom domain for advanced a...
Precision Content™ Tools, Techniques, and Technology
ISO 15926 Reference Data Engineering Methodology
DITA Surprise, Unwrapping DITA Best Practices - tekom tcworld 2016
Data-X-v3.1
Text Mining
Data-X-Sparse-v2
Large Language Models Bootcamp
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
AI-SDV 2020: Bringing AI to SME projects: Addressing customer needs with a fl...
DITA Interoperability
Building a Scalable and reliable open source ML Platform with MLFlow
ModelWriter Presentation International 01-07-2015
Stefan Geissler kairntech - SDC Nice Apr 2019
GenerativeAI and Automation - IEEE ACSOS 2023.pptx
2014 01-ticosa
Software Mining and Software Datasets
ProjectsSummary.pptx
SemTecBiz 2012: Corporate Semantic Web
Solved Big Data and Data Science Projects pdf.pdf
Ad

More from Dr. Haxel Consult (20)

PDF
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
PDF
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
PDF
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
PDF
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
PDF
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
PDF
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
PDF
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
PDF
AI-SDV 2022: Machine learning based patent categorization: A success story in...
PDF
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
PDF
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
PDF
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
PDF
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
PDF
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
PDF
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
PDF
AI-SDV 2022: Copyright Clearance Center
PDF
AI-SDV 2022: Lighthouse IP
PDF
AI-SDV 2022: New Product Introductions: CENTREDOC
PDF
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
PDF
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...

Recently uploaded (20)

PPTX
Layers_of_the_Earth_Grade7.pptx class by
PDF
simpleintnettestmetiaerl for the simple testint
PPT
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
PPT
12 Things That Make People Trust a Website Instantly
PDF
Computer Networking, Internet, Casting in Network
PPTX
Internet Safety for Seniors presentation
PPTX
Basic understanding of cloud computing one need
PDF
Lean-Manufacturing-Tools-Techniques-and-How-To-Use-Them.pdf
PDF
The Evolution of Traditional to New Media .pdf
DOCX
Memecoinist Update: Best Meme Coins 2025, Trump Meme Coin Predictions, and th...
PDF
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
PPTX
Viva Digitally Software-Defined Wide Area Network.pptx
PDF
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
PPTX
ECO SAFE AI - SUSTAINABLE SAFE AND HOME HUB
PDF
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
PDF
Containerization lab dddddddddddddddmanual.pdf
PPTX
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
PPTX
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
PPTX
Cyber Hygine IN organizations in MSME or
PDF
Exploring The Internet Of Things(IOT).ppt
Layers_of_the_Earth_Grade7.pptx class by
simpleintnettestmetiaerl for the simple testint
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
12 Things That Make People Trust a Website Instantly
Computer Networking, Internet, Casting in Network
Internet Safety for Seniors presentation
Basic understanding of cloud computing one need
Lean-Manufacturing-Tools-Techniques-and-How-To-Use-Them.pdf
The Evolution of Traditional to New Media .pdf
Memecoinist Update: Best Meme Coins 2025, Trump Meme Coin Predictions, and th...
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
Viva Digitally Software-Defined Wide Area Network.pptx
Buy Cash App Verified Accounts Instantly – Secure Crypto Deal.pdf
ECO SAFE AI - SUSTAINABLE SAFE AND HOME HUB
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
Containerization lab dddddddddddddddmanual.pdf
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
Cyber Hygine IN organizations in MSME or
Exploring The Internet Of Things(IOT).ppt

AI-SDV 2021: Stefan Geissler - AI support for creating and maintaining vocabularies

  • 1. Kairntech & vocabularies: AI support for creating and maintaining vocabularies AI SDV Oct 4+5, 2021 Stefan Geißler www.kairntech.com
  • 2. Introducing Kairntech • Software & Service company with a focus on NLP & AI for industry use cases • Focus on making powerful ML approaches accessible for domain experts (not just programmers and data scientists) • Created in dec 2018, HQ in Grenoble, France • Team with 20+ years of experience in the field (Xerox, IBM, TEMIS, …) • We’ve been attending the SDV for many years, it is a pleasure to be ‘here’ again ☺ Europe’s highest mountain, the Mont Blanc, is visible from many places in the surroundings of Grenoble (~100km)
  • 3. Kairntech: Different Approaches to Content Analysis Create NLP models by importing annotated data or adding manual annotation • Entities, Categories, Relations • Users adding their domain expertise • “Active Learning”: Reduces required manual efforts • Immediate feedback • Annotation as Teamwork: Have people cooperate on projects Import of existing vocabularies und thesauri. System will learn the relevant concept. • Integrating your knowledge sources (company- or domain- specific) • Quick creating of respective annotation models • New similar terms? variants? • Import from many different formats Benefitting from public world knowledge : more than 90 mio concepts, multilingual, disambiguated, linked. • Based on wikidata • Regularly updated knowledge source • “Tesla” - inventor or electric car? Kairntech this and countless other ambiguous cases.
  • 4. Use case today: AI support in vocabulary management • Thesaurus: Structured vocabulary of terms • Often domain-specific • Important in information retrieval and content analysis • Non-trivial thesauri are often very large (>>10000 terms) • … require considerable effort to build • … and to keep up-to-date as a field evolves • This can be a challenge, especially when working on different subjects at once
  • 5. Case study: Kairntech client TecIntelli • https://www.tecintelli.de • Technology and Innovation Intelligence • Based in Stuttgart, Germany • Analysis of large volumes of text content: Web sources, technical documents, scientific literature • Technology scouting, technology monitoring, coaching and consulting • Which technologies exist, which are on the rise, what solutions exist for a given problem? What markets for a given solution?
  • 6. Example: Technology scouting for tech SME • Client specializes in building switches / actuators • Realizes their switches are quite fast, in fact faster than competitors’ products • “What else can be done with these? Who else needs faster switches than what is typically sold?” • SME → (often no large research department) • Technology watch project: What are technological fields that need our fast switches? • Literature/market review identified markets and potential clients • An important part of this literature analysis is the identification of key concepts and actors
  • 7. Kairntech: AI support for vocabulary maintenance Raw documents Automatic Annotation enriching documents with imported terms Train ML model Broad range of ML algorithms Model application Automatically created suggestions of new terms Wrap AI/NLP/ML into easy-to-use GUI: Domain- specific seed vocabulary
  • 8. • Powerful approaches supporting this use case exist (Deep Learning-based entity recognition) • Productive use requires coding and data sciences expertise • Make ML model creation, optimization and application available to domain users without coding experience Point and click AI
  • 9. Sample domain: battery technology • Technology field with fast-growing economic potential • Projected yearly worldwide growth of > 12% to reach 279 bn US-$ by 2027 by researchandmarkets.com • Key component in e-mobility, home batteries and portable devices and others • Area of intense research and industrial innovation
  • 11. A vocabulary maintenance workflow Seed vocabulary of domain- specific terms Apply vocabulary on document corpus System suggesting new candidates (here new, yet « unknown » types of batteries) Configure Machine Learning experiment
  • 12. Searching for new terms • Deep Learning based models take into account various types of clues • Internal structure of candidate term • “… ion …”, “ … Li …”, “ … cell … “, “ … redox …” • Context • „… electrodes of XYZ batteries are often built from …“ • Model architecture allows both types of clues to be taken into consideration • Available ML approaches from fast and relatively simple Conditional Random Fields (CRF) to powerful and computation intensive Deep Learning • No manual rule-writing process required • Large scale pre-trained embeddings and transformer models (such as BERT) are key ingredients
  • 13. Full workflow still requires (or benefits from) expert input • Import of seed vocabulary and definition and application of annotator on document content • Fully automatic • Review of annotation results and eventual curation of seed vocabulary and annotations (ambiguities) • Expert input, manual • Consistency is king: “Alzheimer’s” or “Alzheimer’s Disease”? Be consistent in your vocabulary and in your annotations • Definition and application of Machine Learning model training • Fully automatic • Review of newly found terms • Expert decision, manual
  • 14. Outcome • Application returned “MVI2 flow battery” and “lithium organosulfor battery” as potential new battery technologies • Both are in fact relatively new approaches in the field (not contained in the seed vocabulary) • Setup allows regular, large-scale scanning of domain-specific content • Time&effort for thesaurus maintenance reduced
  • 15. Conclusion • Kairntech: AI/NLP solutions also for non-programmers • Wide range of use cases and languages • Consulting, on-premise packaged software or cloud-based • We love to hear about your use cases Danke! info@kairntech.com www.kairntech.com