SlideShare a Scribd company logo
VerifAI: Biomedical Generative
Question Answering Engine
with Verifiable Answers
Nikola Milošević
Nikola.milosevic@bayer.com
Bayer A.G.
The Institute for Artificial Intelligence Research and
Development of Serbia
How reliable are LLMs?
LLMs and
hallucinations
• Generative LLMs can produce answers that appear coherent, confident and
articulate
• However, the information conveyed may not be correct or verifiable
• the limited internal knowledge of generative LLMs can hinder their ability to
deliver factually accurate answers, particularly within specialized field
• notably concerning in biomedicine, where accurate and factual answers are
critical
• privacy, sovereignty and security concerns in pharma and biomedicine often
necessitate building systems where all components are controllable (e.g.,
deployed in-house), to avoid reliance on third-party APIs
• In some domains hallucinations (creative) are good, while in others, such as
biomedicine they are not acceptable
Ways to
address
hallucination
s
Retrieval Augmented
Generation (RAG)
Fine-tune the model
Provide sources
Check the sources
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-Answering engine with verifiable answers
IR System
Retrieves
relevant articles
for generating
claims.
PubMed
title + abstract
concatenation
Excluding
articles missing
abstracts
remaining
~25,500,000
abstracts
Texts
Segmentation
max_input_lengt
h
512
IR – Lexical Search (LS)
Sparse vectors
BM25
• authors’ names
• publication dates
• journal names
Metadata for filtering:
OpenSearch
IR – Semantic Search (SS)
Dense vectors
Embedding generation – bi-
encoder for Assymetric Semantic
Search
sentence-transformers/msmarco-distilbert-base-tas-b
MS MARCO Dataset
Vector comparison:
Hierarchical Navigable Small World (HNSW) indexing technique
Approximate Nearest Neighbours clustering
Dot product metrics
Qdrant Vector Database 8-bit quantized embeddings
IR – Hybrid Search
Combination of LS and SS
identification of direct matches
ability to discover semantically related
phrases
Prerequisite – normalizing LS and
SS scores
Importance weights
Evaluation: BioASQ Dataset
Best combination: 0.7 LS and 0.3 SS
Generative Component
– PQAref Dataset
• Randomly selected ~ 9000 questions from
PubMedQA dataset (PMIDs)
• For each question, 10 relevant abstracts
from the PubMed repository
• GPT-4 Turbo for creating the answers
based on the retrieved abstracts
• currently the number one model on
the Chatbot Arena leaderboard
• The prompt to instruct GPT-4 Turbo to
use references (PMIDs)
Generative Component – LLM training
Mistral 7B Instruct (v0.1 and v0.2 )
• PQAref Dataset (80 : 10 : 10)
• QLoRA methodology
• Rank = 64
• Alpha = 16
• LoRA dropout = 0.1
Fine-tuned for the Referenced QA Task, using:
• ~ 27M trainable parameters
• ~ 32 hours of training over 2 epochs, using a batch size of 1
single DGX NVIDIA A100-40GB GPU
Verification Engine
•Cross-checks the generated claim
and the abstract from which the
claim was derived
•Textual Entailment Task
• Each sequence pair is
categorized into 3 classes
•SciFact Dataset Transformation
• 80 : 10 : 10 train : valid : test split
Results – IR
System
Results – Generative
Component
Results – Verification Engine
User interface
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-Answering engine with verifiable answers
17.02.2025 18
Value – on example of Perplexity.ai
No programmatic access
No possibility of granular control
Does not have access to internal
information (and loss of IP if
allowed)
Evaluation: $8-9B
18.June 2024 24.Oct 2024
You can have your deployment (VerifAI
core/enterprize)
→ We have modified VerifAI to be general
Open-Source Generative Search with or
without verification
→ You can connect it to your
OpenAI/Azure/vLLM/oLlama deployment
→ You can index documents in MS Word, MS
PowerPoint, PDF, txt, md formats (and
expanding)
→ Easy installation (download, create venv,
install requirements, run installation
script, run indexing, run back-end and
front-end)
→ Nothing goes away from your deployment
(private and secure)
→ We can consult on deployment and add
feature on roadmap
Future Work
• Semantic tagging
• Long term hosting solution
• Document authoring
• Incorporate feedback from the
scientific community, and
continuously improve
generative search engine
• foster trust in generative AI
across various scientific
domains
PUBLICATIONS
• Adela Ljajić, Miloš Košprdić, Bojana Bašaragin, Darija Medvecki, Lorenzo Cassano, Nikola
Milošević, “Scientific QA System with Verifiable Answers”, The 6th International Open Search
Symposium 2024
• Košprdić, M., Ljajić, A., Bašaragin, B., Medvecki, D., & Milošević, N. "Verif. ai: Towards an Open-
Source Scientific Generative Question-Answering System with Referenced and Verifiable
Answers." The Sixteenth International Conference on Evolving Internet INTERNET 2024 (2024).
• Bojana Bašaragin, Adela Ljajić, Darija Medvecki, Lorenzo Cassano, Miloš Košprdić, Nikola
Milošević "How do you know that? Teaching Generative Language Models to Reference Answers to
Biomedical Questions", Accepted at BioNLP 2024, Colocated with ACL 2024
• Adela Ljajić, Lorenzo Cassano, Miloš Košprdić, Bojana Bašaragin, Darija Medvecki, Nikola
Milošević "Enhancing Biomedical Information Retrieval with Semantic Search: A Comparative
Analysis Using PubMed Data", Belgrade Bioinformatics Conference 2024, BelBi2024
TEAM LEAD Dr Nikola Milošević
OUR TEAM
Lorenzo Cassano
Intern
Dr Adela Ljajić
Darija Medvecki
Miloš Košprdić
Dr Bojana Bašaragin
Angela Pupovac
+
Nataša Radmilović
Petar Stevanović
www.verifai-
project.com
verif.ai.project@gmail.co
m
LinkedIn
Verif.ai Project
Instagram
verif.ai_project
X / Twitter
Verif.ai Project
Facebook
Verif.ai Project
TikTok
verif.ai
THANK YOU
FOR YOUR
ATTENTION!
https://github.com/nikolamilosevic86/verifAI

More Related Content

PPTX
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
PPTX
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
PPTX
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
PPTX
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
PPTX
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
PPTX
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
PPTX
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
PDF
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant

More from DataScienceConferenc1 (20)

PDF
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
PPTX
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
PPTX
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
PDF
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
PPTX
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
PPTX
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
PDF
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
PPTX
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
PPTX
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
PPTX
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
PPTX
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
PPTX
[DSC Europe 24] Igor Sevo - Intelligent Interfaces and operating systems
PDF
[DSC Europe 24] Ali Haidar - Integrating AI into office software
PPTX
[DSC Europe 24] Nataliia Vasileva - Evaluating the Worth of Implementation: I...
PPTX
[DSC Europe 24] Stefan Stosic - BI Battle: Centralized Control vs. Decentrali...
PPTX
[DSC Europe 24] Aleksandar Cvejic - Rivian Autonomy and AI: Building a Scalab...
PDF
[DSC Europe 24] Sofia Konchakova - Effective Context Tuning Strategies for Re...
PPTX
[DSC Europe 24] Azfar Shah - Journey to AI driven company
PPTX
[DSC Europe 24] Sinisa Arsic - Accelerating AI Innovation through Hyperscaler...
PPTX
[DSC Europe 24] Lana Malic - AI Agents: The Future of Autonomous Decision-Mak...
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
[DSC Europe 24] Igor Sevo - Intelligent Interfaces and operating systems
[DSC Europe 24] Ali Haidar - Integrating AI into office software
[DSC Europe 24] Nataliia Vasileva - Evaluating the Worth of Implementation: I...
[DSC Europe 24] Stefan Stosic - BI Battle: Centralized Control vs. Decentrali...
[DSC Europe 24] Aleksandar Cvejic - Rivian Autonomy and AI: Building a Scalab...
[DSC Europe 24] Sofia Konchakova - Effective Context Tuning Strategies for Re...
[DSC Europe 24] Azfar Shah - Journey to AI driven company
[DSC Europe 24] Sinisa Arsic - Accelerating AI Innovation through Hyperscaler...
[DSC Europe 24] Lana Malic - AI Agents: The Future of Autonomous Decision-Mak...
Ad

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Database Infoormation System (DBIS).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Introduction to Business Data Analytics.
PDF
Fluorescence-microscope_Botany_detailed content
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Reliability_Chapter_ presentation 1221.5784
Database Infoormation System (DBIS).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Galatica Smart Energy Infrastructure Startup Pitch Deck
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Business Data Analytics.
Fluorescence-microscope_Botany_detailed content
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Moving the Public Sector (Government) to a Digital Adoption
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
.pdf is not working space design for the following data for the following dat...
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Ad

[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-Answering engine with verifiable answers

  • 1. VerifAI: Biomedical Generative Question Answering Engine with Verifiable Answers Nikola Milošević Nikola.milosevic@bayer.com Bayer A.G. The Institute for Artificial Intelligence Research and Development of Serbia
  • 3. LLMs and hallucinations • Generative LLMs can produce answers that appear coherent, confident and articulate • However, the information conveyed may not be correct or verifiable • the limited internal knowledge of generative LLMs can hinder their ability to deliver factually accurate answers, particularly within specialized field • notably concerning in biomedicine, where accurate and factual answers are critical • privacy, sovereignty and security concerns in pharma and biomedicine often necessitate building systems where all components are controllable (e.g., deployed in-house), to avoid reliance on third-party APIs • In some domains hallucinations (creative) are good, while in others, such as biomedicine they are not acceptable
  • 4. Ways to address hallucination s Retrieval Augmented Generation (RAG) Fine-tune the model Provide sources Check the sources
  • 6. IR System Retrieves relevant articles for generating claims. PubMed title + abstract concatenation Excluding articles missing abstracts remaining ~25,500,000 abstracts Texts Segmentation max_input_lengt h 512
  • 7. IR – Lexical Search (LS) Sparse vectors BM25 • authors’ names • publication dates • journal names Metadata for filtering: OpenSearch
  • 8. IR – Semantic Search (SS) Dense vectors Embedding generation – bi- encoder for Assymetric Semantic Search sentence-transformers/msmarco-distilbert-base-tas-b MS MARCO Dataset Vector comparison: Hierarchical Navigable Small World (HNSW) indexing technique Approximate Nearest Neighbours clustering Dot product metrics Qdrant Vector Database 8-bit quantized embeddings
  • 9. IR – Hybrid Search Combination of LS and SS identification of direct matches ability to discover semantically related phrases Prerequisite – normalizing LS and SS scores Importance weights Evaluation: BioASQ Dataset Best combination: 0.7 LS and 0.3 SS
  • 10. Generative Component – PQAref Dataset • Randomly selected ~ 9000 questions from PubMedQA dataset (PMIDs) • For each question, 10 relevant abstracts from the PubMed repository • GPT-4 Turbo for creating the answers based on the retrieved abstracts • currently the number one model on the Chatbot Arena leaderboard • The prompt to instruct GPT-4 Turbo to use references (PMIDs)
  • 11. Generative Component – LLM training Mistral 7B Instruct (v0.1 and v0.2 ) • PQAref Dataset (80 : 10 : 10) • QLoRA methodology • Rank = 64 • Alpha = 16 • LoRA dropout = 0.1 Fine-tuned for the Referenced QA Task, using: • ~ 27M trainable parameters • ~ 32 hours of training over 2 epochs, using a batch size of 1 single DGX NVIDIA A100-40GB GPU
  • 12. Verification Engine •Cross-checks the generated claim and the abstract from which the claim was derived •Textual Entailment Task • Each sequence pair is categorized into 3 classes •SciFact Dataset Transformation • 80 : 10 : 10 train : valid : test split
  • 18. 17.02.2025 18 Value – on example of Perplexity.ai No programmatic access No possibility of granular control Does not have access to internal information (and loss of IP if allowed) Evaluation: $8-9B 18.June 2024 24.Oct 2024
  • 19. You can have your deployment (VerifAI core/enterprize) → We have modified VerifAI to be general Open-Source Generative Search with or without verification → You can connect it to your OpenAI/Azure/vLLM/oLlama deployment → You can index documents in MS Word, MS PowerPoint, PDF, txt, md formats (and expanding) → Easy installation (download, create venv, install requirements, run installation script, run indexing, run back-end and front-end) → Nothing goes away from your deployment (private and secure) → We can consult on deployment and add feature on roadmap
  • 20. Future Work • Semantic tagging • Long term hosting solution • Document authoring • Incorporate feedback from the scientific community, and continuously improve generative search engine • foster trust in generative AI across various scientific domains
  • 21. PUBLICATIONS • Adela Ljajić, Miloš Košprdić, Bojana Bašaragin, Darija Medvecki, Lorenzo Cassano, Nikola Milošević, “Scientific QA System with Verifiable Answers”, The 6th International Open Search Symposium 2024 • Košprdić, M., Ljajić, A., Bašaragin, B., Medvecki, D., & Milošević, N. "Verif. ai: Towards an Open- Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers." The Sixteenth International Conference on Evolving Internet INTERNET 2024 (2024). • Bojana Bašaragin, Adela Ljajić, Darija Medvecki, Lorenzo Cassano, Miloš Košprdić, Nikola Milošević "How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions", Accepted at BioNLP 2024, Colocated with ACL 2024 • Adela Ljajić, Lorenzo Cassano, Miloš Košprdić, Bojana Bašaragin, Darija Medvecki, Nikola Milošević "Enhancing Biomedical Information Retrieval with Semantic Search: A Comparative Analysis Using PubMed Data", Belgrade Bioinformatics Conference 2024, BelBi2024
  • 22. TEAM LEAD Dr Nikola Milošević OUR TEAM Lorenzo Cassano Intern Dr Adela Ljajić Darija Medvecki Miloš Košprdić Dr Bojana Bašaragin Angela Pupovac + Nataša Radmilović Petar Stevanović
  • 23. www.verifai- project.com verif.ai.project@gmail.co m LinkedIn Verif.ai Project Instagram verif.ai_project X / Twitter Verif.ai Project Facebook Verif.ai Project TikTok verif.ai THANK YOU FOR YOUR ATTENTION! https://github.com/nikolamilosevic86/verifAI

Editor's Notes

  • #11: Both instruction-tuned versions were fine-tuned for the task of referenced QA using the QLoRA methodology allowing us to fine-tune the models on a single DGX NVIDIA A100-40GB GPU in ~32 hours
  • #18: Krishna
  • #22: Team members contributing to Verif.ai have 5 full-time members and two interns. Information retrieval - Adela, Nikola, Lorenzo RAG - Bojana, Darija, Adela Verification - Miloš and Nikola Social networks and content creator - Angela