[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-Answering engine with verifiable answers

VerifAI: Biomedical Generative
Question Answering Engine
with Verifiable Answers
Nikola Milošević
Nikola.milosevic@bayer.com
Bayer A.G.
The Institute for Artificial Intelligence Research and
Development of Serbia

LLMs and
hallucinations
• Generative LLMs can produce answers that appear coherent, confident and
articulate
• However, the information conveyed may not be correct or verifiable
• the limited internal knowledge of generative LLMs can hinder their ability to
deliver factually accurate answers, particularly within specialized field
• notably concerning in biomedicine, where accurate and factual answers are
critical
• privacy, sovereignty and security concerns in pharma and biomedicine often
necessitate building systems where all components are controllable (e.g.,
deployed in-house), to avoid reliance on third-party APIs
• In some domains hallucinations (creative) are good, while in others, such as
biomedicine they are not acceptable

Ways to
address
hallucination
s
Retrieval Augmented
Generation (RAG)
Fine-tune the model
Provide sources
Check the sources

IR System
Retrieves
relevant articles
for generating
claims.
PubMed
title + abstract
concatenation
Excluding
articles missing
abstracts
remaining
~25,500,000
abstracts
Texts
Segmentation
max_input_lengt
h
512

IR – Lexical Search (LS)
Sparse vectors
BM25
• authors’ names
• publication dates
• journal names
Metadata for filtering:
OpenSearch

IR – Semantic Search (SS)
Dense vectors
Embedding generation – bi-
encoder for Assymetric Semantic
Search
sentence-transformers/msmarco-distilbert-base-tas-b
MS MARCO Dataset
Vector comparison:
Hierarchical Navigable Small World (HNSW) indexing technique
Approximate Nearest Neighbours clustering
Dot product metrics
Qdrant Vector Database 8-bit quantized embeddings

IR – Hybrid Search
Combination of LS and SS
identification of direct matches
ability to discover semantically related
phrases
Prerequisite – normalizing LS and
SS scores
Importance weights
Evaluation: BioASQ Dataset
Best combination: 0.7 LS and 0.3 SS

Generative Component
– PQAref Dataset
• Randomly selected ~ 9000 questions from
PubMedQA dataset (PMIDs)
• For each question, 10 relevant abstracts
from the PubMed repository
• GPT-4 Turbo for creating the answers
based on the retrieved abstracts
• currently the number one model on
the Chatbot Arena leaderboard
• The prompt to instruct GPT-4 Turbo to
use references (PMIDs)

Generative Component – LLM training
Mistral 7B Instruct (v0.1 and v0.2 )
• PQAref Dataset (80 : 10 : 10)
• QLoRA methodology
• Rank = 64
• Alpha = 16
• LoRA dropout = 0.1
Fine-tuned for the Referenced QA Task, using:
• ~ 27M trainable parameters
• ~ 32 hours of training over 2 epochs, using a batch size of 1
single DGX NVIDIA A100-40GB GPU

Verification Engine
•Cross-checks the generated claim
and the abstract from which the
claim was derived
•Textual Entailment Task
• Each sequence pair is
categorized into 3 classes
•SciFact Dataset Transformation
• 80 : 10 : 10 train : valid : test split

Results – Generative
Component

Results – Verification Engine

17.02.2025 18
Value – on example of Perplexity.ai
No programmatic access
No possibility of granular control
Does not have access to internal
information (and loss of IP if
allowed)
Evaluation: $8-9B
18.June 2024 24.Oct 2024

You can have your deployment (VerifAI
core/enterprize)
→ We have modified VerifAI to be general
Open-Source Generative Search with or
without verification
→ You can connect it to your
OpenAI/Azure/vLLM/oLlama deployment
→ You can index documents in MS Word, MS
PowerPoint, PDF, txt, md formats (and
expanding)
→ Easy installation (download, create venv,
install requirements, run installation
script, run indexing, run back-end and
front-end)
→ Nothing goes away from your deployment
(private and secure)
→ We can consult on deployment and add
feature on roadmap

Future Work
• Semantic tagging
• Long term hosting solution
• Document authoring
• Incorporate feedback from the
scientific community, and
continuously improve
generative search engine
• foster trust in generative AI
across various scientific
domains

PUBLICATIONS
• Adela Ljajić, Miloš Košprdić, Bojana Bašaragin, Darija Medvecki, Lorenzo Cassano, Nikola
Milošević, “Scientific QA System with Verifiable Answers”, The 6th International Open Search
Symposium 2024
• Košprdić, M., Ljajić, A., Bašaragin, B., Medvecki, D., & Milošević, N. "Verif. ai: Towards an Open-
Source Scientific Generative Question-Answering System with Referenced and Verifiable
Answers." The Sixteenth International Conference on Evolving Internet INTERNET 2024 (2024).
• Bojana Bašaragin, Adela Ljajić, Darija Medvecki, Lorenzo Cassano, Miloš Košprdić, Nikola
Milošević "How do you know that? Teaching Generative Language Models to Reference Answers to
Biomedical Questions", Accepted at BioNLP 2024, Colocated with ACL 2024
• Adela Ljajić, Lorenzo Cassano, Miloš Košprdić, Bojana Bašaragin, Darija Medvecki, Nikola
Milošević "Enhancing Biomedical Information Retrieval with Semantic Search: A Comparative
Analysis Using PubMed Data", Belgrade Bioinformatics Conference 2024, BelBi2024

TEAM LEAD Dr Nikola Milošević
OUR TEAM
Lorenzo Cassano
Intern
Dr Adela Ljajić
Darija Medvecki
Miloš Košprdić
Dr Bojana Bašaragin
Angela Pupovac
+
Nataša Radmilović
Petar Stevanović

www.verifai-
project.com
verif.ai.project@gmail.co
m
LinkedIn
Verif.ai Project
Instagram
verif.ai_project
X / Twitter
Verif.ai Project
Facebook
Verif.ai Project
TikTok
verif.ai
THANK YOU
FOR YOUR
ATTENTION!
https://github.com/nikolamilosevic86/verifAI

[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-Answering engine with verifiable answers

More Related Content

More from DataScienceConferenc1 (20)

Recently uploaded (20)

[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-Answering engine with verifiable answers

Editor's Notes