SlideShare a Scribd company logo
Implementation of Chatbot using
NLP
By Samrat Ghorui
AICTE Reg. ID. - STU66a869ada4ac51722313133
AICTE Internship ID. - INTERNSHIP_173070615967287aef12823
Collage Name: Heritage Institute of Technology,
Kolkata
Learning Objectives:
GOAL
1.Text Extraction from PDF:
•Learn how to extract and process textual data from PDF files using Python libraries like PyMuPDF. This will involve handling different structures
of documents and ensuring the text is clean and ready for further processing.
2.Natural Language Processing (NLP) for Question Answering:
•Understand how to implement NLP techniques for creating a Question Answering (QA) system. This includes leveraging pre-trained models
such as those from Hugging Face’s transformers library and adapting them to understand medical queries.
3.Text Preprocessing and Chunking:
•Gain proficiency in splitting large texts into manageable chunks to feed into NLP models. This involves learning how to process large
documents efficiently and ensuring that the model can handle text input effectively without running into issues of memory or size.
4.Building a Retrieval-Based QA System:
•Learn how to build a retrieval-based QA system using vector databases likeFAISS to store and search for relevant information. This will involve
indexing documents, calculating similarity, and using a retriever to fetch the most relevant chunks of text for answering user queries.
5.Embedding Generation:
•Develop an understanding of how embeddings work and how they can be used to represent textual information in vector space. Learn to
generate these embeddings using models like all-MiniLM-L6-v2 and how to use them to improve information retrieval accuracy.
6.UI Design and User Interaction:
•Learn how to create an interactive user interface using Streamlit for a smooth user experience. This will involve allowing users to input
questions and receive relevant answers based on the content extracted from the medical PDF.
7.Deployment of the Application:
•Understand the process of deploying a machine learning or NLP-based web application. This includes knowledge of cloud platforms like
Heroku or Streamlit Cloud and setting up the necessary environment for running the app in production.
8.Integration of Various Tools and Libraries:
•Gain exposure to integrating various tools and libraries, such as LangChain for QA, Hugging Face for NLP models, and FAISS for vector-based
retrieval, to create a unified medical assistance bot.
Tools and Technology used:
1.Python:
•The primary programming language used for the entire application due to its extensive libraries for data processing, NLP, and web development.
2.PyMuPDF (fitz):
•A library used to extract text from PDF files. It allows efficient reading of the PDF contents, which is essential for processing the medical book
(abc.pdf).
3.LangChain:
•A framework designed to simplify the process of integrating language models with external data sources. It is used for building the retrieval-based
QA system.
4.Hugging Face's Transformers:
•Provides access to pre-trained NLP models, like RoBERTa, for question-answering tasks. These models are used to process and answer the
user queries based on the context provided by the PDF text.
5.FAISS:
•A library for similarity search and clustering of dense vectors. FAISS is used to store and retrieve the text chunks from the PDF based on vector
similarity, allowing fast retrieval of relevant information.
6.Streamlit:
•A tool for building interactive web applications quickly. Streamlit is used to create the user interface where users can input their queries and get
answers.
7.Hugging Face Embeddings:
•A model that generates dense vector representations of text. These embeddings are used to transform text into vectors, enabling efficient
similarity searches for answering queries.
8.Jupyter Notebook / IDE:
•Jupyter Notebook is used for testing and experimentation, while IDEs like Visual Studio Code are used for writing the application code.
9.GitHub:
•A platform for version control and collaboration. It is used to store and manage the project’s code, allowing easy sharing and deployment.
10.Cloud Platforms (Optional):
• Streamlit Cloud, Heroku, or AWS can be used for hosting the application, enabling it to be accessed by users from anywhere.
Methodology
Text Extraction:
The text from the medical PDF (abc.pdf) is extracted using the PyMuPDF library, cleaned, and preprocessed for further use.
Text Chunking:
The extracted text is divided into smaller chunks using LangChain to make it manageable for the QA system.
Embeddings & Vector Database:
The text chunks are converted into vector embeddings using the Hugging Face Embeddings model. These embeddings are stored in
a FAISS vector database for efficient retrieval.
Retrieval-Based QA System:
When a user asks a query, relevant text chunks are retrieved from FAISS. These chunks are passed to a QA model (e.g.,
deepset/roberta-base-squad2) to generate an answer.
User Interface:
A simple UI is developed using Streamlit, allowing users to input queries and view answers from the medical PDF.
Integration & Testing:
The components are integrated and tested to ensure accurate query responses and smooth interaction.
Deployment:
The application is deployed on platforms like Streamlit Cloud, Heroku, or AWS, making it publicly accessible.
This methodology ensures a functional, interactive medical assistance bot capable of answering user queries based on the content of the PDF.
Problem Statement:
The goal of this project is to develop a Medical Assistance Bot that can answer user queries related to medical information
extracted from a PDF document. The PDF (Medical_BOOK.pdf) contains a comprehensive medical book or reference material,
and the challenge is to build a system that can efficiently extract relevant information from it and provide accurate, context-
based answers to user questions.
The problem can be broken down into the following key challenges:
1.Text Extraction: Efficiently extracting and processing structured and unstructured text from a large PDF file.
2.Information Retrieval: Building a system that can search and retrieve the most relevant content from the extracted text
based on user queries.
3.Natural Language Understanding: Creating a model capable of understanding medical queries and accurately answering
them by analyzing the extracted information.
4.Scalability: Ensuring that the system can handle large documents while providing fast and accurate responses.
5.User Interaction: Developing a user-friendly interface for interacting with the bot, allowing users to input medical queries
and receive relevant answers.
The project addresses these challenges by combining text extraction, NLP techniques, and retrieval-based question answering
in an interactive web application.
Solution:
Text Extraction: Use the PyMuPDF library to extract and clean text from the PDF.
Text Chunking: Break the extracted text into smaller chunks for easier processing using LangChain.
Vector Embeddings: Convert text chunks into vector embeddings using Hugging Face Embeddings for semantic
similarity.
Retrieval-Based QA: Store embeddings in a FAISS database and use a pre-trained QA model (e.g.,
deepset/roberta-base-squad2) to answer queries.
User Interface: Build a simple UI with Streamlit for user interaction.
Deployment: Deploy the application on platforms like Streamlit Cloud, Heroku, or AWS for public use.
Conclusion:
This project successfully developed a Medical Assistance Bot capable of answering user
queries based on a medical PDF document. By leveraging advanced Natural Language
Processing (NLP) techniques and efficient information retrieval methods, the bot provides
accurate, context-based answers to medical questions. Key components, such as text
extraction, chunking, vector embeddings, and retrieval-based QA, were integrated seamlessly
to build a functional system. The user-friendly interface, developed with Streamlit, enhances the
overall experience by allowing easy interaction with the bot. The system was deployed on
platforms like Streamlit Cloud, ensuring accessibility for users.
Overall, this solution effectively combines various technologies to address the challenge of
providing reliable medical information, making it a valuable tool for anyone seeking medical
assistance from a vast text source.

More Related Content

PPTX
internship ppt.pptx
PPTX
Smart Health Chatbot health care for patients.pptx
PPTX
AI CHATBOOT.pptx
PPTX
health care bot learning power point.pptx
PPTX
lung disease presenatation realted on medical field
PDF
Final_project_report.pdf
PPTX
internship report on social internship of engg. student
PPTX
ChatGPT in HPE
internship ppt.pptx
Smart Health Chatbot health care for patients.pptx
AI CHATBOOT.pptx
health care bot learning power point.pptx
lung disease presenatation realted on medical field
Final_project_report.pdf
internship report on social internship of engg. student
ChatGPT in HPE

Similar to PPT on Implementation of Chatbot using NLP.pptx (20)

PDF
Design of Chatbot using Deep Learning
PDF
Neuron: A Learning Project and PoC implementing a private ChatGPT like (and...
PPTX
Pawantiwari.pptxshjsjsjssjskahwhhwbw167272
PPTX
AL TO EXTRACT THE PDF DATA AND USING THE LLM GIVE THE ANSWER
PDF
my-document (2).pdf
PDF
MedWise: Your Healthmate
PPTX
Project.pptx
PPTX
health care chatbot using data science with python
PDF
A Chatbot For Medical Purpose Using Deep Learning
PDF
AI 2023.pdf
PDF
Agentic AI Use Cases using GenAI LLM models
PPTX
Artificial_intelligence_Presentation.pptx
PPTX
AI-Enhanced RAG System for Automated University Course Content Generation
PPTX
AI DRIVEN ASSISTANT CHATBOT FOR MENTAL DISTRUBANCE
PDF
Using natural language processing to evaluate the impact of specialized trans...
PPTX
AI Bots and there ways of development using the code
PPTX
aichat bot application for preesentation
PPTX
Generative AI in CSharp with Semantic Kernel.pptx
PDF
LoA (Librarian of Alexandria): An AI-Powered Linux-Python Tool for Comprehens...
PDF
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Design of Chatbot using Deep Learning
Neuron: A Learning Project and PoC implementing a private ChatGPT like (and...
Pawantiwari.pptxshjsjsjssjskahwhhwbw167272
AL TO EXTRACT THE PDF DATA AND USING THE LLM GIVE THE ANSWER
my-document (2).pdf
MedWise: Your Healthmate
Project.pptx
health care chatbot using data science with python
A Chatbot For Medical Purpose Using Deep Learning
AI 2023.pdf
Agentic AI Use Cases using GenAI LLM models
Artificial_intelligence_Presentation.pptx
AI-Enhanced RAG System for Automated University Course Content Generation
AI DRIVEN ASSISTANT CHATBOT FOR MENTAL DISTRUBANCE
Using natural language processing to evaluate the impact of specialized trans...
AI Bots and there ways of development using the code
aichat bot application for preesentation
Generative AI in CSharp with Semantic Kernel.pptx
LoA (Librarian of Alexandria): An AI-Powered Linux-Python Tool for Comprehens...
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Ad

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
composite construction of structures.pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Project quality management in manufacturing
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
web development for engineering and engineering
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Geodesy 1.pptx...............................................
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CYBER-CRIMES AND SECURITY A guide to understanding
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
composite construction of structures.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Project quality management in manufacturing
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Construction Project Organization Group 2.pptx
OOP with Java - Java Introduction (Basics)
web development for engineering and engineering
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mechanical Engineering MATERIALS Selection
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Internet of Things (IOT) - A guide to understanding
Embodied AI: Ushering in the Next Era of Intelligent Systems
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Geodesy 1.pptx...............................................
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Ad

PPT on Implementation of Chatbot using NLP.pptx

  • 1. Implementation of Chatbot using NLP By Samrat Ghorui AICTE Reg. ID. - STU66a869ada4ac51722313133 AICTE Internship ID. - INTERNSHIP_173070615967287aef12823 Collage Name: Heritage Institute of Technology, Kolkata
  • 2. Learning Objectives: GOAL 1.Text Extraction from PDF: •Learn how to extract and process textual data from PDF files using Python libraries like PyMuPDF. This will involve handling different structures of documents and ensuring the text is clean and ready for further processing. 2.Natural Language Processing (NLP) for Question Answering: •Understand how to implement NLP techniques for creating a Question Answering (QA) system. This includes leveraging pre-trained models such as those from Hugging Face’s transformers library and adapting them to understand medical queries. 3.Text Preprocessing and Chunking: •Gain proficiency in splitting large texts into manageable chunks to feed into NLP models. This involves learning how to process large documents efficiently and ensuring that the model can handle text input effectively without running into issues of memory or size. 4.Building a Retrieval-Based QA System: •Learn how to build a retrieval-based QA system using vector databases likeFAISS to store and search for relevant information. This will involve indexing documents, calculating similarity, and using a retriever to fetch the most relevant chunks of text for answering user queries. 5.Embedding Generation: •Develop an understanding of how embeddings work and how they can be used to represent textual information in vector space. Learn to generate these embeddings using models like all-MiniLM-L6-v2 and how to use them to improve information retrieval accuracy. 6.UI Design and User Interaction: •Learn how to create an interactive user interface using Streamlit for a smooth user experience. This will involve allowing users to input questions and receive relevant answers based on the content extracted from the medical PDF. 7.Deployment of the Application: •Understand the process of deploying a machine learning or NLP-based web application. This includes knowledge of cloud platforms like Heroku or Streamlit Cloud and setting up the necessary environment for running the app in production. 8.Integration of Various Tools and Libraries: •Gain exposure to integrating various tools and libraries, such as LangChain for QA, Hugging Face for NLP models, and FAISS for vector-based retrieval, to create a unified medical assistance bot.
  • 3. Tools and Technology used: 1.Python: •The primary programming language used for the entire application due to its extensive libraries for data processing, NLP, and web development. 2.PyMuPDF (fitz): •A library used to extract text from PDF files. It allows efficient reading of the PDF contents, which is essential for processing the medical book (abc.pdf). 3.LangChain: •A framework designed to simplify the process of integrating language models with external data sources. It is used for building the retrieval-based QA system. 4.Hugging Face's Transformers: •Provides access to pre-trained NLP models, like RoBERTa, for question-answering tasks. These models are used to process and answer the user queries based on the context provided by the PDF text. 5.FAISS: •A library for similarity search and clustering of dense vectors. FAISS is used to store and retrieve the text chunks from the PDF based on vector similarity, allowing fast retrieval of relevant information. 6.Streamlit: •A tool for building interactive web applications quickly. Streamlit is used to create the user interface where users can input their queries and get answers. 7.Hugging Face Embeddings: •A model that generates dense vector representations of text. These embeddings are used to transform text into vectors, enabling efficient similarity searches for answering queries. 8.Jupyter Notebook / IDE: •Jupyter Notebook is used for testing and experimentation, while IDEs like Visual Studio Code are used for writing the application code. 9.GitHub: •A platform for version control and collaboration. It is used to store and manage the project’s code, allowing easy sharing and deployment. 10.Cloud Platforms (Optional): • Streamlit Cloud, Heroku, or AWS can be used for hosting the application, enabling it to be accessed by users from anywhere.
  • 4. Methodology Text Extraction: The text from the medical PDF (abc.pdf) is extracted using the PyMuPDF library, cleaned, and preprocessed for further use. Text Chunking: The extracted text is divided into smaller chunks using LangChain to make it manageable for the QA system. Embeddings & Vector Database: The text chunks are converted into vector embeddings using the Hugging Face Embeddings model. These embeddings are stored in a FAISS vector database for efficient retrieval. Retrieval-Based QA System: When a user asks a query, relevant text chunks are retrieved from FAISS. These chunks are passed to a QA model (e.g., deepset/roberta-base-squad2) to generate an answer. User Interface: A simple UI is developed using Streamlit, allowing users to input queries and view answers from the medical PDF. Integration & Testing: The components are integrated and tested to ensure accurate query responses and smooth interaction. Deployment: The application is deployed on platforms like Streamlit Cloud, Heroku, or AWS, making it publicly accessible. This methodology ensures a functional, interactive medical assistance bot capable of answering user queries based on the content of the PDF.
  • 5. Problem Statement: The goal of this project is to develop a Medical Assistance Bot that can answer user queries related to medical information extracted from a PDF document. The PDF (Medical_BOOK.pdf) contains a comprehensive medical book or reference material, and the challenge is to build a system that can efficiently extract relevant information from it and provide accurate, context- based answers to user questions. The problem can be broken down into the following key challenges: 1.Text Extraction: Efficiently extracting and processing structured and unstructured text from a large PDF file. 2.Information Retrieval: Building a system that can search and retrieve the most relevant content from the extracted text based on user queries. 3.Natural Language Understanding: Creating a model capable of understanding medical queries and accurately answering them by analyzing the extracted information. 4.Scalability: Ensuring that the system can handle large documents while providing fast and accurate responses. 5.User Interaction: Developing a user-friendly interface for interacting with the bot, allowing users to input medical queries and receive relevant answers. The project addresses these challenges by combining text extraction, NLP techniques, and retrieval-based question answering in an interactive web application.
  • 6. Solution: Text Extraction: Use the PyMuPDF library to extract and clean text from the PDF. Text Chunking: Break the extracted text into smaller chunks for easier processing using LangChain. Vector Embeddings: Convert text chunks into vector embeddings using Hugging Face Embeddings for semantic similarity. Retrieval-Based QA: Store embeddings in a FAISS database and use a pre-trained QA model (e.g., deepset/roberta-base-squad2) to answer queries. User Interface: Build a simple UI with Streamlit for user interaction. Deployment: Deploy the application on platforms like Streamlit Cloud, Heroku, or AWS for public use.
  • 7. Conclusion: This project successfully developed a Medical Assistance Bot capable of answering user queries based on a medical PDF document. By leveraging advanced Natural Language Processing (NLP) techniques and efficient information retrieval methods, the bot provides accurate, context-based answers to medical questions. Key components, such as text extraction, chunking, vector embeddings, and retrieval-based QA, were integrated seamlessly to build a functional system. The user-friendly interface, developed with Streamlit, enhances the overall experience by allowing easy interaction with the bot. The system was deployed on platforms like Streamlit Cloud, ensuring accessibility for users. Overall, this solution effectively combines various technologies to address the challenge of providing reliable medical information, making it a valuable tool for anyone seeking medical assistance from a vast text source.