SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1878
Natural Language Processing and summarization of medical
symptomatic data from geographical diverse locations
Snehal N. Palve1, R.N Awale 2, Vaibhav Awandekar 3, Sunil Lakdawala 4
1 M.Tech Student, Electrical Department, Veermata Jijabai Technological Institute, Mumbai, India
2 Professor, Electrical Department, Veermata Jijabai Technological Institute, Mumbai, India
3 Senior R&D Engineer, A3 Remote Monitoring Technologies Pvt Ltd, India
4 Director, A3 Remote Monitoring Technologies Pvt Ltd, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This Paper deals with detection of regional
symptomatic diseases at an earlier stage. This early detection
across various geographical locations will be helpful for early
diagnosis and for death prevention on a largerscale. Adopting
random methods for early detection and its respective factors
will be unsystematic. The proposed method is for detecting
disease epidemics/pandemics by considering large
symptomatic data and Natural Language Processing (NLP).
These Symptomatic data is available incommentsmadeby the
doctors during physiological data acquisition from the
database by hospitals. NLP will be used to find the common
diseases over different geographical locations.
Key Words: Regional symptomatic disease; natural
language processing; machine learning; physiological
data acquisition; statistical data analysis.
1. INTRODUCTION
Nowadays, healthcare is taken into account to be a
significant challenge. Infectiousdiseasesareamongthemost
serious health issues in the world. The emergence of these
diseases can be through air, water, direct contact with the
infected person, biologicallyandecological determinants [1].
In the year 2020, the world has witnessed an outbreak of
infectious diseases, whichisCorona Virus/Covid-19.Around
64 L deaths globally are reported till date and it is getting
multiplied at an awfully faster rate. Awareness of such
infectious diseases needs to be spread widely among the
people for the prevention of being infected in prior. These
infectious diseases go on spreadingoverlargerareasleading
to epidemics and pandemics. Also, these outbreaks have
major impacts on the population both socially and
economically.
In the situation of epidemics/pandemics,wheneverythingis
virtual, there are many places in our country which lack
medical facilities. The traditional wayoftreatmenttodisease
may not be enough in the case of serious problems.
Developing a medical diagnosis system based on Natural
Language Processing (NLP) and Machine Learning (ML)
algorithms for prediction of any disease can help in a more
accurate diagnosis and preventingthespreadforpandemics.
Accurate and on-time analysis of any health relatedproblem
is vital for the prevention and treatmentoftheillness.Hence,
detecting the spread of such epidemics / pandemics at an
early stage across various locations is going to be helpful for
early diagnosis and for death prevention on a bigger scale.
After identification of an emerging pandemic, detecting the
disease spread, local and international healthcare
organizations may be notified earlier in order that they will
take steps to halt the disease'sprogress[2].Thus, controlling
the epidemic diseases at the start of its spread may be a vital
solution for epidemics/pandemics.
2. LITERATURE SURVEY
Harini D K, Natesh M [3], In this paper, machine learning
algorithms is used for effective prediction of diseases. It uses
both structured and unstructured data from hospital for
effective prediction of diseases.Latentfactormodelisusedto
overcome the difficulty of missing data. A new convolutional
neural network based multi-modal disease risk prediction
(CNN-MDRP) algorithm is proposed in this paper. The
proposed algorithm accuracyprediction reaches94.8%than
that of the CNN-based unimodal disease risk prediction
(CNN-UDRP) algorithm.
Shratik J. Mishra, Albar M. Vasi , Vinay S. Menon, Prof. K.
Jayamalini [4] , The system implementedhadtheaccuracyof
86.67% on the dataset of 120 patient data. The current
system covered the general diseases or the more commonly
occurring disease, so that early prediction and treatment
could be done, and the fatality rate of deadly diseases
decreases, with the economic benefit.
Minsung Kim, Joon Yeop Lee, Hwangnam Kim [5],This paper
presents an Early Warning System (EWS) which is able to
predict infectious disease outbreaks and detect the sudden
increase of any livestock disease with the potentials to
become epidemic before spreading.
Pahulpreet Singh Kohli, Shriya Arora [6], In this paper,
different classificationalgorithmswereapplied, eachwithits
own advantage on three separate databases of disease
available in UCI repository for disease prediction.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1879
3. PROPOSED WORK
The already existing system, for disease prediction, uses
various data processing types, which is the actual basic
foundation of Artificial Intelligence (AI) and Machine
Learning (ML). Natural Language Processing (NLP) refersto
one of the method of AI, which is concerned with giving
computers a potential to understand text and spoken words
in the same way humans can.
Figure 1: Proposed System Block Diagram
The Figure 1 shows the Block Diagram of the proposed
System. It has mainly two parts. The first part is the ‘disease
prediction’, that is using NLP algorithm to extract the
symptoms from complaints given by user and then
predicting the corresponding diseases. The second part is
the ‘plotting over geographical regions’ which uses the
latitude and longitude from differentstates,inordertoknow
the spread of diseases.
The main aim of the proposed system is to develop an
artificial intelligent system which detects the spread of
diseases over geographical locations. This can be done by
extracting the regional symptomatic data that is the
symptoms, in the form of complaints, given by the
paramedic/doctor during the physiological data acquisition.
The database contains tables, which has the information
about the patient details like name, age, gender, contact
details, visit date, complaints/symptoms the patient having
and location. Second table consisting of symptoms, disease
lists and mapping of symptoms to disease prepared by the
pre-registered doctors. Third table consists of city with
latitude and longitude of location. The patient
data will be stored on the web server so that the doctor can
access the information whenever required from
anywhere and does not have to be physically present.
Figure 2: Symptom Extraction Process
The Figure 2 shows the symptom extraction process. So, the
user enters the input in the form of complaints,whichcanbe
a sentence or a paragraph. Then the entered input will go
through the tokenization process, where it breaks down the
sentence or the paragraph into smaller chunks of words.
This smaller chunks will be processed with ‘stop word
removal’, to remove the stop words like ‘is, this, the, are, a,
an, etc.’ After, the stop words are removed, the result is
passed through removal of delimiters that is ‘ , and .’ . Later
when the delimiters as well as stop words areremovedfrom
the sentence/paragraph, the remaining words are referred
to as ‘candidate keywords’. Sometimes, the candidate
keyword can also be the extracted symptom from the
sentence/paragraph. The next is the segmentation process
where meaningful phrases are created from the data
obtained after the entire tokenization process.
And finally, the system will cross check the obtained
segmented data and the symptom list in the database. Each
time a match is found, will be considered as a symptomfrom
the user’s entered input. Later, these symptoms will be used
for further disease prediction.
For the disease prediction, the database will have the list of
symptoms and diseases which are prepared by the pre-
registered doctors. After the process of symptom extraction,
the system will make cross-matching with the database
which contains both ‘symptom and disease list’. Finally, the
system will then predict the disease.
4. PLOTTING OVER GEOGRAPHICAL REGIONS
AND GENERATING REPORTS
The final output is more beneficial when the system
generates report in the form of graph. The graph indicates
the rise of symptoms and disease location wise as well as
month wise. The Figure 3 shows the month-wise symptom
rise and the Figure 4 shows the state-wise symptom rise.
The system also generatesriseofepidemics/pandemicsover
geographical regions. The geographical region on map uses
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1880
latitude and longitude from the database for mapping the
symptoms over thestates.Afterselectingmultiplesymptoms
from the checkbox of the user interface, the map-view will
show the number of patient spread across different regions
having the selected symptom. Figure 5 shows the
geographical view of the symptom “Cold and cough”, ”Skin
problem” and “Body pain”, in the states of India. This
geographical view shows the total count of patients having
particular symptom in the form of bubble map. Multiple
symptoms on map can be distinguished by different color
code. When the cursor moves over to the coordinates of any
particular state or symptom, cursor will show the count of
the patients in that state for that particular symptom, as
displayed the count of patient for skin problem in the state
Gujarat. The dataset used for geographical plotting consists
of more than 7k entries.
Figure 3: Month wise symptoms rise
Figure 4: State wise Symptoms rise
Figure 5: Symptom rise over geographical regions
5. CONCLUSION
The proposed system aims at detecting the spread of
epidemics /pandemics over geographical map plot. The
reports generated in the results shown can be used by
medical or health organizations for analysis at national or
international level. The future work is to visualize the data
more prominently and add animations to the graphs. These
graphs can consider many other attributes like year, age,
gender, etc. Mapping over geographical regions can further
be done at district level as well as village level.
6. REFERENCES
[1] Inayatulloh, Selvyna Theresia, “Early Warning System
for Infectious Diseases”, DOI:
10.1109/TSSA.2015.7440435 ,IEEE 2015.
[2] Khanita Duangchaemkarn, Varin Chaovatut, Phongtape
Wiwatanadate, and Ekkarat Boon chieng, “Symptom-
based Data Preprocessing for the Detection of Disease
Outbreak”, DOI:10.1109/EMBC.2017.8037393 ,pp.
2614-2617, EEE, 2017.
[3] Harini D K, Natesh M , “Prediction Of Probability Of
Disease Based On Symptoms Using Machine Learning
Algorithm” ,International Research Journal of
Engineering and Technology (IRJET), Volume: 05 Issue:
05 | May-2018.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1881
[4] Shratik J. Mishra, Albar M. Vasi , Vinay S. Menon, Prof. K.
Jayamalini, “GDPS - General Disease PredictionSystem”,
International Research Journal of Engineering and
Technology (IRJET), Volume: 05 Issue: 03 | Mar-2018.
[5] Minsung Kim, Joon Yeop Lee, HwangnamKim, “Warning
and Detection System for Epidemic Disease”,
DOI: 10.1109/ICTC.2016.7763517 ,pp. 478-483 ,IEEE,
2018.
[6] P. S. Kohli and S. Arora, “Application of Machine
Learning in Disease Prediction”, 4th International
Conference on Computing Communication and
Automation(ICCCA),DOI:10.1109/CCAA.2018.8777449
,pp. 1-4 , 2018.
[7] Fariz Bramasta Putra et. al., “IdentificationofSymptoms
Based on Natural Language Processing (NLP) for
Disease Diagnosis Based on International Classification
of Diseases and Related Health Problems”,International
Electronics Symposium (IES), DOI:
10.1109/ELECSYM.2019.8901644, pp. 1-5 ,IEEE, 2019.
[8] Aswathy K P, Rathi R, Shyam Shankar E P, “NLP based
Segmentation Protocol for Predicting Diseases and
Finding Doctors”, International Research Journal of
Engineering and Technology (IRJET) , Volume:06Issue:
02 | Feb 2019.
[9] PM. Lavanya , E. Sasikala, “Deep LearningTechniqueson
Text Classification Using Natural Language Processing
(NLP) In Social Healthcare Network: A Comprehensive
Survey”, International Conference on Signal Processing
and Communication, DOI: 10.1109/
ICSPC51351.2021.9451752, pp. 603-609,IEEE, 2021.

More Related Content

PDF
IRJET- A Prediction Engine for Influenza Pandemic using Healthcare Analysis
PDF
IRJET - Review on Classi?cation and Prediction of Dengue and Malaria Dise...
PDF
IRJET- Result on the Application for Multiple Disease Prediction from Symptom...
PDF
IRJET - Prediction and Analysis of Multiple Diseases using Machine Learni...
PDF
Multiple disease prediction using Machine Learning Algorithms
PDF
IRJET- Disease Analysis and Giving Remedies through an Android Application
PDF
IRJET- Mobile Assisted Remote Healthcare Service
PDF
Multiple Disease Prediction System
IRJET- A Prediction Engine for Influenza Pandemic using Healthcare Analysis
IRJET - Review on Classi?cation and Prediction of Dengue and Malaria Dise...
IRJET- Result on the Application for Multiple Disease Prediction from Symptom...
IRJET - Prediction and Analysis of Multiple Diseases using Machine Learni...
Multiple disease prediction using Machine Learning Algorithms
IRJET- Disease Analysis and Giving Remedies through an Android Application
IRJET- Mobile Assisted Remote Healthcare Service
Multiple Disease Prediction System

Similar to Natural Language Processing and summarization of medical symptomatic data from geographical diverse locations (20)

PDF
IOT BASED HEALTH MONITORING SYSTEM FOR COVID 19 PATIENT
PDF
DISEASE PREDICTION SYSTEM USING SYMPTOMS
PDF
IRJET- A System for Complete Healthcare Management: Ask-Us-Health A Secon...
PDF
MediBot: A Primary Telemedicine Approach for Basic Ailments
PDF
IRJET- An Information Forwarder for Healthcare Service and analysis using Big...
PDF
IRJET- Survey on Risk Estimation of Chronic Disease using Machine Learning
PDF
Integrated Health App
PDF
Multi Disease Detection using Deep Learning
PDF
COVID-19 FUTURE FORECASTING USING SUPERVISED MACHINE LEARNING MODELS
PDF
Health Analyzer System
DOCX
Dhruti
PDF
IRJET- Wireless Real Time Implementation of Health Assist System for Rurals
PDF
IRJET- IoT based Patient Health Monitoring using ESP8266
PDF
Multiple Disease Prediction System: A Review
PDF
Patient Health Monitoring System using IOT
PDF
IRJET- An Android Application for Electronic Health Record System
PDF
IRJET- An Android Application for Electronic Health Record System
PDF
Proposed Model for Chest Disease Prediction using Data Analytics
PDF
Predictions And Analytics In Healthcare: Advancements In Machine Learning
PDF
RECOMMENDER SYSTEM FOR DETECTION OF DENGUE USING FUZZY LOGIC
IOT BASED HEALTH MONITORING SYSTEM FOR COVID 19 PATIENT
DISEASE PREDICTION SYSTEM USING SYMPTOMS
IRJET- A System for Complete Healthcare Management: Ask-Us-Health A Secon...
MediBot: A Primary Telemedicine Approach for Basic Ailments
IRJET- An Information Forwarder for Healthcare Service and analysis using Big...
IRJET- Survey on Risk Estimation of Chronic Disease using Machine Learning
Integrated Health App
Multi Disease Detection using Deep Learning
COVID-19 FUTURE FORECASTING USING SUPERVISED MACHINE LEARNING MODELS
Health Analyzer System
Dhruti
IRJET- Wireless Real Time Implementation of Health Assist System for Rurals
IRJET- IoT based Patient Health Monitoring using ESP8266
Multiple Disease Prediction System: A Review
Patient Health Monitoring System using IOT
IRJET- An Android Application for Electronic Health Record System
IRJET- An Android Application for Electronic Health Record System
Proposed Model for Chest Disease Prediction using Data Analytics
Predictions And Analytics In Healthcare: Advancements In Machine Learning
RECOMMENDER SYSTEM FOR DETECTION OF DENGUE USING FUZZY LOGIC
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
Artificial Intelligence
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PDF
PPT on Performance Review to get promotions
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
UNIT - 3 Total quality Management .pptx
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PPT
Occupational Health and Safety Management System
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Current and future trends in Computer Vision.pptx
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
III.4.1.2_The_Space_Environment.p pdffdf
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Artificial Intelligence
Exploratory_Data_Analysis_Fundamentals.pdf
Safety Seminar civil to be ensured for safe working.
Abrasive, erosive and cavitation wear.pdf
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPT on Performance Review to get promotions
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
UNIT - 3 Total quality Management .pptx
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Occupational Health and Safety Management System
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Automation-in-Manufacturing-Chapter-Introduction.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Current and future trends in Computer Vision.pptx
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
III.4.1.2_The_Space_Environment.p pdffdf

Natural Language Processing and summarization of medical symptomatic data from geographical diverse locations

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1878 Natural Language Processing and summarization of medical symptomatic data from geographical diverse locations Snehal N. Palve1, R.N Awale 2, Vaibhav Awandekar 3, Sunil Lakdawala 4 1 M.Tech Student, Electrical Department, Veermata Jijabai Technological Institute, Mumbai, India 2 Professor, Electrical Department, Veermata Jijabai Technological Institute, Mumbai, India 3 Senior R&D Engineer, A3 Remote Monitoring Technologies Pvt Ltd, India 4 Director, A3 Remote Monitoring Technologies Pvt Ltd, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - This Paper deals with detection of regional symptomatic diseases at an earlier stage. This early detection across various geographical locations will be helpful for early diagnosis and for death prevention on a largerscale. Adopting random methods for early detection and its respective factors will be unsystematic. The proposed method is for detecting disease epidemics/pandemics by considering large symptomatic data and Natural Language Processing (NLP). These Symptomatic data is available incommentsmadeby the doctors during physiological data acquisition from the database by hospitals. NLP will be used to find the common diseases over different geographical locations. Key Words: Regional symptomatic disease; natural language processing; machine learning; physiological data acquisition; statistical data analysis. 1. INTRODUCTION Nowadays, healthcare is taken into account to be a significant challenge. Infectiousdiseasesareamongthemost serious health issues in the world. The emergence of these diseases can be through air, water, direct contact with the infected person, biologicallyandecological determinants [1]. In the year 2020, the world has witnessed an outbreak of infectious diseases, whichisCorona Virus/Covid-19.Around 64 L deaths globally are reported till date and it is getting multiplied at an awfully faster rate. Awareness of such infectious diseases needs to be spread widely among the people for the prevention of being infected in prior. These infectious diseases go on spreadingoverlargerareasleading to epidemics and pandemics. Also, these outbreaks have major impacts on the population both socially and economically. In the situation of epidemics/pandemics,wheneverythingis virtual, there are many places in our country which lack medical facilities. The traditional wayoftreatmenttodisease may not be enough in the case of serious problems. Developing a medical diagnosis system based on Natural Language Processing (NLP) and Machine Learning (ML) algorithms for prediction of any disease can help in a more accurate diagnosis and preventingthespreadforpandemics. Accurate and on-time analysis of any health relatedproblem is vital for the prevention and treatmentoftheillness.Hence, detecting the spread of such epidemics / pandemics at an early stage across various locations is going to be helpful for early diagnosis and for death prevention on a bigger scale. After identification of an emerging pandemic, detecting the disease spread, local and international healthcare organizations may be notified earlier in order that they will take steps to halt the disease'sprogress[2].Thus, controlling the epidemic diseases at the start of its spread may be a vital solution for epidemics/pandemics. 2. LITERATURE SURVEY Harini D K, Natesh M [3], In this paper, machine learning algorithms is used for effective prediction of diseases. It uses both structured and unstructured data from hospital for effective prediction of diseases.Latentfactormodelisusedto overcome the difficulty of missing data. A new convolutional neural network based multi-modal disease risk prediction (CNN-MDRP) algorithm is proposed in this paper. The proposed algorithm accuracyprediction reaches94.8%than that of the CNN-based unimodal disease risk prediction (CNN-UDRP) algorithm. Shratik J. Mishra, Albar M. Vasi , Vinay S. Menon, Prof. K. Jayamalini [4] , The system implementedhadtheaccuracyof 86.67% on the dataset of 120 patient data. The current system covered the general diseases or the more commonly occurring disease, so that early prediction and treatment could be done, and the fatality rate of deadly diseases decreases, with the economic benefit. Minsung Kim, Joon Yeop Lee, Hwangnam Kim [5],This paper presents an Early Warning System (EWS) which is able to predict infectious disease outbreaks and detect the sudden increase of any livestock disease with the potentials to become epidemic before spreading. Pahulpreet Singh Kohli, Shriya Arora [6], In this paper, different classificationalgorithmswereapplied, eachwithits own advantage on three separate databases of disease available in UCI repository for disease prediction.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1879 3. PROPOSED WORK The already existing system, for disease prediction, uses various data processing types, which is the actual basic foundation of Artificial Intelligence (AI) and Machine Learning (ML). Natural Language Processing (NLP) refersto one of the method of AI, which is concerned with giving computers a potential to understand text and spoken words in the same way humans can. Figure 1: Proposed System Block Diagram The Figure 1 shows the Block Diagram of the proposed System. It has mainly two parts. The first part is the ‘disease prediction’, that is using NLP algorithm to extract the symptoms from complaints given by user and then predicting the corresponding diseases. The second part is the ‘plotting over geographical regions’ which uses the latitude and longitude from differentstates,inordertoknow the spread of diseases. The main aim of the proposed system is to develop an artificial intelligent system which detects the spread of diseases over geographical locations. This can be done by extracting the regional symptomatic data that is the symptoms, in the form of complaints, given by the paramedic/doctor during the physiological data acquisition. The database contains tables, which has the information about the patient details like name, age, gender, contact details, visit date, complaints/symptoms the patient having and location. Second table consisting of symptoms, disease lists and mapping of symptoms to disease prepared by the pre-registered doctors. Third table consists of city with latitude and longitude of location. The patient data will be stored on the web server so that the doctor can access the information whenever required from anywhere and does not have to be physically present. Figure 2: Symptom Extraction Process The Figure 2 shows the symptom extraction process. So, the user enters the input in the form of complaints,whichcanbe a sentence or a paragraph. Then the entered input will go through the tokenization process, where it breaks down the sentence or the paragraph into smaller chunks of words. This smaller chunks will be processed with ‘stop word removal’, to remove the stop words like ‘is, this, the, are, a, an, etc.’ After, the stop words are removed, the result is passed through removal of delimiters that is ‘ , and .’ . Later when the delimiters as well as stop words areremovedfrom the sentence/paragraph, the remaining words are referred to as ‘candidate keywords’. Sometimes, the candidate keyword can also be the extracted symptom from the sentence/paragraph. The next is the segmentation process where meaningful phrases are created from the data obtained after the entire tokenization process. And finally, the system will cross check the obtained segmented data and the symptom list in the database. Each time a match is found, will be considered as a symptomfrom the user’s entered input. Later, these symptoms will be used for further disease prediction. For the disease prediction, the database will have the list of symptoms and diseases which are prepared by the pre- registered doctors. After the process of symptom extraction, the system will make cross-matching with the database which contains both ‘symptom and disease list’. Finally, the system will then predict the disease. 4. PLOTTING OVER GEOGRAPHICAL REGIONS AND GENERATING REPORTS The final output is more beneficial when the system generates report in the form of graph. The graph indicates the rise of symptoms and disease location wise as well as month wise. The Figure 3 shows the month-wise symptom rise and the Figure 4 shows the state-wise symptom rise. The system also generatesriseofepidemics/pandemicsover geographical regions. The geographical region on map uses
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1880 latitude and longitude from the database for mapping the symptoms over thestates.Afterselectingmultiplesymptoms from the checkbox of the user interface, the map-view will show the number of patient spread across different regions having the selected symptom. Figure 5 shows the geographical view of the symptom “Cold and cough”, ”Skin problem” and “Body pain”, in the states of India. This geographical view shows the total count of patients having particular symptom in the form of bubble map. Multiple symptoms on map can be distinguished by different color code. When the cursor moves over to the coordinates of any particular state or symptom, cursor will show the count of the patients in that state for that particular symptom, as displayed the count of patient for skin problem in the state Gujarat. The dataset used for geographical plotting consists of more than 7k entries. Figure 3: Month wise symptoms rise Figure 4: State wise Symptoms rise Figure 5: Symptom rise over geographical regions 5. CONCLUSION The proposed system aims at detecting the spread of epidemics /pandemics over geographical map plot. The reports generated in the results shown can be used by medical or health organizations for analysis at national or international level. The future work is to visualize the data more prominently and add animations to the graphs. These graphs can consider many other attributes like year, age, gender, etc. Mapping over geographical regions can further be done at district level as well as village level. 6. REFERENCES [1] Inayatulloh, Selvyna Theresia, “Early Warning System for Infectious Diseases”, DOI: 10.1109/TSSA.2015.7440435 ,IEEE 2015. [2] Khanita Duangchaemkarn, Varin Chaovatut, Phongtape Wiwatanadate, and Ekkarat Boon chieng, “Symptom- based Data Preprocessing for the Detection of Disease Outbreak”, DOI:10.1109/EMBC.2017.8037393 ,pp. 2614-2617, EEE, 2017. [3] Harini D K, Natesh M , “Prediction Of Probability Of Disease Based On Symptoms Using Machine Learning Algorithm” ,International Research Journal of Engineering and Technology (IRJET), Volume: 05 Issue: 05 | May-2018.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1881 [4] Shratik J. Mishra, Albar M. Vasi , Vinay S. Menon, Prof. K. Jayamalini, “GDPS - General Disease PredictionSystem”, International Research Journal of Engineering and Technology (IRJET), Volume: 05 Issue: 03 | Mar-2018. [5] Minsung Kim, Joon Yeop Lee, HwangnamKim, “Warning and Detection System for Epidemic Disease”, DOI: 10.1109/ICTC.2016.7763517 ,pp. 478-483 ,IEEE, 2018. [6] P. S. Kohli and S. Arora, “Application of Machine Learning in Disease Prediction”, 4th International Conference on Computing Communication and Automation(ICCCA),DOI:10.1109/CCAA.2018.8777449 ,pp. 1-4 , 2018. [7] Fariz Bramasta Putra et. al., “IdentificationofSymptoms Based on Natural Language Processing (NLP) for Disease Diagnosis Based on International Classification of Diseases and Related Health Problems”,International Electronics Symposium (IES), DOI: 10.1109/ELECSYM.2019.8901644, pp. 1-5 ,IEEE, 2019. [8] Aswathy K P, Rathi R, Shyam Shankar E P, “NLP based Segmentation Protocol for Predicting Diseases and Finding Doctors”, International Research Journal of Engineering and Technology (IRJET) , Volume:06Issue: 02 | Feb 2019. [9] PM. Lavanya , E. Sasikala, “Deep LearningTechniqueson Text Classification Using Natural Language Processing (NLP) In Social Healthcare Network: A Comprehensive Survey”, International Conference on Signal Processing and Communication, DOI: 10.1109/ ICSPC51351.2021.9451752, pp. 603-609,IEEE, 2021.