SlideShare a Scribd company logo
2
Most read
3
Most read
5
Most read
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1214
Kinjal Raut 1, Chaitrali Katkar2, Prof. Dr. Mrs. Suhasini A. Itkar 3
1Final Year Computer Engineering Student, PES Modern College of Engineering, Pune
2Final Year Computer Engineering Student, PES Modern College of Engineering, Pune
3Professor, Dept. of Computer Engineering, PES Modern College of Engineering, Pune, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Polycystic ovary syndrome (PCOS), also known as
polycystic ovarian syndrome, is hormonal endocrine disorder
among women of reproductive age. Over five million women
worldwide in their reproductive age are suffering from PCOS.
The most common symptoms of this disorder may include
missed periods, irregular periods, or very light periods, it af-
fects in a way that ovaries become large or may contain many
cysts, it can also cause excess body hair, including the chest,
stomach, and hirsutism, can cause weight gain, especially
around the abdomen, Acne or oily skin. The exact pathophysi-
ology of PCOS is not yet known. This heterogenous disorder is
characterized by the ovaries mainly. PCOS is a multifactorial
and polygenic condition. Machine Learning is capable of
"learning" features from very large amount through clinical
practice to diagnose this disorder. This paper put forwards a
solution to this problem which helps in early detection and
prediction of PCOS treatment fromanoptimalandminimal set
of parameters which have been statistically analyzed. The so-
lution is built using machine learning algorithms such asRan-
dom Forest, Decision Tree, Support Vector Classifier, Logistic
Regression, K-Nearest neighbors, XGBRF, CatBoost Classifier.
Key Words: Machine Learning, Polycystic Ovary Syn-
drome, Random Forest, Decision Tree, Support Vector
Classifier, K-Nearest Neighbours, Logistic Regression, K-
Nearest Neighbours.
1. INTRODUCTION
Technology is changing every outlook of our lives making
remarkable transformations in the healthcare industry,
nowadays technology and humans are working hand in
hand. For example, robots performing surgeries once
seemed a fiction but now they are performing critical and
complex surgeries in hospitals.
Machine learning is a subclass of artificial intelligence, it
helps the system learn, identify patterns of datasets, make
logical decisions and performing digital analysis on digital
information including words, numbers, images and clicks.
Machine Learning applications mainly include image recog-
nition, data prediction, Medical Diagnosis – Health Care and
Clinical Care, etc. In this world of technology many advance-
ments are taking place for detection of PCOS and Machine
Learning algorithms are one of them.
PCOS is one of the most widely common endocrinedisorders
that affects 1 in 10 women of childbearing age. The exact
prevalence of PCOS is not known but variable ranging from
2.2% to 26% globally. It was first detailed in 1935 by Stein
and Leventhal as a syndrome manifested by hirsutism, and
obesity associated with enlarged polycystic ovaries.Woman
in reproductive age 15-40 experience hormonal imbalance,
hence PCOS can happen at any age after puberty.
Hormones needed are progesterone, luteinizing hormone
(LH), estrogen and follicle stimulating hormone (FSH). The
common symptoms of PCOS are irregular menstrual cycle,
too much hair, acne, weight gain, darkeningofskin,skintags.
There is a high risk of first trimester miscarriage, in ovaries
inappropriate growth of follicle can be prevented by detect-
ing PCOS at an early stage. Hence detection of PCOS is im-
portant at primary stage. This paper focusesonpredictionof
PCOS.
The main work includes:
1. Selection of most important attributes using feature se-
lection method from the dataset.
2. Applying/Performing machine learning algorithms on
the selected features.
3. Comparing the performed algorithms in order to check
accuracy.
1.1 Literature Review
Over 10 million young generation has been affected globally
with 1 in every 4 four young women having PCOS. The dis-
ease is more common in urban population than rural be-
cause of the lifestyle. The increase in number of PCOS wom-
en is directly correlated with the sedentary lifestyleandlack
of nutritional food, lack of exercise, weight gain and obesity.
Table -1: Summary of Literature review
AUTHORS OBJECTIVES RESEARCH
DESIGN
RESULTS
Palak et al.
[2012]
A method to
automate
PCOS based
on clinical
and meta-
bolic mark-
ers.
Classification
of features
based on
Bayesian and
Logistic Re-
gression.
Among the
two com-
pared models
the best
model built is
Bayesian
classifier
with accura-
cy 93.93%.
Purnama et Detection of Three classi- On C=40
PCOS Detect using Machine Learning Algorithms
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1215
al. [2015] Follicles
based on
USG images
by using the
binary folli-
cle images,
feature ex-
traction and
segmenta-
tion
fication sce-
narios were
designed
Neural Net-
work - LVQ,
KNN - Euclid-
ean distance,
SVM - RBF
kernel.
SVM-RBF
kernel
achieved
82% accura-
cy and on
K=5 KNN
achieved
78% accura-
cy.
Denny et al.
[2019]
To over-
come the
time and
cost in-
volved in
various clin-
ical testsand
ovary scan-
ning.
PCOSfeatures
transformed
with PCA
used machine
learning algo-
rithms like
KNN, SVM,
RF, etc.
The best and
accurate
model for the
PCOS detec-
tion came out
Random For-
est with 0.89
acc
Subrato et
al.
[2020]
Data driven
diagnosis of
PCOS using
dataset on
Kaggle re-
pository.
Classifiers
used are as
follows gra-
dient boost-
ing, random
forest,logistic
regression,
RFLR and
methods ap-
plied are
holdout and
cross valida-
tion.
The best test-
ing accuracy
obtained is of
RFLR
91.01%, re-
call value
90%.
Ning-Ning
Xie et al.
[2020]
To identify
gene bi-
omarkers
and build
diagnostic
model
Computa-
tional method
applied by
combining
two machine
learning algo-
rithms such
as ANN, and
Random For-
est
A novel diag-
nostic model
developed
with accura-
cy of AUC:
0.7273 in
microarray
dataset and
0.6488 in
RNA-seq da-
taset.
Priyanka et
al.
[2020]
Classifica-
tion of PCOS
will use
physical
symptoms
and sono-
grams in
which only
the physical
symptoms
will be pre-
sented.
Used differ-
ent algo-
rithms like K-
star, IB1 in-
stance-based,
locally
weighted
learning, De-
cision Table,
M5 rules, Ze-
ro R, Random
Forest and
Random Tree
to classify
and find best
Among dif-
ferent algo-
rithms per-
formed K-
star outper-
formed.
model.
Namarat
Tanwani
A model is
built using
the causes
and symp-
toms of
PCOS as in-
puts and the
output is
predicted as
presence or
absence or
PCOS.
Machine
learning su-
pervised clas-
sification al-
gorithms
used are K-
NN and Lo-
gistic Regres-
sion.
The best ac-
curate model
built is Lo-
gistic Regres-
sion with
accuracy
92%.
Madhumith
a et al.
[2021]
Ovary
details are
large range
of follicles,
type of cysts,
follicle size,
using image
segmenta-
tion
Based on pre-
processing
and morpho-
logical opera-
tions SVM,
KNN and Lo-
gistic Regres-
sion were
used.
All three al-
gorithms
were com-
bined and
hybrid model
were made
and 0.98 ac-
curacy were
achieved
Pijush et al.
[2021]
Detection
and preven-
tion of this
disease as
early as pos-
sible.
Used SMOTE
and five other
algorithms
such as Lo-
gistic Regres-
sion, Random
Forest, Deci-
sion Tree
Support vec-
tor machine
and K-NN
together for
early detec-
tion of PCOS.
The best
model
achieved ac-
curacy,
Training
time: 97.11,
F1 score:
0.010sec,
Recall: 98%,
Precision:
98% and
AUROC:
95.6%
Khan Inan
et al. [2021]
Conducting
a probabilis-
tic approach
to select
statistically
relevant
features
which con-
tribute to
PCOS in-
stances.
SMOTE, ENN
and ANOVA
Test, Chi-
Square Test
were used to
identify im-
portant fea-
tures. Classi-
fiers such as
XG Boost,
SVM, KNN,
NB, MLP, RF,
AdaB were
used.
G Boost out-
performedall
other classi-
fiers with
0.96 accuracy
and 0.98 Re-
call.
2. Methodology
Development of machine learning model to train the dataset
is an important step for successful implementation. The da-
taset contains attributes such as I beta-HCG (mIU/mL), II
beta-HCG (mIU/mL), AMH (ng/mL), Age(yrs), Weight (Kg),
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1216
Height (Cm), BMI, Blood Group, Pulse rate(bpm), RR
(breaths/min), Fast food, Reg Exercise,BP_Systolic (mmHg),
etc.
Fig -1: Block diagram of the system
1. Defining the Problem
The first most important step in is to define the problem
by including the inputs provided in the model and the
expected output of the model.
2. Data Collection
This is the crucial step of collecting data. How good the
model will perform, the accuracy that we will get de-
pends on the dataset. We can collect data from various
platforms such as Kaggle, UCI Repository, BuzzFeed
News, etc. We performed research on the dataset ob-
tained on Kaggle named Polycystic ovary syndrome
(PCOS).
3. Selection of implementation Platform
For this machine learning implementation, the platform
used is Jupyter Notebook, language used – Python.
4. Data Preparation
When appropriate data is identified, the data should be
shaped in order to train the model. The data obtained
will be in csv format in python. Visualize the data and
check the correlations between different characteristics.
In this step checking for missing values or incomplete
records, aggregation,augmentation,normalization,label-
ling, structured, unstructured and semi- structured data
these activities are performed. This dataset contains the
women patients in which they are suffering from PCOS.
Further the steps to be performed in Data Preprocessing
are:
i. Data Cleaning
It is the process of identifying the incorrect, incomplete
or missing part of the data and then modifying, replacing
or deleting them. In our paper the dataset was checked
for missing values first by using the PandasandNumpy2.
ii. Data Labeling
It is a method of identifying raw data i.e videos,
images, text files, etc and add informative tags to provide
context to increase the significance of machine learning
model. The non-numerical are transformed into
numerical values.
iii. Feature Selection
Feature Selection is an important step in which most
relevant features are extracted from the datasetandthen
machine learning algorithms are applied for the better
performance of model. It has a goal to find the best
possible features for building the model ignoring the
irrelevant details. Feature selection can be performed
with common techniques including Filter methods,
Wrapper methods and Embedded methods.
2.1 Modelling
When the data is completely cleaned and selected, it is
ready to be processed by the algorithms. The algorithms
used to create the model are Random Forest, Decision
Tree, Support Vector Classifier, Logistic Regression, K-
Nearest neighbors, XGBRF, CatBoost Classifier.
Random Forest
Random Forest is a kind of supervised machine learning
algorithm used for both Classification and Regression. Its
builds multiple decision trees and merges them together
to get a more accurate and stable prediction.
Decision Tree
Decision Tree is of type supervised learning algorithm,
graphically represented for getting all possible solutions
to a problem based on given conditions. It used CART al-
gorithm which stands for Classification and Regression
Tree algorithm.
Support Vector Classifier (SVC)
SVC is to fit the data provided returning a best fit that di-
vides or categorizes the data. The data points are closer
to the hyperplane and causes change in position and ori-
entation of the given hyperplane.
Logistic Regression
Logistic Regression is a type of supervisedLearningtech-
nique used for solving the classification problems. It is a
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1217
machine learning algorithm used for predicting the cate-
gorical dependent variable using a given set ofindepend-
ent variables and the cost function is limited between
values 0 and 1.
K Nearest Neighbor (KNN)
K Nearest Neighbor also known as lazy learneralgorithm
is a supervised Learning algorithms used for both classi-
fication and regression. Instead of instantly learning the
dataset, it first stores the dataset and then at the time of
classification performs action on the given dataset.
XGBRF
XGBoost with Random Forest (XGBRF) is an ensemble
method used for classification of PCOS. XGBoost is a gra-
dient boosting algorithm and Random Forest is an exam-
ple of bagging algorithm. XGBRF is a modified version of
XGBoost classifier. The advantage of XGBRF is it is used
to overcome the problem of over-fitting.
CatBoost Classifier
Categorical Boosting CatBoost or is an open-source
boosting library used for regression and classification.It
works with multiple categories of data, including audio,
text and image including historical data. The technique
used in this algorithm is to perform conversion from cat-
egorical values into numbers using different types ofsta-
tistics on combinations of categorical features and com-
binations of categorical and numerical features.
Cross Validation
Cross-Validation in machine learning is a technique for
validating the model efficiency in which model is trained
using the subset of the dataset and after training the
model is evaluated using the complementary subset of
the dataset.
Fig -2: Correlation between Features
Results and Discussion
The experimentation is performed on the dataset using
various machine learning algorithms. The mainobjective
is to find most suitable algorithm for the classification of
the dataset created. The algorithms used to constructthe
model are Decision Tree, SVC, Random Forest, Logistic
Regression, K Nearest Neighbor, XGBRF and CatBoost
Classifier.
Table -1: Accuracy of Different Classifier Models
Models Accuracy
Decision Tree 82.79
SVC 69.05
Random Forest 89.42
Logistic Regression 83.32
K Nearest Neighbors 74.34
XGBRF 85.89
CatBoost Classifier 92.64
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1218
Fig -3: Accuracy of different Classifier Models
The accuracies obtained by different algorithms are: Deci-
sion Tree – 82.79%, SVC – 69.05%,RandomForest– 89.42%,
Logistic Regression – 83.32%, K-Nearest Neighbors –
74.34%, XBRF – 85.89%, CatBoostClassifier – 92.64%.
Therefore, from the above results, conclusion is CatBoost-
Classifier has outperformed and obtained highest accuracy.
3. CONCLUSIONS
In this paper, Machine Learning model is successfully built
and trained for early detection of PCOS. PCOS is one of the
very common condition in women associated with psycho-
logical, reproductive and metabolic features. Some day-to-
day activities to decrease the effects of PCOS are maintain a
healthy weight, limit carbohydrates, be active, exercisedaily
and eat healthy food. The system in this paper helps in early
detection of PCOS from an optimal and minimal set of pa-
rameters which have been statistically analyzed. Among the
various algorithmsusedCatBoostClassifierisfoundsuperior
in performance. This model can be used by doctors for early
screening and diagnosing patients who are likely to develop
this disorder. Therefore, with the use of various machine
learning techniques we have built a model to detect PCOS at
an early stage.
REFERENCES
[1] Palak Mehrotra, Jyotirmoy, Chatterjee, Chandan
Chakraborty, “AutomatedScreeningofPolycystic Ovary
Syndrome using Machine Learning Techniques”, IEEE,
2012.
[2] Bedy Purnama, Untari Novia Wisesti, Adiwijaya,Fhira
Nhita, Andini Gayatri, Titik Mutiah, “A Classification of
Polycystic Ovary Syndrome Based on FollicleDetec-tion
of Ultrasound Images, 2015 3rd International Confer-
ence on Information and Communication Tech- nology
(ICoICT).
[3] Amsy Denny, Anita Raj, Ashi Ashok, Maneesh Ram C,
Remya George, “i-HOPE: Detection And Prediction Sys-
tem For Polycystic Ovary Syndrome (PCOS) Using Ma-
chine Learning Techniques”, 2019 IEEE Region 10 Con-
ference (TENCON 2019).
[4] Subrato Bharati, Prajoy Podder, M. Rubaiyat Hossain
Mondal, “Diagnosis of Polycystic OvarySyndromeUs-
ing Machine Learning Algorithms”. 2020 IEEE Region
10 Symposium (TENSYMP), 5-7 June 2020, Dhaka,
Bangladesh.
[5] Ning-Ning Xie, Fang-Fang Wang, Jue Zhou, Chang Liu,
Fan Qu, “Establishment and Analysis of a Combined
Diagnostic Model of Polycystic Ovary Syndrome with
Random Forest and Artificial Neural Network”,
Hindawi BioMed Research International Volume
2020.
[6] Priyanka R. Lele, Anuradha D. Thakare, “Comparative
Analysis of Classifiers for Polycystic OvarySyndromeDe-
tection using Various Statistical Measures”,International
Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181: Vol. 9 Issue 03, March-2020.
[7] Namrata Tanwani, “Detecting PCOS using Machine
Learning”, IJMTES | International Journal of Modern
Trends in Engineering and Science ISSN: 2348-3121,
Volume:07 Issue:01 2020.
[8] J. Madhumitha, M. Kalaiyarasi, S. Sakthiya Ram, “Au-
tomated Polycystic Ovarian Syndrome Identification
with Follicle Recognition”, 2021 3rd International Con-
ference on Signal Processing and Communication
[9] Pijush Dutta, Shobhandeb Paul, Madhurima Majum-
der, “An Efficient SMOTE Based Machine Learning
classification for Prediction & Detection of PCOS”,Re-
search Square, November 8th, 2021.
[10] Muhammad Sakib Khan Inan, Rubaiath E Ulfath,
Fahim Irfan Alam, Fateha Khanam Bappee, Rizwan
Hasan, “Improved Sampling and Feature Selection to
Support Extreme Gradient Boosting forPCOSDiagno-
sis.

More Related Content

PPTX
Adaptive neural network controller Presentation
PDF
summer training report on python
PPTX
Data science in health care
PDF
Stochastic gradient descent and its tuning
PPTX
C++ Overview PPT
PPTX
While , For , Do-While Loop
PDF
Handwritten digits recognition report
PPTX
Static keyword ppt
Adaptive neural network controller Presentation
summer training report on python
Data science in health care
Stochastic gradient descent and its tuning
C++ Overview PPT
While , For , Do-While Loop
Handwritten digits recognition report
Static keyword ppt

What's hot (20)

PDF
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
PDF
Support Vector Machines ( SVM )
PPTX
Stochastic Gradient Decent (SGD).pptx
PPTX
Drowsiness Detection using machine learning (1).pptx
PPTX
Types of loops in c language
PPTX
Final ppt
PPT
Machine Learning Ch 1.ppt
PPTX
Pattern recognition
PPTX
Traffic Data Analysis and Prediction using Big Data
PPT
Artificial neural network
PDF
Ai lecture 06(unit-02)
PPTX
Conditional Statement in C Language
PPTX
Support Vector Machine ppt presentation
PDF
Bias and variance trade off
PDF
Artificial Intelligence with Python | Edureka
PPTX
Machine Learning-Linear regression
PPTX
“ADAS in Action (POC Autonomous Driving Vehicle Presentation)”
PPTX
Lecture_6_Chapter_1_Lesson_1.3-Lesson-1.4.pptx
PPTX
Traffic Violation System
PPTX
Spam email detection using machine learning PPT.pptx
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
Support Vector Machines ( SVM )
Stochastic Gradient Decent (SGD).pptx
Drowsiness Detection using machine learning (1).pptx
Types of loops in c language
Final ppt
Machine Learning Ch 1.ppt
Pattern recognition
Traffic Data Analysis and Prediction using Big Data
Artificial neural network
Ai lecture 06(unit-02)
Conditional Statement in C Language
Support Vector Machine ppt presentation
Bias and variance trade off
Artificial Intelligence with Python | Edureka
Machine Learning-Linear regression
“ADAS in Action (POC Autonomous Driving Vehicle Presentation)”
Lecture_6_Chapter_1_Lesson_1.3-Lesson-1.4.pptx
Traffic Violation System
Spam email detection using machine learning PPT.pptx
Ad

Similar to PCOS Detect using Machine Learning Algorithms (20)

PDF
“Detection of Diseases using Machine Learning”
PDF
IRJET- Diabetes Diagnosis using Machine Learning Algorithms
PDF
IRJET - Machine Learning for Diagnosis of Diabetes
PDF
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
PDF
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
PDF
A Machine Learning Model for Diabetes
PDF
Cervical Cancer Analysis
PDF
Breast Cancer Detection Using Machine Learning
PDF
ENVIRONMENTAL QUALITY PREDICTION AND ITS DEPLOYMENT
PDF
PARKINSON’S DISEASE DETECTION USING MACHINE LEARNING
PDF
IRJET- Breast Cancer Prediction using Deep Learning
PDF
DIABETES PROGNOSTICATION UTILIZING MACHINE LEARNING
PDF
Multi Disease Detection using Deep Learning
PDF
Comparative Analysis of Various Algorithms for Fetal Risk Prediction
PDF
Heart Disease Prediction using Data Mining
PDF
IRJET- Predicting Diabetes Disease using Effective Classification Techniques
PDF
Predictions And Analytics In Healthcare: Advancements In Machine Learning
PDF
An Innovative Deep Learning Framework Integrating Transfer- Learning And Extr...
PDF
Breast Cancer Prediction
PDF
Precision medicine in hepatology: harnessing IoT and machine learning for per...
“Detection of Diseases using Machine Learning”
IRJET- Diabetes Diagnosis using Machine Learning Algorithms
IRJET - Machine Learning for Diagnosis of Diabetes
IRJET - Deep Multiple Instance Learning for Automatic Detection of Diabetic R...
LIFE EXPECTANCY PREDICTION FOR POST THORACIC SURGERY
A Machine Learning Model for Diabetes
Cervical Cancer Analysis
Breast Cancer Detection Using Machine Learning
ENVIRONMENTAL QUALITY PREDICTION AND ITS DEPLOYMENT
PARKINSON’S DISEASE DETECTION USING MACHINE LEARNING
IRJET- Breast Cancer Prediction using Deep Learning
DIABETES PROGNOSTICATION UTILIZING MACHINE LEARNING
Multi Disease Detection using Deep Learning
Comparative Analysis of Various Algorithms for Fetal Risk Prediction
Heart Disease Prediction using Data Mining
IRJET- Predicting Diabetes Disease using Effective Classification Techniques
Predictions And Analytics In Healthcare: Advancements In Machine Learning
An Innovative Deep Learning Framework Integrating Transfer- Learning And Extr...
Breast Cancer Prediction
Precision medicine in hepatology: harnessing IoT and machine learning for per...
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Sustainable Sites - Green Building Construction
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Welding lecture in detail for understanding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
composite construction of structures.pdf
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Construction Project Organization Group 2.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Well-logging-methods_new................
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Sustainable Sites - Green Building Construction
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Welding lecture in detail for understanding
bas. eng. economics group 4 presentation 1.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
composite construction of structures.pdf
Lecture Notes Electrical Wiring System Components
Embodied AI: Ushering in the Next Era of Intelligent Systems
Construction Project Organization Group 2.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Digital Logic Computer Design lecture notes
Well-logging-methods_new................

PCOS Detect using Machine Learning Algorithms

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1214 Kinjal Raut 1, Chaitrali Katkar2, Prof. Dr. Mrs. Suhasini A. Itkar 3 1Final Year Computer Engineering Student, PES Modern College of Engineering, Pune 2Final Year Computer Engineering Student, PES Modern College of Engineering, Pune 3Professor, Dept. of Computer Engineering, PES Modern College of Engineering, Pune, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Polycystic ovary syndrome (PCOS), also known as polycystic ovarian syndrome, is hormonal endocrine disorder among women of reproductive age. Over five million women worldwide in their reproductive age are suffering from PCOS. The most common symptoms of this disorder may include missed periods, irregular periods, or very light periods, it af- fects in a way that ovaries become large or may contain many cysts, it can also cause excess body hair, including the chest, stomach, and hirsutism, can cause weight gain, especially around the abdomen, Acne or oily skin. The exact pathophysi- ology of PCOS is not yet known. This heterogenous disorder is characterized by the ovaries mainly. PCOS is a multifactorial and polygenic condition. Machine Learning is capable of "learning" features from very large amount through clinical practice to diagnose this disorder. This paper put forwards a solution to this problem which helps in early detection and prediction of PCOS treatment fromanoptimalandminimal set of parameters which have been statistically analyzed. The so- lution is built using machine learning algorithms such asRan- dom Forest, Decision Tree, Support Vector Classifier, Logistic Regression, K-Nearest neighbors, XGBRF, CatBoost Classifier. Key Words: Machine Learning, Polycystic Ovary Syn- drome, Random Forest, Decision Tree, Support Vector Classifier, K-Nearest Neighbours, Logistic Regression, K- Nearest Neighbours. 1. INTRODUCTION Technology is changing every outlook of our lives making remarkable transformations in the healthcare industry, nowadays technology and humans are working hand in hand. For example, robots performing surgeries once seemed a fiction but now they are performing critical and complex surgeries in hospitals. Machine learning is a subclass of artificial intelligence, it helps the system learn, identify patterns of datasets, make logical decisions and performing digital analysis on digital information including words, numbers, images and clicks. Machine Learning applications mainly include image recog- nition, data prediction, Medical Diagnosis – Health Care and Clinical Care, etc. In this world of technology many advance- ments are taking place for detection of PCOS and Machine Learning algorithms are one of them. PCOS is one of the most widely common endocrinedisorders that affects 1 in 10 women of childbearing age. The exact prevalence of PCOS is not known but variable ranging from 2.2% to 26% globally. It was first detailed in 1935 by Stein and Leventhal as a syndrome manifested by hirsutism, and obesity associated with enlarged polycystic ovaries.Woman in reproductive age 15-40 experience hormonal imbalance, hence PCOS can happen at any age after puberty. Hormones needed are progesterone, luteinizing hormone (LH), estrogen and follicle stimulating hormone (FSH). The common symptoms of PCOS are irregular menstrual cycle, too much hair, acne, weight gain, darkeningofskin,skintags. There is a high risk of first trimester miscarriage, in ovaries inappropriate growth of follicle can be prevented by detect- ing PCOS at an early stage. Hence detection of PCOS is im- portant at primary stage. This paper focusesonpredictionof PCOS. The main work includes: 1. Selection of most important attributes using feature se- lection method from the dataset. 2. Applying/Performing machine learning algorithms on the selected features. 3. Comparing the performed algorithms in order to check accuracy. 1.1 Literature Review Over 10 million young generation has been affected globally with 1 in every 4 four young women having PCOS. The dis- ease is more common in urban population than rural be- cause of the lifestyle. The increase in number of PCOS wom- en is directly correlated with the sedentary lifestyleandlack of nutritional food, lack of exercise, weight gain and obesity. Table -1: Summary of Literature review AUTHORS OBJECTIVES RESEARCH DESIGN RESULTS Palak et al. [2012] A method to automate PCOS based on clinical and meta- bolic mark- ers. Classification of features based on Bayesian and Logistic Re- gression. Among the two com- pared models the best model built is Bayesian classifier with accura- cy 93.93%. Purnama et Detection of Three classi- On C=40 PCOS Detect using Machine Learning Algorithms
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1215 al. [2015] Follicles based on USG images by using the binary folli- cle images, feature ex- traction and segmenta- tion fication sce- narios were designed Neural Net- work - LVQ, KNN - Euclid- ean distance, SVM - RBF kernel. SVM-RBF kernel achieved 82% accura- cy and on K=5 KNN achieved 78% accura- cy. Denny et al. [2019] To over- come the time and cost in- volved in various clin- ical testsand ovary scan- ning. PCOSfeatures transformed with PCA used machine learning algo- rithms like KNN, SVM, RF, etc. The best and accurate model for the PCOS detec- tion came out Random For- est with 0.89 acc Subrato et al. [2020] Data driven diagnosis of PCOS using dataset on Kaggle re- pository. Classifiers used are as follows gra- dient boost- ing, random forest,logistic regression, RFLR and methods ap- plied are holdout and cross valida- tion. The best test- ing accuracy obtained is of RFLR 91.01%, re- call value 90%. Ning-Ning Xie et al. [2020] To identify gene bi- omarkers and build diagnostic model Computa- tional method applied by combining two machine learning algo- rithms such as ANN, and Random For- est A novel diag- nostic model developed with accura- cy of AUC: 0.7273 in microarray dataset and 0.6488 in RNA-seq da- taset. Priyanka et al. [2020] Classifica- tion of PCOS will use physical symptoms and sono- grams in which only the physical symptoms will be pre- sented. Used differ- ent algo- rithms like K- star, IB1 in- stance-based, locally weighted learning, De- cision Table, M5 rules, Ze- ro R, Random Forest and Random Tree to classify and find best Among dif- ferent algo- rithms per- formed K- star outper- formed. model. Namarat Tanwani A model is built using the causes and symp- toms of PCOS as in- puts and the output is predicted as presence or absence or PCOS. Machine learning su- pervised clas- sification al- gorithms used are K- NN and Lo- gistic Regres- sion. The best ac- curate model built is Lo- gistic Regres- sion with accuracy 92%. Madhumith a et al. [2021] Ovary details are large range of follicles, type of cysts, follicle size, using image segmenta- tion Based on pre- processing and morpho- logical opera- tions SVM, KNN and Lo- gistic Regres- sion were used. All three al- gorithms were com- bined and hybrid model were made and 0.98 ac- curacy were achieved Pijush et al. [2021] Detection and preven- tion of this disease as early as pos- sible. Used SMOTE and five other algorithms such as Lo- gistic Regres- sion, Random Forest, Deci- sion Tree Support vec- tor machine and K-NN together for early detec- tion of PCOS. The best model achieved ac- curacy, Training time: 97.11, F1 score: 0.010sec, Recall: 98%, Precision: 98% and AUROC: 95.6% Khan Inan et al. [2021] Conducting a probabilis- tic approach to select statistically relevant features which con- tribute to PCOS in- stances. SMOTE, ENN and ANOVA Test, Chi- Square Test were used to identify im- portant fea- tures. Classi- fiers such as XG Boost, SVM, KNN, NB, MLP, RF, AdaB were used. G Boost out- performedall other classi- fiers with 0.96 accuracy and 0.98 Re- call. 2. Methodology Development of machine learning model to train the dataset is an important step for successful implementation. The da- taset contains attributes such as I beta-HCG (mIU/mL), II beta-HCG (mIU/mL), AMH (ng/mL), Age(yrs), Weight (Kg),
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1216 Height (Cm), BMI, Blood Group, Pulse rate(bpm), RR (breaths/min), Fast food, Reg Exercise,BP_Systolic (mmHg), etc. Fig -1: Block diagram of the system 1. Defining the Problem The first most important step in is to define the problem by including the inputs provided in the model and the expected output of the model. 2. Data Collection This is the crucial step of collecting data. How good the model will perform, the accuracy that we will get de- pends on the dataset. We can collect data from various platforms such as Kaggle, UCI Repository, BuzzFeed News, etc. We performed research on the dataset ob- tained on Kaggle named Polycystic ovary syndrome (PCOS). 3. Selection of implementation Platform For this machine learning implementation, the platform used is Jupyter Notebook, language used – Python. 4. Data Preparation When appropriate data is identified, the data should be shaped in order to train the model. The data obtained will be in csv format in python. Visualize the data and check the correlations between different characteristics. In this step checking for missing values or incomplete records, aggregation,augmentation,normalization,label- ling, structured, unstructured and semi- structured data these activities are performed. This dataset contains the women patients in which they are suffering from PCOS. Further the steps to be performed in Data Preprocessing are: i. Data Cleaning It is the process of identifying the incorrect, incomplete or missing part of the data and then modifying, replacing or deleting them. In our paper the dataset was checked for missing values first by using the PandasandNumpy2. ii. Data Labeling It is a method of identifying raw data i.e videos, images, text files, etc and add informative tags to provide context to increase the significance of machine learning model. The non-numerical are transformed into numerical values. iii. Feature Selection Feature Selection is an important step in which most relevant features are extracted from the datasetandthen machine learning algorithms are applied for the better performance of model. It has a goal to find the best possible features for building the model ignoring the irrelevant details. Feature selection can be performed with common techniques including Filter methods, Wrapper methods and Embedded methods. 2.1 Modelling When the data is completely cleaned and selected, it is ready to be processed by the algorithms. The algorithms used to create the model are Random Forest, Decision Tree, Support Vector Classifier, Logistic Regression, K- Nearest neighbors, XGBRF, CatBoost Classifier. Random Forest Random Forest is a kind of supervised machine learning algorithm used for both Classification and Regression. Its builds multiple decision trees and merges them together to get a more accurate and stable prediction. Decision Tree Decision Tree is of type supervised learning algorithm, graphically represented for getting all possible solutions to a problem based on given conditions. It used CART al- gorithm which stands for Classification and Regression Tree algorithm. Support Vector Classifier (SVC) SVC is to fit the data provided returning a best fit that di- vides or categorizes the data. The data points are closer to the hyperplane and causes change in position and ori- entation of the given hyperplane. Logistic Regression Logistic Regression is a type of supervisedLearningtech- nique used for solving the classification problems. It is a
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1217 machine learning algorithm used for predicting the cate- gorical dependent variable using a given set ofindepend- ent variables and the cost function is limited between values 0 and 1. K Nearest Neighbor (KNN) K Nearest Neighbor also known as lazy learneralgorithm is a supervised Learning algorithms used for both classi- fication and regression. Instead of instantly learning the dataset, it first stores the dataset and then at the time of classification performs action on the given dataset. XGBRF XGBoost with Random Forest (XGBRF) is an ensemble method used for classification of PCOS. XGBoost is a gra- dient boosting algorithm and Random Forest is an exam- ple of bagging algorithm. XGBRF is a modified version of XGBoost classifier. The advantage of XGBRF is it is used to overcome the problem of over-fitting. CatBoost Classifier Categorical Boosting CatBoost or is an open-source boosting library used for regression and classification.It works with multiple categories of data, including audio, text and image including historical data. The technique used in this algorithm is to perform conversion from cat- egorical values into numbers using different types ofsta- tistics on combinations of categorical features and com- binations of categorical and numerical features. Cross Validation Cross-Validation in machine learning is a technique for validating the model efficiency in which model is trained using the subset of the dataset and after training the model is evaluated using the complementary subset of the dataset. Fig -2: Correlation between Features Results and Discussion The experimentation is performed on the dataset using various machine learning algorithms. The mainobjective is to find most suitable algorithm for the classification of the dataset created. The algorithms used to constructthe model are Decision Tree, SVC, Random Forest, Logistic Regression, K Nearest Neighbor, XGBRF and CatBoost Classifier. Table -1: Accuracy of Different Classifier Models Models Accuracy Decision Tree 82.79 SVC 69.05 Random Forest 89.42 Logistic Regression 83.32 K Nearest Neighbors 74.34 XGBRF 85.89 CatBoost Classifier 92.64
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 01 | Jan 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1218 Fig -3: Accuracy of different Classifier Models The accuracies obtained by different algorithms are: Deci- sion Tree – 82.79%, SVC – 69.05%,RandomForest– 89.42%, Logistic Regression – 83.32%, K-Nearest Neighbors – 74.34%, XBRF – 85.89%, CatBoostClassifier – 92.64%. Therefore, from the above results, conclusion is CatBoost- Classifier has outperformed and obtained highest accuracy. 3. CONCLUSIONS In this paper, Machine Learning model is successfully built and trained for early detection of PCOS. PCOS is one of the very common condition in women associated with psycho- logical, reproductive and metabolic features. Some day-to- day activities to decrease the effects of PCOS are maintain a healthy weight, limit carbohydrates, be active, exercisedaily and eat healthy food. The system in this paper helps in early detection of PCOS from an optimal and minimal set of pa- rameters which have been statistically analyzed. Among the various algorithmsusedCatBoostClassifierisfoundsuperior in performance. This model can be used by doctors for early screening and diagnosing patients who are likely to develop this disorder. Therefore, with the use of various machine learning techniques we have built a model to detect PCOS at an early stage. REFERENCES [1] Palak Mehrotra, Jyotirmoy, Chatterjee, Chandan Chakraborty, “AutomatedScreeningofPolycystic Ovary Syndrome using Machine Learning Techniques”, IEEE, 2012. [2] Bedy Purnama, Untari Novia Wisesti, Adiwijaya,Fhira Nhita, Andini Gayatri, Titik Mutiah, “A Classification of Polycystic Ovary Syndrome Based on FollicleDetec-tion of Ultrasound Images, 2015 3rd International Confer- ence on Information and Communication Tech- nology (ICoICT). [3] Amsy Denny, Anita Raj, Ashi Ashok, Maneesh Ram C, Remya George, “i-HOPE: Detection And Prediction Sys- tem For Polycystic Ovary Syndrome (PCOS) Using Ma- chine Learning Techniques”, 2019 IEEE Region 10 Con- ference (TENCON 2019). [4] Subrato Bharati, Prajoy Podder, M. Rubaiyat Hossain Mondal, “Diagnosis of Polycystic OvarySyndromeUs- ing Machine Learning Algorithms”. 2020 IEEE Region 10 Symposium (TENSYMP), 5-7 June 2020, Dhaka, Bangladesh. [5] Ning-Ning Xie, Fang-Fang Wang, Jue Zhou, Chang Liu, Fan Qu, “Establishment and Analysis of a Combined Diagnostic Model of Polycystic Ovary Syndrome with Random Forest and Artificial Neural Network”, Hindawi BioMed Research International Volume 2020. [6] Priyanka R. Lele, Anuradha D. Thakare, “Comparative Analysis of Classifiers for Polycystic OvarySyndromeDe- tection using Various Statistical Measures”,International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181: Vol. 9 Issue 03, March-2020. [7] Namrata Tanwani, “Detecting PCOS using Machine Learning”, IJMTES | International Journal of Modern Trends in Engineering and Science ISSN: 2348-3121, Volume:07 Issue:01 2020. [8] J. Madhumitha, M. Kalaiyarasi, S. Sakthiya Ram, “Au- tomated Polycystic Ovarian Syndrome Identification with Follicle Recognition”, 2021 3rd International Con- ference on Signal Processing and Communication [9] Pijush Dutta, Shobhandeb Paul, Madhurima Majum- der, “An Efficient SMOTE Based Machine Learning classification for Prediction & Detection of PCOS”,Re- search Square, November 8th, 2021. [10] Muhammad Sakib Khan Inan, Rubaiath E Ulfath, Fahim Irfan Alam, Fateha Khanam Bappee, Rizwan Hasan, “Improved Sampling and Feature Selection to Support Extreme Gradient Boosting forPCOSDiagno- sis.