SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 440
Logistic Regression Model for Predicting the Malignancy of Breast
Cancer
Muneeba Ahmed1
1Systems Engineer, Infosys Limited, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - In today's modern environment, recognising
breast cancer is critical. Breast cancer is one of the most
serious tumours that can affect women, and it can be fatal.
Breast cancer is classified into two types: benign (non-
cancerous) and malignant (cancerous). Machine learning is
the process through whicha machinelearns increasinglyonits
own. The ML model is a mathematical technique used in
artificial intelligence. A computer that thinks for itself and
mimics human intelligence is referred to as artificial
intelligence. Just like a human, the computer improves at its
work as it gets "experience." There are several Machine
Learning approaches available for analysing breast cancer
data. This paper describes a Machine Learning model for
diagnosing breast cancer. LogisticRegressionmodelis used for
detecting breast cancer. This algorithm falls under the
category of supervised machine learning.
Key Words: Breast Cancer, Artificial Intelligence,
Machine Learning, Logistic Regression
1. INTRODUCTION
Breast cancer refers to the uncontrolled cell development in
the breast. Both men and women can get breast cancer, but
women are more likely to have it.Breastcancerhasbeen one
of the main causes of female mortality when compared to
other malignancies. Breast cancer symptoms include
changes in the breast's size and form, the thickness of the
tissue around the breast, as well as crust, scales,andredness
of the skin. Changes in environmental variables, hormones,
and lifestyle lead to breast cancer, which raises the risk
factor. The lymphatic vessels allow lymphatic fluid from the
breast to pass through. If the breast containscancerouscells,
they go into the lymphatic vessels andstarttomultiplyin the
lymph nodes. Although many breast cancer patientshave no
symptoms at all, breast cancer is typically discovered after
the beginning of symptoms. To prevent mortality, early
detection of breast cancer is crucial. For the ability to detect
breast cancer in its early stages, earlier therapy is required.
A reliable and efficient diagnostic method that enables
clinicians to differentiate between benign and malignant
breast tumours is required for early identification. For the
current medical issue, the automated identificationof breast
cancer is significant. It is crucial to create an efficient and
reliable diagnostic strategy. Clinical applicationsfacea major
problem with clinical diagnosis. Breast cancer data
classifications can be usedto predicttheoutcomesofspecific
diseases and to determinethegenetic activityoftumours[1].
Numerous methods for estimating breast cancer have been
identified in the last year. During biopsy screening, the
breast tissues are used for the biopsies. Although the testing
yields more trustworthy results, the method for collecting
breast biopsies is incredibly painful and pitiable [2]. The
majority of patients are not interested in this testing as a
result. Since mammography produces 2D projection images
of the breast, it is the most widely used method for
estimating breast cancer. The two most frequently utilised
mammogram techniques are digital mammography and
screen-film mammography [3].Screenfilmmammographyis
used on female breasts that are asymptomatic. It takes
roughly 20 minutes to do a traditional mammogram. Benign
cancer cannot be found with this method. Digital
mammography offers a solution to the screening
mammography problem. It is connected to a computing
equipment since a computer is where digital mammography
data is saved. Digital mammography uses image processing
techniques to enhance the quality of the images that are
recorded. Digital mammography performs better for
incorrectly diagnosed samples.Magneticresonanceimaging,
another common technique, is primarily used to find breast
cancer [4]. The MRI is a challenging procedure. Additionally,
certain malignancies that mammography would have
detected could be missed. In women who have been given a
breast cancer diagnosis, MRI is used to measure the breast's
actual size and spot numerous disorders in the breast.
In the past year, machine learning techniques have been
used more and more in prediction, especially in the field of
medicine [5]. It gives systems the ability to learn from the
past in order to extrapolate intricate insights from massive
data sets. In a variety of clinical settings, these methods are
most frequently employed to identify and classify
malignancies. In order to diagnose and cure breast cancer,
machine learning has been used first and foremost [6].
2. RELATED WORK
This section discusses some of the related research on
machine learning-based breast cancer diagnosis that has
been conducted in the past.
S.Vasundhara, B.V. Kiranmayee, and Chalumuru Suresh [7]
proposed employing several machine learning methods to
classify mammography pictures as benign, malignant, or
normal. A comparison of Support Vector Machines,
Convolutional Neural Networks, and Random Forest is
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 441
performed. According to the simulation results, CNN is the
best classifier since it produces instinctive classification of
digital mammograms utilising filtering and morphological
processes.
Arpita Joshi and Dr.Ashish Mehta [8] compared the
classification results using KNN, SVM, Random Forest, and
Decision Tree (Recursive Partitioning and Conditional
Inference Tree). The dataset utilised was the Wisconsin
Breast Cancer dataset from the UCI repository. According to
simulation findings, KNN was the top classifier, followed by
SVM, Random Forest, and Decision Tree.
Muhammet Fatih Ak [9] used Dr. William H. Walberg's
dataset from the University of Wisconsin Hospital. This
dataset was subjected to data visualisation and machine
learning techniques such as logistic regression, k-nearest
neighbours, support vector machine, naive Bayes, decision
tree, random forest, and rotation forest. R, Minitab, and
Python were chosen to be used for machine learning and
visualisation. All of the procedures were subjected to a
comparative study. The logistic regression model with all
features included produced the greatest classification
accuracy (98.1%), and the proposed approach exhibited an
improvement in accuracy results.
Hiba Asria, Hajar Mousannif, Hassan Al Moatassime, and
Thomas Noel [10] compared the performance of four
machine learning algorithms onthe WisconsinBreastCancer
(original) dataset: Support Vector Machine (SVM), Decision
Tree (C4.5), Naive Bayes (NB), and k Nearest Neighbors (k-
NN). According to the experimental data, SVM has the best
accuracy (97.13%) and the lowest error rate. All
experiments are carried out in a simulation environment
using the WEKA data mining tool.
3. METHODOLOGY
3.1 Data Collection
The dataset is taken from Kaggle. It consists of 569 rowsand
33 columns, the first of which is the ID number and the
second of which is the diagnosis outcome (0-benign and 1-
malignant). The othercolumnsdescribetheshapeandsizeof
the nucleus of the target cancer cell. In a biopsy test, a
sample of cells is obtained from the breast using the Fine
Needle Aspiration (FNA) process. These characteristics are
determined for each cell nucleus by examining it under a
microscope in a pathology laboratory.
Table -1: Description of features of the dataset
Feature Name Feature Description
Radius Average of distance from center to
circumference points
Texture Standard deviation of gray scale vlaue
Perimeter Gross distancebetweenthesnakepoints
Area Total number of pixels on the inside of
the snake along with one half of the
pixels in the circumference
Smoothness Local variance in length of radius,
quantified by calculating the length
difference
Compactness Perimeter ^2/ Area
Concavity Intensity of the contour concave points
Concave
points
The number of contour concavities
Symmetry The difference in the length between
lines perpendicular to the major axis in
both directions to cell boundary
Fractal
Dimension
Coastline estimation. A higher value
leads to a less normal contour
representing a higher risk of
malignancy.
3.2 Data Pre-processing
Data pre-processing is the initial stepthatstartsthemachine
learning process while developing a model. Real-world data
frequently lacks particular attribute values or trends and is
frequently inconsistent, erroneous (contains mistakes and
outliers), incomplete, and inconsistent. This is where data
pre-processing enters the picture; it aids in calming,
organising, and formatting the raw data so that machine
learning models can use it immediately
3.3 Logistic Regression
The generated linear regression hyperplane cannot be
utilised to predict the dependent variable in linear
regression using the independent variable alone. Logistic
regression is therefore employed whentherearecategorical
data. Instead of forecasting anything continuous, logistic
regression forecasts whether something is true or untrue. It
serves as a classification tool. The dependent variable's
independent variable is transformed using the sigmoid
function into a probability expression with a range of 0 to 1.
It is a well-liked Machine Learning technique due to its
capacity to offer probabilities and categorize fresh samples
using continuous and discrete data.
3.4 Training and Testing the Model
The issue of overfitting usuallyoccursduringmodel training.
When a model performs incredibly well on the data we used
to train it but struggles to generalize successfully to new,
unexpected data points, a problem has arisen. The model
performs poorly even when testedonthetrainingsetofdata,
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 442
which is under fitting. Creating several data samples for the
model's training and testing phases is the most popular
method for locating these kinds of problems. After the
analysis, we will use 80% of the data to train the machine.
Using data to guide the machine is referredtoas"training"in
this context. The remaining 20% of the data points are used
to test the machine's performance after the first 80% of the
data were used to train it. To put it another way, we can
measure the amount of specialized process knowledge the
machine picked up.
Fig -1: Training and Testing the model
3.5 Model Evaluation
Fig -2: Checking the accuracy score of the model
The model’s accuracy was found to be 94.94% on training
data and 92.10% on testing data.
3.6 Prediction
After Machine learning model is fit, the model can predict
whether the patient has Malignant type of tumour that is
patient is suffering from cancer or Benign type of tumour
that is patient does not have cancer.
Fig -3: Building a predictive system
Hence, a Logistic Regression Model was implemented to
predict the malignancy of breast cancer.
4. CONCLUSION
Breast cancer is one of the most deadly diseases affecting
women today. It is the most common cause of death in
women. As a result, early identification of this cancer could
save many lives. The effort in this study is to create a
classification with the purpose of detecting breast cancer in
its early stages. We examined Logistic Regression model for
breast cancer detection.
REFERENCES
[1] Joshi, R. Doshi, and J. Patel, “Diagnosis and prognosis
breast cancer using classification rules,” International
Journal of Engineering Research and General Science,
vol. 2, no. 6, pp. 315–323, 2014.
[2] A. M. Ahmad, G. M. Khan, S. A. Mahmud, and J. F. Miller,
“Breast cancer detection using Cartesian genetic
programming evolved artificial neural networks,” in
Proceedings of the 14th annual conference on Genetic
and evolutionary computation, pp. 1031–1038,
Philadelphia Pennsylvania, USA, 2012.
[3] A. T. Azar and S. A. El-Said, “Probabilisticneural network
for breast cancer classification,” Neural Computing and
Applications, vol. 23, no. 6, pp. 1737–1751, 2013.
[4] E. Warner, H. Messersmith, P. Causer, A. Eisen, R.
Shumak, and D. Plewes, “Systematic review: using
magnetic resonance imaging to screen women at high
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 443
risk for breast cancer,” Annals of Internal Medicine, vol.
148, no. 9, pp. 671–679, 2008.
[5] G. R. Kumar, G. Ramachandra, and K. Nagamani, “An
efficient prediction of breast cancer data using data
mining techniques,”International Journal ofInnovations
in Engineering and Technology (IJIET), vol. 2, no. 4, p.
139, 2013.
[6] J. A. Cruz and D. S. Wishart, “Applications of machine
learning in cancer prediction and prognosis,” Cancer
Informatics, Sage Journals, vol. 2, 2006.
[7] S.Vasundhara , B.V. Kiranmayee and Chalumuru Suresh
"Machine Learning Approach for Breast Cancer
Prediction", International Journal of Recent Technology
and Engineering, 2019.
[8] Arpita Joshi and Dr. Ashish Mehta “Comparative
Analysis of Various Machine Leaning Techniques for
Diagnosis of Breast Cancer,” International Journal on
Emerging Technologies, 2017.
[9] Muhammed Fatih Ak “A Comparative Analysis of Breast
Cancer Detection andDiagnosisUsingData Visualization
and Machine Learning Applications”, Healthcare, MDPI,
2020.
[10] Hiba Asria, Hajar Mousannifb, Hasan Al Moatassime,
Thomas Noeld “Using Machine Learning Algorithms for
Breast Cancer Risk Prediction and Diagnosis”, Procedia
Computer Science, Elsevier, 2016.

More Related Content

PDF
Breast Cancer Prediction using Machine Learning
PDF
Breast Cancer Prediction
PDF
IRJET- Diagnosis of Breast Cancer using Decision Tree Models and SVM
PDF
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
PDF
CANCER TUMOR DETECTION USING MACHINE LEARNING
PPTX
DataMining Techniques in BreastCancer.pptx
PDF
A Review on Breast Cancer Detection
PDF
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction
IRJET- Diagnosis of Breast Cancer using Decision Tree Models and SVM
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
CANCER TUMOR DETECTION USING MACHINE LEARNING
DataMining Techniques in BreastCancer.pptx
A Review on Breast Cancer Detection
A Comprehensive Evaluation of Machine Learning Approaches for Breast Cancer C...

Similar to Logistic Regression Model for Predicting the Malignancy of Breast Cancer (20)

PDF
A Comprehensive Survey On Predictive Analysis Of Breast Cancer
PDF
Comparison of breast cancer classification models on Wisconsin dataset
PPTX
Datamining in BreastCancer.pptx
PPTX
Machine Learning - Breast Cancer Diagnosis
PDF
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
PDF
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
PDF
Comparative analysis on bayesian classification for breast cancer problem
PPTX
SET PROJECT PPT.pptx
PDF
IRJET - Survey on Analysis of Breast Cancer Prediction
PPTX
Breast Cancer Detection.pptx
PPTX
Breast Cancer detection.pptx
PDF
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
PDF
My own Machine Learning project - Breast Cancer Prediction
PDF
Predictive modeling for breast cancer based on machine learning algorithms an...
PDF
IRJET- Detection and Classification of Breast Cancer from Mammogram Image
PDF
Analysis of Machine Learning Techniques for Breast Cancer Prediction
PDF
Breast Cancer Detection Using Machine Learning
PDF
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
PDF
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
PDF
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
A Comprehensive Survey On Predictive Analysis Of Breast Cancer
Comparison of breast cancer classification models on Wisconsin dataset
Datamining in BreastCancer.pptx
Machine Learning - Breast Cancer Diagnosis
IRJET - Breast Cancer Prediction using Supervised Machine Learning Algorithms...
Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis
Comparative analysis on bayesian classification for breast cancer problem
SET PROJECT PPT.pptx
IRJET - Survey on Analysis of Breast Cancer Prediction
Breast Cancer Detection.pptx
Breast Cancer detection.pptx
Performance Evaluation using Supervised Learning Algorithms for Breast Cancer...
My own Machine Learning project - Breast Cancer Prediction
Predictive modeling for breast cancer based on machine learning algorithms an...
IRJET- Detection and Classification of Breast Cancer from Mammogram Image
Analysis of Machine Learning Techniques for Breast Cancer Prediction
Breast Cancer Detection Using Machine Learning
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
BREAST TUMOR DETECTION USING EFFICIENT MACHINE LEARNING AND DEEP LEARNING TEC...
Breast Tumor Detection Using Efficient Machine Learning and Deep Learning Tec...
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
additive manufacturing of ss316l using mig welding
PDF
Digital Logic Computer Design lecture notes
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Well-logging-methods_new................
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
Lecture Notes Electrical Wiring System Components
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Welding lecture in detail for understanding
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Arduino robotics embedded978-1-4302-3184-4.pdf
additive manufacturing of ss316l using mig welding
Digital Logic Computer Design lecture notes
OOP with Java - Java Introduction (Basics)
Embodied AI: Ushering in the Next Era of Intelligent Systems
bas. eng. economics group 4 presentation 1.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Foundation to blockchain - A guide to Blockchain Tech
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Well-logging-methods_new................
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Construction Project Organization Group 2.pptx
Lecture Notes Electrical Wiring System Components
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Welding lecture in detail for understanding
Strings in CPP - Strings in C++ are sequences of characters used to store and...

Logistic Regression Model for Predicting the Malignancy of Breast Cancer

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 440 Logistic Regression Model for Predicting the Malignancy of Breast Cancer Muneeba Ahmed1 1Systems Engineer, Infosys Limited, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - In today's modern environment, recognising breast cancer is critical. Breast cancer is one of the most serious tumours that can affect women, and it can be fatal. Breast cancer is classified into two types: benign (non- cancerous) and malignant (cancerous). Machine learning is the process through whicha machinelearns increasinglyonits own. The ML model is a mathematical technique used in artificial intelligence. A computer that thinks for itself and mimics human intelligence is referred to as artificial intelligence. Just like a human, the computer improves at its work as it gets "experience." There are several Machine Learning approaches available for analysing breast cancer data. This paper describes a Machine Learning model for diagnosing breast cancer. LogisticRegressionmodelis used for detecting breast cancer. This algorithm falls under the category of supervised machine learning. Key Words: Breast Cancer, Artificial Intelligence, Machine Learning, Logistic Regression 1. INTRODUCTION Breast cancer refers to the uncontrolled cell development in the breast. Both men and women can get breast cancer, but women are more likely to have it.Breastcancerhasbeen one of the main causes of female mortality when compared to other malignancies. Breast cancer symptoms include changes in the breast's size and form, the thickness of the tissue around the breast, as well as crust, scales,andredness of the skin. Changes in environmental variables, hormones, and lifestyle lead to breast cancer, which raises the risk factor. The lymphatic vessels allow lymphatic fluid from the breast to pass through. If the breast containscancerouscells, they go into the lymphatic vessels andstarttomultiplyin the lymph nodes. Although many breast cancer patientshave no symptoms at all, breast cancer is typically discovered after the beginning of symptoms. To prevent mortality, early detection of breast cancer is crucial. For the ability to detect breast cancer in its early stages, earlier therapy is required. A reliable and efficient diagnostic method that enables clinicians to differentiate between benign and malignant breast tumours is required for early identification. For the current medical issue, the automated identificationof breast cancer is significant. It is crucial to create an efficient and reliable diagnostic strategy. Clinical applicationsfacea major problem with clinical diagnosis. Breast cancer data classifications can be usedto predicttheoutcomesofspecific diseases and to determinethegenetic activityoftumours[1]. Numerous methods for estimating breast cancer have been identified in the last year. During biopsy screening, the breast tissues are used for the biopsies. Although the testing yields more trustworthy results, the method for collecting breast biopsies is incredibly painful and pitiable [2]. The majority of patients are not interested in this testing as a result. Since mammography produces 2D projection images of the breast, it is the most widely used method for estimating breast cancer. The two most frequently utilised mammogram techniques are digital mammography and screen-film mammography [3].Screenfilmmammographyis used on female breasts that are asymptomatic. It takes roughly 20 minutes to do a traditional mammogram. Benign cancer cannot be found with this method. Digital mammography offers a solution to the screening mammography problem. It is connected to a computing equipment since a computer is where digital mammography data is saved. Digital mammography uses image processing techniques to enhance the quality of the images that are recorded. Digital mammography performs better for incorrectly diagnosed samples.Magneticresonanceimaging, another common technique, is primarily used to find breast cancer [4]. The MRI is a challenging procedure. Additionally, certain malignancies that mammography would have detected could be missed. In women who have been given a breast cancer diagnosis, MRI is used to measure the breast's actual size and spot numerous disorders in the breast. In the past year, machine learning techniques have been used more and more in prediction, especially in the field of medicine [5]. It gives systems the ability to learn from the past in order to extrapolate intricate insights from massive data sets. In a variety of clinical settings, these methods are most frequently employed to identify and classify malignancies. In order to diagnose and cure breast cancer, machine learning has been used first and foremost [6]. 2. RELATED WORK This section discusses some of the related research on machine learning-based breast cancer diagnosis that has been conducted in the past. S.Vasundhara, B.V. Kiranmayee, and Chalumuru Suresh [7] proposed employing several machine learning methods to classify mammography pictures as benign, malignant, or normal. A comparison of Support Vector Machines, Convolutional Neural Networks, and Random Forest is
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 441 performed. According to the simulation results, CNN is the best classifier since it produces instinctive classification of digital mammograms utilising filtering and morphological processes. Arpita Joshi and Dr.Ashish Mehta [8] compared the classification results using KNN, SVM, Random Forest, and Decision Tree (Recursive Partitioning and Conditional Inference Tree). The dataset utilised was the Wisconsin Breast Cancer dataset from the UCI repository. According to simulation findings, KNN was the top classifier, followed by SVM, Random Forest, and Decision Tree. Muhammet Fatih Ak [9] used Dr. William H. Walberg's dataset from the University of Wisconsin Hospital. This dataset was subjected to data visualisation and machine learning techniques such as logistic regression, k-nearest neighbours, support vector machine, naive Bayes, decision tree, random forest, and rotation forest. R, Minitab, and Python were chosen to be used for machine learning and visualisation. All of the procedures were subjected to a comparative study. The logistic regression model with all features included produced the greatest classification accuracy (98.1%), and the proposed approach exhibited an improvement in accuracy results. Hiba Asria, Hajar Mousannif, Hassan Al Moatassime, and Thomas Noel [10] compared the performance of four machine learning algorithms onthe WisconsinBreastCancer (original) dataset: Support Vector Machine (SVM), Decision Tree (C4.5), Naive Bayes (NB), and k Nearest Neighbors (k- NN). According to the experimental data, SVM has the best accuracy (97.13%) and the lowest error rate. All experiments are carried out in a simulation environment using the WEKA data mining tool. 3. METHODOLOGY 3.1 Data Collection The dataset is taken from Kaggle. It consists of 569 rowsand 33 columns, the first of which is the ID number and the second of which is the diagnosis outcome (0-benign and 1- malignant). The othercolumnsdescribetheshapeandsizeof the nucleus of the target cancer cell. In a biopsy test, a sample of cells is obtained from the breast using the Fine Needle Aspiration (FNA) process. These characteristics are determined for each cell nucleus by examining it under a microscope in a pathology laboratory. Table -1: Description of features of the dataset Feature Name Feature Description Radius Average of distance from center to circumference points Texture Standard deviation of gray scale vlaue Perimeter Gross distancebetweenthesnakepoints Area Total number of pixels on the inside of the snake along with one half of the pixels in the circumference Smoothness Local variance in length of radius, quantified by calculating the length difference Compactness Perimeter ^2/ Area Concavity Intensity of the contour concave points Concave points The number of contour concavities Symmetry The difference in the length between lines perpendicular to the major axis in both directions to cell boundary Fractal Dimension Coastline estimation. A higher value leads to a less normal contour representing a higher risk of malignancy. 3.2 Data Pre-processing Data pre-processing is the initial stepthatstartsthemachine learning process while developing a model. Real-world data frequently lacks particular attribute values or trends and is frequently inconsistent, erroneous (contains mistakes and outliers), incomplete, and inconsistent. This is where data pre-processing enters the picture; it aids in calming, organising, and formatting the raw data so that machine learning models can use it immediately 3.3 Logistic Regression The generated linear regression hyperplane cannot be utilised to predict the dependent variable in linear regression using the independent variable alone. Logistic regression is therefore employed whentherearecategorical data. Instead of forecasting anything continuous, logistic regression forecasts whether something is true or untrue. It serves as a classification tool. The dependent variable's independent variable is transformed using the sigmoid function into a probability expression with a range of 0 to 1. It is a well-liked Machine Learning technique due to its capacity to offer probabilities and categorize fresh samples using continuous and discrete data. 3.4 Training and Testing the Model The issue of overfitting usuallyoccursduringmodel training. When a model performs incredibly well on the data we used to train it but struggles to generalize successfully to new, unexpected data points, a problem has arisen. The model performs poorly even when testedonthetrainingsetofdata,
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 442 which is under fitting. Creating several data samples for the model's training and testing phases is the most popular method for locating these kinds of problems. After the analysis, we will use 80% of the data to train the machine. Using data to guide the machine is referredtoas"training"in this context. The remaining 20% of the data points are used to test the machine's performance after the first 80% of the data were used to train it. To put it another way, we can measure the amount of specialized process knowledge the machine picked up. Fig -1: Training and Testing the model 3.5 Model Evaluation Fig -2: Checking the accuracy score of the model The model’s accuracy was found to be 94.94% on training data and 92.10% on testing data. 3.6 Prediction After Machine learning model is fit, the model can predict whether the patient has Malignant type of tumour that is patient is suffering from cancer or Benign type of tumour that is patient does not have cancer. Fig -3: Building a predictive system Hence, a Logistic Regression Model was implemented to predict the malignancy of breast cancer. 4. CONCLUSION Breast cancer is one of the most deadly diseases affecting women today. It is the most common cause of death in women. As a result, early identification of this cancer could save many lives. The effort in this study is to create a classification with the purpose of detecting breast cancer in its early stages. We examined Logistic Regression model for breast cancer detection. REFERENCES [1] Joshi, R. Doshi, and J. Patel, “Diagnosis and prognosis breast cancer using classification rules,” International Journal of Engineering Research and General Science, vol. 2, no. 6, pp. 315–323, 2014. [2] A. M. Ahmad, G. M. Khan, S. A. Mahmud, and J. F. Miller, “Breast cancer detection using Cartesian genetic programming evolved artificial neural networks,” in Proceedings of the 14th annual conference on Genetic and evolutionary computation, pp. 1031–1038, Philadelphia Pennsylvania, USA, 2012. [3] A. T. Azar and S. A. El-Said, “Probabilisticneural network for breast cancer classification,” Neural Computing and Applications, vol. 23, no. 6, pp. 1737–1751, 2013. [4] E. Warner, H. Messersmith, P. Causer, A. Eisen, R. Shumak, and D. Plewes, “Systematic review: using magnetic resonance imaging to screen women at high
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 443 risk for breast cancer,” Annals of Internal Medicine, vol. 148, no. 9, pp. 671–679, 2008. [5] G. R. Kumar, G. Ramachandra, and K. Nagamani, “An efficient prediction of breast cancer data using data mining techniques,”International Journal ofInnovations in Engineering and Technology (IJIET), vol. 2, no. 4, p. 139, 2013. [6] J. A. Cruz and D. S. Wishart, “Applications of machine learning in cancer prediction and prognosis,” Cancer Informatics, Sage Journals, vol. 2, 2006. [7] S.Vasundhara , B.V. Kiranmayee and Chalumuru Suresh "Machine Learning Approach for Breast Cancer Prediction", International Journal of Recent Technology and Engineering, 2019. [8] Arpita Joshi and Dr. Ashish Mehta “Comparative Analysis of Various Machine Leaning Techniques for Diagnosis of Breast Cancer,” International Journal on Emerging Technologies, 2017. [9] Muhammed Fatih Ak “A Comparative Analysis of Breast Cancer Detection andDiagnosisUsingData Visualization and Machine Learning Applications”, Healthcare, MDPI, 2020. [10] Hiba Asria, Hajar Mousannifb, Hasan Al Moatassime, Thomas Noeld “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis”, Procedia Computer Science, Elsevier, 2016.