LUNG CANCER PREDICTION MODELING USING ARTIFICIAL NEURAL NETWORK.pptx
1. THESIS ON
LUNG CANCER PREDICTION MODELING USING ARTIFICIAL
NEURAL NETWORK
FOR THE DEGREE OF
MASTER OF TECHNOLOGY
in
COMPUTER SCIENCE ENGINEERING
by
RISHKI TYAGI
(Roll No. 2201320105010)
Under the Supervision of
DR. ARUN KUMAR SINGH
Greater Noida Institute of Technology, Greater Noida
Submitted to
DR. APJ ABDUL KALAM TECHNICAL UNIVERSITY
LUCKNOW
2. Table of Contents
CHAPTER 1
INTRODUCTION
Introduction to Lung Cancer Prediction
Overview of Lung Cancer
Significance of Early Detection
Problem Statement and Research Objectives
Role of ANN
Understanding ANNs
Relevance in Medical Predictive Modeling
Structure of the Thesis
CHAPTER 2
LITERATURE REVIEW
Lung Cancer Overview
Types of Lung Cancer
Risk Factors and Epidemiology
Existing Methods for Lung Cancer Prediction
Traditional Diagnostic Techniques
Imaging Technologies
Strengths and Weaknesses
3. CHAPTER 3
METHODOLOGY
Data Collection and Preprocessing
Dataset Description
Graphical analysis of each attribute
ANN Architecture
Ethical Considerations in Data Usage
Patient Privacy
Informed Consent
CHAPTER 4
RESULTS & DISCUSSION
Results & Discussion
Data Analysis
ANN Modeling
CHAPTER 5
CONCLUSION & FUTURE WORK
Summary of Research Findings
Achievements and Contributions
Recommendations for Further Research
Improved Model Architectures
Larger and Diverse Datasets
5. Introduction to Lung Cancer Prediction
The WHO estimates that there were approximately 2.2 million
new cases of lung cancer and 1.8 million lung cancer-related
deaths in 2020 alone, highlighting the urgency of addressing this
disease [1]. One of the key factors that contribute to the high
mortality rate associated with lung cancer is its often late-stage
diagnosis. Lung cancer is characterized by the uncontrolled
growth of abnormal cells in the lung tissue, and it can be broadly
categorized into two main types: non-small cell lung cancer
(NSCLC) and small cell lung cancer (SCLC). Both types have
different subtypes and varying prognoses, but they share a
common feature: when diagnosed at an advanced stage, they are
associated with limited treatment options and poor survival rates
6. Overview of Lung Cancer
Lung cancer is a devastating disease that continues to be a major global health
concern. Understanding its prevalence, different types, and the impact it has on
individuals and society is crucial for effective prevention, early detection, and
treatment.
Types of Lung Cancer
Lung cancer is not a single disease but rather a complex group of malignancies that
originate in the lung tissue. The two main types of lung cancer are non-small cell lung
cancer (NSCLC) and small cell lung cancer (SCLC), each with distinct characteristics:
• NSCLC
NSCLC accounts for approximately 85% of all lung cancer cases.
It includes several subtypes, such as adenocarcinoma, squamous cell carcinoma, and
large cell carcinoma.
NSCLC typically grows more slowly and is often diagnosed at a later stage.
• SCLC
SCLC is a more aggressive form of lung cancer, representing about 15% of cases.
It is characterized by rapid growth and early metastasis, making it challenging to treat.
8. Significance of Early Detection
Early detection of lung cancer stands as a cornerstone in the
battle against this deadly disease, significantly impacting the
prognosis and survival rates of affected individuals.
Improved Treatment Options
Enhanced Survival Rates
Decreased Treatment Intensity
Lower Healthcare Costs
Psychological and Emotional Well-Being
9. Role of ANN
In the context of addressing the research problem
of early lung cancer prediction, ANNs play a pivotal
role due to their ability to capture complex patterns
in large datasets. This section introduces the role of
ANNs in addressing the research problem and
provides a reference to support their effectiveness.
10. Understanding ANNs
Modeled after the human brain's neural architecture, ANNs have gained widespread popularity
due to their ability to process vast amounts of data and extract meaningful patterns. In
this comprehensive overview, we will delve into the fundamentals of ANNs, ensuring that
readers gain a solid understanding of their structure, functioning, and applications.
The Basics of Artificial Neural Networks
ANN, often referred to as neural networks or simply "neurons," are computational models
inspired by the biological neurons in the human brain. ANNs consist of interconnected
nodes, or "neurons," organized into layers. To grasp the fundamentals of ANNs, let's
explore their key components:
Neurons
Layers
Input Layer
Hidden Layer
Output Layer
Connections and Weights
Activation Function
Bias
11. Relevance in Medical Predictive Modeling
ANNs have emerged as powerful tools in medical predictive
modeling, revolutionizing the healthcare landscape by offering
innovative solutions for disease prediction, diagnosis, and treatment
planning. In this discussion, we delve into the compelling reasons why
ANNs are particularly relevant and effective in the context of medical
predictive modeling, drawing from their unique capabilities,
adaptability, and proven successes in addressing complex healthcare
challenges.
Complexity of Medical Data
Adaptability to Evolving Data Patterns
Modeling Nonlinear Relationships
Handling Missing Data
Integration of Multimodal Data
Proven Successes in Healthcare
12. Structure of the Thesis
This thesis is organized into five distinct chapters, the first chapter serves as an introduction to the
thesis, setting the stage for the entire research endeavor. It provides essential background
information on the topic of lung cancer prediction and highlights the significance of early
detection. The chapter also defines the research problem and objectives, introduces the role of
ANNs in addressing the problem, and formulates the research hypothesis and questions. Finally, it
offers a glimpse into the overall structure of the thesis, guiding readers through the forthcoming
chapters. In Chapter 2, we delve into a comprehensive review of the existing literature related to
lung cancer prediction, artificial neural networks, and their applications in healthcare. This
chapter synthesizes prior research, highlighting key findings, methodologies, and gaps in the field.
It provides readers with a strong foundation of knowledge and insights into the current state of
lung cancer prediction modeling, setting the stage for the novel contributions of this thesis.
Chapter 3 focuses on the crucial aspects of data collection and preprocessing, representing the
initial steps in developing an effective lung cancer prediction model. It outlines the sources and
types of data used in the research, which include clinical records, genetic information, and
medical imaging data. The chapter discusses data cleaning, transformation, and feature selection
techniques to ensure data quality and relevance. Additionally, ethical considerations related to
data privacy and patient consent are addressed. Chapter 4 delves into the heart of the research,
where the ANN model for lung cancer prediction is meticulously designed, developed, and
trained. The fifth and final chapter presents the results of the research, evaluating the ANN-based
lung cancer prediction model's performance.
14. LITERATURE REVIEW
This chapter lays the foundation for our research by
providing a thorough understanding of the current
state of lung cancer prediction, the strengths and
limitations of ANNs, and the ethical considerations
surrounding the use of predictive models in
healthcare.
15. Lung Cancer Overview
Lung cancer is a complex and multifaceted disease with significant global
health implications. In this section, we provide a comprehensive overview
of lung cancer, highlighting its prevalence, various types, and the risk
factors associated with its development.
Types of Lung Cancer
Lung cancer encompasses a diverse group of malignancies that originate in
the lungs. These cancers are broadly categorized into two primary types
NSCLC
SCLC
17. Risk Factors and Epidemiology
Lung cancer is strongly influenced by various risk factors and exhibits notable epidemiological
patterns:
Smoking and Tobacco Use: Cigarette smoking stands as the foremost and most influential risk
factor in the development of lung cancer. The association between smoking and lung cancer is
unequivocal, with an astounding increase in the likelihood of lung cancer among those who
smoke compared to non-smokers. The magnitude of risk associated with smoking is staggering.
Active smokers face a dramatically heightened risk, estimated to be between 15 to 30 times
greater than non-smokers. This elevated risk is intricately linked to the duration and intensity
of smoking. The longer an individual smoke and the greater the number of cigarettes
consumed, the more pronounced their susceptibility to lung cancer becomes. Secondhand
smoke exposure adds another dimension to the smoking-lung cancer connection. Non-
smokers who are regularly exposed to the exhaled smoke from active smokers or the
emissions from burning cigarettes also encounter an increased risk of developing lung cancer.
While this risk is notably lower than that of active smokers, it remains a matter of considerable
public health concern, especially in enclosed or poorly-ventilated spaces. Crucially, the act of
quitting smoking holds the promise of significantly reducing the risk of lung cancer. Although
some residual risk may persist after smoking cessation, it gradually diminishes over the years
of tobacco abstinence. Smoking cessation, therefore, emerges as a pivotal step in lowering an
individual's probability of developing lung cancer and, more broadly, in enhancing overall
health and well-being.
Recognizing the profound impact of smoking on lung cancer risk serves as a powerful
motivator for individuals to embark on the journey to quit smoking, and for society to
implement policies aimed at reducing smoking rates and safeguarding non-smokers from the
harmful effects of secondhand smoke exposure.
20. Existing Methods for Lung Cancer Prediction
This section delves into the existing methods and techniques employed for
lung cancer prediction. The focus is on traditional diagnostic techniques and
advanced imaging technologies, both of which play crucial roles in
identifying and diagnosing lung cancer cases. The detailed information
provided is complemented by tables for clarity and reference.
Traditional Diagnostic Techniques
Traditional diagnostic techniques for lung cancer involve a range of
procedures aimed at detecting the presence of cancer cells or tumors within
the lungs. These techniques are fundamental in diagnosing lung cancer and
guiding treatment decisions. Below are some traditional diagnostic methods
along with relevant details in tabular form
22. Imaging Technologies
Imaging technologies play a pivotal role in lung cancer diagnosis,
allowing for non-invasive visualization of the lungs and detection of
abnormalities. Here, we explore various imaging techniques used in
lung cancer prediction, along with their respective advantages and
limitations
25. METHODOLOGY
Data Collection and Preprocessing
Dataset Description
The dataset encompasses a diverse group of 300 patients, meticulously
compiled from www.dataworld.com. The dataset titled "survey lung
cancer.csv" presents a comprehensive collection of data focused on
identifying various factors and their potential association with lung
cancer. Comprising a total of 300 entries, the dataset is structured into
several columns, each denoting a unique attribute relevant to the study.
At the core of the dataset are two fundamental types of data:
categorical and numerical. The sole numerical attribute is 'AGE',
indicating the age of each individual. This quantitative measure varies
across the dataset, providing a crucial demographic perspective for the
analysis.
26. Graphical analysis of each attribute
Age Distribution: The 'AGE' attribute shows a mean age of 62.7 years with a standard
deviation of 8.28 years, indicative of a primarily older adult population. The age range spans
from 21 to 87 years, covering a wide spectrum of adult life stages. This wide age range is
essential for examining the impact of age on lung cancer risk and symptoms
27. ANN Architecture
The fundamental building blocks of an ANN are artificial
neurons, also known as perceptrons. Each perceptron takes
multiple inputs, applies weights to these inputs, sums them
up, and then passes the result through an activation function.
The weights represent the strength of connections between
neurons, and they are learned during the training process,
allowing the network to adapt and improve its performance
over time. The activation function introduces non-linearity
into the model, enabling ANNs to capture complex
relationships within the data.
29. Ethical Considerations in Data Usage
Patient Privacy
In our study on lung cancer prediction using artificial neural networks,
patient privacy remains a top priority, especially considering that the
dataset was sourced from www.dataworld.com. To ensure the
confidentiality and security of the information, we have implemented
stringent measures. All data obtained is anonymized; personal identifiers
are removed to maintain the anonymity of the patients. This step is crucial
in safeguarding individual privacy. Additionally, our handling of this data
strictly adheres to relevant privacy regulations. We have established
protocols to ensure that all patient information is securely stored and
accessed solely by authorized personnel for the purpose of this research.
31. Informed Consent
Given that the dataset for our research was acquired from an external
source like www.dataworld.com, the aspect of informed consent takes on
a different dimension. In this scenario, it is crucial to verify that the
original data collection process involved obtaining informed consent from
all participants. This involves ensuring that the data provider,
www.dataworld.com, had a clear and ethical process in place for
informing patients about the use of their data, their rights, and the
purpose of the data collection
33. Results & Discussion
Data Analysis
Total 300 patient’s data sets are used in this study which appears to be
related to health and lifestyle factors, specifically indicators that might be
associated with lung cancer. In this data, 250 patient’s data are used for
training purpose of ANN model and remaining 50 patient’s data is used for
testing purpose. This data set are provided in Appendix-I.
This dataset comprises various attributes potentially linked to lung cancer
risk. Key columns include 'GENDER,' 'AGE,' 'SMOKING,'
'YELLOW_FINGERS,' 'ANXIETY,' 'WHEEZING,' 'ALCOHOL CONSUMING,'
'COUGHING,' 'SHORTNESS OF BREATH,' 'SWALLOWING DIFFICULTY,' 'CHEST
PAIN,' and the outcome variable 'LUNG_CANCER.' The data is be encoded
numerically, likely for ease of analysis, where categorical variables have
been transformed into numerical codes.
35. • Strong Positive Correlations
Strong positive correlations (coefficients close to 1) suggest that when one factor
increases, the other tends to increase as well. For instance, if 'SMOKING' and
'COUGHING' show a strong positive correlation, it implies that heavier smokers are
likely to cough more. This could be a critical insight in medical research, indicating
that smoking exacerbates or is associated with symptoms like coughing.
• Strong Negative Correlations
Strong negative correlations (coefficients close to -1) indicate that an increase in
one factor is associated with a decrease in another. If 'SMOKING' and 'ANXIETY'
show a strong negative correlation, it might imply that in this specific dataset,
higher smoking is associated with lower anxiety levels. This could be
counterintuitive and suggests an area for further investigation, perhaps exploring
psychological factors or stress relief methods associated with smoking.
• Weak or No Correlations
Correlations around 0 suggest no significant relationship between the factors. This
could mean that changes in one variable do not predict changes in the other. For
instance, if 'AGE' and 'YELLOW_FINGERS' have a correlation close to 0, age might
not be a good predictor of yellowing fingers among the participants. This might
indicate that other factors, not age, are more relevant in explaining why some
individuals have yellow fingers. When interpreting a heatmap, it's essential to
consider the context. In the case of health-related data, correlations do not imply
causation. Heatmaps in medical research can guide hypotheses and further studies
but must be carefully interpreted.
36. ANN Modelling
Training an ANN model as shown in Fig. 4.2 for lung
cancer prediction of 250 patients’ data is a crucial
endeavor in the field of medical research and
healthcare. In this pursuit, achieving a low Mean
Absolute Error (MAE) is a significant milestone, as it
signifies the accuracy of the model's predictions. In
this work we consider two input layers with 5 hidden
neurons.
40. CONCLUSION & FUTURE WORK
Summary of Research Findings
In this extensive research focused on lung cancer prediction
using ANN, we have made significant strides in developing a
highly accurate and robust model for early diagnosis. Our
study aimed to address this challenge by harnessing the
power of ANNs, which are well-suited for complex pattern
recognition tasks.
Our research yielded several key findings and results that
underscore the model's efficacy and adequacy in the context
of lung cancer prediction.
41. Achievements and Contributions
Our research on lung cancer prediction using ANN has made noteworthy
achievements and contributions to the field of medical AI, particularly in the context
of lung cancer diagnosis. The primary accomplishment lies in the development of an
ANN model that exhibits a high degree of accuracy, as indicated by its low MAE and
MSE on both training and testing data. This level of accuracy demonstrates the
model's proficiency in capturing intricate patterns within the data, a crucial aspect in
improving diagnostic precision. Additionally, the model's ability to generalize its
predictions to previously unseen cases signifies its robustness and adaptability,
extending its potential clinical utility to diverse patient populations. Our research
places a strong emphasis on the clinical significance of the model's predictions,
emphasizing their actionable nature for healthcare providers in early diagnosis and
treatment planning for lung cancer patients, ultimately contributing to enhanced
patient outcomes. Furthermore, our commitment to ethical considerations, including
bias mitigation and fairness in predictions, ensures the responsible development and
deployment of AI in healthcare. By advocating for continual monitoring and
adaptation of the model, we address the dynamic nature of medical data and
knowledge, guaranteeing its ongoing effectiveness.
42. Recommendations for Further Research
As our research in lung cancer prediction using Artificial
Neural Networks (ANNs) advances, there are several
promising directions for further research that can lead to
improved predictive performance and a more comprehensive
understanding of this critical healthcare challenge.
43. Improved Model Architectures
Exploring ensemble learning techniques that combine the
strengths of multiple ANNs can potentially enhance
predictive performance.
Investigating more advanced deep learning architectures,
such as CNNs or RNNs, can be beneficial.
Incorporating attention mechanisms within the model can
help identify and focus on crucial features or regions in
medical images or data, potentially improving both
interpretability and prediction accuracy.
Leveraging pre-trained models in related medical domains
and fine-tuning them for lung cancer prediction can save
computational resources and potentially boost
performance.
44. Larger and Diverse Datasets
• Collaborating with multiple healthcare institutions to obtain data from
various centers can increase dataset size and diversity.
• Collecting longitudinal data that tracks patients' health over time can
enable the development of predictive models that account for disease
progression and changes in patient health status.
• Incorporating unstructured data from clinical notes and reports, such
as radiologist's notes, patient medical histories, and pathology reports,
can provide valuable context and additional features for predictive
models.
• The potential benefits of using larger and more diverse datasets are
numerous. Firstly, they can help reduce overfitting and improve model
generalization by exposing the model to a broader range of cases and
scenarios. Secondly, enhanced data