SlideShare a Scribd company logo
Explainableneuro-fuzzy
recurrentneuralnetworkto
predictcolo-rectalcancerwith
differenttimeframedata
EUSFLAT 2019
Ostrava, Czech Republic
Servio Fernando Lima Reyna, PhD (c ) University of Fribourg, Switzerland
Agenda
 Why this research?
 Proposed neuro-fuzzy system
 Components of the system
 Future work
 Conclusions
 Q&A
2
Why this
research?
 In 2018, 1.8 Million cases worldwide, are attributed to CRC:
Colo rectal cancer (CRC) incidence worldwide
Source:World Health Organization-WHO
3
Why this
research?
Screening:
duplicates and
misclassification
removal
Eligibility: revisión
of title and
abstract
Inclusion: final
selection of
articles
Keywords:cancer AND neuro fuzzy AND colo AND rectal
between 2006 and 2019
ACM
2
Xplore
1
Elsevier
34
Springer
15
695 435 102 52
Too few papers are related to using neuro-fuzzy systems
for analyzing time sensitive EMR (electronic medical records) applied to CRC.
4
Why this
research?
 In past years, it has been seen that even though deep neural
networks (DNN) are accurate in its prediction, the medical
community hasn’t adopted it, because the lack of explainability.
 Moreover it has been shown that CRC prediction is benefit by
analyzing past patient data (e.g. EMR data) [2] .There are
algorithms such as LSTM (Long ShortTerm Memory) that can
take into account this data and find patterns in time. But LSTM still
remains difficult to explain due to its internal complexity.
 We propose to explain LSTM algorithm by means of Fuzzy Logic
which helps to humans:
 In representing the results of DNN in if-then format, easier to
interpret.
 In having traceability of the LSTM decisions
 In adding or modifying fuzzy rules (by humans) that will allow more
explainable predictions.
Rationale of this research
5
Proposed
neuro fuzzy
system
Explainable neuro fuzzy system for
colorectal cancer (ENF-CRC)
6
Data source
 Data was compiled on approximately 4,300 patients, including
patients between 18-49 years old diagnosed with CRC (200
patients), patients between 50-90 years old diagnosed with CRC
(3300 patients), and health controls with the same year of birth as
patients in Group 1 without CRC (800 patients).
 Dataset created Feb 12, 2018
 Dataset provided by
 NYU Health Sciences Library
 Time period covered
 Jan 1, 2011 - Dec 31, 2017
 Area covered
 NewYork
 Source: https://datacatalog.med.nyu.edu/dataset/10164
NYU Langone Health EMR Dataset
7
Components
of the system
 Our source of information are EMR.We are going to split the data in
two: 1. known predictors of CRC (as per Hippisley-Cox J, Coupland
C model, in blue) and 2. yet to know predictors (anything else).
EMR and data pre-processing
Type Purpose Output
Raw data
*Patient: age, gender
Consults: symptoms/diagnosis
Medication: medication type
Referral: specialism
Lab results: measurement type:value
*Diagnosis of specific disease:0/1
Lab value
contextualization
Contextualize raw lab values
to ground values
Lab value: low/normal/high
Lab value: increasing/decreasing/stable
Semantic
enrichment
Add existing knowledge to
compensate for
incompleteness of data
*Consults: comorbidities, diagnosis type
Medication: side effects
Events that co-occur or occur in succession
Table 1: EMR data and its preparation
*Hippisley-Cox J, Coupland C model uses specific data to predictCRC [3] 8
Components
of the system
EMR and data pre-processing: Hazard risk
Source: Hippisley-Cox J, Coupland C model [3]. CI: Confidence Interval. Bowel cancer includes colon and rectal cancer
 Thanks to the Hippisley-Cox J, CouplandC model, we know in
advance the impact that the variables in this model have over CRC.
It uses a score named Hazard Risk or HR.Therefore is up to us to
input them in LSTM.
9
Components
of the system
 Clinical variables are measured at different moments and are
irregularly sampled in time.
 LSTM may have a hard time in distinguishing lengthy and
incomplete sequences of events.
EMR and data pre-processing
 Moreover, temporal pattern mining usually returns a large number of
temporal patterns, most of which may be irrelevant to the
classification task. To address all these problems we can use Batal [1]
pre-processing algorithm. 10
Components
of the system
 Despite all the previous efforts in data pre-processing, we still
may end up with a big number of input features.
 In this situation we are going to use Mann-WhitneyWilcoxon
rank-sum test to reduce the number of input features
considered.This test determine the difference in occurrences of
a feature among CRC and non-CRC cases.We are going to pick up
the features with the lowest p-values or probability of observed
result arising by chance.
EMR and data pre-processing
Sympton A frequency (EMR data)
Patient 1 (non CRC)
Patient 2 (non CRC)
Patient 3 (CRC)
Patient 4 (CRC)
Time
11
Components
of the system
 Once we finish the data pre-processing, we can feed the LSTM
algorithm.
 LSTM has the ability to capture long term relationships in data. It
uses forget gate=1 to remember cell state and f=0 to forget.
LSTM for analyzing EMR
 It has been shown that CRC benefits from analyzing past and present
data of the patient up to six months [2], which is the most
symptomatic phase for CRC.
 But LSTM decisions for CRC are hard to explain because it considers
multiple input variables, each changing in frequency and time.
LSTM network and formulas: it contains: input, output and forget gates
Input
Forget
Output
12
Components
of the system
 We need a method that can assess the impact of each variable
individually (e.g. frequency of occurance of symptoms, lab value) and
its change in time.
 The method that achieves this is IMV-LSTM [6]
Explaining LSTM decisions
13
Components
of the system
 After finding the most relevant variables by IMV-LSTM and considering
that we already have the Hippisley-Cox J, CouplandC variables, we can
use them as input for a fuzzy rule based system (FRBS), that will
generate fuzzy rules automatically.
Explaining LSTM decisions: fuzzy rules
14
Components
of the system
GUAJE is a modeling software tool especially aimed for designing
interpretable fuzzy systems. GLMP is granular linguistic model of
phenomenon and it is usually represented as a hierarchical network of
computational perceptions. rLDCP is a software tool that is aimed at
producing GLMP. GUAJE and rLDCP can be connected in order to build
rule bases that can be embedded into some parts of a GLMP.
Explaining LSTM decisions: GUAJE, GLMP, rLDCP
NATURAL
LANGUAGE:
The patient has CRC
cancer with high
probability
because is white, is a
heavy smoker, very
heavy drinker, has
family history of bowel
cancer and has shown
pattern X in the lastY
months.
GLMP representation for CRC 15
Future work
 Implement the system.
 Change to other predictive models for CRC such as Marshall et
al. [4].
 Incorporate images: fMRI and radiography and use CNN
(Convolutional neural networks) for reading those images.
 Substitute LSTM with Bio-BERT (Bidirectional Encoder
Representations fromTransformers)[5], a pre- trained biomedical
language representation model for biomedical text mining that
has access to PubMed (4.5 billion words) and PMC (13.5 billion
words)
 Add an evaluation mechanism of generated explanations, such
as the Psychological model of explanation proposed by the
Defense Advanced Research Projects Agency (DARPA)
16
Conclusions
 Cancer analysis can be greatly improved fusing deep learning and
fuzzy logic, making the overall system more explainable to
medical doctors.
 A system that allows traceability of decisions and human
modification in case of any bias of the decisions, is a system that
is more prone to be accepted for the medical community for
helping in attending CRC patients.
17
Q&A
18
Thank you
Servio Fernando Lima Reina, PhD student
Human-IST institute, University of Fribourg, Switzerland
servio.lima@unifr.ch
Bibliography
[1] Iyad Batal, HamedValizadegan, Gregory F Cooper, and Milos
Hauskrecht. A temporal pattern mining approach for classifying
electronic health record data. ACMTransactions on Intelligent Systems
andTechnology (TIST), 4(4):63, 2013.
[2] Amirkhan R. et al. Using Recurrent Neural Networks to Predict
Colorectal Cancer among Patients. IEEE Symposium Series on
Computational Intelligence (SSCI). Honolulu, HI, USA 2017
[3] Hippisley-Cox J, Coupland C (2015) Development and validation of
risk prediction algorithms to estimate future risk of common cancers in
men and women: prospective cohort study. BMJ Open 5: e007825.
[4] MarshallT, Lancashire R, Sharp D, PetersTJ, Cheng KK, HamiltonW
(2011)The diagnostic performance of scoring systems to identify
symptomatic colorectal cancer compared to current referral guidance.
Gut 60: 1242–1248.
[5] Lee J. et al. BioBERT: a pre-trained biomedical language
representation model for biomedical text mining. Oxford Press. 2019
[6] GuoT. et al, Exploring Interpretable LSTM Neural Networks over
Multi-Variable Data. Proceedings of the 36 th International Conference
on Machine Learning, Long Beach, California, PMLR 97, 2019.
20

More Related Content

PDF
USING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDY
PDF
Definiens In Digital Pathology Hr
PDF
Breast cancer diagnosis via data mining performance analysis of seven differe...
PDF
Iganfis Data Mining Approach for Forecasting Cancer Threats
PDF
Prediction & Survival Rate Prostate Cancer Patient using Artificial Neural Ne...
PDF
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
PDF
Srge most important publications 2020
PDF
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
USING ARTIFICIAL NEURAL NETWORK IN DIAGNOSIS OF THYROID DISEASE: A CASE STUDY
Definiens In Digital Pathology Hr
Breast cancer diagnosis via data mining performance analysis of seven differe...
Iganfis Data Mining Approach for Forecasting Cancer Threats
Prediction & Survival Rate Prostate Cancer Patient using Artificial Neural Ne...
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Srge most important publications 2020
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...

What's hot (17)

PPTX
Bayesian statistics
PDF
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
PDF
IRJET- Breast Cancer Detection from Histopathology Images: A Review
PDF
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
PDF
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
PPTX
AI in Bioinformatics
PDF
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
PDF
IRJET- Crop Leaf Disease Diagnosis using Convolutional Neural Network
PDF
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
PPTX
Breast cancer diagnosis machine learning ppt
PDF
AI for drug discovery
PDF
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
PDF
IRJET - Lung Disease Prediction using Image Processing and CNN Algorithm
PDF
IRJET- Survey on Breast Cancer Detection using Neural Networks
PPTX
How is machine learning significant to computational pathology in the pharmac...
PDF
Research Statement Chien-Wei Lin
PPTX
Digital Pathology, FDA Approval and Precision Medicine
Bayesian statistics
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
IRJET- Breast Cancer Detection from Histopathology Images: A Review
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
AI in Bioinformatics
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
IRJET- Crop Leaf Disease Diagnosis using Convolutional Neural Network
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Breast cancer diagnosis machine learning ppt
AI for drug discovery
IRJET- Intelligent Prediction of Lung Cancer Via MRI Images using Morphologic...
IRJET - Lung Disease Prediction using Image Processing and CNN Algorithm
IRJET- Survey on Breast Cancer Detection using Neural Networks
How is machine learning significant to computational pathology in the pharmac...
Research Statement Chien-Wei Lin
Digital Pathology, FDA Approval and Precision Medicine
Ad

Similar to EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict colorectal cancer (20)

PDF
Slima abstract XAI Deep learning for health using fuzzy logic
PDF
IRJET- Exploring Colorectal Cancer Genes through Data Mining Techniques
PPTX
Electronic health records and machine learning
PDF
Deep learning for biomedical discovery and data mining II
PPTX
Modeling Electronic Health Records with Recurrent Neural Networks
PDF
Care expert assistant for Medicare system using Machine learning
PDF
Using Artificial Neural Networks to Detect Multiple Cancers from a Blood Test
PDF
Challenges and opportunities for machine learning in biomedical research
PPTX
Defenseeeeeeeeedefenceeeeeunckefjdfd I.pptx
PPTX
Dekker trog - learning outcome prediction models from cancer data - 2017
PPTX
Realising the potential of Health Data Science: opportunities and challenges ...
PPTX
Updated proposal powerpoint.pptx
PDF
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
PPTX
Data supporting precision oncology fda wakibbe
PDF
Unleash the Power of Neo4j with GPT and Large Language Models: Harmonizing Co...
PPTX
Predictive Analytics and Machine Learning for Healthcare - Diabetes
PDF
ARTIFICIAL NEURAL NETWORK FOR DIAGNOSIS OF PANCREATIC CANCER
PDF
ARTIFICIAL NEURAL NETWORK FOR DIAGNOSIS OF PANCREATIC CANCER
PPTX
Deep Learning for EHR Data
DOCX
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Slima abstract XAI Deep learning for health using fuzzy logic
IRJET- Exploring Colorectal Cancer Genes through Data Mining Techniques
Electronic health records and machine learning
Deep learning for biomedical discovery and data mining II
Modeling Electronic Health Records with Recurrent Neural Networks
Care expert assistant for Medicare system using Machine learning
Using Artificial Neural Networks to Detect Multiple Cancers from a Blood Test
Challenges and opportunities for machine learning in biomedical research
Defenseeeeeeeeedefenceeeeeunckefjdfd I.pptx
Dekker trog - learning outcome prediction models from cancer data - 2017
Realising the potential of Health Data Science: opportunities and challenges ...
Updated proposal powerpoint.pptx
IRJET - Prediction of Risk Factor of the Patient with Hepatocellular Carcinom...
Data supporting precision oncology fda wakibbe
Unleash the Power of Neo4j with GPT and Large Language Models: Harmonizing Co...
Predictive Analytics and Machine Learning for Healthcare - Diabetes
ARTIFICIAL NEURAL NETWORK FOR DIAGNOSIS OF PANCREATIC CANCER
ARTIFICIAL NEURAL NETWORK FOR DIAGNOSIS OF PANCREATIC CANCER
Deep Learning for EHR Data
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Ad

More from Servio Fernando Lima Reina (17)

PPTX
Universidad corporativa: El ethos, pathos y logos
PPTX
Slima taxonomy dl in cognitive cities
PPTX
Slima explainable deep learning using fuzzy logic human ist u fribourg ver 17...
PDF
Slima xai lstm fuzzy logic project ver 9 feb 2019
PDF
Slima paper smartcities and ehealth
PDF
Slima paper smartcities and ehealth for cmu ver 1 july 2018
PDF
Educastle fintech presentation for internations ver 12 may 2018
PPTX
Educastle for RICOH
PDF
Slima linkedin recommendations ver 4 feb 2017
PDF
Slima overall transcripts ver 27 dic 2016
PDF
A tale of competitive strategy in space
DOC
A Performance Comparison of TCP Protocols
DOC
IMPLEMENTACION CONJUNTA DE LOS PRINCIPIOS DE LEAN THINKING Y TEORIA DE LAS RE...
PDF
Slima dba investment and taxation paper em vs behavioral finance ver 26 sept ...
PDF
Slima thesis carnegie mellon ver march 2001
DOCX
Slim ph d management paper ver 9 july 2011
PPTX
Slima telstra submarine cable australia japan ver 22 oct 2011
Universidad corporativa: El ethos, pathos y logos
Slima taxonomy dl in cognitive cities
Slima explainable deep learning using fuzzy logic human ist u fribourg ver 17...
Slima xai lstm fuzzy logic project ver 9 feb 2019
Slima paper smartcities and ehealth
Slima paper smartcities and ehealth for cmu ver 1 july 2018
Educastle fintech presentation for internations ver 12 may 2018
Educastle for RICOH
Slima linkedin recommendations ver 4 feb 2017
Slima overall transcripts ver 27 dic 2016
A tale of competitive strategy in space
A Performance Comparison of TCP Protocols
IMPLEMENTACION CONJUNTA DE LOS PRINCIPIOS DE LEAN THINKING Y TEORIA DE LAS RE...
Slima dba investment and taxation paper em vs behavioral finance ver 26 sept ...
Slima thesis carnegie mellon ver march 2001
Slim ph d management paper ver 9 july 2011
Slima telstra submarine cable australia japan ver 22 oct 2011

Recently uploaded (20)

PDF
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
PPTX
Slider: TOC sampling methods for cleaning validation
PPTX
ACID BASE management, base deficit correction
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPTX
Note on Abortion.pptx for the student note
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PPTX
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
PPT
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
PPTX
Electromyography (EMG) in Physiotherapy: Principles, Procedure & Clinical App...
PPT
Management of Acute Kidney Injury at LAUTECH
PPTX
Gastroschisis- Clinical Overview 18112311
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PDF
Medical Evidence in the Criminal Justice Delivery System in.pdf
PDF
Human Health And Disease hggyutgghg .pdf
PPTX
SKIN Anatomy and physiology and associated diseases
PPTX
1 General Principles of Radiotherapy.pptx
DOCX
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
PDF
Deadly Stampede at Yaounde’s Olembe Stadium Forensic.pdf
PPT
Obstructive sleep apnea in orthodontics treatment
PPT
Breast Cancer management for medicsl student.ppt
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
Slider: TOC sampling methods for cleaning validation
ACID BASE management, base deficit correction
surgery guide for USMLE step 2-part 1.pptx
Note on Abortion.pptx for the student note
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
Electromyography (EMG) in Physiotherapy: Principles, Procedure & Clinical App...
Management of Acute Kidney Injury at LAUTECH
Gastroschisis- Clinical Overview 18112311
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
Medical Evidence in the Criminal Justice Delivery System in.pdf
Human Health And Disease hggyutgghg .pdf
SKIN Anatomy and physiology and associated diseases
1 General Principles of Radiotherapy.pptx
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
Deadly Stampede at Yaounde’s Olembe Stadium Forensic.pdf
Obstructive sleep apnea in orthodontics treatment
Breast Cancer management for medicsl student.ppt

EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict colorectal cancer

  • 2. Agenda  Why this research?  Proposed neuro-fuzzy system  Components of the system  Future work  Conclusions  Q&A 2
  • 3. Why this research?  In 2018, 1.8 Million cases worldwide, are attributed to CRC: Colo rectal cancer (CRC) incidence worldwide Source:World Health Organization-WHO 3
  • 4. Why this research? Screening: duplicates and misclassification removal Eligibility: revisión of title and abstract Inclusion: final selection of articles Keywords:cancer AND neuro fuzzy AND colo AND rectal between 2006 and 2019 ACM 2 Xplore 1 Elsevier 34 Springer 15 695 435 102 52 Too few papers are related to using neuro-fuzzy systems for analyzing time sensitive EMR (electronic medical records) applied to CRC. 4
  • 5. Why this research?  In past years, it has been seen that even though deep neural networks (DNN) are accurate in its prediction, the medical community hasn’t adopted it, because the lack of explainability.  Moreover it has been shown that CRC prediction is benefit by analyzing past patient data (e.g. EMR data) [2] .There are algorithms such as LSTM (Long ShortTerm Memory) that can take into account this data and find patterns in time. But LSTM still remains difficult to explain due to its internal complexity.  We propose to explain LSTM algorithm by means of Fuzzy Logic which helps to humans:  In representing the results of DNN in if-then format, easier to interpret.  In having traceability of the LSTM decisions  In adding or modifying fuzzy rules (by humans) that will allow more explainable predictions. Rationale of this research 5
  • 6. Proposed neuro fuzzy system Explainable neuro fuzzy system for colorectal cancer (ENF-CRC) 6
  • 7. Data source  Data was compiled on approximately 4,300 patients, including patients between 18-49 years old diagnosed with CRC (200 patients), patients between 50-90 years old diagnosed with CRC (3300 patients), and health controls with the same year of birth as patients in Group 1 without CRC (800 patients).  Dataset created Feb 12, 2018  Dataset provided by  NYU Health Sciences Library  Time period covered  Jan 1, 2011 - Dec 31, 2017  Area covered  NewYork  Source: https://datacatalog.med.nyu.edu/dataset/10164 NYU Langone Health EMR Dataset 7
  • 8. Components of the system  Our source of information are EMR.We are going to split the data in two: 1. known predictors of CRC (as per Hippisley-Cox J, Coupland C model, in blue) and 2. yet to know predictors (anything else). EMR and data pre-processing Type Purpose Output Raw data *Patient: age, gender Consults: symptoms/diagnosis Medication: medication type Referral: specialism Lab results: measurement type:value *Diagnosis of specific disease:0/1 Lab value contextualization Contextualize raw lab values to ground values Lab value: low/normal/high Lab value: increasing/decreasing/stable Semantic enrichment Add existing knowledge to compensate for incompleteness of data *Consults: comorbidities, diagnosis type Medication: side effects Events that co-occur or occur in succession Table 1: EMR data and its preparation *Hippisley-Cox J, Coupland C model uses specific data to predictCRC [3] 8
  • 9. Components of the system EMR and data pre-processing: Hazard risk Source: Hippisley-Cox J, Coupland C model [3]. CI: Confidence Interval. Bowel cancer includes colon and rectal cancer  Thanks to the Hippisley-Cox J, CouplandC model, we know in advance the impact that the variables in this model have over CRC. It uses a score named Hazard Risk or HR.Therefore is up to us to input them in LSTM. 9
  • 10. Components of the system  Clinical variables are measured at different moments and are irregularly sampled in time.  LSTM may have a hard time in distinguishing lengthy and incomplete sequences of events. EMR and data pre-processing  Moreover, temporal pattern mining usually returns a large number of temporal patterns, most of which may be irrelevant to the classification task. To address all these problems we can use Batal [1] pre-processing algorithm. 10
  • 11. Components of the system  Despite all the previous efforts in data pre-processing, we still may end up with a big number of input features.  In this situation we are going to use Mann-WhitneyWilcoxon rank-sum test to reduce the number of input features considered.This test determine the difference in occurrences of a feature among CRC and non-CRC cases.We are going to pick up the features with the lowest p-values or probability of observed result arising by chance. EMR and data pre-processing Sympton A frequency (EMR data) Patient 1 (non CRC) Patient 2 (non CRC) Patient 3 (CRC) Patient 4 (CRC) Time 11
  • 12. Components of the system  Once we finish the data pre-processing, we can feed the LSTM algorithm.  LSTM has the ability to capture long term relationships in data. It uses forget gate=1 to remember cell state and f=0 to forget. LSTM for analyzing EMR  It has been shown that CRC benefits from analyzing past and present data of the patient up to six months [2], which is the most symptomatic phase for CRC.  But LSTM decisions for CRC are hard to explain because it considers multiple input variables, each changing in frequency and time. LSTM network and formulas: it contains: input, output and forget gates Input Forget Output 12
  • 13. Components of the system  We need a method that can assess the impact of each variable individually (e.g. frequency of occurance of symptoms, lab value) and its change in time.  The method that achieves this is IMV-LSTM [6] Explaining LSTM decisions 13
  • 14. Components of the system  After finding the most relevant variables by IMV-LSTM and considering that we already have the Hippisley-Cox J, CouplandC variables, we can use them as input for a fuzzy rule based system (FRBS), that will generate fuzzy rules automatically. Explaining LSTM decisions: fuzzy rules 14
  • 15. Components of the system GUAJE is a modeling software tool especially aimed for designing interpretable fuzzy systems. GLMP is granular linguistic model of phenomenon and it is usually represented as a hierarchical network of computational perceptions. rLDCP is a software tool that is aimed at producing GLMP. GUAJE and rLDCP can be connected in order to build rule bases that can be embedded into some parts of a GLMP. Explaining LSTM decisions: GUAJE, GLMP, rLDCP NATURAL LANGUAGE: The patient has CRC cancer with high probability because is white, is a heavy smoker, very heavy drinker, has family history of bowel cancer and has shown pattern X in the lastY months. GLMP representation for CRC 15
  • 16. Future work  Implement the system.  Change to other predictive models for CRC such as Marshall et al. [4].  Incorporate images: fMRI and radiography and use CNN (Convolutional neural networks) for reading those images.  Substitute LSTM with Bio-BERT (Bidirectional Encoder Representations fromTransformers)[5], a pre- trained biomedical language representation model for biomedical text mining that has access to PubMed (4.5 billion words) and PMC (13.5 billion words)  Add an evaluation mechanism of generated explanations, such as the Psychological model of explanation proposed by the Defense Advanced Research Projects Agency (DARPA) 16
  • 17. Conclusions  Cancer analysis can be greatly improved fusing deep learning and fuzzy logic, making the overall system more explainable to medical doctors.  A system that allows traceability of decisions and human modification in case of any bias of the decisions, is a system that is more prone to be accepted for the medical community for helping in attending CRC patients. 17
  • 19. Thank you Servio Fernando Lima Reina, PhD student Human-IST institute, University of Fribourg, Switzerland servio.lima@unifr.ch
  • 20. Bibliography [1] Iyad Batal, HamedValizadegan, Gregory F Cooper, and Milos Hauskrecht. A temporal pattern mining approach for classifying electronic health record data. ACMTransactions on Intelligent Systems andTechnology (TIST), 4(4):63, 2013. [2] Amirkhan R. et al. Using Recurrent Neural Networks to Predict Colorectal Cancer among Patients. IEEE Symposium Series on Computational Intelligence (SSCI). Honolulu, HI, USA 2017 [3] Hippisley-Cox J, Coupland C (2015) Development and validation of risk prediction algorithms to estimate future risk of common cancers in men and women: prospective cohort study. BMJ Open 5: e007825. [4] MarshallT, Lancashire R, Sharp D, PetersTJ, Cheng KK, HamiltonW (2011)The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance. Gut 60: 1242–1248. [5] Lee J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Oxford Press. 2019 [6] GuoT. et al, Exploring Interpretable LSTM Neural Networks over Multi-Variable Data. Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. 20