SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 283
Multilevel Predictor Model for Detecting Depressed Posts in
Social Media
Neha U1, Ayush Shrivastava1, Manish Bhande1
1Dept. of CSE, Dayananda Sagar College of Engineering, Bangalore 560078, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The society is now facing an unexpected
growth in the matter of mental disorders, with an estimated
300 million people suffering from depression all over the
world. People with high life satisfaction usually do not suffer
mental health issues at a higher level. Generation of data on
social network platforms enables us to detect hidden
patterns in data which are useful in obtaining meaningful
insights. This work aims to (a) explore the relationship
between life satisfaction and depression in social network
users, using Twitter as an example, and (b) develop a
multilevel predictive model to detect posts which convey a
sign of depression using positive, negative and neutral as the
three values . We trained a set of predictive models on
datasets obtained by using a dictionary which carries the
values for emotions and a survey called Centre for
Epidemiological Study Depression (CES-D) scale. The
resulting multilevel model establishes a negative correlation
between life satisfaction and depression, and it can also
display the accuracy of a predictive model using confusion
matrix.
Key Words: Predictive model, Machine learning,
Mental health, Depression, Life satisfaction, Social
media, Confusion matrix.
1. INTRODUCTION
Mental health is and continues to be a prominent
invasion for the world. It is estimated that 300 million
people are suffering from depression all over the world.
. Depression is associated with being inactive and
overthinking leading to major depressive disorder.
Major Depressive Disorder is commonly referred to as
clinical depression.
The probability for an individual to encounter a major
depressive episode within a duration of one year is 3 – 5%
for males and 8 – 10% for females. Yet, these effects of
depression reach further than simply societal well-being.
Depression affects workdays, diminishing work habits,
and potentially inciting complications with concentration,
memory, and decision-making behaviors.
This also stems further than simply the economic sphere,
and often co-occurs with other illnesses and mental
conditions. One in four cancer patients experience
depression, one in three heart attack survivors undergo
depression, and up to 75% of individuals diagnosed with
an eating disorder will encounter the disease . Untreated
depression increases the chance of dangerous behaviors.
The significant challenge of detecting depression is the
recognition that depressive symptoms may differ from
patients’ behavior and personality While these factors may
be seemingly random on their own, they often coalesce
into an undue burden on an ailing patient, thus vastly
degrading the quality of life for an individual and their
peers.
1.1 MDD - Major Depressive Disorder
The suicide rate exponentially increases as an
individual suffering through one or more of these mental
illnesses will likely experience a snowball effect towards
others. Suicide itself is the cause leading to death. Clearly,
depression has the potential to manifest itself within a
cornucopia of other social issues, and therefore becomes a
problem of high priority for our society to solve.
Effort to gain anti-depressants, or otherwise mask one’s
depression from a friend or family member. Furthermore,
these questionnaires are often costly, and further it
economically becomes a burden of receiving treatment for
depression significantly.
Yet, before even examining flaws with the current
methods of treating depression, one must simply be
identified with the illness: the World Health Organization
reports that the huge majority of depressed individuals
never seek out treatment. This is particularly a trouble
for the younger generation, which commonly will resort
to blame and stifled self-esteem before seeking any sort
of help. Even during visits with a primary health care
physician, depression often goes unrecognized, and
therefore undiagnosed.
Yet, when Major Depressive Disorder (MDD) is
properly identified, contained, and treated, it may have far-
reaching impacts upon society. Up to 80% of those treated
for depression showed an improvement in their symptoms
within a period of four to six weeks [10], thus bettering
their lives, productivity, and boosting the economic status.
A study funded by the National Institute of Mental Health
developed a test to determine the effectiveness of
depression treatment. Known as the Sequenced Treatment
Alternatives to Relieve Depression (STAR*D): it reported
depression remission rates of over 65 percent after
probably six months of treatment. Therefore, it has
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 284
become patently obvious that our wide contribution to
combating clinical depression in certain countries would
lie within improving techniques to identify Major
Depressive Disorder, rather than in its treatment methods.
1.2 Treatment Methods
The most popular of these are the Center of
Epidemiologic Studies Depression Scale (CES-D) , Beck’s
Depression Scale (BDI) , and Zung’s Self-Rating Depression
Scale (SDS) . Results on these examinations are
determined from the patient themselves, or a third-party
observation, but never from empirical data. Thus, these
questionnaires often lend themselves flaws though
subjective human testing, and may be easily manipulated
to achieve a pre-determined prognosis.
People increasingly utilize social network platforms to
share their innermost thoughts, desires, and voice their
opinion on social matters. Postings on these sites are made
in a naturalistic manner, and therefore provides a solution
to the manipulation which self-reported depression
questionnaires often encounter. We have concluded that
social media provides a means to capture an individuals
present state of mind, and is even effective at representing
feelings of worthlessness, guilt, helplessness, and the levels
of self-hatred that would often characterize clinical
depression. We pursue the hypothesis that social media,
through word vectorization, may be utilized to construct
statistical models to detect and even predict Major
Depressive Disorder, and possibly even compliment and
extend traditional approaches to depression diagnosis.
2. METHODS
Twitter has become one of the most popular social media
platforms since it launched, it advises 313 million active
users who produce 6,000 tweets on Twitter every second
as June, 2016. In favour of gathering the depression related
data, we keep monitoring each streaming tweet that
includes the word “depression” in entire Twitter platform
[1]. Totally, we roughly have gathered large number of
tweets that discussed the fields relevant to depression.
Starting to web scraping the initial webpage, thousands of
professional mental health tweets as well have been
accumulated at the end.
A dataset can be developed from an amalgamation of users
with public Twitter accounts who posted a status update
in the form of a statement of diagnosis, such as “I was
diagnosed with X today”, where X would represent either
depression or PTSD.
2.1 Data Collection
Create a twitter developers account and specify in brief the
intent of the cause to work on sentimental analysis.Once
approved, from that account you would need 4 things.The
first one being “consumer_key = '', “consumer_secret = '',
“access_token = '', “access_secret = ''
Create a file and insert the credentials so that you can
download current tweets using keywords such as
depression, anxiety or sadness. When data sets are ready
you may proceed on the preprocessing stage.
2.2 Data Preprocessing
Because the data we have collected from the tweets are
biased and noisy, making sure that the data is free of such
noise is our first task. Generally, the special characters,
such as retweet tags “@RT: xxx” and link address
“http://www.”, contain less information, they are removed
at the beginning. In the next step, stop words and
punctuations are removed by stop word list that has been
extracted online. Non- words are very common in social
media data due to any types of typo or acronyms, for
instance, “hrt”, “lmao”. These words are filtered by the
NLTK toolkit available in the python libraries. Basically
the preprocessing stage will go through your data sets and
the given dictionary. The dictionary essentially contains
words with their corresponding polarity, which is helpful
in calculating the sentiment of each tweet, each word will
be filtered, tokenized and given its polarity. Every tweet
will consist of the summation of all the polarity of each
word which is basically divided by the number of words in
that particular tweet. Finally, we had the raw data cleaned.
Table 1 shows the number of words have left after each
step of data preprocessing procedure.
2.2.1 Word Frequency
The primary use of the standard online dictionary
available basically involves word frequency which is the
basic approach to analyze data as to calculate the word
frequency in the documents or otherwise called the text
files. In the traditional text mining research, the frequent
words are considered as the important words in the
natural language processing. The collected data includes
many common words that are semantically related to the
depressive symptoms that we are familiar with, e.g. words
“anxiety” and “disorder” are universal in the data set.
2.2.2 Data Analysis
Once preprocessing is done, you can find the file in the
directory. Opening it you will find that the ID (Twitter User
or tweet) and Sentiment of each tweet is segregated into 2
columns. With this output you now have a twitter data set
and its corresponding sentiment filtered by depress
keywords. (Positive, Neutral and Negative) as per the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 285
dictionary that contains words with their respective
polarity. Each word taken from the tweet is compared with
the dictionary and given a score. The sum of polarity is
added for each tweet and if it is above 0, then it is a
positive tweet. If it is equal to 0 it is neutral and if lesser
than 0 it is a negative tweet. By this way tweets are
classified as positive or negative.
2.2.3 Training
For training and predicting purposes we need to make
sure that all the files are located in proper folders, the
code written for analysis will run through the output.xlsx
file which is the file that basically generates the ID and
sentiment associated with it simultaneously recovering the
tweet corresponding to the id of each sentiment. Using this
we use the original data and feed them to our classifiers
which is defined in the next step.
2.2.4 Classifiers
We employ four different types of binary classifiers in
order to estimate the likelihood of depression within
users. For each classifier, we utilize Scikit-Learn to
implement the learning algorithms. We chose to
evaluate Linear, Non-Linear, and Tree-based
approaches in order to explore foundational learning
models against our dataset. Ultimately, we decided
upon Decision Trees, a Linear Support Vector Classifier,
, as well as a Naïve Bayes algorithm. In this section, we
attempt to explain how these algorithms work, as well
as our implementation of them.
3. RESULTS
As per the understanding of the working algorithms a
result depicting Postive means that the person is unlikely
to have depression or anxiety. Neutral is the middle level
wherein the user may or may not have depression but may
also be more prone to being depressed. At this stage the
user may display some depression like symptoms. Lastly,
Negative is the lowest level where depression and anxiety
symptoms are being detected through the users tweets.
The more negative words the user uses means the more
negative emotion the tweet carries. We also discuss the
degree of accuracy to which the presence of active
depression within a body of text may be ascertained from
the analysis conducted. Classifiers were constructed by
Machine Learning as detailed in for estimating the
presence of signs of depression.
We utilize accuracy to denote each classifier’s accuracy.
We also obtain the performance of each classifier in the
form of confusion matrix on a set of test data for which the
true values are known. This is the key to the confusion
matrix also known as an error matrix, is a specific table
layout that allows visualization of the performance of an
algorithm, typically a supervised learning one
Table shows the accuracy as to which our constructed
classifiers were able to discern the class of a small body of
text. The classification accuracies are the average values
which were used to vectorize the tweet. The result
involves typing in a sample tweet, the tweet will go
through the list of classifiers to predict the sentiment of
the tweet you wrote and give the result as positive,
negative or neutral. The algorithms were run on the test
set and were able to give an accuracy in predicting positive
and negative tweets.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 286
4. CONCLUSIONS
Depression has been a serious mental illness since past
decades which negatively affects human’s health. It is
difficult to confirm human’s depression symptoms from
their behaviors via restricted clinic records. Our proposed
methods and experiments illustrate that social network
provides rich and vast information for depression
symptoms extraction from a distinctive perspective. We
have demonstrated the potential of using twitter as a tool
for measuring and predicting major depressive disorder in
individuals. First, we compiled a dataset.Next, we
proposed a Bag of Words approach towards quantifying
this dataset creating a huge dimensional feature space as
our input vector. Finally, we leveraged these distinguishing
attributes to build, compare, and contrast several
statistical classifiers which may predict the likelihood of
depression within an individual.
The original data fed to our classifiers displayed all the
AUC of each classifier .The ability to type in a sample tweet,
The tweet will go through the highest AUC in the list of
classifier to predict the sentiment of the tweet you wrote
was also enabled. Positive result means that the person is
unlikely to have depression or anxiety. Neutral means the
middle level wherein the user may or may not have
depression but may also be more prone to being depress.
At that stage the user may display some depression like
symptoms. Lastly, Negative is the lowest level where
depression and anxiety symptoms are being detected
through the user’s tweets. The more negative words the
user uses mean the more negative emotion the tweet has.
Thus through these results depression level could be
determined.
In the future, we will collect other types of data, e.g. image
and video from other social networks. Additionally,
advanced entity selection technique would be used to
select more accurate and meaningful depression
symptoms.
ACKNOWLEDGEMENT
This research paper would have been incomplete
without the names of those people who have been
constantly helping us achieve this milestone. I take this
opportunity to express my deepest gratitude and
appreciation to all those who have been the moral support
helping me directly or indirectly towards the successful
completion of this research paper.
I would like to express my gratitude to Dr. C P S
Prakash, Principal, Dayananda Sagar College of
engineering, DSCE, Bengaluru for this warm support
throughout the course.
I am extremely thankful to Dr. Ramesh Babu D R,
Vice Principal, and Head of the Department of Computer
Science and Engineering, DSCE, Bengaluru for his
motivation and encouragement and support.
For overall direction, I am very grateful to 8th
semester coordinator Dr. Vindhya P Malagi, Assoc.
Professor, Dept. of Computer Science and Engineering,
DSCE, Bengaluru who has helped me by providing details
relevant to this field helping us achieve the best of the
outcomes required behind the completion of this paper.
I am grateful to Dept. of Computer Science and
Engineering and Dayananda Sagar College of Engineering
My sincere thanks to my guide, Prof. Prameetha Pai for
imparting me the knowledge with which I could do my
best. Finally, I Would like to thank all the staff members of
Computer Science and Engineering Department, DSCE, and
also other departments who are helping me directly or
indirectly for the research regarding depression to be
complete.
I sincerely thank my mentors, parents, family, and
friends who have advised and pushed me forward
throughout this endeavour. The product of this research
paper would not be possible without all of them.
REFERENCES
[1] Extracting Depression Symptoms from Social
Networks and Web Blogs via Text Mining
Long Ma, Zhibo Wang, Yanqing Zhang,
Department of Computer Science,Georgia State University,
Atlanta, USA.
{lma5, zwang2}@student.gsu.edu, yzhang@gsu.edu
[2] Identifying Depression on Twitter
Moin Nadeem, Mike Horn, Glen Coppersmith,
JohnsHopkins University and Dr. Sandip Sen, PhD,
University of Tulsa
[3] Detecting Depression Using an Ensemble Logistic
Regression Model Based on Multiple Speech Features.
Haihua Jiang,1 Bin Hu ,1 Zhenyu Liu,2 Gang Wang,3 Lan
Zhang,4 Xiaoyu Li,2 and HuanyuKang2
1Faculty of Information Technology, Beijing University of
Technology, Beijing 100124, China
2Gansu Provincial Key Laboratory of Wearable Computing,
School of Information Science and Engineering, Lanzhou
University, Lanzhou 730000, China
3Beijing Anding Hospital of Capital Medical University,
Beijing 100088, China
4Lanzhou University Second Hospital, Lanzhou 730030,
China
[4] Predicting Depression Levels Using Social Media Posts.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 287
Maryam Mohammed Aldarwish, King Saud University for
Health Science National Guard Health-Affairs. Al-Ahssa,
Kingdom of Saudi Arabia
Hafiz Farooq Ahmed, College of Computer Science and
Computer Technology, King Faisal University, Al-Ahssa,
Kingdom of Saudi Arabia

More Related Content

PDF
best engineering journals
PPTX
Studying Mental Heath via social Media
PDF
Monitoring Clinical Depressive symptoms in Social Media(Invited talk at NLM)
PDF
IRJET- Intelligent Depression Detection System
PDF
DepXGBoot: Depression detection using a robust tuned extreme gradient boostin...
PDF
Depression prognosis using natural language processing and machine learning ...
PDF
Proactive depression detection from Facebook text and behavior data
PDF
Depression Detection Using Various Machine Learning Classifiers
best engineering journals
Studying Mental Heath via social Media
Monitoring Clinical Depressive symptoms in Social Media(Invited talk at NLM)
IRJET- Intelligent Depression Detection System
DepXGBoot: Depression detection using a robust tuned extreme gradient boostin...
Depression prognosis using natural language processing and machine learning ...
Proactive depression detection from Facebook text and behavior data
Depression Detection Using Various Machine Learning Classifiers

Similar to IRJET- Multilevel Predictor Model for Detecting Depressed Posts in Social Media (20)

PPTX
DEPRESSIVE-AND-NON-DEPRESSIVE-TWEETS-CLASSIFICATION-USING-MACHINE-LEARNING-MO...
PPT
An Investigation Into The Early Detection Of Psychosis And Depression In Indi...
PPTX
Understanding the Spread of Major Depressive Disorder in the Twitter Population
PDF
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social...
PDF
Depression Screening in Humans With AI and Deep Learning Techniques.pdf
PPTX
Social Media Sentiment Analysis fro Depression detection Using Machine Learni...
PDF
Depression and anxiety detection through the Closed-Loop method using DASS-21
PDF
A_Hybrid_Deep_Learning_Model_to_Predict_the_Impact (1).pdf
PDF
Suicide Analysis and Prevention Application using Machine Learning Classifiers
PDF
Depression Detection in Tweets using Logistic Regression Model
DOCX
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
PDF
Predicting depression using deep learning and ensemble algorithms on raw twit...
PDF
A comparative analysis of transfer learning models on suicide and non-suicide...
PPTX
Depression Analayis
DOCX
PSYC706-ASSIGNMENT2-CRITICAL Reflective Paper-Print copy
PDF
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
PPTX
Depression- Diagnosis, Causes, Treatments
PDF
Stress Sentimental Analysis Using Machine learning (Reddit): A Review
PDF
Quantifying the efficacy of ML models at predicting mental health illnesses
PDF
Understanding Depression
DEPRESSIVE-AND-NON-DEPRESSIVE-TWEETS-CLASSIFICATION-USING-MACHINE-LEARNING-MO...
An Investigation Into The Early Detection Of Psychosis And Depression In Indi...
Understanding the Spread of Major Depressive Disorder in the Twitter Population
Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social...
Depression Screening in Humans With AI and Deep Learning Techniques.pdf
Social Media Sentiment Analysis fro Depression detection Using Machine Learni...
Depression and anxiety detection through the Closed-Loop method using DASS-21
A_Hybrid_Deep_Learning_Model_to_Predict_the_Impact (1).pdf
Suicide Analysis and Prevention Application using Machine Learning Classifiers
Depression Detection in Tweets using Logistic Regression Model
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Predicting depression using deep learning and ensemble algorithms on raw twit...
A comparative analysis of transfer learning models on suicide and non-suicide...
Depression Analayis
PSYC706-ASSIGNMENT2-CRITICAL Reflective Paper-Print copy
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Depression- Diagnosis, Causes, Treatments
Stress Sentimental Analysis Using Machine learning (Reddit): A Review
Quantifying the efficacy of ML models at predicting mental health illnesses
Understanding Depression
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
web development for engineering and engineering
PPT
Project quality management in manufacturing
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
additive manufacturing of ss316l using mig welding
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
web development for engineering and engineering
Project quality management in manufacturing
Embodied AI: Ushering in the Next Era of Intelligent Systems
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
R24 SURVEYING LAB MANUAL for civil enggi
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
bas. eng. economics group 4 presentation 1.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
additive manufacturing of ss316l using mig welding
UNIT 4 Total Quality Management .pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...

IRJET- Multilevel Predictor Model for Detecting Depressed Posts in Social Media

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 283 Multilevel Predictor Model for Detecting Depressed Posts in Social Media Neha U1, Ayush Shrivastava1, Manish Bhande1 1Dept. of CSE, Dayananda Sagar College of Engineering, Bangalore 560078, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The society is now facing an unexpected growth in the matter of mental disorders, with an estimated 300 million people suffering from depression all over the world. People with high life satisfaction usually do not suffer mental health issues at a higher level. Generation of data on social network platforms enables us to detect hidden patterns in data which are useful in obtaining meaningful insights. This work aims to (a) explore the relationship between life satisfaction and depression in social network users, using Twitter as an example, and (b) develop a multilevel predictive model to detect posts which convey a sign of depression using positive, negative and neutral as the three values . We trained a set of predictive models on datasets obtained by using a dictionary which carries the values for emotions and a survey called Centre for Epidemiological Study Depression (CES-D) scale. The resulting multilevel model establishes a negative correlation between life satisfaction and depression, and it can also display the accuracy of a predictive model using confusion matrix. Key Words: Predictive model, Machine learning, Mental health, Depression, Life satisfaction, Social media, Confusion matrix. 1. INTRODUCTION Mental health is and continues to be a prominent invasion for the world. It is estimated that 300 million people are suffering from depression all over the world. . Depression is associated with being inactive and overthinking leading to major depressive disorder. Major Depressive Disorder is commonly referred to as clinical depression. The probability for an individual to encounter a major depressive episode within a duration of one year is 3 – 5% for males and 8 – 10% for females. Yet, these effects of depression reach further than simply societal well-being. Depression affects workdays, diminishing work habits, and potentially inciting complications with concentration, memory, and decision-making behaviors. This also stems further than simply the economic sphere, and often co-occurs with other illnesses and mental conditions. One in four cancer patients experience depression, one in three heart attack survivors undergo depression, and up to 75% of individuals diagnosed with an eating disorder will encounter the disease . Untreated depression increases the chance of dangerous behaviors. The significant challenge of detecting depression is the recognition that depressive symptoms may differ from patients’ behavior and personality While these factors may be seemingly random on their own, they often coalesce into an undue burden on an ailing patient, thus vastly degrading the quality of life for an individual and their peers. 1.1 MDD - Major Depressive Disorder The suicide rate exponentially increases as an individual suffering through one or more of these mental illnesses will likely experience a snowball effect towards others. Suicide itself is the cause leading to death. Clearly, depression has the potential to manifest itself within a cornucopia of other social issues, and therefore becomes a problem of high priority for our society to solve. Effort to gain anti-depressants, or otherwise mask one’s depression from a friend or family member. Furthermore, these questionnaires are often costly, and further it economically becomes a burden of receiving treatment for depression significantly. Yet, before even examining flaws with the current methods of treating depression, one must simply be identified with the illness: the World Health Organization reports that the huge majority of depressed individuals never seek out treatment. This is particularly a trouble for the younger generation, which commonly will resort to blame and stifled self-esteem before seeking any sort of help. Even during visits with a primary health care physician, depression often goes unrecognized, and therefore undiagnosed. Yet, when Major Depressive Disorder (MDD) is properly identified, contained, and treated, it may have far- reaching impacts upon society. Up to 80% of those treated for depression showed an improvement in their symptoms within a period of four to six weeks [10], thus bettering their lives, productivity, and boosting the economic status. A study funded by the National Institute of Mental Health developed a test to determine the effectiveness of depression treatment. Known as the Sequenced Treatment Alternatives to Relieve Depression (STAR*D): it reported depression remission rates of over 65 percent after probably six months of treatment. Therefore, it has
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 284 become patently obvious that our wide contribution to combating clinical depression in certain countries would lie within improving techniques to identify Major Depressive Disorder, rather than in its treatment methods. 1.2 Treatment Methods The most popular of these are the Center of Epidemiologic Studies Depression Scale (CES-D) , Beck’s Depression Scale (BDI) , and Zung’s Self-Rating Depression Scale (SDS) . Results on these examinations are determined from the patient themselves, or a third-party observation, but never from empirical data. Thus, these questionnaires often lend themselves flaws though subjective human testing, and may be easily manipulated to achieve a pre-determined prognosis. People increasingly utilize social network platforms to share their innermost thoughts, desires, and voice their opinion on social matters. Postings on these sites are made in a naturalistic manner, and therefore provides a solution to the manipulation which self-reported depression questionnaires often encounter. We have concluded that social media provides a means to capture an individuals present state of mind, and is even effective at representing feelings of worthlessness, guilt, helplessness, and the levels of self-hatred that would often characterize clinical depression. We pursue the hypothesis that social media, through word vectorization, may be utilized to construct statistical models to detect and even predict Major Depressive Disorder, and possibly even compliment and extend traditional approaches to depression diagnosis. 2. METHODS Twitter has become one of the most popular social media platforms since it launched, it advises 313 million active users who produce 6,000 tweets on Twitter every second as June, 2016. In favour of gathering the depression related data, we keep monitoring each streaming tweet that includes the word “depression” in entire Twitter platform [1]. Totally, we roughly have gathered large number of tweets that discussed the fields relevant to depression. Starting to web scraping the initial webpage, thousands of professional mental health tweets as well have been accumulated at the end. A dataset can be developed from an amalgamation of users with public Twitter accounts who posted a status update in the form of a statement of diagnosis, such as “I was diagnosed with X today”, where X would represent either depression or PTSD. 2.1 Data Collection Create a twitter developers account and specify in brief the intent of the cause to work on sentimental analysis.Once approved, from that account you would need 4 things.The first one being “consumer_key = '', “consumer_secret = '', “access_token = '', “access_secret = '' Create a file and insert the credentials so that you can download current tweets using keywords such as depression, anxiety or sadness. When data sets are ready you may proceed on the preprocessing stage. 2.2 Data Preprocessing Because the data we have collected from the tweets are biased and noisy, making sure that the data is free of such noise is our first task. Generally, the special characters, such as retweet tags “@RT: xxx” and link address “http://www.”, contain less information, they are removed at the beginning. In the next step, stop words and punctuations are removed by stop word list that has been extracted online. Non- words are very common in social media data due to any types of typo or acronyms, for instance, “hrt”, “lmao”. These words are filtered by the NLTK toolkit available in the python libraries. Basically the preprocessing stage will go through your data sets and the given dictionary. The dictionary essentially contains words with their corresponding polarity, which is helpful in calculating the sentiment of each tweet, each word will be filtered, tokenized and given its polarity. Every tweet will consist of the summation of all the polarity of each word which is basically divided by the number of words in that particular tweet. Finally, we had the raw data cleaned. Table 1 shows the number of words have left after each step of data preprocessing procedure. 2.2.1 Word Frequency The primary use of the standard online dictionary available basically involves word frequency which is the basic approach to analyze data as to calculate the word frequency in the documents or otherwise called the text files. In the traditional text mining research, the frequent words are considered as the important words in the natural language processing. The collected data includes many common words that are semantically related to the depressive symptoms that we are familiar with, e.g. words “anxiety” and “disorder” are universal in the data set. 2.2.2 Data Analysis Once preprocessing is done, you can find the file in the directory. Opening it you will find that the ID (Twitter User or tweet) and Sentiment of each tweet is segregated into 2 columns. With this output you now have a twitter data set and its corresponding sentiment filtered by depress keywords. (Positive, Neutral and Negative) as per the
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 285 dictionary that contains words with their respective polarity. Each word taken from the tweet is compared with the dictionary and given a score. The sum of polarity is added for each tweet and if it is above 0, then it is a positive tweet. If it is equal to 0 it is neutral and if lesser than 0 it is a negative tweet. By this way tweets are classified as positive or negative. 2.2.3 Training For training and predicting purposes we need to make sure that all the files are located in proper folders, the code written for analysis will run through the output.xlsx file which is the file that basically generates the ID and sentiment associated with it simultaneously recovering the tweet corresponding to the id of each sentiment. Using this we use the original data and feed them to our classifiers which is defined in the next step. 2.2.4 Classifiers We employ four different types of binary classifiers in order to estimate the likelihood of depression within users. For each classifier, we utilize Scikit-Learn to implement the learning algorithms. We chose to evaluate Linear, Non-Linear, and Tree-based approaches in order to explore foundational learning models against our dataset. Ultimately, we decided upon Decision Trees, a Linear Support Vector Classifier, , as well as a Naïve Bayes algorithm. In this section, we attempt to explain how these algorithms work, as well as our implementation of them. 3. RESULTS As per the understanding of the working algorithms a result depicting Postive means that the person is unlikely to have depression or anxiety. Neutral is the middle level wherein the user may or may not have depression but may also be more prone to being depressed. At this stage the user may display some depression like symptoms. Lastly, Negative is the lowest level where depression and anxiety symptoms are being detected through the users tweets. The more negative words the user uses means the more negative emotion the tweet carries. We also discuss the degree of accuracy to which the presence of active depression within a body of text may be ascertained from the analysis conducted. Classifiers were constructed by Machine Learning as detailed in for estimating the presence of signs of depression. We utilize accuracy to denote each classifier’s accuracy. We also obtain the performance of each classifier in the form of confusion matrix on a set of test data for which the true values are known. This is the key to the confusion matrix also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one Table shows the accuracy as to which our constructed classifiers were able to discern the class of a small body of text. The classification accuracies are the average values which were used to vectorize the tweet. The result involves typing in a sample tweet, the tweet will go through the list of classifiers to predict the sentiment of the tweet you wrote and give the result as positive, negative or neutral. The algorithms were run on the test set and were able to give an accuracy in predicting positive and negative tweets.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 286 4. CONCLUSIONS Depression has been a serious mental illness since past decades which negatively affects human’s health. It is difficult to confirm human’s depression symptoms from their behaviors via restricted clinic records. Our proposed methods and experiments illustrate that social network provides rich and vast information for depression symptoms extraction from a distinctive perspective. We have demonstrated the potential of using twitter as a tool for measuring and predicting major depressive disorder in individuals. First, we compiled a dataset.Next, we proposed a Bag of Words approach towards quantifying this dataset creating a huge dimensional feature space as our input vector. Finally, we leveraged these distinguishing attributes to build, compare, and contrast several statistical classifiers which may predict the likelihood of depression within an individual. The original data fed to our classifiers displayed all the AUC of each classifier .The ability to type in a sample tweet, The tweet will go through the highest AUC in the list of classifier to predict the sentiment of the tweet you wrote was also enabled. Positive result means that the person is unlikely to have depression or anxiety. Neutral means the middle level wherein the user may or may not have depression but may also be more prone to being depress. At that stage the user may display some depression like symptoms. Lastly, Negative is the lowest level where depression and anxiety symptoms are being detected through the user’s tweets. The more negative words the user uses mean the more negative emotion the tweet has. Thus through these results depression level could be determined. In the future, we will collect other types of data, e.g. image and video from other social networks. Additionally, advanced entity selection technique would be used to select more accurate and meaningful depression symptoms. ACKNOWLEDGEMENT This research paper would have been incomplete without the names of those people who have been constantly helping us achieve this milestone. I take this opportunity to express my deepest gratitude and appreciation to all those who have been the moral support helping me directly or indirectly towards the successful completion of this research paper. I would like to express my gratitude to Dr. C P S Prakash, Principal, Dayananda Sagar College of engineering, DSCE, Bengaluru for this warm support throughout the course. I am extremely thankful to Dr. Ramesh Babu D R, Vice Principal, and Head of the Department of Computer Science and Engineering, DSCE, Bengaluru for his motivation and encouragement and support. For overall direction, I am very grateful to 8th semester coordinator Dr. Vindhya P Malagi, Assoc. Professor, Dept. of Computer Science and Engineering, DSCE, Bengaluru who has helped me by providing details relevant to this field helping us achieve the best of the outcomes required behind the completion of this paper. I am grateful to Dept. of Computer Science and Engineering and Dayananda Sagar College of Engineering My sincere thanks to my guide, Prof. Prameetha Pai for imparting me the knowledge with which I could do my best. Finally, I Would like to thank all the staff members of Computer Science and Engineering Department, DSCE, and also other departments who are helping me directly or indirectly for the research regarding depression to be complete. I sincerely thank my mentors, parents, family, and friends who have advised and pushed me forward throughout this endeavour. The product of this research paper would not be possible without all of them. REFERENCES [1] Extracting Depression Symptoms from Social Networks and Web Blogs via Text Mining Long Ma, Zhibo Wang, Yanqing Zhang, Department of Computer Science,Georgia State University, Atlanta, USA. {lma5, zwang2}@student.gsu.edu, yzhang@gsu.edu [2] Identifying Depression on Twitter Moin Nadeem, Mike Horn, Glen Coppersmith, JohnsHopkins University and Dr. Sandip Sen, PhD, University of Tulsa [3] Detecting Depression Using an Ensemble Logistic Regression Model Based on Multiple Speech Features. Haihua Jiang,1 Bin Hu ,1 Zhenyu Liu,2 Gang Wang,3 Lan Zhang,4 Xiaoyu Li,2 and HuanyuKang2 1Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2Gansu Provincial Key Laboratory of Wearable Computing, School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China 3Beijing Anding Hospital of Capital Medical University, Beijing 100088, China 4Lanzhou University Second Hospital, Lanzhou 730030, China [4] Predicting Depression Levels Using Social Media Posts.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 05 | May 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 287 Maryam Mohammed Aldarwish, King Saud University for Health Science National Guard Health-Affairs. Al-Ahssa, Kingdom of Saudi Arabia Hafiz Farooq Ahmed, College of Computer Science and Computer Technology, King Faisal University, Al-Ahssa, Kingdom of Saudi Arabia