SlideShare a Scribd company logo
A Deep Learning Approach
For Twitter Spam Detection
Lijie Zhou (lijie@mail.sfsu.edu) & Hao Yue
San Francisco State University
Outline
• Problem and Challenges
• Past Work
• Our Model and Results
• Conclusion
• Future Work
What Is Spam?
Spam on Facebook and Twitter
# of active
users
# of spam
accounts
%
Facebook 2.2 billion 60-83 million 2.73%-3.77%
Twitter 330 million 23 million 6.97%
Source: https://www.statista.com/
Various Social Media Sites
Social Media’s Fundamental Design Flaw
• Sophisticated spam accounts know how to use various features to
make the biggest harm:
• Use shortened URL to trick users
• Buy compromised accounts to look legitimate
• Use campaigns to gain traction in a short period time
• Use bots to amplify the noise
• Social media makes it easier and faster to spread spam.
Related Work
• Detection at the tweet level
• Focus on the content of tweets
• E.g., spam words? Overuse of hashtag, URL, mention, …?
• Detection at the account level
• Focus on the characteristics of spam accounts
• E.g., Age of the account? # of followers? # of followees? …
Challenges
• Large amount of unlabeled data
• Time and labor intensive
• Feature selection may cause model overfitting problem
• Twitter spam drift
• Spamming behavior changes over time, thus the performance of existing
machine learning based classifiers decreases.
Research Questions
• Question 1: Can we find an unsupervised way to learn from the
unlabeled data and later apply what we have learnt on labeled data?
• Will this approach outperform the hand-labeling process?
• Question 2: Can we find a more systematic way to reduce the feature
dimensions instead of feature engineering?
Stage 1: Self-taught Learning From Unlabeled Data
Training Data
W/O Label
One-to-N
Encoding
Max-Min
Norm
Sparse Auto-
encoder
Trained
Parameter Set
Stage 2: Soft-max Classifier Training
Preprocessed
Labeled
Training Data
Sparse Auto-
encoder
Soft-max
Regression
Trained
Parameter Set
Stage 3: Classification
Preprocessed
Test Data
Sparse Auto-
encoder
Soft-Max
Regression
Spam/Non-
Spam
Self-taught Learning
• Assumption:
• A single unlabeled record is less informative
• A large of amount of unlabeled records may show certain pattern
• Goal:
• Find an effective model to reveal this pattern (if exists)
• Choose sparse auto-encoder for its good performance and simplicity
Auto-encoder
• A special neural network whose
output is (almost) identical to its
input.
• A compression tool
• The hidden layer is considered the
compressed representation of the
input.
Auto-encoder
• Model parameter:
(𝑊, b) = (𝑊(1), 𝑏(1), 𝑊(2), 𝑏(2))
• Activation function
𝑎1
2
= f(𝑊11
(1)
𝑥1 + 𝑊12
(1)
𝑥2+ 𝑊13
(1)
𝑥3+ 𝑏1
(1)
)
𝑎2
2
= f(𝑊21
(1)
𝑥1 + 𝑊22
(1)
𝑥2+ 𝑊23
(1)
𝑥3+ 𝑏2
(1)
)
𝑎3
2
= f(𝑊31
(1)
𝑥1 + 𝑊32
(1)
𝑥2+ 𝑊33
(1)
𝑥3+ 𝑏3
(1)
)
• Hypothesis ℎ 𝑤,𝑏(𝑥) :
ℎ 𝑤,𝑏(𝑥)= 𝑎1
3
= f(𝑊11
(2)
𝑎1
2
+ 𝑊12
(2)
𝑎2
2
+ 𝑊13
(2)
𝑎3
2
+ 𝑏1
(2)
) = 𝑥
Sparse Auto-encoder
• Sparsity parameter
• Definition: a constraint imposed on the hidden layer
• Goal: ensure pattern will be revealed even if the size of hidden layer is large
• Average activation: 𝜌 =
1
𝑚 𝑖=1
𝑚
[𝑎𝑗
(2)
(𝑥(𝑖))]
• Penalty term
• 𝜌 = 𝜌 (𝜌 = 0.05)
• Kullback-Leibler (KL) divergence: 𝑗=1
𝐾
𝐾𝐿(𝜌 || 𝜌)= 𝜌𝑙𝑜𝑔
𝜌
𝜌
+ (1-𝜌) l𝑙𝑜𝑔
1− 𝜌
1− 𝜌
• 𝑗=1
𝐾
𝐾𝐿(𝜌 || 𝜌) = 0 if 𝜌= 𝜌
Cost Function
J(W,b) =
𝟏
𝒎 𝒊=𝟏
𝒎
| |𝒙𝒊 − 𝒙𝒊||
𝟐
+
𝝀
𝟐
( 𝒌,𝒏 𝑾 𝟐 + 𝒏,𝒌 𝑽 𝟐 + 𝒌 𝒃 𝟏
𝟐
+ 𝒏 𝒃 𝟐
𝟐
) +
𝜷 𝒋=𝟏
𝒌
𝑲𝑳(𝝆|| 𝝆𝒋)
Average sum-of-square error term
Weigh decay term
Penalty term
Cost Function
• Goal: minimize J(W, b) as a function of W and b
• Steps
• Initialization
• Update parameters with gradient descent
𝑊𝑖𝑗
(𝑙)
= 𝑊𝑖𝑗
(𝑙)
- 𝛼
𝜕
𝜕𝑊𝑖𝑗
𝑙 𝐽 𝑊, 𝑏
𝑏𝑖
(𝑙)
= 𝑏𝑖
(𝑙)
- 𝛼
𝜕
𝜕𝑏𝑖
(𝑙) 𝐽 𝑊, 𝑏
Back-propagation
𝛿𝑖
(𝑛 𝑙)
“error term”
how much the node is “responsible” for any error in the output
Back-propagation
1. Perform a feedforward pass, compute activations for layers𝐿2, 𝐿3,
up until the output layer 𝐿 𝑛 𝑙
2. For each output unit I in layer 𝑛𝑙 (the output layer), set
• 𝛿𝑖
(𝑛 𝑙)
= -(𝑦𝑖 − 𝑎𝑖
(𝑛 𝑙)
) x 𝑓−1(𝑧𝑖
(𝑛 𝑙)
)
3. For l = 𝑛𝑙 -1, 𝑛𝑙-2, 𝑛𝑙-3, …, 2
• For each node I in layer l, set 𝛿𝑖
(𝑙)
= ( 𝑗=1
𝑠 𝑙+1
𝑊𝑖𝑗
𝑙
𝛿𝑗
(𝑙+1)
) 𝑓−1(𝑧𝑖
(𝑙)
)
4. Compute the partial derivatives
• 𝛼
𝜕
𝜕𝑊𝑖𝑗
𝑙 𝐽 𝑊, 𝑏; 𝑥, 𝑦 = 𝑎𝑗
(𝑙)
𝛿𝑖
(𝑙+1)
• 𝛼
𝜕
𝜕𝑏𝑖
𝑙 𝐽 𝑊, 𝑏; 𝑥, 𝑦 = 𝛿𝑖
(𝑙+1)
Fine-tuning
Preprocessed
Labeled
Training Data
Sparse Auto-
encoder
Soft-max
Regression
Trained
Parameter Set
Fine-tuning
Dataset
• 1065 instances; Each instance has 62 features.
• Split 1065 instances into three groups:
• Training w/o label – 600 instances
• Training w label – 365 instances
• Test w label - 100 instances
• Comparison group: SVM, naïve bayes, and random forests
• Training w label – 365 instances
• Test w label – 100 instances
Evaluation
• True Positive (TP): actual spammer, prediction spammer.
• True Negative (TN): actual non-spammer, prediction non-spammer.
• False Positive (FP): actual non-spammer, prediction spammer.
• False Negative (FN): actual spammer, prediction non-spammer.
Evaluation
Accuracy: the correctly classified instances over the total number of
test instances.
Precision: P =
𝑇𝑃
(𝑻𝑃 + 𝐹𝑃)
* 100%
Recall: R =
𝑇𝑃
(𝑇𝑃 + 𝐹𝑁)
* 100%
F-Measure: F =
2∗𝑅𝑃
(𝑅 + 𝑃)
Results
Hidden L2
Hidden
L1
15 20 25 30 35 40 45 50 55 Avg
55 86% 88% 85% 84% 87% 85% 83% 86% 86% 86%
50 84% 84% 86% 88% 86% 89% 87% 86% 88% 86%
45 85% 88% 87% 86% 85% 84% 88% 86% 86% 86%
40 88% 87% 85% 85% 85% 87% 87% 86% 89% 87%
35 87% 88% 87% 86% 87% 86% 86% 85% 86% 86%
30 85% 86% 89% 85% 85% 84% 83% 87% 88% 86%
25 87% 87% 88% 87% 85% 88% 85% 87% 88% 87%
20 84% 88% 83% 88% 86% 85% 88% 87% 86% 86%
15 83% 83% 83% 87% 85% 82% 85% 86% 85% 84%
Avg 85% 87% 86% 86% 86% 86% 86% 86% 87%
Results – Comparison with SVM
TP TN FP FN A P R F
SAE 34 52 3 11 86% 91.9% 75.6% 83.0%
Top 5 28 52 2 18 80% 93.3% 60.9% 73.7%
Top 10 27 52 3 18 79% 90% 60.0% 72.0%
Top 20 28 52 3 17 80% 90.3% 62.2% 73.7%
Top 30 29 52 3 16 81% 90.6% 64.4% 75.3%
Results – Comparison with Random Forests &
Naïve Bayes
TP TN FP FN A P R F
SAE 34 52 3 11 86% 91.9% 75.6% 83.0%
Random
Forrest
32 52 3 13 84% 91% 71.0% 80.0%
Naïve
Bayes
33 50 5 12 83% 86.8% 73.0% 79.5%
Conclusion
• Self-taught Learning: large amount of unlabeled data + small amount
of labeled data
• Sparse AE: reduce the feature dimensions
• Fine tuning: improve the deep learning model by large extent.
Limitation & Future Work
• The dataset we use is relatively small.
• We are still exploring new ways to apply this model on raw data.
A Deep Learning Approach
For Twitter Spam Detection
Lijie Zhou (lijie@mail.sfsu.edu) and Hao Yue
San Francisco State University

More Related Content

PPTX
Genetic algorithm for hyperparameter tuning
PPTX
Hyperparameter Tuning
PPTX
iNews: Final Presentation
PPTX
Unsupervised Feature Learning
PPTX
AI_06_Machine Learning.pptx
PDF
Making Machine Learning Work in Practice - StampedeCon 2014
PPTX
supervised.pptx
PPTX
Machine learning introduction
Genetic algorithm for hyperparameter tuning
Hyperparameter Tuning
iNews: Final Presentation
Unsupervised Feature Learning
AI_06_Machine Learning.pptx
Making Machine Learning Work in Practice - StampedeCon 2014
supervised.pptx
Machine learning introduction

Similar to A deep learning approach for twitter spam detection lijie zhou (20)

PPTX
PDF
know Machine Learning Basic Concepts.pdf
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
PPTX
Data modeling with a focus on spam detection using K-Nearest Neighbors (KNN) ...
PDF
Machine Learning ebook.pdf
PDF
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
PPTX
Machine Learning Contents.pptx
PPTX
supervised_learning_PRESENTATION___.pptx
PPTX
Building High Available and Scalable Machine Learning Applications
PDF
Machine Learning with Python- Machine Learning Algorithms.pdf
PDF
Machine Learning (Towards Data Science)
PPTX
Building Continuous Learning Systems
PDF
Mlintro 120730222641-phpapp01-210624192524
PDF
Machine Learning - Deep Learning
PPTX
Machine Learning_overview_presentation.pptx
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
PPTX
Introduction to ML (Machine Learning)
PDF
newmicrosoftpowerpointpresentation-210512111200.pdf
PPTX
in5490-classification (1).pptx
PPTX
machine learning types methods classification regression decision tree
know Machine Learning Basic Concepts.pdf
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Data modeling with a focus on spam detection using K-Nearest Neighbors (KNN) ...
Machine Learning ebook.pdf
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
Machine Learning Contents.pptx
supervised_learning_PRESENTATION___.pptx
Building High Available and Scalable Machine Learning Applications
Machine Learning with Python- Machine Learning Algorithms.pdf
Machine Learning (Towards Data Science)
Building Continuous Learning Systems
Mlintro 120730222641-phpapp01-210624192524
Machine Learning - Deep Learning
Machine Learning_overview_presentation.pptx
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Introduction to ML (Machine Learning)
newmicrosoftpowerpointpresentation-210512111200.pdf
in5490-classification (1).pptx
machine learning types methods classification regression decision tree
Ad

Recently uploaded (20)

PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PPTX
Software Engineering and software moduleing
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPTX
communication and presentation skills 01
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
Artificial Intelligence
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
Abrasive, erosive and cavitation wear.pdf
PPT
Occupational Health and Safety Management System
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Soil Improvement Techniques Note - Rabbi
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Software Engineering and software moduleing
August 2025 - Top 10 Read Articles in Network Security & Its Applications
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
communication and presentation skills 01
Safety Seminar civil to be ensured for safe working.
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Categorization of Factors Affecting Classification Algorithms Selection
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
Fundamentals of Mechanical Engineering.pptx
Artificial Intelligence
Automation-in-Manufacturing-Chapter-Introduction.pdf
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Abrasive, erosive and cavitation wear.pdf
Occupational Health and Safety Management System
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Soil Improvement Techniques Note - Rabbi
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Ad

A deep learning approach for twitter spam detection lijie zhou

  • 1. A Deep Learning Approach For Twitter Spam Detection Lijie Zhou (lijie@mail.sfsu.edu) & Hao Yue San Francisco State University
  • 2. Outline • Problem and Challenges • Past Work • Our Model and Results • Conclusion • Future Work
  • 4. Spam on Facebook and Twitter # of active users # of spam accounts % Facebook 2.2 billion 60-83 million 2.73%-3.77% Twitter 330 million 23 million 6.97% Source: https://www.statista.com/
  • 6. Social Media’s Fundamental Design Flaw • Sophisticated spam accounts know how to use various features to make the biggest harm: • Use shortened URL to trick users • Buy compromised accounts to look legitimate • Use campaigns to gain traction in a short period time • Use bots to amplify the noise • Social media makes it easier and faster to spread spam.
  • 7. Related Work • Detection at the tweet level • Focus on the content of tweets • E.g., spam words? Overuse of hashtag, URL, mention, …? • Detection at the account level • Focus on the characteristics of spam accounts • E.g., Age of the account? # of followers? # of followees? …
  • 8. Challenges • Large amount of unlabeled data • Time and labor intensive • Feature selection may cause model overfitting problem • Twitter spam drift • Spamming behavior changes over time, thus the performance of existing machine learning based classifiers decreases.
  • 9. Research Questions • Question 1: Can we find an unsupervised way to learn from the unlabeled data and later apply what we have learnt on labeled data? • Will this approach outperform the hand-labeling process? • Question 2: Can we find a more systematic way to reduce the feature dimensions instead of feature engineering?
  • 10. Stage 1: Self-taught Learning From Unlabeled Data Training Data W/O Label One-to-N Encoding Max-Min Norm Sparse Auto- encoder Trained Parameter Set
  • 11. Stage 2: Soft-max Classifier Training Preprocessed Labeled Training Data Sparse Auto- encoder Soft-max Regression Trained Parameter Set
  • 12. Stage 3: Classification Preprocessed Test Data Sparse Auto- encoder Soft-Max Regression Spam/Non- Spam
  • 13. Self-taught Learning • Assumption: • A single unlabeled record is less informative • A large of amount of unlabeled records may show certain pattern • Goal: • Find an effective model to reveal this pattern (if exists) • Choose sparse auto-encoder for its good performance and simplicity
  • 14. Auto-encoder • A special neural network whose output is (almost) identical to its input. • A compression tool • The hidden layer is considered the compressed representation of the input.
  • 15. Auto-encoder • Model parameter: (𝑊, b) = (𝑊(1), 𝑏(1), 𝑊(2), 𝑏(2)) • Activation function 𝑎1 2 = f(𝑊11 (1) 𝑥1 + 𝑊12 (1) 𝑥2+ 𝑊13 (1) 𝑥3+ 𝑏1 (1) ) 𝑎2 2 = f(𝑊21 (1) 𝑥1 + 𝑊22 (1) 𝑥2+ 𝑊23 (1) 𝑥3+ 𝑏2 (1) ) 𝑎3 2 = f(𝑊31 (1) 𝑥1 + 𝑊32 (1) 𝑥2+ 𝑊33 (1) 𝑥3+ 𝑏3 (1) ) • Hypothesis ℎ 𝑤,𝑏(𝑥) : ℎ 𝑤,𝑏(𝑥)= 𝑎1 3 = f(𝑊11 (2) 𝑎1 2 + 𝑊12 (2) 𝑎2 2 + 𝑊13 (2) 𝑎3 2 + 𝑏1 (2) ) = 𝑥
  • 16. Sparse Auto-encoder • Sparsity parameter • Definition: a constraint imposed on the hidden layer • Goal: ensure pattern will be revealed even if the size of hidden layer is large • Average activation: 𝜌 = 1 𝑚 𝑖=1 𝑚 [𝑎𝑗 (2) (𝑥(𝑖))] • Penalty term • 𝜌 = 𝜌 (𝜌 = 0.05) • Kullback-Leibler (KL) divergence: 𝑗=1 𝐾 𝐾𝐿(𝜌 || 𝜌)= 𝜌𝑙𝑜𝑔 𝜌 𝜌 + (1-𝜌) l𝑙𝑜𝑔 1− 𝜌 1− 𝜌 • 𝑗=1 𝐾 𝐾𝐿(𝜌 || 𝜌) = 0 if 𝜌= 𝜌
  • 17. Cost Function J(W,b) = 𝟏 𝒎 𝒊=𝟏 𝒎 | |𝒙𝒊 − 𝒙𝒊|| 𝟐 + 𝝀 𝟐 ( 𝒌,𝒏 𝑾 𝟐 + 𝒏,𝒌 𝑽 𝟐 + 𝒌 𝒃 𝟏 𝟐 + 𝒏 𝒃 𝟐 𝟐 ) + 𝜷 𝒋=𝟏 𝒌 𝑲𝑳(𝝆|| 𝝆𝒋) Average sum-of-square error term Weigh decay term Penalty term
  • 18. Cost Function • Goal: minimize J(W, b) as a function of W and b • Steps • Initialization • Update parameters with gradient descent 𝑊𝑖𝑗 (𝑙) = 𝑊𝑖𝑗 (𝑙) - 𝛼 𝜕 𝜕𝑊𝑖𝑗 𝑙 𝐽 𝑊, 𝑏 𝑏𝑖 (𝑙) = 𝑏𝑖 (𝑙) - 𝛼 𝜕 𝜕𝑏𝑖 (𝑙) 𝐽 𝑊, 𝑏
  • 19. Back-propagation 𝛿𝑖 (𝑛 𝑙) “error term” how much the node is “responsible” for any error in the output
  • 20. Back-propagation 1. Perform a feedforward pass, compute activations for layers𝐿2, 𝐿3, up until the output layer 𝐿 𝑛 𝑙 2. For each output unit I in layer 𝑛𝑙 (the output layer), set • 𝛿𝑖 (𝑛 𝑙) = -(𝑦𝑖 − 𝑎𝑖 (𝑛 𝑙) ) x 𝑓−1(𝑧𝑖 (𝑛 𝑙) ) 3. For l = 𝑛𝑙 -1, 𝑛𝑙-2, 𝑛𝑙-3, …, 2 • For each node I in layer l, set 𝛿𝑖 (𝑙) = ( 𝑗=1 𝑠 𝑙+1 𝑊𝑖𝑗 𝑙 𝛿𝑗 (𝑙+1) ) 𝑓−1(𝑧𝑖 (𝑙) ) 4. Compute the partial derivatives • 𝛼 𝜕 𝜕𝑊𝑖𝑗 𝑙 𝐽 𝑊, 𝑏; 𝑥, 𝑦 = 𝑎𝑗 (𝑙) 𝛿𝑖 (𝑙+1) • 𝛼 𝜕 𝜕𝑏𝑖 𝑙 𝐽 𝑊, 𝑏; 𝑥, 𝑦 = 𝛿𝑖 (𝑙+1)
  • 22. Dataset • 1065 instances; Each instance has 62 features. • Split 1065 instances into three groups: • Training w/o label – 600 instances • Training w label – 365 instances • Test w label - 100 instances • Comparison group: SVM, naïve bayes, and random forests • Training w label – 365 instances • Test w label – 100 instances
  • 23. Evaluation • True Positive (TP): actual spammer, prediction spammer. • True Negative (TN): actual non-spammer, prediction non-spammer. • False Positive (FP): actual non-spammer, prediction spammer. • False Negative (FN): actual spammer, prediction non-spammer.
  • 24. Evaluation Accuracy: the correctly classified instances over the total number of test instances. Precision: P = 𝑇𝑃 (𝑻𝑃 + 𝐹𝑃) * 100% Recall: R = 𝑇𝑃 (𝑇𝑃 + 𝐹𝑁) * 100% F-Measure: F = 2∗𝑅𝑃 (𝑅 + 𝑃)
  • 25. Results Hidden L2 Hidden L1 15 20 25 30 35 40 45 50 55 Avg 55 86% 88% 85% 84% 87% 85% 83% 86% 86% 86% 50 84% 84% 86% 88% 86% 89% 87% 86% 88% 86% 45 85% 88% 87% 86% 85% 84% 88% 86% 86% 86% 40 88% 87% 85% 85% 85% 87% 87% 86% 89% 87% 35 87% 88% 87% 86% 87% 86% 86% 85% 86% 86% 30 85% 86% 89% 85% 85% 84% 83% 87% 88% 86% 25 87% 87% 88% 87% 85% 88% 85% 87% 88% 87% 20 84% 88% 83% 88% 86% 85% 88% 87% 86% 86% 15 83% 83% 83% 87% 85% 82% 85% 86% 85% 84% Avg 85% 87% 86% 86% 86% 86% 86% 86% 87%
  • 26. Results – Comparison with SVM TP TN FP FN A P R F SAE 34 52 3 11 86% 91.9% 75.6% 83.0% Top 5 28 52 2 18 80% 93.3% 60.9% 73.7% Top 10 27 52 3 18 79% 90% 60.0% 72.0% Top 20 28 52 3 17 80% 90.3% 62.2% 73.7% Top 30 29 52 3 16 81% 90.6% 64.4% 75.3%
  • 27. Results – Comparison with Random Forests & Naïve Bayes TP TN FP FN A P R F SAE 34 52 3 11 86% 91.9% 75.6% 83.0% Random Forrest 32 52 3 13 84% 91% 71.0% 80.0% Naïve Bayes 33 50 5 12 83% 86.8% 73.0% 79.5%
  • 28. Conclusion • Self-taught Learning: large amount of unlabeled data + small amount of labeled data • Sparse AE: reduce the feature dimensions • Fine tuning: improve the deep learning model by large extent.
  • 29. Limitation & Future Work • The dataset we use is relatively small. • We are still exploring new ways to apply this model on raw data.
  • 30. A Deep Learning Approach For Twitter Spam Detection Lijie Zhou (lijie@mail.sfsu.edu) and Hao Yue San Francisco State University

Editor's Notes

  • #19: The key is to compute the partial derivatives.
  • #21: We conducted an experiment on this implementation but the result is not as expected.