SlideShare a Scribd company logo
Guided by- Ms. Safa Hamdare
Group Members
.
Quora Duplicate Question Pair
Detection Using Semantic Analysis
Name Roll No.
Jai Mulye 64
Anshul Pawaskar 87
Tannmay Redij 88
Akshata Talankar 89
St. Francis Institute of Technology
Department of Computer Engineering
Quora Duplicate Question Pair Detection using Semantic Analysis
1 28/05/2021
Content
● Introduction
● Literature
● Problem Statement
● Proposed Solution
● Work Flow of the system
● Algorithm with Implementation details
● Experimental Set Up
● Data Set
● Performance Evaluation Parameters
● Validation with Test Cases
● Results & Discussion
● Conclusion
● References
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 2
Introduction
• What is Quora?
28/05/2021 3
Quora Duplicate Question Pair Detection using Semantic Analysis
Current Scenario:
Quora uses Random Forest technique to identify duplicate
questions.
Let’s look at two hypothetical questions:
1. Is it true that time flies like an arrow?
2. Do fruit flies like a banana?
There are two common words in these questions, flies and
like.
4
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 4
Let’s consider these
5
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 5
Literature
• The paper[1] explores the Transformer based
Universal Sentence Encoder which relies on
attention mechanism.
• The paper[2] introduces Deep Averaging Network
which performs well with neural networks that model
semantic and syntactic compositionality.
6
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis
Literature
• The paper cited [3] explores the two variants of
Universal Sentence Encoder- the transformer and
the deep averaging network (DAN).
• The paper cited [4] analyses several neural network
designs and their variations for sentence pair
modelling and compare their performance
extensively across eight datasets, including
paraphrase identification, semantic textual similarity,
natural language inference, and question answering
tasks.
7
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis
Problem Statement
• On Quora, there may be people who might ask same
questions differently from an existing question. Solving
this problem will help to reduce the redundancy on the
platform and the manual task of identifying the questions
to match the correct answer for same. The task to identify
which questions asked on Quora are duplicates of
questions that have already been asked could be useful to
instantly provide answers of existing questions.
• A model created which can predict if the questions
entered are similar in meaning based on deep learning
approach using DAN & Transformer model.
28/05/2021 8
Quora Duplicate Question Pair Detection using Semantic Analysis
Proposed Solution
1. Pre Processing 3. Deep Learning Approach
(DAN & Transformer)
2. Sentence to Vector
Conversion (USE)
28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 9
Fig 1: Workflow of the System
Work Flow of the system
28/05/2021 10
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 2: Architecture Diagram
Algorithm with Implementation
Details
28/05/2021 11
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 3: Algorithm
Algorithm with Implementation
Details
28/05/2021 12
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 4: Implementation
Experimental Setup
28/05/2021 13
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 5: Dataset[5]
Experimental Setup
28/05/2021 14
Fig 6: Model accuracy of
Transformer
Fig 7: Model loss of
Transformer
Quora Duplicate Question Pair Detection using Semantic Analysis
Experimental Setup
28/05/2021 15
Fig 8: Model accuracy of DAN
Fig 9: Model loss of DAN
Quora Duplicate Question Pair Detection using Semantic Analysis
Validation with Test cases
28/05/2021 16
Quora Duplicate Question Pair Detection using Semantic Analysis
Results and Discussions
28/05/2021 17
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 10: Browse Questions
Results and Discussions
28/05/2021 18
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 11: Post Questions
Results and Discussions
28/05/2021 19
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 12: Results by DAN Model
Results and Discussions
28/05/2021 20
Quora Duplicate Question Pair Detection using Semantic Analysis
Fig 13: Results by Transformer Model
Conclusion
28/05/2021 21
Quora Duplicate Question Pair Detection using Semantic Analysis
Model Embedding technique
F1-score
weighted average
F1- Score macro
average
Logistic
Regression
Word2Vec, Similarity
scores
0.66 0.62
Random Forest
Word2Vec, Similarity
scores
0.70 0.69
Table 1:Accuracy of machine learning models
Conclusion
28/05/2021 22
Quora Duplicate Question Pair Detection using Semantic Analysis
Table 2:Accuracy of Deep learning models (DAN & Transformer)
Model
Embedding
technique
Epochs
Training
accuracy (%)
Validation
accuracy (%)
Neural
Network
Universal Sentence
Encoder (DAN)
20 88.63 86
Neural
Network
Universal Sentence
Encoder
(Transformer)
20 89.16 85
Conclusion
• Deep learning models using sentence level
embedding outperform the basic classification
model.
• DAN Model sometimes under performs with the
questions having double negation.
• Transformer based Universal Sentence Encoder can
be used.
28/05/2021 23
Quora Duplicate Question Pair Detection using Semantic Analysis
References
[1] Mueller J, Thyagarajan A. Siamese recurrent architectures for learning
sentence similarity. In: Proceedings of the thirtieth AAAI conference on artificial
intelligence. (2016)
[2] Eneko Agirre, Aitor Gonzalez-Agirre, Inigo Lopez-Gazpio, Montse Maritxalar,
German Rigau, and Larraitz Uria. Semeval-2016 task 2: Interpretable semantic
textual similarity. In: Proceedings of the 10th International Workshop on Semantic
Evaluation (2016).
[3] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.
Advances in neural information processing systems, pp. 5998-6008. 2017. (2017)
[4] Cer D, Yang Y, Kong S-Y, et al. Universal Sentence Encoder for English. In:
Proceedings of the 2018 Conference on Empirical Methods in Natural Language
Processing: System Demonstrations. doi: 10.18653/v1/d18-2029 (2018)
[5] https://www.kaggle.com/c/quora-question-pairs/data
28/05/2021 24
Quora Duplicate Question Pair Detection using Semantic Analysis
28/05/2021 25
Thank you
Quora Duplicate Question Pair Detection using Semantic Analysis

More Related Content

PPTX
U-Net (1).pptx
PDF
Duplicate_Quora_Question_Detection
PPTX
Segment Anything
PPTX
Introduction to Deep Learning
PDF
Support Vector Machines for Classification
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
PPTX
Data Augmentation
PPT
Vanishing & Exploding Gradients
U-Net (1).pptx
Duplicate_Quora_Question_Detection
Segment Anything
Introduction to Deep Learning
Support Vector Machines for Classification
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Data Augmentation
Vanishing & Exploding Gradients

What's hot (20)

PPTX
Convolution Neural Network (CNN)
PPTX
Resnet.pptx
PDF
Bias and variance trade off
PPTX
Autoencoders in Deep Learning
PPTX
AlexNet
PDF
Image processing
PDF
PDF
MLIP - Chapter 3 - Introduction to deep learning
PPT
2.2 decision tree
PPTX
Gender and Age Detection using OpenCV.pptx
PPTX
Regularization in deep learning
PDF
Understanding Convolutional Neural Networks
PDF
RNN and its applications
PDF
openCV with python
PPTX
Naive bayes
PDF
SSD: Single Shot MultiBox Detector (UPC Reading Group)
PPTX
Neural collaborative filtering-발표
PPTX
PDF
Digital Image Processing: Image Restoration
Convolution Neural Network (CNN)
Resnet.pptx
Bias and variance trade off
Autoencoders in Deep Learning
AlexNet
Image processing
MLIP - Chapter 3 - Introduction to deep learning
2.2 decision tree
Gender and Age Detection using OpenCV.pptx
Regularization in deep learning
Understanding Convolutional Neural Networks
RNN and its applications
openCV with python
Naive bayes
SSD: Single Shot MultiBox Detector (UPC Reading Group)
Neural collaborative filtering-발표
Digital Image Processing: Image Restoration
Ad

Similar to Quora questions pair duplication analysis using semantic analysis (9)

PDF
Smart Solutions for Question Duplication: Deep Learning in Action
PPTX
Seminar PPT on Duplicate Question Pair Technology
PDF
IRJET- Semantic Question Matching
PDF
F017243241
PDF
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
PPTX
A Machine learning approach to classify a pair of sentence as duplicate or not.
PDF
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
PDF
A Survey of Text Mining
PDF
20433-39028-3-PB.pdf
Smart Solutions for Question Duplication: Deep Learning in Action
Seminar PPT on Duplicate Question Pair Technology
IRJET- Semantic Question Matching
F017243241
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
A Machine learning approach to classify a pair of sentence as duplicate or not.
Grammarly Meetup: Paraphrase Detection in NLP (PART 2) - Andriy Gryshchuk
A Survey of Text Mining
20433-39028-3-PB.pdf
Ad

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
PPT on Performance Review to get promotions
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Digital Logic Computer Design lecture notes
DOCX
573137875-Attendance-Management-System-original
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Artificial Intelligence
PPT
introduction to datamining and warehousing
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Well-logging-methods_new................
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Lecture Notes Electrical Wiring System Components
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Mechanical Engineering MATERIALS Selection
PPT on Performance Review to get promotions
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Digital Logic Computer Design lecture notes
573137875-Attendance-Management-System-original
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Sustainable Sites - Green Building Construction
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
bas. eng. economics group 4 presentation 1.pptx
Artificial Intelligence
introduction to datamining and warehousing
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Well-logging-methods_new................
Automation-in-Manufacturing-Chapter-Introduction.pdf

Quora questions pair duplication analysis using semantic analysis

  • 1. Guided by- Ms. Safa Hamdare Group Members . Quora Duplicate Question Pair Detection Using Semantic Analysis Name Roll No. Jai Mulye 64 Anshul Pawaskar 87 Tannmay Redij 88 Akshata Talankar 89 St. Francis Institute of Technology Department of Computer Engineering Quora Duplicate Question Pair Detection using Semantic Analysis 1 28/05/2021
  • 2. Content ● Introduction ● Literature ● Problem Statement ● Proposed Solution ● Work Flow of the system ● Algorithm with Implementation details ● Experimental Set Up ● Data Set ● Performance Evaluation Parameters ● Validation with Test Cases ● Results & Discussion ● Conclusion ● References 28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 2
  • 3. Introduction • What is Quora? 28/05/2021 3 Quora Duplicate Question Pair Detection using Semantic Analysis
  • 4. Current Scenario: Quora uses Random Forest technique to identify duplicate questions. Let’s look at two hypothetical questions: 1. Is it true that time flies like an arrow? 2. Do fruit flies like a banana? There are two common words in these questions, flies and like. 4 28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 4
  • 5. Let’s consider these 5 28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 5
  • 6. Literature • The paper[1] explores the Transformer based Universal Sentence Encoder which relies on attention mechanism. • The paper[2] introduces Deep Averaging Network which performs well with neural networks that model semantic and syntactic compositionality. 6 28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis
  • 7. Literature • The paper cited [3] explores the two variants of Universal Sentence Encoder- the transformer and the deep averaging network (DAN). • The paper cited [4] analyses several neural network designs and their variations for sentence pair modelling and compare their performance extensively across eight datasets, including paraphrase identification, semantic textual similarity, natural language inference, and question answering tasks. 7 28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis
  • 8. Problem Statement • On Quora, there may be people who might ask same questions differently from an existing question. Solving this problem will help to reduce the redundancy on the platform and the manual task of identifying the questions to match the correct answer for same. The task to identify which questions asked on Quora are duplicates of questions that have already been asked could be useful to instantly provide answers of existing questions. • A model created which can predict if the questions entered are similar in meaning based on deep learning approach using DAN & Transformer model. 28/05/2021 8 Quora Duplicate Question Pair Detection using Semantic Analysis
  • 9. Proposed Solution 1. Pre Processing 3. Deep Learning Approach (DAN & Transformer) 2. Sentence to Vector Conversion (USE) 28/05/2021 Quora Duplicate Question Pair Detection using Semantic Analysis 9 Fig 1: Workflow of the System
  • 10. Work Flow of the system 28/05/2021 10 Quora Duplicate Question Pair Detection using Semantic Analysis Fig 2: Architecture Diagram
  • 11. Algorithm with Implementation Details 28/05/2021 11 Quora Duplicate Question Pair Detection using Semantic Analysis Fig 3: Algorithm
  • 12. Algorithm with Implementation Details 28/05/2021 12 Quora Duplicate Question Pair Detection using Semantic Analysis Fig 4: Implementation
  • 13. Experimental Setup 28/05/2021 13 Quora Duplicate Question Pair Detection using Semantic Analysis Fig 5: Dataset[5]
  • 14. Experimental Setup 28/05/2021 14 Fig 6: Model accuracy of Transformer Fig 7: Model loss of Transformer Quora Duplicate Question Pair Detection using Semantic Analysis
  • 15. Experimental Setup 28/05/2021 15 Fig 8: Model accuracy of DAN Fig 9: Model loss of DAN Quora Duplicate Question Pair Detection using Semantic Analysis
  • 16. Validation with Test cases 28/05/2021 16 Quora Duplicate Question Pair Detection using Semantic Analysis
  • 17. Results and Discussions 28/05/2021 17 Quora Duplicate Question Pair Detection using Semantic Analysis Fig 10: Browse Questions
  • 18. Results and Discussions 28/05/2021 18 Quora Duplicate Question Pair Detection using Semantic Analysis Fig 11: Post Questions
  • 19. Results and Discussions 28/05/2021 19 Quora Duplicate Question Pair Detection using Semantic Analysis Fig 12: Results by DAN Model
  • 20. Results and Discussions 28/05/2021 20 Quora Duplicate Question Pair Detection using Semantic Analysis Fig 13: Results by Transformer Model
  • 21. Conclusion 28/05/2021 21 Quora Duplicate Question Pair Detection using Semantic Analysis Model Embedding technique F1-score weighted average F1- Score macro average Logistic Regression Word2Vec, Similarity scores 0.66 0.62 Random Forest Word2Vec, Similarity scores 0.70 0.69 Table 1:Accuracy of machine learning models
  • 22. Conclusion 28/05/2021 22 Quora Duplicate Question Pair Detection using Semantic Analysis Table 2:Accuracy of Deep learning models (DAN & Transformer) Model Embedding technique Epochs Training accuracy (%) Validation accuracy (%) Neural Network Universal Sentence Encoder (DAN) 20 88.63 86 Neural Network Universal Sentence Encoder (Transformer) 20 89.16 85
  • 23. Conclusion • Deep learning models using sentence level embedding outperform the basic classification model. • DAN Model sometimes under performs with the questions having double negation. • Transformer based Universal Sentence Encoder can be used. 28/05/2021 23 Quora Duplicate Question Pair Detection using Semantic Analysis
  • 24. References [1] Mueller J, Thyagarajan A. Siamese recurrent architectures for learning sentence similarity. In: Proceedings of the thirtieth AAAI conference on artificial intelligence. (2016) [2] Eneko Agirre, Aitor Gonzalez-Agirre, Inigo Lopez-Gazpio, Montse Maritxalar, German Rigau, and Larraitz Uria. Semeval-2016 task 2: Interpretable semantic textual similarity. In: Proceedings of the 10th International Workshop on Semantic Evaluation (2016). [3] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, pp. 5998-6008. 2017. (2017) [4] Cer D, Yang Y, Kong S-Y, et al. Universal Sentence Encoder for English. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. doi: 10.18653/v1/d18-2029 (2018) [5] https://www.kaggle.com/c/quora-question-pairs/data 28/05/2021 24 Quora Duplicate Question Pair Detection using Semantic Analysis
  • 25. 28/05/2021 25 Thank you Quora Duplicate Question Pair Detection using Semantic Analysis