SlideShare a Scribd company logo
«Tag-based Semantic
Website Recommendation
for Turkish Language»

Onur Yılmaz
mail@onuryilmaz.me
Outline


Introduction



Related Work



Problem Definition and Algorithm



Experimental Evaluation



Conclusion



Future Work



Demo
Introduction - Definitions


Tags




non-hierarchical keyword or term

Reasons
 categorizing,
 memorizing,
 archiving
 and

sharing…
Introduction - Motivation


Dramatic increase in the number of the websites on
the internet

7.14
billion
pages

Difficulty in
finding and
exploring
new websites

Social
bookmarking

Recommendation
systems
Introduction – Turkish Effect


Recommendation systems search within user inputs


Users tend to use their own language on the internet



Turkey is listed as 32nd country in English proficiency



Turkish and English is very different languages!
Introduction – What is
proposed?


Tag-based recommendation system


For Turkish-language



Which is based on similarity, tag weight, tag
popularity;



Where semantic properties of tags are taken into
account
Related Work


Collaborative filtering





Widely accepted
No context!

Topic and pattern extraction


Usage of WordNet


A lexical database for the English language



2 papers are found for Turkish WordNet but no
source
Related Work


Similarity calculation methods


Durao & Dolog (2009) Reference paper



Tag popularity, tag representativeness and taguser affinity



Without any semantics analysis, 60 % acceptance
level achieved
Problem Definition


Take inputs

Websites
and tags

Recommendation
System
Problem Definition


Provide personal recommendations

Websites
and tags

Recommendation
System
Problem Definition


Aim -> User satisfaction



Recommend websites


User wants to use in the future,



Already using and finds interesting
Problem Definition



Challenge -> Different tagging purposes and
expectations
Website

Tag

Potential Purpose

zaytung.com

zaytung

Archiving

eksisozluk.com

alışkanlık
(ENG: habit)

Internet usage
habit

evekitap.com

ücretsiz kargo
(ENG: free shipping)

Categorizing

9gag.com

eğlenceli
(ENG: funny)

Definition
Data are taken from experiment
Algorithm


Steps of the algorithm

Spell-check

Stemming

Semantics
Analysis

Similarity
Calculation
Algorithm – Spell-Checking


Spell check on the tags


Add a single letter,



Delete a single letter,



Replace one letter and



Transpose two letters

Estimated tags occur or not in Turkish National Corpus.
Algorithm – Spell-Checking


Correction on URLs
Original URL

Corrected URL

https://www.deviantart.com/

deviantart.com

http://www.sahadan.com/Default.aspx

sahadan.com

http://www.yemeksepeti.com/AnonymouseDefault.aspx

yemeksepeti.com
Data are taken from experiment
Algorithm


Steps of the algorithm

Spell-check

Stemming

Semantics
Analysis

Similarity
Calculation
Algorithm – Stemming


Stems of the tags are extracted by removing
suffices.
Website
facebook.com

metu.edu.tr

deviantart.com

Original Tag

Corrected Tag

arkadaşlık

arkadaş

(ENG: friendship)

(ENG: friend)

mühendislik

mühendis

(ENG: engineering)

(ENG: engineer)

eğlenceli

eğlence

(ENG: funny)

(ENG: fun)
Data are taken from experiment
Algorithm


Steps of the algorithm

Spell-check

Stemming

Semantics
Analysis

Similarity
Calculation
Algorithm – Semantics Analysis


An open source «Turkish Thesaurus» project


125.022 <Word, Synonym> pairs
Algorithm – Semantics Analysis


Algorithm applied:
for each “tag” in ALL-DATA do:
for each “synonym” of “tag” in SYNONYM-LIST do:
if “synonym” occurs in ALL-DATA then:
add <user, site, “synonym”> to ALL-DATA
Algorithm – Semantics Analysis
User

Website

Tag

User1

milliyet.com.tr

haber (ENG: news)

User2

sabah.com.tr

gazete (ENG: newspaper)

Original data (ALL-DATA)

Word

Synonym

haber (ENG: news)

gazete (ENG: newspaper)
Synonym List (SYNONYM-LIST)

User

Website

Tag

User1

milliyet.com.tr

gazete (ENG: newspaper)

User2

sabah.com.tr

haber (ENG: news)

Added data to ALL-DATA
Data are taken from experiment
Algorithm – Semantics Analysis



An environment where all users provide tags and
their potential meanings which other people may
have already used.
Algorithm


Steps of the algorithm

Spell-check

Stemming

Semantics
Analysis

Similarity
Calculation
Algorithm –

Website
Rating

=

Similarity Calculation

Tag
Popularity
𝑜𝑣𝑒𝑟 𝑎𝑙𝑙 𝑡𝑎𝑔𝑠

How often
this tag is
used?

x

Tag
Representativeness
How much a tag
can represent a
document?

The more used
for document,
the more
representative
Algorithm –

Similarity
=
(a,b)

Similarity Calculation

Document +
Score (a)

Document x
Score (b)

Cosine
Similarity
(a,b)

Tags as vectors,
Cosine similarity
between vectors
Experimental Evaluation
Call for
participation

Gather websites
and tags

Find
recommendations

Ask for evaluation
Experimental Evaluation
Call for
participation

Gather websites
and tags
www.eksiduyuru.com
Find
recommendations

Ask for
evaluation
Experimental Evaluation
Call for
participation

Gather websites
and tags

25 users
122 websites
366 tags

Find
recommendations

Ask for
evaluation

bit.ly/oneri-sistemi
Experimental Evaluation
Call for
participation

Gather websites
and tags

20 of 25 Users

Find
recommendations

Ask for
evaluation

bit.ly/oneri-sistemi-degerlendirme
Experimental Evaluation
Expected Results

Recommendation
Acceptance

50 %
Not
acceptable

80 %
Excellent
(not expected)
Experimental Evaluation
Results
For top 5

28%

Accepted Recomm.

recommendations

Accepted

Rejected

72%

5
4
3
2

Accepted

1
0
1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20
User

Accepted recommendations by each user (5 Recommendations)
Experimental Evaluation
Results
For top 3

22%
Accepted

Accepted Recomm.

recommendations

Rejected

78%

3
2
1

Accepted

0
1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20
User

Accepted recommendations by each user (3 Recommendations)
Conclusion


What is presented?


Turkish-language tag-based recommendation
system
 Based

on similarity, tag weight, tag
popularity

 Semantic

properties of tags are taken
into account
Conclusion


Main contribution


Combining
 Well-known

similarity measures and

calculations
 Turkish

semantics analysis
Conclusion


Evaluation


An experiment with 25 people



Participants provide websites and tags



Then evaluate recommendations
Future Work


Pre-processing Stage


English inputs

Site
yandex.com


Tags
harita, e-mail, arama

Turkish inputs with English letters

Site
eksiduyuru.com


Tags
duyuru, alinik, satilik

Translation or control over them
Future Work


Semantic Analysis


Small set of synonyms list



125.022 <word, synonym> pairs



Larger and more comprehensive theasurus
Demo
2 users from experiment
Demo
XXX@candasdemir.com
Website

Tags

http://candasdemir.com pazarlama, kişisel, blog
http://radikal.com.tr

haber, gündem, güncel

http://mynet.com

portal, genel, haber

http://sahibinden.com

alışveriş, market, sahibinden

http://markafoni.com

moda, e-ticaret, alışveriş
Demo
XXX@candasdemir.com
Website

Tags

http://candasdemir.com

pazarlama, kişisel, blog

http://radikal.com.tr

haber, gündem, güncel

http://mynet.com

portal, genel, haber

http://sahibinden.com

alışveriş, market, sahibinden

http://markafoni.com

moda, e-ticaret, alışveriş

mynet.com

bilgi

mynet.com

gazete

radikal.com.tr

bilgi

radikal.com.tr

gazete

Added after
semantics
analysis
Demo
XXX@candasdemir.com
Website

User Satisfaction

eksisozluk.com

Accepted

Website
candasdemir.com
radikal.com.tr

zaytung.com

Accepted

sabah.com.tr

Accepted

sahibinden.com

ntvmsnbc.com

Accepted

markafoni.com

golfdunyasi.com.tr

Not Accepted

Recommended
Websites

mynet.com

User Inputs
Demo
demirkolXXX@hotmail.com
Website

Tags

http://www.sahadan.com/Default.aspx

eğlence, merak, futbol

http://www.erepublik.com/

iletişim, strateji, oyun

http://www.1907unifeb.org/forums

fenerbahçe, sohbet, eğlence

http://ligtv.com.tr/

maç özetleri, haber, futbol

https://www.tuttur.com/

para, futbol, eğlence
Demo
demirkolXXX@hotmail.com
Website

Tags

http://www.sahadan.com/Default.aspx

eğlence, merak, futbol

http://www.erepublik.com/

iletişim, strateji, oyun

http://www.1907unifeb.org/forums

fenerbahçe, sohbet, eğlence

http://ligtv.com.tr/

maç özetleri, haber, futbol

https://www.tuttur.com/

para, futbol, eğlence

ligtv.com.tr

bilgi

ligtv.com.tr

gazete
Added after
semantics
analysis
Demo
demirkolXXX@hotmail.com
Website

User Satisfaction

mackolik.com

Accepted

zaytung.com

Accepted

9gag.com

Accepted

ligtv.com.tr

dizi-mag.com

Accepted

tuttur.com

galatasaray.com.tr

Not Accepted

Recommended
Websites

Website

sahadan.com
erepublik.com
1907unifeb.org/forums

User Inputs
References


Adrian, B., Sauermann, L., & Roth-berghofer, T. (2007). ConTag: A
Semantic Tag Recommendation System. Proceedings of ISemantics’ 07



Aksan, Y. et al. (2012). Construction of the Turkish National Corpus (TNC).
In Proceedings of the Eight International Conference on Language
Resources and Evaluation (LREC 2012). İstanbul. Turkiye.
http://www.lrec-conf.org/proceedings/lrec2012/papers.html



Brill, E., & Moore, R. C. (2000). An Improved Error Model for Noisy
Channel Spelling Correction. (Microsoft Research)



Cattuto, C., Benz, D., Hotho, A., & Stumme, G. (2008). Semantic
Grounding of Tag Relatedness in Social Bookmarking Systems. In The
Semantic Web - ISWC 2008. 2008: Springer



Durao, F., & Dolog, P. (2009). A Personalized Tag-based Recommendation
in Social Web Systems. International Workshop on Adaptation and
Personalization for Web 2.0
References


Education First, (2012). EF EPI Country Rankings



Frankfurt International School, (2001). The Differences Between English
and Turkish



ISPA (Investment Support and Promotion Agency) of Turkey, (2010).Turkish
Information and Communication Technologies Industry. Deloitte



Nakamoto, R., Nakajima, S., Miyazaki, J., & Uemura, S. (2007). Tagbased Contextual Collaborative Filtering. IAENG International Journal of
Computer Science



Özbek, A. (2012). Türkçe Eşanlamlı Kelimeler Sözlüğü Projesi (Turkish
Thesaurus Project). Retrieved from http://github.com/maidis/mythes-tr
Thank you!

More Related Content

PDF
Boolean Search Fundamentals For Recruiters - Guide
PPTX
X-Ray Searching - SourceBreaker
PPTX
Sourcing using boolean search and other tips 2014
PPTX
Wk5 contextualized onlinesearchandresearchskills
PPT
Boolean- Search Basics
PPT
Sourcing / Recruiting /Searching on Google and Live
PPTX
Internet search techniques by zakir hossain
PPTX
Internet search techniques for K12
Boolean Search Fundamentals For Recruiters - Guide
X-Ray Searching - SourceBreaker
Sourcing using boolean search and other tips 2014
Wk5 contextualized onlinesearchandresearchskills
Boolean- Search Basics
Sourcing / Recruiting /Searching on Google and Live
Internet search techniques by zakir hossain
Internet search techniques for K12

Viewers also liked (8)

PPTX
Website Implementation #2
PPTX
Eight Tips to a Successful Website Implementation
PPTX
Website implementation
PPTX
Further Website Implementation
PPTX
Website conclusion
DOCX
Website analysis report
PDF
Website Analysis Report
DOCX
Website analysis sample report
Website Implementation #2
Eight Tips to a Successful Website Implementation
Website implementation
Further Website Implementation
Website conclusion
Website analysis report
Website Analysis Report
Website analysis sample report
Ad

Recently uploaded (20)

PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Cell Types and Its function , kingdom of life
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
master seminar digital applications in india
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
01-Introduction-to-Information-Management.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Updated Idioms and Phrasal Verbs in English subject
PDF
Classroom Observation Tools for Teachers
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Trump Administration's workforce development strategy
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Computing-Curriculum for Schools in Ghana
Paper A Mock Exam 9_ Attempt review.pdf.
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Cell Types and Its function , kingdom of life
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
master seminar digital applications in india
LDMMIA Reiki Yoga Finals Review Spring Summer
What if we spent less time fighting change, and more time building what’s rig...
01-Introduction-to-Information-Management.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
Updated Idioms and Phrasal Verbs in English subject
Classroom Observation Tools for Teachers
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Module 4: Burden of Disease Tutorial Slides S2 2025
Trump Administration's workforce development strategy
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Anesthesia in Laparoscopic Surgery in India
Computing-Curriculum for Schools in Ghana
Ad

Tag-based Semantic Website Recommendation