SlideShare a Scribd company logo
Temporal Semantic Techniques for
Text Analysis and Applications
FEDELUCIO NARDUCCI, PHD - UNIVERSITY OF BARI ALDO MORO, ITALY
SEMANTIC WEB ACCESS & PERSONALIZATION (SWAP) RESEARCH GROUP
Research Workshop of the Israel Science Foundation on
User Modeling and Recommender Systems
Haifa, July 17, 2017
Credits: Pierpaolo Basile, Phd
INTRODUCTION
➤ Basic concepts
➤ Techniques for Temporal Analysis of Natural Language
➤ T-Recs: Temporal Analysis of Conference Proceedings
➤ Time-aware Recommender Systems
➤ Future applications
2
BASIC CONCEPTS
➤Natural language…
➤ language spoken by people, e.g. English,
Japanese, Hebrew, Arabic, Italian…
➤…processing
➤ applications that deal with natural
language in a way or another
➤NLP Applications
➤ classify text into categories
➤ index and search text
➤ machine translation
➤ …
3
PROBLEM
➤ Systems which deal with textual content should
consider that language changes over time
meat
any kind of food
(archaic)
flesh of mammals
(current)
4
POSSIBLE SOLUTION
➤Synchronic Analysis
➤ the language is described by rules at a
specific point of time without taking its
history into account
➤Diachronic Analysis
➤ the language is described through its
evolution over time
5
LINGUISTICS
➤Computational Linguistics
➤ Doing linguistics on computers. More on the linguistic side
than NLP, but closely related (interdisciplinary field)
➤Diachronic Linguistics
➤ The scientific study of language changes over time also
called Historical Linguistics
6
DIACHRONIC ANALYSIS: WHY?
➤ Observe changes in particular languages
➤ Reconstruct the pre-history of languages
➤ Develop general theories about how and why a language
changes
➤ Describe the history of speech communities
➤ Etymology
7
TOOLS
➤Google n-gram viewer
➤Temporal Random Indexing
➤Explicit Semantic Analysis
8
GOOGLE N-GRAM ANALYZER
9
GOOGLE N-GRAM VIEWER
➤Search and visualize n-gram statistics from
Google Books
➤N-gram: sequence of n words
➤Google Books digitalizes millions of books
10
N-GRAM
“Google Books digitalizes millions of books”
➤1-gram
➤ Google, Books, digitalizes, millions, of, books
➤2-gram
➤ Google Books, Books digitalizes, digitalizes millions,
millions of, of books
➤3-gram
➤ Google Books digitalizes, Books digitalizes millions,
digitalizes millions of, millions of books
11
GOOGLE N-GRAM VIEWER
https://books.google.com/ngrams
12
GRAMMAR EVOLUTION
to burn is evolving from an irregular form to a regular form
burned
burnt
13
MEANING EVOLUTION
invention
of the telephone
mail is also synonym of email
communication is more frequent combined to mail than to telephone now
14
TEMPORAL RANDOM INDEXING
P. Basile, A. Caputo, G. Semeraro. Temporal random indexing: A system for analysing word meaning over time. IJCoL vol. 1: Emerging Topics at the First Italian
Conference on Computational Linguistics, Accademia University Press.
15
DISTRIBUTIONAL SEMANTICS
Meaning of a word
is determined by its
usage
Ludwig Wittgenstein 16
DISTRIBUTIONAL SEMANTICS
wine beer
dog cat
17
DISTRIBUTIONAL SEMANTICS
extract co-occurrences
‘Yes, you may still call me’
Yes	->	[you,	may]	
you	->	[Yes,	may,	still]	
may	->	[Yes,	you,	still,	call]	
still	->	[you,	may,	call,	me]	
call	->	[may,	still,	me,	…]	
me	->	[still,	call,	…]	
co-occurences
max distance
3
18
DISTRIBUTIONAL SEMANTICS
count co-occurrences
dog cat bread pasta meat mouse
dog 40 7 1 0 1 5
cat 7 32 0 1 0 8
bread 1 0 22 15 8 0
pasta 0 1 15 24 10 1
meat 1 0 8 10 30 2
mouse 5 8 0 1 2 31
a
19
DISTRIBUTIONAL SEMANTICS
word similarity
dog cat bread pasta meat mouse
dog 40 7 1 0 1 5
cat 7 32 0 1 0 8
bread 1 0 22 15 8 0
pasta 0 1 15 24 10 1
meat 1 0 8 10 30 2
mouse 5 8 0 1 2 31
20
DISTRIBUTIONAL SEMANTICS
word similarity
dog cat bread pasta meat mouse
dog 40 7 1 0 1 5
cat 7 32 0 1 0 8
bread 1 0 22 15 8 0
pasta 0 1 15 24 10 1
meat 1 0 8 10 30 2
mouse 5 8 0 1 2 31
21
DISTRIBUTIONAL SEMANTICS
word similarity
dog cat bread pasta meat mouse
dog 40 7 1 0 1 5
cat 7 32 0 1 0 8
bread 1 0 22 15 8 0
pasta 0 1 15 24 10 1
meat 1 0 8 10 30 2
mouse 5 8 0 1 2 31
22
DISTRIBUTIONAL SEMANTICS
geometric space
dog
pasta
mouse
cat
23
DISTRIBUTIONAL SEMANTICS
geometric space
dog
pasta
mouse
cat
cat and mouse
are close in the space
24
DISTRIBUTIONAL SEMANTICS
➤ WordSpace
➤ a snapshot of a specific corpus
➤ It does not take into account
temporal information
25
RANDOM INDEXING
➤Incremental, scalable and effective technique
for dimensionality reduction
➤RI belongs to the class of distributional models
➤The meaning of a word can be inferred by
analyzing its use (distribution) in large corpus
➤Words that occur in the same context tend to have
the same meaning
26
TEMPORAL RANDOM INDEXING
Corpus1900
RI
Space1
Corpus1920
RI
Space2
Corpus1930
RI
Space3
Corpus1940
RI
Space4
1860 1870 1880 1900
27
MEANING EVOLUTION
word space
1860
word space
1870
word space
1900
call call
call
phone phone
28
MEANING EVOLUTION: CHANGING POINT
➤ Track how the word meaning changes over
time
➤ Build a time series by taking into account the
semantic shift of each word
➤ Find significant change: mean shift model
call
1860 1870 1900 1910 1920
0.25 0.3 0.7 0.8 0.75
vector similarity in different Word Spaces
change point
29
EXPLICIT SEMANTIC ANALYSIS
30
WIKIPEDIA AS KNOWLEDGE SOURCE
➤ Exploit Wikipedia as knowledge source in
different Artificial Intelligence tasks
➤ Semantic Relatedness computation between
texts
➤ Word Sense and Named Entity
disambiguation
➤ Information Retrieval
➤ Clustering
➤ Text Classification
31
EXPLICIT SEMANTIC ANALYSIS
➤ Methodology for representing the knowledge
stored in Wikipedia through an inverted index
➤ ESA defines a relationship between terms in a
vocabulary and Wikipedia articles
➤ ESA has been effectively used for the semantic
relatedness computation, text categorization,
and in the information retrieval area
Evgeniy Gabrilovich, Shaul Markovitch: Wikipedia-based Semantic Interpretation for Natural Language Processing. 
J. Artif. Intell. Res. 34: 443-498 (2009)
32
EXPLICIT SEMANTIC ANALYSIS
ESA concept 1 concept 2 concept k
term 1 TF-IDF TF-IDF TF-IDF TF-IDF
term 2 TF-IDF TF-IDF TF-IDF TF-IDF
TF-IDF TF-IDF TF-IDF TF-IDF
term n TF-IDF TF-IDF TF-IDF TF-IDF
Wikipedia Articles
Terms in
Wikipedia
articles
{
{
➤ The semantic relatedness between a word and a
Wikipedia concept (article) is in terms of TF-IDF
33
EXPLICIT SEMANTIC ANALYSIS
➤ Every Wikipedia article represents a concept
Panthera	
Cat [0.92]	
Leopard [0.84]	
Roar [0.77]	
34
EXPLICIT SEMANTIC ANALYSIS
➤ The semantics of a word is the vector of its
associations with Wikipedia concepts
Cat	 Panthera	
[0.92]	
Cat	
[0.95]	
Jane
Fonda	
[0.07]	
Semantic
interpretation vector
35
EXPLICIT SEMANTIC ANALYSIS
➤ The semantic interpretation vector of a text fragment
is the centroid vector of the terms occurring in the fragment
➤ A term or a text fragment can be represented in terms of
most related Wikipedia concepts
bu#on	
Dick
Bu#on	
[0.84]	
Bu#on	
[0.93]	
Game	
Controller	
[0.32]	
Mouse
computing	
[0.81]	
mouse	
Mouse
computing	
[0.89]	
Mouse
rodent	
[0.91]	
John
Steinbeck	
[0.17]	
Mickey	
Mouse	
[0.81]	
mouse
bu#on	
Drag-
and-
drop	
[0.32]	
Mouse
rodent	
[0.46]	
Mouse
computing	
[0.85]	
IBM	
PS/2	
[0.35]	
Centroid
vector
36
TEMPORAL ESA
➤ It is possible to build a matrix for different
time periods
2000
ESA concept 1 concept 2 concept k
term 1 0.87 0.13 0.44
term 2 0.34 0.50 0.10
term n 0.22 0.60 0.55
2010
ESA concept 1 concept 2 concept k
term 1 0.77 0.24 0.30
term 2 0.34 0.50 0.10
term n 0.22 0.60 0.55
2020
ESA concept 1 concept 2 concept k
term 1 0.45 0.33 0.29
term 2 0.34 0.50 0.10
term n 0.22 0.60 0.55
37
TEMPORAL ESA
semantic interpretation vector of the
same term in different periods
meat
meat
fruit
0.73
vegetable
0.64
beverage
0.55
1920
2017
beef
0.81
pork
0.72
chicken
0.68
38
T-RECS1
TEMPORAL SEMANTIC ANALYSIS OF
CONFERENCE PROCEEDINGS
39
Narducci, F., Basile, P., Lops, P., de Gemmis, M., & Semeraro, G. (2017, April). Temporal Semantic Analysis of
Conference Proceedings. In European Conference on Information Retrieval (pp. 762-765). Springer, Cham.
1
http://193.204.187.192/recsys/
T-RECS: IDEA
➤Identify linguistic phenomena reflecting interesting
variations for the research community
➤topic shift
➤correlation between two topics
➤similarity between authors
➤Answer to questions like
➤authors who studied emotions in recommender systems
➤the most used recommendation paradigm in 2007
➤correlation between matrix factorization and collaborative filtering
➤…
40
T-RECS: TECHNIQUES & DATA
➤Techniques
➤N-gram analyzer
➤Temporal Random Indexing
➤Temporal ESA
➤Corpus
➤ ACM RecSys Conference Proceedings from 2007 to
2015
41
T-RECS: TECHNIQUES
➤N-gram analyzer
➤we counted n-gram (n=1,…,5) into the
papers grouped by author and year
➤we show the percentage of a given n-
gram in the corpus for each year, grouped
by author
42
T-RECS: N-GRAM ANALYZER
bi-grams: matrix factorization, collaborative filtering 43
T-RECS: N-GRAM ANALYZER
D. Jannach, M. Zanker, M. Ge, and M. Groning,
Recommender systems in computer science and information systems{a landscape of research. Springer,
2012.
44
T-RECS: TEMPORAL RI
➤Temporal Random Indexing
➤A word space for each year has been built
➤Word vectors compared in different time
periods for
➤identifying changes of the semantics
➤computing relatedness over time
45
T-RECS: TEMPORAL RANDOM INDEXING
similarity between matrix factorization and collaborative
filtering is computed for each word space
word spaces
46
T-RECS: TEMPORAL EXPLICIT SEMANTIC ANALYSIS
➤A matrix for each year has been built
➤columns: authors
➤rows: terms occurring in the papers
ESA author 1 author 2 author k
term 1 0.87 0.13 0.44
term 2 0.34 0.50 0.10
term n 0.22 0.60 0.55
ESA concept 1 concept 2 concept k
term 1 0.87 0.13 0.44
term 2 0.34 0.50 0.10
term n 0.22 0.60 0.55
47
T-RECS: TEMPORAL EXPLICIT SEMANTIC ANALYSIS
2005
ESA author 1 author 2 author k
term 1 0.87 0.13 0.44
term 2 0.34 0.50 0.10
term n 0.22 0.60 0.55
similarity between two authors
compute the cosine similarity
between the two column vectors 48
T-RECS: TEMPORAL EXPLICIT SEMANTIC ANALYSIS
2007
ESA author 1 author 2 author k
term 1 0.87 0.13 0.44
term 2 0.34 0.50 0.10
term n 0.22 0.60 0.55
similarity between two terms
2010
ESA author 1 author 2 author k
term 1 0.87 0.13 0.44
term 2 0.34 0.50 0.10
term n 0.22 0.60 0.55
compute the cosine
similarity between the
row vectors of the same
term in different years 49
T-RECS: TEMPORAL ESA
50
DIACHRONIC ANALYSIS AND
RECOMMENDER SYSTEMS
51
TIME-AWARE RECOMMENDER SYSTEMS
➤Temporal aspects have not yet been extensively
investigated in recommendation scenarios
➤State of the art
➤context-aware, context=time
➤decay function
➤time categories (day, week, weekday or weekend)
➤model based recsys (eg. TimeSVD)
52
OUR PROPOSAL FOR A TIME-AWARE CBRS
Content Analyzer
Profile Learner
Recommender
Item descriptions
Processed Items
Rated items
Recommended
Items
Feedback
Temporal
Analyzer
53
FUTURE WORK
➤Analyze different corpora (proceedings, news
collection, user profiles)
➤Implement temporal analyses in content-
based recommender systems
➤how (content) preferences change over
time
54
Thanks
fedelucio.narducci@uniba.it
55

More Related Content

PDF
A Bridge Not too Far
PDF
Contexts 4 quantification (CommonSense2013)
PDF
Lean Logic for Lean Times: Varieties of Natural Logic
PPT
Tweeting beyond Facts – The Need for a Linguistic Perspective
PDF
Natural Language Processing
PDF
Diachronic Analysis of Language exploiting Google Ngram
DOCX
An-Exploration-of-scientific-literature-using-Natural-Language-Processing
PDF
Diachronic Analysis
A Bridge Not too Far
Contexts 4 quantification (CommonSense2013)
Lean Logic for Lean Times: Varieties of Natural Logic
Tweeting beyond Facts – The Need for a Linguistic Perspective
Natural Language Processing
Diachronic Analysis of Language exploiting Google Ngram
An-Exploration-of-scientific-literature-using-Natural-Language-Processing
Diachronic Analysis

Similar to Temporal Semantic Techniques for Text Analysis and Applications (20)

PDF
Some Information Retrieval Models and Our Experiments for TREC KBA
PPTX
Franz 2015 SPNHC Taxonomic concept resolution for voucher-based biodiversity ...
PDF
Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...
PDF
Computational linguistics
PPTX
MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications fo...
PDF
Computational Phraseology Gloria Corpas Pastor Jeanpierre Colson
PPTX
Deep Neural Methods for Retrieval
PDF
Intro to sentiment analysis
PPTX
Neural Text Embeddings for Information Retrieval (WSDM 2017)
PDF
Adnan: Introduction to Natural Language Processing
PDF
AMBIGUITY-AWARE DOCUMENT SIMILARITY
PPT
Natural Language Processing
PDF
Mapping Landscape of Patterns - Vol.2
PDF
IJNLC 2013 - Ambiguity-Aware Document Similarity
PPTX
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
PPTX
Pptphrase tagset mapping for french and english treebanks and its application...
PDF
Exploring Google Books Ngram Viewer for Big Data Text Corpus Visualizations
PDF
Corpusbased Approaches To Metaphor And Metonymy Anatol Stefanowitsch Editor S...
PDF
Exploration of Rhetorical Appeals, Operations and Figures in UI/UX Design
PDF
Automatic Profiling Of Learner Texts
Some Information Retrieval Models and Our Experiments for TREC KBA
Franz 2015 SPNHC Taxonomic concept resolution for voucher-based biodiversity ...
Adpositional Argumentation: How Logic Originates In Natural Argumentative Dis...
Computational linguistics
MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications fo...
Computational Phraseology Gloria Corpas Pastor Jeanpierre Colson
Deep Neural Methods for Retrieval
Intro to sentiment analysis
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Adnan: Introduction to Natural Language Processing
AMBIGUITY-AWARE DOCUMENT SIMILARITY
Natural Language Processing
Mapping Landscape of Patterns - Vol.2
IJNLC 2013 - Ambiguity-Aware Document Similarity
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
Pptphrase tagset mapping for french and english treebanks and its application...
Exploring Google Books Ngram Viewer for Big Data Text Corpus Visualizations
Corpusbased Approaches To Metaphor And Metonymy Anatol Stefanowitsch Editor S...
Exploration of Rhetorical Appeals, Operations and Figures in UI/UX Design
Automatic Profiling Of Learner Texts
Ad

Recently uploaded (20)

PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
2. Earth - The Living Planet earth and life
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
famous lake in india and its disturibution and importance
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
The scientific heritage No 166 (166) (2025)
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Introduction to Cardiovascular system_structure and functions-1
AlphaEarth Foundations and the Satellite Embedding dataset
2. Earth - The Living Planet Module 2ELS
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
neck nodes and dissection types and lymph nodes levels
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
2. Earth - The Living Planet earth and life
Biophysics 2.pdffffffffffffffffffffffffff
bbec55_b34400a7914c42429908233dbd381773.pdf
famous lake in india and its disturibution and importance
Placing the Near-Earth Object Impact Probability in Context
Derivatives of integument scales, beaks, horns,.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
The KM-GBF monitoring framework – status & key messages.pptx
HPLC-PPT.docx high performance liquid chromatography
The scientific heritage No 166 (166) (2025)
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Phytochemical Investigation of Miliusa longipes.pdf
Introduction to Cardiovascular system_structure and functions-1
Ad

Temporal Semantic Techniques for Text Analysis and Applications

  • 1. Temporal Semantic Techniques for Text Analysis and Applications FEDELUCIO NARDUCCI, PHD - UNIVERSITY OF BARI ALDO MORO, ITALY SEMANTIC WEB ACCESS & PERSONALIZATION (SWAP) RESEARCH GROUP Research Workshop of the Israel Science Foundation on User Modeling and Recommender Systems Haifa, July 17, 2017 Credits: Pierpaolo Basile, Phd
  • 2. INTRODUCTION ➤ Basic concepts ➤ Techniques for Temporal Analysis of Natural Language ➤ T-Recs: Temporal Analysis of Conference Proceedings ➤ Time-aware Recommender Systems ➤ Future applications 2
  • 3. BASIC CONCEPTS ➤Natural language… ➤ language spoken by people, e.g. English, Japanese, Hebrew, Arabic, Italian… ➤…processing ➤ applications that deal with natural language in a way or another ➤NLP Applications ➤ classify text into categories ➤ index and search text ➤ machine translation ➤ … 3
  • 4. PROBLEM ➤ Systems which deal with textual content should consider that language changes over time meat any kind of food (archaic) flesh of mammals (current) 4
  • 5. POSSIBLE SOLUTION ➤Synchronic Analysis ➤ the language is described by rules at a specific point of time without taking its history into account ➤Diachronic Analysis ➤ the language is described through its evolution over time 5
  • 6. LINGUISTICS ➤Computational Linguistics ➤ Doing linguistics on computers. More on the linguistic side than NLP, but closely related (interdisciplinary field) ➤Diachronic Linguistics ➤ The scientific study of language changes over time also called Historical Linguistics 6
  • 7. DIACHRONIC ANALYSIS: WHY? ➤ Observe changes in particular languages ➤ Reconstruct the pre-history of languages ➤ Develop general theories about how and why a language changes ➤ Describe the history of speech communities ➤ Etymology 7
  • 8. TOOLS ➤Google n-gram viewer ➤Temporal Random Indexing ➤Explicit Semantic Analysis 8
  • 10. GOOGLE N-GRAM VIEWER ➤Search and visualize n-gram statistics from Google Books ➤N-gram: sequence of n words ➤Google Books digitalizes millions of books 10
  • 11. N-GRAM “Google Books digitalizes millions of books” ➤1-gram ➤ Google, Books, digitalizes, millions, of, books ➤2-gram ➤ Google Books, Books digitalizes, digitalizes millions, millions of, of books ➤3-gram ➤ Google Books digitalizes, Books digitalizes millions, digitalizes millions of, millions of books 11
  • 13. GRAMMAR EVOLUTION to burn is evolving from an irregular form to a regular form burned burnt 13
  • 14. MEANING EVOLUTION invention of the telephone mail is also synonym of email communication is more frequent combined to mail than to telephone now 14
  • 15. TEMPORAL RANDOM INDEXING P. Basile, A. Caputo, G. Semeraro. Temporal random indexing: A system for analysing word meaning over time. IJCoL vol. 1: Emerging Topics at the First Italian Conference on Computational Linguistics, Accademia University Press. 15
  • 16. DISTRIBUTIONAL SEMANTICS Meaning of a word is determined by its usage Ludwig Wittgenstein 16
  • 18. DISTRIBUTIONAL SEMANTICS extract co-occurrences ‘Yes, you may still call me’ Yes -> [you, may] you -> [Yes, may, still] may -> [Yes, you, still, call] still -> [you, may, call, me] call -> [may, still, me, …] me -> [still, call, …] co-occurences max distance 3 18
  • 19. DISTRIBUTIONAL SEMANTICS count co-occurrences dog cat bread pasta meat mouse dog 40 7 1 0 1 5 cat 7 32 0 1 0 8 bread 1 0 22 15 8 0 pasta 0 1 15 24 10 1 meat 1 0 8 10 30 2 mouse 5 8 0 1 2 31 a 19
  • 20. DISTRIBUTIONAL SEMANTICS word similarity dog cat bread pasta meat mouse dog 40 7 1 0 1 5 cat 7 32 0 1 0 8 bread 1 0 22 15 8 0 pasta 0 1 15 24 10 1 meat 1 0 8 10 30 2 mouse 5 8 0 1 2 31 20
  • 21. DISTRIBUTIONAL SEMANTICS word similarity dog cat bread pasta meat mouse dog 40 7 1 0 1 5 cat 7 32 0 1 0 8 bread 1 0 22 15 8 0 pasta 0 1 15 24 10 1 meat 1 0 8 10 30 2 mouse 5 8 0 1 2 31 21
  • 22. DISTRIBUTIONAL SEMANTICS word similarity dog cat bread pasta meat mouse dog 40 7 1 0 1 5 cat 7 32 0 1 0 8 bread 1 0 22 15 8 0 pasta 0 1 15 24 10 1 meat 1 0 8 10 30 2 mouse 5 8 0 1 2 31 22
  • 25. DISTRIBUTIONAL SEMANTICS ➤ WordSpace ➤ a snapshot of a specific corpus ➤ It does not take into account temporal information 25
  • 26. RANDOM INDEXING ➤Incremental, scalable and effective technique for dimensionality reduction ➤RI belongs to the class of distributional models ➤The meaning of a word can be inferred by analyzing its use (distribution) in large corpus ➤Words that occur in the same context tend to have the same meaning 26
  • 28. MEANING EVOLUTION word space 1860 word space 1870 word space 1900 call call call phone phone 28
  • 29. MEANING EVOLUTION: CHANGING POINT ➤ Track how the word meaning changes over time ➤ Build a time series by taking into account the semantic shift of each word ➤ Find significant change: mean shift model call 1860 1870 1900 1910 1920 0.25 0.3 0.7 0.8 0.75 vector similarity in different Word Spaces change point 29
  • 31. WIKIPEDIA AS KNOWLEDGE SOURCE ➤ Exploit Wikipedia as knowledge source in different Artificial Intelligence tasks ➤ Semantic Relatedness computation between texts ➤ Word Sense and Named Entity disambiguation ➤ Information Retrieval ➤ Clustering ➤ Text Classification 31
  • 32. EXPLICIT SEMANTIC ANALYSIS ➤ Methodology for representing the knowledge stored in Wikipedia through an inverted index ➤ ESA defines a relationship between terms in a vocabulary and Wikipedia articles ➤ ESA has been effectively used for the semantic relatedness computation, text categorization, and in the information retrieval area Evgeniy Gabrilovich, Shaul Markovitch: Wikipedia-based Semantic Interpretation for Natural Language Processing.  J. Artif. Intell. Res. 34: 443-498 (2009) 32
  • 33. EXPLICIT SEMANTIC ANALYSIS ESA concept 1 concept 2 concept k term 1 TF-IDF TF-IDF TF-IDF TF-IDF term 2 TF-IDF TF-IDF TF-IDF TF-IDF TF-IDF TF-IDF TF-IDF TF-IDF term n TF-IDF TF-IDF TF-IDF TF-IDF Wikipedia Articles Terms in Wikipedia articles { { ➤ The semantic relatedness between a word and a Wikipedia concept (article) is in terms of TF-IDF 33
  • 34. EXPLICIT SEMANTIC ANALYSIS ➤ Every Wikipedia article represents a concept Panthera Cat [0.92] Leopard [0.84] Roar [0.77] 34
  • 35. EXPLICIT SEMANTIC ANALYSIS ➤ The semantics of a word is the vector of its associations with Wikipedia concepts Cat Panthera [0.92] Cat [0.95] Jane Fonda [0.07] Semantic interpretation vector 35
  • 36. EXPLICIT SEMANTIC ANALYSIS ➤ The semantic interpretation vector of a text fragment is the centroid vector of the terms occurring in the fragment ➤ A term or a text fragment can be represented in terms of most related Wikipedia concepts bu#on Dick Bu#on [0.84] Bu#on [0.93] Game Controller [0.32] Mouse computing [0.81] mouse Mouse computing [0.89] Mouse rodent [0.91] John Steinbeck [0.17] Mickey Mouse [0.81] mouse bu#on Drag- and- drop [0.32] Mouse rodent [0.46] Mouse computing [0.85] IBM PS/2 [0.35] Centroid vector 36
  • 37. TEMPORAL ESA ➤ It is possible to build a matrix for different time periods 2000 ESA concept 1 concept 2 concept k term 1 0.87 0.13 0.44 term 2 0.34 0.50 0.10 term n 0.22 0.60 0.55 2010 ESA concept 1 concept 2 concept k term 1 0.77 0.24 0.30 term 2 0.34 0.50 0.10 term n 0.22 0.60 0.55 2020 ESA concept 1 concept 2 concept k term 1 0.45 0.33 0.29 term 2 0.34 0.50 0.10 term n 0.22 0.60 0.55 37
  • 38. TEMPORAL ESA semantic interpretation vector of the same term in different periods meat meat fruit 0.73 vegetable 0.64 beverage 0.55 1920 2017 beef 0.81 pork 0.72 chicken 0.68 38
  • 39. T-RECS1 TEMPORAL SEMANTIC ANALYSIS OF CONFERENCE PROCEEDINGS 39 Narducci, F., Basile, P., Lops, P., de Gemmis, M., & Semeraro, G. (2017, April). Temporal Semantic Analysis of Conference Proceedings. In European Conference on Information Retrieval (pp. 762-765). Springer, Cham. 1 http://193.204.187.192/recsys/
  • 40. T-RECS: IDEA ➤Identify linguistic phenomena reflecting interesting variations for the research community ➤topic shift ➤correlation between two topics ➤similarity between authors ➤Answer to questions like ➤authors who studied emotions in recommender systems ➤the most used recommendation paradigm in 2007 ➤correlation between matrix factorization and collaborative filtering ➤… 40
  • 41. T-RECS: TECHNIQUES & DATA ➤Techniques ➤N-gram analyzer ➤Temporal Random Indexing ➤Temporal ESA ➤Corpus ➤ ACM RecSys Conference Proceedings from 2007 to 2015 41
  • 42. T-RECS: TECHNIQUES ➤N-gram analyzer ➤we counted n-gram (n=1,…,5) into the papers grouped by author and year ➤we show the percentage of a given n- gram in the corpus for each year, grouped by author 42
  • 43. T-RECS: N-GRAM ANALYZER bi-grams: matrix factorization, collaborative filtering 43
  • 44. T-RECS: N-GRAM ANALYZER D. Jannach, M. Zanker, M. Ge, and M. Groning, Recommender systems in computer science and information systems{a landscape of research. Springer, 2012. 44
  • 45. T-RECS: TEMPORAL RI ➤Temporal Random Indexing ➤A word space for each year has been built ➤Word vectors compared in different time periods for ➤identifying changes of the semantics ➤computing relatedness over time 45
  • 46. T-RECS: TEMPORAL RANDOM INDEXING similarity between matrix factorization and collaborative filtering is computed for each word space word spaces 46
  • 47. T-RECS: TEMPORAL EXPLICIT SEMANTIC ANALYSIS ➤A matrix for each year has been built ➤columns: authors ➤rows: terms occurring in the papers ESA author 1 author 2 author k term 1 0.87 0.13 0.44 term 2 0.34 0.50 0.10 term n 0.22 0.60 0.55 ESA concept 1 concept 2 concept k term 1 0.87 0.13 0.44 term 2 0.34 0.50 0.10 term n 0.22 0.60 0.55 47
  • 48. T-RECS: TEMPORAL EXPLICIT SEMANTIC ANALYSIS 2005 ESA author 1 author 2 author k term 1 0.87 0.13 0.44 term 2 0.34 0.50 0.10 term n 0.22 0.60 0.55 similarity between two authors compute the cosine similarity between the two column vectors 48
  • 49. T-RECS: TEMPORAL EXPLICIT SEMANTIC ANALYSIS 2007 ESA author 1 author 2 author k term 1 0.87 0.13 0.44 term 2 0.34 0.50 0.10 term n 0.22 0.60 0.55 similarity between two terms 2010 ESA author 1 author 2 author k term 1 0.87 0.13 0.44 term 2 0.34 0.50 0.10 term n 0.22 0.60 0.55 compute the cosine similarity between the row vectors of the same term in different years 49
  • 52. TIME-AWARE RECOMMENDER SYSTEMS ➤Temporal aspects have not yet been extensively investigated in recommendation scenarios ➤State of the art ➤context-aware, context=time ➤decay function ➤time categories (day, week, weekday or weekend) ➤model based recsys (eg. TimeSVD) 52
  • 53. OUR PROPOSAL FOR A TIME-AWARE CBRS Content Analyzer Profile Learner Recommender Item descriptions Processed Items Rated items Recommended Items Feedback Temporal Analyzer 53
  • 54. FUTURE WORK ➤Analyze different corpora (proceedings, news collection, user profiles) ➤Implement temporal analyses in content- based recommender systems ➤how (content) preferences change over time 54