Uprising microblogs: A Bayesian network
    retrieval model for tweet search

 Lamjed Ben Jabeur, Lynda Tamine and Mohand Boughanem
 IRIT, Université Paul Sabatier
A Bayesian network retrieval model for tweet search

     Outline

1.   Microblogging service
2.   Tweet search
3.   Bayesian network topology
4.   Computing conditional probabilities
5.   Experimental evaluation
6.   Conclusion and future work




                                                           2
Microblogging service

        Microblog?

“   Microblogging is a new form of communication [….]
    that enables users to broadcast and share information
    about their activities, opinions and status. [Java et
    al.2007].
                  ”
• Microblog post
    –   Short (140 characters)
                                   1 billions    Publications /week
    –   Real-time                  50 millions Publications /day
    –   Social motivation           177 million Publications in mars 2011
    –   Mobile device            +106 millions User accounts

                                                                            3
Microblogging service

          Tweet, retweet et hashtag ?

“
    Jack Dorsey 21 Mars 06  1ier Tweet
inviting coworkers                                                        #oilspill


“
    Stephen Colbert 21 Juin 2010  Golden Tweet Award 2010
In honor of oil-soaked birds, 'tweets' are now 'gurgles. http://bit.ly/cIhZNf



“
    Wendy's 8 Juin 2011  Golden Tweet Award 2011
RT for a good cause. Each Retweet sends 50¢ to help kids in foster care. #TreatItFwd




                  “
                       CORIA11 16 mars 2010
                   CORIA 2011 : Université d'Avignon #CORIA11 http://yfrog.com/h3y




                                   ““
                                          MohBoughanem 17 Mars 2010
                                    @coria2011 well visualized, quickly found
                                         MohBoughanem     CORIA11 17 Mars 2010
                                                                                       4
                                      @coria2011 well visualized, quickly found
Microblogging service

Social information network




                             5
Tweet search

       Microblog IR

• Users overwhelmed by the huge quantity of tweets
   – Important publication rate
   – Diverse sources of information
       Difficulty to accessing to interesting posts

• Microblog IR tasks
   –   Person search and follower suggestion
   –   Trend extraction
   –   Opinion search
   –   Tweet search
                                                      6
Tweet search

        Tweet search task

“   real-time search task, where the user wishes to see the
    most recent but relevant information to the query. (Ounis
    et al., 2011).
                       ”
“   adhoc search on Twitter, where a user’s information need is

                                                                  ”
    represented by a query at a specific time. (Ounis et al., 2011).

• Search motivations
    –   access to concise and credible information
    –   access to fresh and real-time news
    –   follow an event
    –   collect opinions and public sentiments
                                                                       7
Tweet search

     Related work

1. Spatio-temporel context
 TwitterStand (Sankaranarayanan J. et al, 2009)   TweetSieve (Grinev M et al, 2009)




2. Microblog features
   – followership, tweets, retweets, reply, hashtags, URLs
   – Linear combination (Nagmoti et al., 2010)
   – Learn to Rank (Duan Y et al., 2010)

                                                                                      8
Tweet search

    Related work

3. Social network structure
   – Indegree, Retweet et Mention influence (Cha et al.,
     2010).,TweetRank, FollowerRank (Nagmoti et al., 2010).
   – Authority (Kwak et al., 2010)
   – Influence (Kwak et al., 2010), TwitterRank (Weng et al., 2010),
     Popularity (Duan et al.,2010)




                                                                       9
Tweet search

        Contributions
                                        topical
•   Relevance features:
    –     Term occurrence
    –     social influence
    –     time magnitude


• Bayesian network model
                             temporal        social




                                                      10
Bayesian network topology

    Definitions and notations

•   Query: q  0,1            q, q
• Term: ki  0,1 k , ki i

• Term configuration: k
    example : k1 , k 2
    
   k   k1 , k2 ), (k1 , k2 ), (k1 , k2 ), (k1 , k2 )
         (
• Tweet: t j  0,1 ti , ti
• Microblogger: uk  0,1 uk , uk
                                                          11
Bayesian network topology

Network nodes and edges

            Query                q




            Terms     k1         k2   k3




            Tweets    t1         t2   t3




            Microbloggers   u1        u2



                                           12
Computing conditional probabilities
         Query evaluation


Query                  q
                                                                
                                         P(q  t i )   P(q | k )P(k | t i ) P( t i | u k ) P(u k )
                                                        
                                                        k
Terms       k1        k2        k3
                                                             
                                       P(q  t j )   P(q | k )P( t j | u k ) P(u k )
                                                      
                                                          k
Tweets      t1        t2         t3
                                                                            
                                         P(k i | t j )   P(k i | t j ) 
                                        k |on(i,k ) 1                   
                                        i                k i |on(i,k )  0 
Microbloggers    u1             u2



                                                                                                13
Computing conditional probabilities

        Query
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 


                           
P(q | k )           on(i, k )
                   i , ki q




                                                                                         14
Computing conditional probabilities

        Tweet
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                                           Term occurrence                     Tweet properties




  P( k i | t j )  1  P( k i | t j )



                                                                                                  15
Computing conditional probabilities

        Term frequency
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                    a
                                       if k i  t j
                                                                       F ( ki , t j )
                1                                                     1
F (ki , t j )   tf ki ,t j                                         0,8
                 0
                                                                                                            a=0,1

                                       otherwise                    0,6                                    a=0,25

                                                                     0,4                                    a=0,5

                                                                     0,2                                    a=0,75

                                                                        0                                   a=1

                                                                            0            5   tf ki ,t j10
                                                                                                            16
Computing conditional probabilities

        Hashtag
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                     b                if # k i  t j
                1 
H (ki , t j )   tf #ki ,t j
                 b                     otherwise
                




                                                                                         17
Computing conditional probabilities

        Time magnitude
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                                                       tweets
                        df k i, j
 T ( ki , t j )                                           30

                           j                              20
                                                                                                        t1
                                                           10
                                                                                                        t2
                                                             0
                                                                 1       2                       tems
                                                                                 3       4   5

                  
     j  t k ,  t j   t k  t                                                  time

                                                                                                   18
Computing conditional probabilities

        Tweet length
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 

 P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )

                1
L(t j ) 
          1  avgtl  tltj




                                                                                         19
Computing conditional probabilities

        Microblogger
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 


                            1
 P( t j | u k ) 
                          u   k




                                                                                         20
Computing conditional probabilities

        Social influence
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 



 P(uk )  Inf (uk )


PageRank on Retweet Social Network
                   1                                  Inf G 1 (ui )
                                                           k
Inf Gk (ui )  d        (1  d )  w j ,i
                  U              u j ,e ( u j ,ui )E   O(u j )
          (u j )   (u j )
w j ,i 
                (u j )
                                                                                         21
Computing conditional probabilities

        Social influence
                                                                                   
P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j ) 
                                               k |on(i,k )1                    
              k                                i                k i |on(i,k )  0 




                                                                         (u j )   (ui )
                                                             wi , j 
                                                                               (ui )
                                                                                             22
Experimental evaluation

TREC 2011 Microblog
                                                                  NESTOR
                                                        Microblog Search Engine

Tweets         16 141 812      Microbloggers                            5 356 432
Retweets        1 128 179      Retweet relationships                    1 060 551
Tweet           1 860 112      Social network of retweets: nodes        5 495 081
Terms           7 781 775      Social network of retweets: edges        1 024 914
Hashtags         455 179       Giant component                            11.12%


    Term frequency                 Hashtags                     Tweet length

                  1.5E8                       1.5E 7                           1.5E 6




0          5          10 0            5            10     0                20
                Term frequency, hashtags and length distributions
                                                                                        23
Experimental evaluation

   Queries and ground truth

• “Arab Spring” query dataset (25 queries)
  – Topical
     “Number of protesters in Tahrir”, “Tunisian revolution”

  – Temporal
     “ElBaradei arrvies in Egypt”, “Clashes in Tahrir”, “SMS Down Egypt”

  – Social
     “Wael Ghonim”, “Mubarak dissolves government”

• User rating (relevant, not relevent)
• Tweets ranked by Score; p@10; p@20
                                                                           24
Experimental evaluation

       Configurations and baselines

BNTS         Bayesian network model for tweet search*
BNTS-L       BNTS, Tweet length feature disabled
BNTS-T       BNTS, Time magnitude feature disabled
BNTS-H       BNTS, Hashtag feature disabled
BNTS-S       BNTS, Social influence feature disabled
BM25         Okapi BM25
VSM          Vector Space Model
BM           Boolean Model


*   0.25, a  0.25, b  0.4, t  1h, d  0.15

                                                        25
Experimental evaluation

 Features impact

     BNTS         BNTS-L           BNTS-T       BNTS-H          BNTS-S

                      0,584 0,58
0,552 0,532                             0,548                   0,542 0,528
                                                0,502



                                                        0,294
              0,256




              p@10                                      p@20

                                                                     26
Experimental evaluation

Features impact
                                                                     Topical
              BNTS        BNTS-L     BNTS-T    BNTS-H        BNTS-S
                       0,7533 0,7333
                                                                    0,7233
0,66 0,6867                                0,6867 0,6833                     0,6833




                                                           0,3767
              0,2867




              p@10                                         p@20
                                                                                      27
Experimental evaluation

 Features impact
                                                                      Temporal
                  BNTS       BNTS-L       BNTS-T      BNTS-H     BNTS-S

0,4333
                                    0,4
                           0,3333                  0,35
                                                                       0,3 0,3167
         0,2333
                                                          0,2

                                                                0,1
                  0,0667


                  p@10                                          p@20
                                                                                    28
Experimental evaluation

 Features impact
                                                                           Social
                  BNTS       BNTS-L        BNTS-T   BNTS-H         BNTS-L
0,3714
         0,3286            0,3286 0,3286        0,3357

                  0,2714                                                          0,2857
                                                         0,2429          0,2571
                                                                  0,2




                  p@10                                            p@20
                                                                                           29
Experimental evaluation

 Retrieval effectiveness




                    p@10             p@20
BNTS                0,552            0,548
BM25                0,576      -4%   0,494     11%
BM                  0,416   ** 33%   0,382   ** 34%
VSM                 0,376   ** 47%    0,36   ** 52%
                                                      30
A Bayesian network retrieval model for tweet search

        Conclusion and future work

•   Tweet search model
    –     Normalized Term frequency
    –     Time magnitude
    –     Social influence
•   Integrating relevance factors within a Bayesian network
•   Query profile impact features performances.
•   Our model outperforms traditional IR baselines.
•   Future work
    –     Automatically detect optimal time window
    –     Select appropriate feature depending on the query profile
                                                                      31
Thank you for your attention!




            Follow me on Twitter!
             http://twitter.com/amjedbj
Computing conditional probabilities
      Query evaluation
                                                                
            q
                               P(t j | q)   P(q | k ) P(t j | k )P(k )
                                             
                                                 k
                                                                                          
k1         k2         k3       P(t j | q)   P(q | k ) P(tkj | k )P(toj | k ) P(t sj | k ) P(k )
                                             
                                                 k




                                           o1              o2               u1            u1




tk1        tk2       tk3
                                to3             to2        to3        ts1        ts2      ts3




                                      t1              t2         t3                             33
Experimental evaluation

      Term frequency normalization

•    BNTS.K
    p @ 30
                                                                                 1                 tf ki ,t j  
                                                                                         
    0,35
                                                                   P(t kj | k )          
     0,3                                                                          k   ki k t j      tf ki ,t j

    0,25

     0,2

    0,15

     0,1

    0,05

      0
           0   0,1   0,2   0,3   0,4       0,5   0,6   0,7   0,8   0,9     1

                                                                                                                   34
Experimental evaluation

               Time window

 •             BNTS.KO
p @ 30
  0,32
                                                                             t      t 
                                                                  oe :  oe  , oe  
 0,315                                                                       2       2

  0,31


 0,305


     0,3


 0,295

                                                                                   jours
  0,29

           0    1   2    3   4   5       6   7    8   9   10 11 12 13 14 15 16 17
                                                                                             35
                                                 t
Experimental evaluation

       Retrieval effectiveness
       isiFDL   DFReeKLIM30      BNTS   Médiane   Nestor   BM25   Disjunctive
 0,5
0,45
 0,4
0,35
 0,3
0,25
 0,2
0,15
 0,1
0,05
  0
                     p@30                                  MAP
                                                                                36
Experimental evaluation

         TREC Microblogs 2011
                                  Ranked by time                       Ranked by score
                        All rel                    High rel                All rel
                   p@30           MAP       p@30          MAP          p@30          MAP
Nestor*                0.2027      0.1305    0.0838           0.1287     0.2218      0.1384
Nestor-S*              0.2027      0.1305    0.0838           0.1286     0.2184      0.1360
Nestor-T               0.2082      0.1343    0.0585           0.0912     0.1912      0.1196
Nestor-L               0.2048      0.1306    0.0565           0.0867     0.2293      0.1426
Median                 0.2592      0.1433    0.2646           0.1381




                                                                                           37
Experimental evaluation

        TREC Microblogs 2011
                    Système               Seuil    p@10      p@20      p@30     Map
1    Somme IDF des termes présents            30    0,3633    0,3316   0,3333   0,1759
2    BM25                                     30    0,3571    0,3245   0,2973   0,1546
3    Proportion des termes présents           30    0,2653    0,2561   0,2782      0,14
4    Somme des fréquences booléennes          30    0,2571    0,2663   0,2755   0,1387
5    EBM (AND)                                30    0,3041    0,2918   0,2714   0,1282
6    Réseau d’inférence Bayésien              30     0,302    0,2888   0,2687   0,1274
7    Somme TF*IDF                             30     0,302    0,2888   0,2687   0,1274
8    VSM                                      30     0,302    0,2888   0,2687   0,1274
9    Somme TF                                 30    0,2327    0,2276   0,2238   0,1066
10   Nestor                                         0,2857    0,2347   0,2027   0,1305
11   EBM (OR)                                 30    0,1837    0,1786    0,166   0,0541
12   Sommes des fréquences des Hashtags       30    0,1612    0,1541   0,1469   0,0512
13   Lucene-Baseline                        1000    0,1612    0,1143   0,0986   0,1411
14    Somme TF (normalise par longueur)       30    0,0816    0,0673   0,0612   0,0223
15   Ordre chronologique inverse              30    0,0184    0,0255   0,0218   0,0082

                                                                                      38

More Related Content

PDF
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
PPTX
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
PDF
Un modèle de Recherche d'Information Sociale pour l'Accès aux Ressources Bib...
PPTX
Poster Recherche d'Information Sociale
PDF
Diachronic Analysis of the Italian Language exploiting Google Ngram
PDF
Master Minds on Data Science - Maarten de Rijke
PDF
MICROBLOGGING CONTENT PROPAGATION MODELING USING TOPIC-SPECIFIC BEHAVIORAL FA...
PDF
Rethinking Microblogging: Open Distributed Semantic
Intégration des facteurs temps et autorité sociale dans un modèle bayésien de...
Un modèle de recherche d’information sociale dans les microblogs : cas de Twi...
Un modèle de Recherche d'Information Sociale pour l'Accès aux Ressources Bib...
Poster Recherche d'Information Sociale
Diachronic Analysis of the Italian Language exploiting Google Ngram
Master Minds on Data Science - Maarten de Rijke
MICROBLOGGING CONTENT PROPAGATION MODELING USING TOPIC-SPECIFIC BEHAVIORAL FA...
Rethinking Microblogging: Open Distributed Semantic

Viewers also liked (7)

PDF
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
ODP
Semantic Microblogging
PDF
Web-scale semantic search
PPT
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
PDF
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
PPTX
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
PDF
Moederpresentatie Cross Media Cafe - Uit het Lab
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
Semantic Microblogging
Web-scale semantic search
(Micro)Blog : un sujet de recherche actuel [08/02/2011]
Barometre RegionsJob/Bringr : les conversations "emploi" sur les réseaux sociaux
Quels facteurs de pertinence pour la recherche de produits e-commerce ?
Moederpresentatie Cross Media Cafe - Uit het Lab
Ad

Similar to Uprising microblogs: A Bayesian network retrieval model for tweet search (8)

PDF
Iscc2011 ioannis stavrakakis_ keynote
PPTX
Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
PDF
Machine Learning at PeerIndex
PPTX
Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
PPTX
Learning from Twitter Hashtags: Leveraging Proximate Tags to Enhance Graph-ba...
PDF
Predicting Potential Responders in Twitter: A Query Routing Algorithm
PDF
A new approach to achieve the users’ habitual opportunities on social media
PDF
Tweet Recommendation with Graph Co-Ranking
Iscc2011 ioannis stavrakakis_ keynote
Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
Machine Learning at PeerIndex
Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
Learning from Twitter Hashtags: Leveraging Proximate Tags to Enhance Graph-ba...
Predicting Potential Responders in Twitter: A Query Routing Algorithm
A new approach to achieve the users’ habitual opportunities on social media
Tweet Recommendation with Graph Co-Ranking
Ad

More from Lamjed Ben Jabeur (6)

PDF
Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...
PDF
IRIT at clef 2015: A product search model for head queries
PPTX
Challenges of managing Data Science Project
PDF
Leveraging social relevance: Using social networks to enhance literature acce...
PPTX
A social model for Literature Access: Towards a weighted social network of au...
PPTX
An Exploratory Study on Using Social Information Networks for Flexible Litera...
Accès à l’information dans les réseaux sociaux : quelles formes de collaborat...
IRIT at clef 2015: A product search model for head queries
Challenges of managing Data Science Project
Leveraging social relevance: Using social networks to enhance literature acce...
A social model for Literature Access: Towards a weighted social network of au...
An Exploratory Study on Using Social Information Networks for Flexible Litera...

Recently uploaded (20)

PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
20th Century Theater, Methods, History.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
HVAC Specification 2024 according to central public works department
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Hazard Identification & Risk Assessment .pdf
Paper A Mock Exam 9_ Attempt review.pdf.
Introduction to pro and eukaryotes and differences.pptx
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
AI-driven educational solutions for real-life interventions in the Philippine...
Share_Module_2_Power_conflict_and_negotiation.pptx
20th Century Theater, Methods, History.pptx
History, Philosophy and sociology of education (1).pptx
Unit 4 Computer Architecture Multicore Processor.pptx
HVAC Specification 2024 according to central public works department
Environmental Education MCQ BD2EE - Share Source.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
FORM 1 BIOLOGY MIND MAPS and their schemes
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
B.Sc. DS Unit 2 Software Engineering.pptx
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Hazard Identification & Risk Assessment .pdf

Uprising microblogs: A Bayesian network retrieval model for tweet search

  • 1. Uprising microblogs: A Bayesian network retrieval model for tweet search Lamjed Ben Jabeur, Lynda Tamine and Mohand Boughanem IRIT, Université Paul Sabatier
  • 2. A Bayesian network retrieval model for tweet search Outline 1. Microblogging service 2. Tweet search 3. Bayesian network topology 4. Computing conditional probabilities 5. Experimental evaluation 6. Conclusion and future work 2
  • 3. Microblogging service Microblog? “ Microblogging is a new form of communication [….] that enables users to broadcast and share information about their activities, opinions and status. [Java et al.2007]. ” • Microblog post – Short (140 characters) 1 billions Publications /week – Real-time 50 millions Publications /day – Social motivation 177 million Publications in mars 2011 – Mobile device +106 millions User accounts 3
  • 4. Microblogging service Tweet, retweet et hashtag ? “ Jack Dorsey 21 Mars 06  1ier Tweet inviting coworkers #oilspill “ Stephen Colbert 21 Juin 2010  Golden Tweet Award 2010 In honor of oil-soaked birds, 'tweets' are now 'gurgles. http://bit.ly/cIhZNf “ Wendy's 8 Juin 2011  Golden Tweet Award 2011 RT for a good cause. Each Retweet sends 50¢ to help kids in foster care. #TreatItFwd “ CORIA11 16 mars 2010 CORIA 2011 : Université d'Avignon #CORIA11 http://yfrog.com/h3y ““ MohBoughanem 17 Mars 2010 @coria2011 well visualized, quickly found MohBoughanem CORIA11 17 Mars 2010 4 @coria2011 well visualized, quickly found
  • 6. Tweet search Microblog IR • Users overwhelmed by the huge quantity of tweets – Important publication rate – Diverse sources of information Difficulty to accessing to interesting posts • Microblog IR tasks – Person search and follower suggestion – Trend extraction – Opinion search – Tweet search 6
  • 7. Tweet search Tweet search task “ real-time search task, where the user wishes to see the most recent but relevant information to the query. (Ounis et al., 2011). ” “ adhoc search on Twitter, where a user’s information need is ” represented by a query at a specific time. (Ounis et al., 2011). • Search motivations – access to concise and credible information – access to fresh and real-time news – follow an event – collect opinions and public sentiments 7
  • 8. Tweet search Related work 1. Spatio-temporel context TwitterStand (Sankaranarayanan J. et al, 2009) TweetSieve (Grinev M et al, 2009) 2. Microblog features – followership, tweets, retweets, reply, hashtags, URLs – Linear combination (Nagmoti et al., 2010) – Learn to Rank (Duan Y et al., 2010) 8
  • 9. Tweet search Related work 3. Social network structure – Indegree, Retweet et Mention influence (Cha et al., 2010).,TweetRank, FollowerRank (Nagmoti et al., 2010). – Authority (Kwak et al., 2010) – Influence (Kwak et al., 2010), TwitterRank (Weng et al., 2010), Popularity (Duan et al.,2010) 9
  • 10. Tweet search Contributions topical • Relevance features: – Term occurrence – social influence – time magnitude • Bayesian network model temporal social 10
  • 11. Bayesian network topology Definitions and notations • Query: q  0,1 q, q • Term: ki  0,1 k , ki i • Term configuration: k example : k1 , k 2  k   k1 , k2 ), (k1 , k2 ), (k1 , k2 ), (k1 , k2 ) ( • Tweet: t j  0,1 ti , ti • Microblogger: uk  0,1 uk , uk 11
  • 12. Bayesian network topology Network nodes and edges Query q Terms k1 k2 k3 Tweets t1 t2 t3 Microbloggers u1 u2 12
  • 13. Computing conditional probabilities Query evaluation Query q   P(q  t i )   P(q | k )P(k | t i ) P( t i | u k ) P(u k )  k Terms k1 k2 k3  P(q  t j )   P(q | k )P( t j | u k ) P(u k )  k Tweets t1 t2 t3     P(k i | t j )   P(k i | t j )   k |on(i,k ) 1    i k i |on(i,k )  0  Microbloggers u1 u2 13
  • 14. Computing conditional probabilities Query    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0    P(q | k )   on(i, k ) i , ki q 14
  • 15. Computing conditional probabilities Tweet    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j ) Term occurrence Tweet properties P( k i | t j )  1  P( k i | t j ) 15
  • 16. Computing conditional probabilities Term frequency    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )  a if k i  t j F ( ki , t j ) 1  1 F (ki , t j )   tf ki ,t j 0,8  0 a=0,1  otherwise 0,6 a=0,25 0,4 a=0,5 0,2 a=0,75 0 a=1 0 5 tf ki ,t j10 16
  • 17. Computing conditional probabilities Hashtag    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j )  b if # k i  t j 1  H (ki , t j )   tf #ki ,t j  b otherwise  17
  • 18. Computing conditional probabilities Time magnitude    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j ) tweets df k i, j T ( ki , t j )  30 j 20 t1 10 t2 0 1 2 tems 3 4 5   j  t k ,  t j   t k  t  time 18
  • 19. Computing conditional probabilities Tweet length    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(k i | t j )  (1   ) F (ki , t j ) H (ki , t j )  T (ki , t j ) L(t j ) 1 L(t j )  1  avgtl  tltj 19
  • 20. Computing conditional probabilities Microblogger    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  1 P( t j | u k )  u k 20
  • 21. Computing conditional probabilities Social influence    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0  P(uk )  Inf (uk ) PageRank on Retweet Social Network 1 Inf G 1 (ui ) k Inf Gk (ui )  d  (1  d )  w j ,i U u j ,e ( u j ,ui )E O(u j )  (u j )   (u j ) w j ,i   (u j ) 21
  • 22. Computing conditional probabilities Social influence    P(q  t j )   P(q | k )P( t j | u k ) P(u k )  P(k i | t j )   P( k i | t j )    k |on(i,k )1   k  i k i |on(i,k )  0   (u j )   (ui ) wi , j   (ui ) 22
  • 23. Experimental evaluation TREC 2011 Microblog NESTOR Microblog Search Engine Tweets 16 141 812 Microbloggers 5 356 432 Retweets 1 128 179 Retweet relationships 1 060 551 Tweet 1 860 112 Social network of retweets: nodes 5 495 081 Terms 7 781 775 Social network of retweets: edges 1 024 914 Hashtags 455 179 Giant component 11.12% Term frequency Hashtags Tweet length 1.5E8 1.5E 7 1.5E 6 0 5 10 0 5 10 0 20 Term frequency, hashtags and length distributions 23
  • 24. Experimental evaluation Queries and ground truth • “Arab Spring” query dataset (25 queries) – Topical “Number of protesters in Tahrir”, “Tunisian revolution” – Temporal “ElBaradei arrvies in Egypt”, “Clashes in Tahrir”, “SMS Down Egypt” – Social “Wael Ghonim”, “Mubarak dissolves government” • User rating (relevant, not relevent) • Tweets ranked by Score; p@10; p@20 24
  • 25. Experimental evaluation Configurations and baselines BNTS Bayesian network model for tweet search* BNTS-L BNTS, Tweet length feature disabled BNTS-T BNTS, Time magnitude feature disabled BNTS-H BNTS, Hashtag feature disabled BNTS-S BNTS, Social influence feature disabled BM25 Okapi BM25 VSM Vector Space Model BM Boolean Model *   0.25, a  0.25, b  0.4, t  1h, d  0.15 25
  • 26. Experimental evaluation Features impact BNTS BNTS-L BNTS-T BNTS-H BNTS-S 0,584 0,58 0,552 0,532 0,548 0,542 0,528 0,502 0,294 0,256 p@10 p@20 26
  • 27. Experimental evaluation Features impact Topical BNTS BNTS-L BNTS-T BNTS-H BNTS-S 0,7533 0,7333 0,7233 0,66 0,6867 0,6867 0,6833 0,6833 0,3767 0,2867 p@10 p@20 27
  • 28. Experimental evaluation Features impact Temporal BNTS BNTS-L BNTS-T BNTS-H BNTS-S 0,4333 0,4 0,3333 0,35 0,3 0,3167 0,2333 0,2 0,1 0,0667 p@10 p@20 28
  • 29. Experimental evaluation Features impact Social BNTS BNTS-L BNTS-T BNTS-H BNTS-L 0,3714 0,3286 0,3286 0,3286 0,3357 0,2714 0,2857 0,2429 0,2571 0,2 p@10 p@20 29
  • 30. Experimental evaluation Retrieval effectiveness p@10 p@20 BNTS 0,552 0,548 BM25 0,576 -4% 0,494 11% BM 0,416 ** 33% 0,382 ** 34% VSM 0,376 ** 47% 0,36 ** 52% 30
  • 31. A Bayesian network retrieval model for tweet search Conclusion and future work • Tweet search model – Normalized Term frequency – Time magnitude – Social influence • Integrating relevance factors within a Bayesian network • Query profile impact features performances. • Our model outperforms traditional IR baselines. • Future work – Automatically detect optimal time window – Select appropriate feature depending on the query profile 31
  • 32. Thank you for your attention! Follow me on Twitter! http://twitter.com/amjedbj
  • 33. Computing conditional probabilities Query evaluation    q P(t j | q)   P(q | k ) P(t j | k )P(k )  k      k1 k2 k3 P(t j | q)   P(q | k ) P(tkj | k )P(toj | k ) P(t sj | k ) P(k )  k o1 o2 u1 u1 tk1 tk2 tk3 to3 to2 to3 ts1 ts2 ts3 t1 t2 t3 33
  • 34. Experimental evaluation Term frequency normalization • BNTS.K p @ 30  1 tf ki ,t j    0,35 P(t kj | k )    0,3 k ki k t j tf ki ,t j 0,25 0,2 0,15 0,1 0,05 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1  34
  • 35. Experimental evaluation Time window • BNTS.KO p @ 30 0,32  t t  oe :  oe  , oe   0,315  2 2 0,31 0,305 0,3 0,295 jours 0,29 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 35 t
  • 36. Experimental evaluation Retrieval effectiveness isiFDL DFReeKLIM30 BNTS Médiane Nestor BM25 Disjunctive 0,5 0,45 0,4 0,35 0,3 0,25 0,2 0,15 0,1 0,05 0 p@30 MAP 36
  • 37. Experimental evaluation TREC Microblogs 2011 Ranked by time Ranked by score All rel High rel All rel p@30 MAP p@30 MAP p@30 MAP Nestor* 0.2027 0.1305 0.0838 0.1287 0.2218 0.1384 Nestor-S* 0.2027 0.1305 0.0838 0.1286 0.2184 0.1360 Nestor-T 0.2082 0.1343 0.0585 0.0912 0.1912 0.1196 Nestor-L 0.2048 0.1306 0.0565 0.0867 0.2293 0.1426 Median 0.2592 0.1433 0.2646 0.1381 37
  • 38. Experimental evaluation TREC Microblogs 2011 Système Seuil p@10 p@20 p@30 Map 1 Somme IDF des termes présents 30 0,3633 0,3316 0,3333 0,1759 2 BM25 30 0,3571 0,3245 0,2973 0,1546 3 Proportion des termes présents 30 0,2653 0,2561 0,2782 0,14 4 Somme des fréquences booléennes 30 0,2571 0,2663 0,2755 0,1387 5 EBM (AND) 30 0,3041 0,2918 0,2714 0,1282 6 Réseau d’inférence Bayésien 30 0,302 0,2888 0,2687 0,1274 7 Somme TF*IDF 30 0,302 0,2888 0,2687 0,1274 8 VSM 30 0,302 0,2888 0,2687 0,1274 9 Somme TF 30 0,2327 0,2276 0,2238 0,1066 10 Nestor 0,2857 0,2347 0,2027 0,1305 11 EBM (OR) 30 0,1837 0,1786 0,166 0,0541 12 Sommes des fréquences des Hashtags 30 0,1612 0,1541 0,1469 0,0512 13 Lucene-Baseline 1000 0,1612 0,1143 0,0986 0,1411 14 Somme TF (normalise par longueur) 30 0,0816 0,0673 0,0612 0,0223 15 Ordre chronologique inverse 30 0,0184 0,0255 0,0218 0,0082 38