SlideShare a Scribd company logo
35th Annual International ACM SIGIR Conference on Research
                    and Development in Information Retrieval (SIGIR 2012)



                        Explicit Relevance Models
                   in Intent-Aware IR Diversification
                           Saúl Vargas, Pablo Castells and David Vallet
                               Universidad Autónoma de Madrid
                                        http://ir.ii.uam.es

                                         Portland, OR, 13 August 2012




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
Outline


            Context: IR diversification formulation and algorithms

            Proposed approach: relevance-based reformulation
                 of diversification algorithms

            Experiments

            Adjustable tolerance to redundancy

            Conclusion




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Brief recap

                                                                                                               Nutrition /
                                                                                                               Health


                                                                                                               Appliance


                                                                                                               Chemical
                                                                                                               element


                                                                                                               Golf


                                                                                                               Mining /
                                                                                                               Metallurgy



IRG
                                  Explicit Relevance Models in Intent-Aware IR Diversification
                 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                    Portland, OR, 13 August 2012
IR diversity – Brief recap

                                                                                                                         Nutrition /
                                                                                                                         Health


                                                                                                                         Appliance
                  Diversity as a means to address uncertainty in user queries
                    – The same query may have different intents or aspects in the Chemical
                      information need underneath                                 element
                  Revision of document relevance independence
                    – Marginal utility of additional relevant documents decreases fast
                                                                                Golf
                  Trade diminishing marginal utility for increased intent coverage
                    – Thus maximize the number of users who obtain at least some
                      useful document                                            Mining /
                                                                                                                         Metallurgy



IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversification – Problem statement

   Given a query 𝑞 on a collection 
   Find 𝑆 ⊂  of given size maximizing:                                                                              NP-hard
     𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞
   Agrawal 2009, Santos 2010, Chen 2006, …


          𝑅− 𝑆                                                                        𝑆
    Baseline              arg max 𝝋 𝒅, 𝑺 𝒒                                          Diversified                      Greedy
     ranking                 𝑑∈𝑅−𝑆                                                  ranking                          approx
      𝑝(𝑑|𝑞)


      𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑 ′ ∈ 𝑆 is relevant                                                        𝑞



IRG
                                    Explicit Relevance Models in Intent-Aware IR Diversification
                   35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                      Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞
                               𝑧                                                      𝑑 ′ ∈𝑆

                          Explicit query aspects
    xQuAD scheme (Santos 2010)
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛                                  1 − 𝑝 𝑑′ 𝑞, 𝒛
                                                    𝑧                                           𝑑 ′ ∈𝑆

                                                                           Explicit query aspects
IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                               𝑧                                                     𝑑 ′ ∈𝑆


                        Query aspect
    xQuAD scheme (Santos 2010)
                          coverage
      𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                    𝑧                                           𝑑 ′ ∈𝑆




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                               𝑧                                                     𝑑 ′ ∈𝑆



                           Document “relevance”
    xQuAD scheme (Santos 2010)
                              for query aspect
      𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                    𝑧                                           𝑑 ′ ∈𝑆




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                               𝑧                                                     𝑑 ′ ∈𝑆



    xQuAD scheme (Santos 2010)                                                                           Redundancy
                                                                                                          penalization
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                    𝑧                                           𝑑 ′ ∈𝑆




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

   State of the art aspect-based approaches
    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =               𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                                𝑧                                                     𝑑 ′ ∈𝑆



    xQuAD scheme (Santos 2010)
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                  𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                     𝑧                                           𝑑 ′ ∈𝑆

                 Mixture with baseline                           𝜆  Degree of diversification
IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversity – Instantiations of objective function

                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭                                                     𝑞

    IA-Select scheme (Agrawal 2009)

             𝜑 𝑑, 𝑆 𝑞 =               𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞                                          1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞
                                𝑧                                                     𝑑 ′ ∈𝑆


                                                                                                              Probability to
    xQuAD scheme (Santos 2010)                                                                            observe documents
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞

          = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆                                  𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧                                  1 − 𝑝 𝑑 ′ 𝑞, 𝑧
                                                     𝑧                                           𝑑 ′ ∈𝑆




IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function

                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭                                                     𝑞

    IA-Select scheme – relevance-based                                                                   Our proposal
             𝜑 𝑑, 𝑆 𝑞 =               𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝒓 𝑑 ′ , 𝑞, 𝑧
                                𝑧                                                𝑑 ′ ∈𝑆


                                                                                                                    Probability
    xQuAD scheme – relevance-based                                                                                of relevance
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞

   = 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆                                     𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝒓 𝑑′ , 𝑞, 𝑧
                                                      𝑧                                                𝑑 ′ ∈𝑆




IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function

                 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭                                                      𝑞

    IA-Select scheme – relevance-based

             𝜑 𝑑, 𝑆 𝑞 =               𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧          1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
                                𝑧        More literal interpretation
                                                          𝑑 ′ ∈𝑆
                                        of initial problem statement

    xQuAD scheme – relevance-based
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞

   = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆                                     𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
                                                     𝑧                                                𝑑 ′ ∈𝑆




IRG
                                            Explicit Relevance Models in Intent-Aware IR Diversification
                           35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                              Portland, OR, 13 August 2012
IR diversity – Relevance-based instantiation of objective function

             𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant                                                         𝑞

    IA-Select scheme – relevance-based

             𝜑 𝑑, 𝑆 𝑞 =              𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
                               𝑧                                               𝑑 ′ ∈𝑆


                                                                                                            Equivalent
    xQuAD scheme – relevance-based
                                                                                                             for 𝜆 = 1
             𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑 𝑞 + 𝜆 𝑝 𝑟 𝑑 , ¬ 𝑟 𝑆 𝑞

   = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆                                    𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧
                                                    𝑧                                                𝑑 ′ ∈𝑆




IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
Relevance distirbution vs. document distribution

                 𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context)
       1


                           𝑝 𝑑 𝑞, 𝑧 = 1
                       𝑑


                                                    𝑝 𝑟 𝑑, 𝑞, 𝑧 = E nr relevant docs ≥ 1
                                               𝑑


                                                                     Different potential behavior
                                                                    E.g. stronger redundancy penalization
                                                                                                                           Potential rank
       0                                                                                                                   equivalences do
                                                          𝑑                                                                not apply here

     1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆                                𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧                                    1 − 𝑝 𝑟 𝑑′ , 𝑞, 𝑧
IRG                                             𝑧                                                     ′
                                              Explicit Relevance Models in Intent-Aware IR Diversification
                                                                                                 𝑑 ∈𝑆
                             35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                Portland, OR, 13 August 2012
Relevance-based greedy diversification



         Relevance-based reformulation of diversification algorithm

         1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

         2. Does it work? Test empirically

         3. Further development: parameterized tolerance to redundancy




IRG
                                      Explicit Relevance Models in Intent-Aware IR Diversification
                     35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                        Portland, OR, 13 August 2012
Aspect-based relevance model

     Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛

     Cannot use odds, logs, constant removal… or any other rank-preserving step
     (we need the specific values)

                              𝑝 𝑟 𝑑, 𝑞                            Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞

                                                                  Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending
                              𝑝 𝑧 𝑑
                                                                  on available observations:
        𝑝 𝑟 𝑑, 𝑞, 𝑧           𝑝 𝑧 𝑞                               • 𝑧 as document classes (e.g. ODP)
                                                                  • 𝑧 as subqueries (e.g. reformulations)
                              𝑝(𝑧)
                                                                  Then derive the other two parameters

                              𝑝 𝑑 𝑞                              Normalized baseline IR system score
                                                                 (as in e.g. Bache 2009)

IRG
                                       Explicit Relevance Models in Intent-Aware IR Diversification
                      35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                         Portland, OR, 13 August 2012
Positional relevance distribution estimate

                                        𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞                               = 𝒑 𝒓 𝒌

             1E+00

                 1E-01                                           𝑝 𝑟 𝑘
                                                                                                            pLSA
                 1E-02
    p(r|k)




                                                                                                       Lemur                Precision
                 1E-03                                                                                                      estimates

                 1E-04                                                                                                      Click log
                                                                                              AOL                           statistics
                 1E-05
                         0   20     40         60        80       100 120 140 160 180 200
                                                                   𝑘
                                                                   k


IRG
                                               Explicit Relevance Models in Intent-Aware IR Diversification
                              35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                 Portland, OR, 13 August 2012
Relevance-based greedy diversification



         Relevance-based reformulation of diversification algorithm

         1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

         2. Does it work? Test empirically

         3. Further development: parameterized tolerance to redundancy




IRG
                                      Explicit Relevance Models in Intent-Aware IR Diversification
                     35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                        Portland, OR, 13 August 2012
Experiments
   Search diversity
          Collection: ClueWeb09 category B (50M documents)
          Query/subtopic set: TREC 2009/10 diversity task (100 queries)


          Baseline ranking: Lemur Indri search engine (Web service)                                           Diversified top n : 100
          Query aspect space:
                 a) ODP categories level 4 (~7K categories)
                 b) TREC subtopics (oracle for reference)
          Specific parameter estimates:
                 𝑝 𝑧 𝑞     Uniform
                           ODP categories: semi-supervised text classification by Textwise
                 𝑝 𝑧 𝑑
                           TREC subtopics: Indri search system run on 𝑧 as if a query
                           i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)
                 𝑝 𝑟 𝑘
                           ii. Click statistics from AOL log (thus different IR system)




IRG
                                                  Explicit Relevance Models in Intent-Aware IR Diversification
                                 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                    Portland, OR, 13 August 2012
Experiments – Search diversity on TREC

     xQuAD scheme
                                                               Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
       𝑝 𝑟 𝑘 from qrels
                                                               Based on 𝑝 𝑑 𝑞, 𝑧


                          ODP categories                                                             TREC subtopics
             ERR-IA




                                                                            ERR-IA

                                      λ                                                                           λ



IRG
                                           Explicit Relevance Models in Intent-Aware IR Diversification
                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                             Portland, OR, 13 August 2012
Experiments – Search diversity on TREC


                                                                    -nDCG@20              ERR-IA@20             nDCGIA@20             S-recall@20
                              Lemur                            -        0.2587                0.1630                 0.2396              0.4636
                              IA-Select                        -        0.2651                0.1681                 0.2423              0.4483
                 categories
                  a) ODP




                              xQuAD                          0.9        0.2675                0.1656                 0.2451              0.4864
                              Rel-based i. Qrels 0.1                    0.2858△▲              0.1828△▲               0.2655△▲            0.4898▲△
                              xQuAD     ii. Clicks 0.4                  0.2841▲△              0.1831△△               0.2605△▲            0.4830▲▽

                              IA-Select                        -        0.3541                0.2346                 0.3213              0.5787
                 subtopics
                  b) TREC




                              xQuAD                          1.0        0.3445                0.2241                 0.3127              0.5704
                              Rel-based i. Qrels 1.0                    0.3543△△              0.2349△△               0.3192▽△            0.5782▽△
                              xQuAD     ii. Clicks 1.0                  0.3512▽△              0.2320▽△               0.3166▽△            0.5748▽△

                      “informally” maximizing ERR-IA by 0.1 steps for each diversifier
                     Best value in bold green
                     ▲▼         𝑝 < 0.05


IRG
                                                           Explicit Relevance Models in Intent-Aware IR Diversification
                                          35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                             Portland, OR, 13 August 2012
Experiments
   Recommendation diversity
                                               Collection: 6K users, 4K movies, 1M ratings
          Dataset 1: MovieLens 1M
                                               Subtopic set: 10 movie genres
                                               Collection: 1K users, 175K artists, 20M playcounts
          Dataset 2: Last.fm crawl
                                               Subtopic set: 120K social tags on artists by Last.fm users
                                                                Queries  users
          Adaptation of IR diversity paradigm                   Documents  items (movies, music artists)
                                                                Subtopics  item features (genres, tags)
          (Vargas, Castells & Vallet SIGIR 2011)
                                                                Relevance judgments  test ratings from data split

                                     a) pLSA
          Baseline rankings:                                                                              Diversified top n: 100
                                     b) Popularity-based recommendation
          Specific parameter estimates:
                 𝑝 𝑧 𝑞   Uniform
                 𝑝 𝑧 𝑑   Uniform on 𝑑 (based on binary aspect/item association)
                 𝑝 𝑟 𝑘   P@k estimates with 2-fold cross-validation on test users



IRG
                                               Explicit Relevance Models in Intent-Aware IR Diversification
                              35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                 Portland, OR, 13 August 2012
Experiments – Recommendation diversity on MovieLens and Last.fm

                      pLSA recommender                       MovieLens 1M                                                Last.fm

                                         ERR-IA
             by item popularity
             Recommendation


                                         ERR-IA




                                                                                                                                           Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧
                                                                                                                                           Based on 𝑝 𝑑 𝑞, 𝑧

                                                                        λ                                                      λ

IRG
                                                                   Explicit Relevance Models in Intent-Aware IR Diversification
                                                  35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                                     Portland, OR, 13 August 2012
Relevance-based greedy diversification



         Relevance-based reformulation of diversification algorithm

         1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

         2. Does it work? Test empirically

         3. Further development: parameterized tolerance to redundancy




IRG
                                      Explicit Relevance Models in Intent-Aware IR Diversification
                     35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                        Portland, OR, 13 August 2012
Adjustable tolerance to redundancy

      Generalization of relevance-based diversification scheme
      Formally support adjustable redundancy penalization
      Approach: generalize relevance to browsing model
                                                                                                                           Tolerance to
                                                                                                                           redundancy
       𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟 𝑑 , ¬ 𝒔𝒕𝒐𝒑 𝑆                                           𝑞 =⋯

     = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ                            𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞                            1 − 𝑝 𝑟 𝑑 ′ , 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓
                                                𝑐                                        𝑑 ′ ∈𝑆

      Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1]
                 – High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches
                 – In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1,
                    i.e. a single relevant document is sought


IRG
                                              Explicit Relevance Models in Intent-Aware IR Diversification
                             35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                Portland, OR, 13 August 2012
Adjustable tolerance to redundancy

                   Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs.  in -nDCG

                              Search task                                             Recommendation task
                       Lemur on TREC / Subtopics                                   pLSA on MovieLens / Genres
                      1                                                             1
                 𝑝 𝑠𝑡𝑜𝑝 𝑟




                                                                               𝑝 𝑠𝑡𝑜𝑝 𝑟
                      0                                      1                     0                                      1

                                                               best -nDCG value of column
                            For each 
                                                               worst -nDCG value of column

IRG
                                               Explicit Relevance Models in Intent-Aware IR Diversification
                              35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                 Portland, OR, 13 August 2012
Conclusion

    Alternative, relevance-based formulation of greedy aspect-based diversification
                 – Unifies two previous aspect-based algorithms

                 – More literal expression of formal problem statement (and metrics?)

    𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧
                 – Literal value estimates needed (rather than rank-equivalent approximations)

                 – Estimate based on positional relevance (relevance or click data needed)

    Seems to perform well empirically

                 – Light requirements on relevance or click data for training positional relevance

                 – Improvement trend, but needs to be tested under further optimizations

    Formal support for redundancy tolerance adjustment


IRG
                                                Explicit Relevance Models in Intent-Aware IR Diversification
                               35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
IR Group @ UAM                                                  Portland, OR, 13 August 2012

More Related Content

PDF
Comparing ooda abstract
PDF
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
PPTX
Personalized Diversification of Search Results
PDF
Semantic technologies for attribute based access: measurable security for the...
PDF
Prediction of Poultry Yield Using Data Mining Techniques
PDF
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
PDF
Human Activity Recognition (HAR) Using Opencv
PDF
A Complete Analysis of Human Action Recognition Procedures
Comparing ooda abstract
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
Personalized Diversification of Search Results
Semantic technologies for attribute based access: measurable security for the...
Prediction of Poultry Yield Using Data Mining Techniques
IRJET - Prediction of Autistic Spectrum Disorder based on Behavioural Fea...
Human Activity Recognition (HAR) Using Opencv
A Complete Analysis of Human Action Recognition Procedures

More from Pablo Castells (8)

PDF
Rational and irrational bias in recommendation
PDF
Bias in recommendation: avoid it or embrace it?
PDF
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
PDF
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
PDF
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
PDF
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
PDF
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
PDF
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
Rational and irrational bias in recommendation
Bias in recommendation: avoid it or embrace it?
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
Ad

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
1. Introduction to Computer Programming.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Getting Started with Data Integration: FME Form 101
PPT
Teaching material agriculture food technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
Tartificialntelligence_presentation.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
1. Introduction to Computer Programming.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Dropbox Q2 2025 Financial Results & Investor Presentation
Getting Started with Data Integration: FME Form 101
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
SOPHOS-XG Firewall Administrator PPT.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Ad

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrieval Diversification

  • 1. 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) Explicit Relevance Models in Intent-Aware IR Diversification Saúl Vargas, Pablo Castells and David Vallet Universidad Autónoma de Madrid http://ir.ii.uam.es Portland, OR, 13 August 2012 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 2. Outline  Context: IR diversification formulation and algorithms  Proposed approach: relevance-based reformulation of diversification algorithms  Experiments  Adjustable tolerance to redundancy  Conclusion IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 3. IR diversity – Brief recap Nutrition / Health Appliance Chemical element Golf Mining / Metallurgy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 4. IR diversity – Brief recap Nutrition / Health Appliance  Diversity as a means to address uncertainty in user queries – The same query may have different intents or aspects in the Chemical information need underneath element  Revision of document relevance independence – Marginal utility of additional relevant documents decreases fast Golf  Trade diminishing marginal utility for increased intent coverage – Thus maximize the number of users who obtain at least some useful document Mining / Metallurgy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 5. IR diversification – Problem statement Given a query 𝑞 on a collection  Find 𝑆 ⊂  of given size maximizing: NP-hard 𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞 Agrawal 2009, Santos 2010, Chen 2006, … 𝑅− 𝑆 𝑆 Baseline arg max 𝝋 𝒅, 𝑺 𝒒 Diversified Greedy ranking 𝑑∈𝑅−𝑆 ranking approx 𝑝(𝑑|𝑞) 𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑 ′ ∈ 𝑆 is relevant 𝑞 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 6. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Explicit query aspects  xQuAD scheme (Santos 2010) 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛 1 − 𝑝 𝑑′ 𝑞, 𝒛 𝑧 𝑑 ′ ∈𝑆 Explicit query aspects IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 7. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Query aspect  xQuAD scheme (Santos 2010) coverage 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 8. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Document “relevance”  xQuAD scheme (Santos 2010) for query aspect 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 9. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆  xQuAD scheme (Santos 2010) Redundancy penalization 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 10. IR diversity – Instantiations of objective function State of the art aspect-based approaches  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆  xQuAD scheme (Santos 2010) 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Mixture with baseline 𝜆  Degree of diversification IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 11. IR diversity – Instantiations of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme (Agrawal 2009) 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1 − 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞 𝑧 𝑑 ′ ∈𝑆 Probability to  xQuAD scheme (Santos 2010) observe documents 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑, ¬ 𝑆 𝑞 = 1− 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1 − 𝑝 𝑑 ′ 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 12. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme – relevance-based Our proposal 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Probability  xQuAD scheme – relevance-based of relevance 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1 − 𝑝 𝒓 𝑑′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 13. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑 ′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞  IA-Select scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 More literal interpretation 𝑑 ′ ∈𝑆 of initial problem statement  xQuAD scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓 𝑑 𝑞 + 𝜆 𝑝 𝒓 𝑑 , ¬ 𝒓 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 14. IR diversity – Relevance-based instantiation of objective function 𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞  IA-Select scheme – relevance-based 𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 Equivalent  xQuAD scheme – relevance-based for 𝜆 = 1 𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑 𝑞 + 𝜆 𝑝 𝑟 𝑑 , ¬ 𝑟 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑 ′ , 𝑞, 𝑧 𝑧 𝑑 ′ ∈𝑆 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 15. Relevance distirbution vs. document distribution 𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context) 1 𝑝 𝑑 𝑞, 𝑧 = 1 𝑑 𝑝 𝑟 𝑑, 𝑞, 𝑧 = E nr relevant docs ≥ 1 𝑑 Different potential behavior  E.g. stronger redundancy penalization Potential rank 0 equivalences do 𝑑 not apply here 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1 − 𝑝 𝑟 𝑑′ , 𝑞, 𝑧 IRG 𝑧 ′ Explicit Relevance Models in Intent-Aware IR Diversification 𝑑 ∈𝑆 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 16. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 17. Aspect-based relevance model Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛 Cannot use odds, logs, constant removal… or any other rank-preserving step (we need the specific values) 𝑝 𝑟 𝑑, 𝑞 Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞 Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending 𝑝 𝑧 𝑑 on available observations: 𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑧 𝑞 • 𝑧 as document classes (e.g. ODP) • 𝑧 as subqueries (e.g. reformulations) 𝑝(𝑧) Then derive the other two parameters 𝑝 𝑑 𝑞 Normalized baseline IR system score (as in e.g. Bache 2009) IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 18. Positional relevance distribution estimate 𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞 = 𝒑 𝒓 𝒌 1E+00 1E-01 𝑝 𝑟 𝑘 pLSA 1E-02 p(r|k) Lemur Precision 1E-03 estimates 1E-04 Click log AOL statistics 1E-05 0 20 40 60 80 100 120 140 160 180 200 𝑘 k IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 19. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 20. Experiments Search diversity Collection: ClueWeb09 category B (50M documents) Query/subtopic set: TREC 2009/10 diversity task (100 queries) Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100 Query aspect space: a) ODP categories level 4 (~7K categories) b) TREC subtopics (oracle for reference) Specific parameter estimates: 𝑝 𝑧 𝑞 Uniform ODP categories: semi-supervised text classification by Textwise 𝑝 𝑧 𝑑 TREC subtopics: Indri search system run on 𝑧 as if a query i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation) 𝑝 𝑟 𝑘 ii. Click statistics from AOL log (thus different IR system) IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 21. Experiments – Search diversity on TREC xQuAD scheme Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧 𝑝 𝑟 𝑘 from qrels Based on 𝑝 𝑑 𝑞, 𝑧 ODP categories TREC subtopics ERR-IA ERR-IA λ λ IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 22. Experiments – Search diversity on TREC  -nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20 Lemur - 0.2587 0.1630 0.2396 0.4636 IA-Select - 0.2651 0.1681 0.2423 0.4483 categories a) ODP xQuAD 0.9 0.2675 0.1656 0.2451 0.4864 Rel-based i. Qrels 0.1 0.2858△▲ 0.1828△▲ 0.2655△▲ 0.4898▲△ xQuAD ii. Clicks 0.4 0.2841▲△ 0.1831△△ 0.2605△▲ 0.4830▲▽ IA-Select - 0.3541 0.2346 0.3213 0.5787 subtopics b) TREC xQuAD 1.0 0.3445 0.2241 0.3127 0.5704 Rel-based i. Qrels 1.0 0.3543△△ 0.2349△△ 0.3192▽△ 0.5782▽△ xQuAD ii. Clicks 1.0 0.3512▽△ 0.2320▽△ 0.3166▽△ 0.5748▽△  “informally” maximizing ERR-IA by 0.1 steps for each diversifier Best value in bold green ▲▼  𝑝 < 0.05 IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 23. Experiments Recommendation diversity Collection: 6K users, 4K movies, 1M ratings Dataset 1: MovieLens 1M Subtopic set: 10 movie genres Collection: 1K users, 175K artists, 20M playcounts Dataset 2: Last.fm crawl Subtopic set: 120K social tags on artists by Last.fm users Queries  users Adaptation of IR diversity paradigm Documents  items (movies, music artists) Subtopics  item features (genres, tags) (Vargas, Castells & Vallet SIGIR 2011) Relevance judgments  test ratings from data split a) pLSA Baseline rankings: Diversified top n: 100 b) Popularity-based recommendation Specific parameter estimates: 𝑝 𝑧 𝑞 Uniform 𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association) 𝑝 𝑟 𝑘 P@k estimates with 2-fold cross-validation on test users IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 24. Experiments – Recommendation diversity on MovieLens and Last.fm pLSA recommender MovieLens 1M Last.fm ERR-IA by item popularity Recommendation ERR-IA Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧 Based on 𝑝 𝑑 𝑞, 𝑧 λ λ IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 25. Relevance-based greedy diversification Relevance-based reformulation of diversification algorithm 1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧 2. Does it work? Test empirically 3. Further development: parameterized tolerance to redundancy IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 26. Adjustable tolerance to redundancy  Generalization of relevance-based diversification scheme  Formally support adjustable redundancy penalization  Approach: generalize relevance to browsing model Tolerance to redundancy 𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟 𝑑 , ¬ 𝒔𝒕𝒐𝒑 𝑆 𝑞 =⋯ = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞 1 − 𝑝 𝑟 𝑑 ′ , 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓 𝑐 𝑑 ′ ∈𝑆  Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1] – High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches – In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1, i.e. a single relevant document is sought IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 27. Adjustable tolerance to redundancy Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs.  in -nDCG Search task Recommendation task Lemur on TREC / Subtopics pLSA on MovieLens / Genres 1 1 𝑝 𝑠𝑡𝑜𝑝 𝑟 𝑝 𝑠𝑡𝑜𝑝 𝑟 0  1 0  1  best -nDCG value of column For each   worst -nDCG value of column IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012
  • 28. Conclusion  Alternative, relevance-based formulation of greedy aspect-based diversification – Unifies two previous aspect-based algorithms – More literal expression of formal problem statement (and metrics?)  𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧 – Literal value estimates needed (rather than rank-equivalent approximations) – Estimate based on positional relevance (relevance or click data needed)  Seems to perform well empirically – Light requirements on relevance or click data for training positional relevance – Improvement trend, but needs to be tested under further optimizations  Formal support for redundancy tolerance adjustment IRG Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) IR Group @ UAM Portland, OR, 13 August 2012