Fast matrix primitives for ranking, link-prediction and more

Fast matrix primitives for
ranking, communities !
and more.

David F. Gleich!

David Gleich · Purdue

Netﬂix

1

Computer Science!
Purdue University!

Netﬂix

2


error
1

Models Previous work
– and algorithms for high performance !
from the PI tackled net- computations
matrix and network
FIGURE 6

std

2

work alignment with matrix methods =for cm
edge
Std, s 0.39
Big data(b)methods
overlap:
SIMAX ‘09, SISC ‘11,MapReduce ‘11, ICASSP ’12
1

j

i

i0
Overlap
Overlap

j0

error

SC ‘05, WAW ‘07, SISC ‘10, WWW ’10, …

Massive matrix "
computations

std

0

0

Fast & Scalable"
Network centrality

0

20
10

10

A
L
B
Tensor eigenvalues"
0

(d) Std, s = 1.95 cm

Ax = b
min kAx bk
Ax = x

This proposal is for matchand a power method
Network alignment
tensor
ing triangles using

P
methods:
on multi-threaded
maximize
Tijk xi xj xk

model compared to the prediction standard debble locations at the ﬁnal time for two values of
ICDM ‘09, SC ‘11, TKDE ‘13
= 1.95 cm. (Colors are visible in the electronic

ijk
n
and kxk2 = 1
subject to distributed
j
Triangle
j
X
i
s
i
approximately twenty minutes to construct using (next) architectures
k
[x
]i = ⇢ · (
Tijk xj xk + xi )
k
s.
jk
s- involved a few pre- and post-processing steps:
ta
where ! ensures the 2-norm
m Aria, globally transpose the data, compute the
g errors. The preprocessing steps took approx- SSHOPM method due to "
nd
0

0

Data clustering

WSDM ‘12, KDD ‘12, CIKM ’13 …

0

A
recise timing information, L we do notB
but
report

David
Kolda and Mayo
Gleich

· Purdue

Netﬂix

3

t
r
o
s.
g
n.
o

The talk ends, you
believe -- whatever
you want to.

Image from rockysprings, deviantart, CC share-alike

4

Everything in the world can be
explained by a matrix, and we see
how deep the rabbit hole goes

Matrix computations in a red-pill


Netﬂix

5

Solve a problem better by
exploiting its structure!

Problem 1 – (Faster) !
Recommendation as link prediction
WHY NO PREPROCESSING?
Top-k predicted “links”
are movies to watch!

David F. Gleich (Purdue)


Emory Math/CS Seminar

Netﬂix

6

Pairwise scores give
user similarity

19 of 47


Netﬂix

7

Problem 2 – (Better) !
Best movies



Netﬂix

8


Matrix structure
Netflix graph

Movies "
“liked”
(>3 stars?)
Problem 1!
Adjacency matrix
Normalized Laplacian matrix
Random walk matrix

Netflix matrix
1

1

4
5

5

Problem 2"
Pairwise comparison matrix

Netflix

9

5

Problem 1 – (Faster) !
Recommendation as link prediction
WHY NO PREPROCESSING?
Top-k predicted “links”
are movies to watch!




Netﬂix

10

Pairwise scores give
user similarity

19 of 47

z score (edge-based) is

 movie  =  X  ↵  ✓        
 
pred. on
`

`=1

1
X

num. paths of length `
from user to movie

✓

◆

user
k=
(↵ A )
ind. vec.
`=1
{z
}
|
` Math/CS Seminar
Emory `

⌘ei

◆


Netﬂix

11

1

Movie prediction"
vector

)

Matrix based link predictors

1
X

✓

◆

user
k=
(↵ A )
ind. vec.
`=1
{z
}
|
`

`

⌘ei

Neumann

Carl Neumann

1
X
k =0

↵A)k = ei

(tA)k

Netﬂix

12

(I

The Katz score (edge-based) is


                   

(I

↵A)k = ei

PageRank

(I

↵P)x = ei

Semi-super."
learning
Heat kernel

(IDavid F. Gleich (Purdue) = ei
↵L)x
exp{↵P}x = ei


They all look at sums of "
damped paths, but "
change the details, slightly


Netﬂix

13

Katz

are localized!
PageRank scores for one node!
Crawl of ﬂickr from 2006 ~800k nodes, 6M edges, alpha=1/2
0

1.5

error
||xtrue – xnnz||1

10

1

0.5

−10

10

−15

0

2

4

plot(x)

6

8

10
5

x 10

10

0

10

2

4

10

6

10

10

nonzeros


Netﬂix

14

0

−5

10

are localized!
KATZ SCORES ARE LOCALIZED



32 of 47


Netﬂix

15

Up to 50 neighbors is
99.65% of the total
mass



Netﬂix

16


How do we compute them fast?
PageRank

xj = ↵

X

i neigh. of j

xi
deg(i)

+ 1 if j is the target user

w/ access to in-links & degs.

w/ access to out-links

PageRankPull

PageRankPush

xj(k+1)

(k)
↵xa /6

(k)
↵xb /2

(k)
↵xc /3

= fj
xj(k+1)

↵

X
i!j

xi(k ) /degi = fj

Let

b
a

c

j = blue node

(k+1)

= xj(k) + rj

(k +1)

=0

then
xj

Update
r(k +1) rj

(k
(k)
ra +1) = ra + ↵rj(k ) /3

(k
(k)
rb +1) = rb + ↵rj(k ) /3
(k
(k)
rc +1) = rc + ↵rj(k ) /3


Netﬂix

17

(k +1)
Solve for
xj

j = blue node

We have good theory
for this algorithm …


Netﬂix

18

… and even better
empirical performance.

Theory
Andersen, Chung, Lang (2006)!
For PageRank, “fast runtimes” and “localization”
Bonchi, Esfandiar, Gleich, et al. (2010/2013)!
For Katz, “fast runtimes”


Netﬂix

19

Kloster, Gleich (2013)!
For Katz, Heat Kernel, "
“fast runtimes” and “localization”"
(assuming power-law degrees)

Accuracy vs. work !
(Heat kernel)
dblp−cc
dblp collaboration graph, 225k vertices
1

0.6

tol=10−5

tol=10−4

0.4

@10
@25

0.2

@100
@1000

0
−2

−1

0

10
10
10
Effective matrix−vector products


Netﬂix

20

Precision

0.8

For the dblp collaboration
graph, we study the
precision in ﬁnding the
100 largest nodes as we
vary the work. This set of
100 does not include the
nodes immediate
neighbors. (One column,
but representative)


Netﬂix

21

Empirical runtime (Katz)
TIMING

40

Never got to try it …
analytics
test on
HelloMovies.com Need to ix now
matr
Netflix

60

80

Ran out of money once we had the algorithms
… promising initial results though!

I collaborate with the company behind He

Netﬂix

22

Note

1
1
1
1
1
1
1
1


Netﬂix

23

Problem 2 – (Better) !
Best movies

Which is a better list of good DVDs?
Lord of the Rings 3: The Return of …

Lord of the Rings 3: The Return of …

Lord of the Rings 1: The Fellowship

Lord of the Rings 1: The Fellowship

Lord of the Rings 2: The Two Towers

Lord of the Rings 2: The Two Towers

Lost: Season 1

Star Wars V: Empire Strikes Back

Battlestar Galactica: Season 1

Raiders of the Lost Ark

Fullmetal Alchemist

Star Wars IV: A New Hope

Trailer Park Boys: Season 4

Shawshank Redemption

Trailer Park Boys: Season 3

Star Wars VI: Return of the Jedi

Tenchi Muyo!

Lord of the Rings 3: Bonus DVD

Shawshank Redemption

The Godfather
Nuclear Norm "
based rank aggregation

(the mean rating)

(not matrix completion on the
netﬂix rating matrix)

Netﬂix

24/40

Standard "
rank aggregation"

Rank Aggregation

Given partial orders on subsets of items, rank aggregation
is the problem of ﬁnding an overall ordering.

Voting Find the winning candidate

Program committees Find the best papers given reviews
Dining Find the best restaurant in Chicago

Netﬂix

25/40

Ranking is really hard
John Kemeny
Ken Arrow

All rank aggregations
involve some measure of
compromise

A good ranking is the
“average” ranking under a
permutation distance

NP hard to compute
Kemeny’s ranking


Netﬂix

26/40

Dwork, Kumar, Naor, !
Sivikumar

Supposewe had scores
Suppose we had scores
Let   be the score of the ith movie/song/paper/team to rank
Suppose we can compare the ith to jth:

 
is skew-symmetric, rank 2.

Also works for  

with an extra log.

Numerical ranking is intimately intertwined
with skew-symmetric matrices
Kemeny and Snell, Mathematical Models in Social Sciences (1978)

KDD 2011


Netﬂix
6/20

27/40

Then

Using ratings as comparisons

Arithmetic Mean

Ratings induce
various skewsymmetric matrices.

From David 1988 – The
Method of Paired Comparisons


Netﬂix

28/40

Log-odds

Extracting the scores
Extracting the scores

do we have?

Do we trust all  
Not really.


105

101
101
105
Number of Comparisons

?
Netflix data 17k movies,
500k users, 100M ratings–
99.17% filled

KDD 2011
David

Gleich · Purdue

Netﬂix

29/40

How many  
Most.

107
Movie Pairs

Given   with all entries, then
 
is the Borda
count, the least-squares
solution to  

8/20

Only partial info? COMPLETE
IT!
Only partial info? Complete it!
Let  

be known for  

We trust these scores.

Goal Find the simplest skew-symmetric matrix that matches
the data  

 

noiseless

 
Both of these are NP-hard too.



KDD 2011

Netﬂix

30/40

noisy

9/20

From a French nuclear test in 1970, imageNetﬂix
from http://picdit.wordpress.com/2008/07/21/8David Gleich · Purdue
insane-nuclear-explosions/

31/40

Solution GO NUCLEAR!

The ranking algorithm

The Ranking Algorithm
0. INPUT   (ratings data) and c
(for trust on comparisons)
1. Compute   from  
2. Discard entries with fewer than
c comparisons
3. Set  
to be indices and
values of what’s left
4.  

= SVP(  

)


Netﬂix

32/40

5. OUTPUT

Exact recovery
Exactrecovery
results

Fraction of trials recovered

indices. Instead we view the following theorem as providing
intuition for the noisy problem.
Consider the operator basis for Hermitian matrices:

H = S [ K [ D where
p
S = {1/ 2(ei eT + ej eT ) : 1  i < j  n};
j
i
David Gross showed how to recover Hermitian matrices.
p
K = {ı/ 2(ei eT ej eT ) : 1we get n}; exact  
j
i
i.e. the conditions under which  i < j the

Note that  

D = {ei eT : 1  i  n}.
i

1
0.8
0.6
0.4
0.2
0
2
10

is Hermitian. Thus our new result!
T

Theorem 5. Let s be centered, i.e., s e = 0. Let Y =
seT
esT where ✓ = maxi s2 /(sT s) and ⇢ = ((maxi si )
i
(mini si ))/ksk. Also, let ⌦ ⇢ H be a random set of elements
with size |⌦| O(2n⌫(1 + )(log n)2 ) where ⌫ = max((n✓ +
1)/4, n⇢2 ). Then the solution of
minimize

kXk⇤

Figure
ity of
about
both th
§6.1 fo

6.1 R

The ﬁ
subject to trace(X W i ) = trace((ıY ) W i ), W i 2 ⌦
ability o
the nois
is equal to ıY with probability at least 1 n .
with un
These a
The proof of this theorem follows directly by Theorem 4 if Netﬂix
= se
Y
 
⇤

33/40

⇤

Recovery Discussion and Experiments
Confession If  

, then just look at differences from
a connected set. Constants? Not very good.

 

Intuition for the truth.
 


Netﬂix

34

Recovery Discussion and Experiments
Recovery Experiments
look at differences from
Confession If  
, then just
a connected set. Constants? Not very good.

 

Intuition for the truth.
 

KDD 2011

16/20


Netﬂix

35/40


Evaluation
Nuclear norm ranking

Mean rating
1

Median Kendall’s Tau

0.9
0.8
20
10
5
2
1.5

0.7
0.6
0.5

0.9
0.8
0.7
0.6
0.5

0

0.2

0.4 0.6
Error

0.8

1

0

0.2

0.4 0.6
Error

0.8

1

Figure 3: The performanceDavid Gleich · Purdue
of our algorithm Netﬂix
(left)

36/40

Median Kendall’s Tau

1

Tie in with PageRank
Another way to compute the scores is through a
close relative of PageRank and the linkprediction methods.

Massey or Colley methods
(2I + D A)s = “differeneces”
(L + 2D 1 )x = “scaled differences”


Netﬂix

37/40

Ongoing Work
Finding communities in large networks !
We have the best community ﬁnder (as of CIKM2013)"
Whang, Gleich, Dhillon (CIKM)

Fast clique detection!
We have the fastest solver for max-clique problems, useful for
computing temporal strong components (Rossi, Gleich, et al. arXiv)

Scalable network alignment !

w
v
s

Overlap

r

& Low-rank clustering with features + links!
wtu

u

t

A

L

B

& Evolving network analysis!

Netﬂix

38

& Scalable, distributed implementations !
of fast graph kernels!

References
!
Papers
Gleich & Lim, KDD 2011 – Nuclear Norm Ranking"
Esfandiar, Gleich, Bonchi et al. – WAW2010, J. Internet. Math. 2013"
Kloster & Gleich, WAW2013, arXiv 1310.3423

Code!

www.cs.purdue.edu/homes/dgleich/codes!
bit.ly/dgleich-code

Supported by NSF CAREER 1149756-CCF

www.cs.purdue.edu/homes/dgleich
Netﬂix

39

!

!

Fast matrix primitives for ranking, link-prediction and more

More Related Content

What's hot (12)

Viewers also liked (20)

Similar to Fast matrix primitives for ranking, link-prediction and more (20)

More from David Gleich (12)

Recently uploaded (20)

Fast matrix primitives for ranking, link-prediction and more