Active attacks on social networks

Active Attacks
In Social Networks
Tanasache Florin & Ragonese Alberto
Seminar in Web Security and Privacy
Active Attacks in Social Networks

Why Social Networks?
❖ Human interaction and socialization has
changed according to the technology
evolution.
❖ In particular, emotions, feelings, thoughts,
opinions can all be shared instantly by
simply pressing a button in our favourite
social network application.
«Users publish detailed
personal information about
their preferences and daily
life»

Social Networks
❖ Social networks model social relationships by graph structures
using nodes and edges.
❖ Nodes correspond to people or other social entities and edges
correspond to social relationship between them.
❖ World's biggest social networks :
1. Facebook (1.9 billion users)
2. WhatsApp (1.2 billion users as of February 2017)
3. Messenger (1.2 billion users as of April 2017)
4. YouTube (1 billion users)
5. WeChat/Weixin (889 million users)
6. QQ (869 million users)
7. Instagram (700 million users)
8. Qzone (638 million users)
9. Twitter (328 million users)
10. Weibo (313 million users)
Facebook properties

Model: Social Graph
Facebook graph Twitter graph

Research on
Social Networks
❖ Digital traces of human social interactions in a wide variety of
online settings:
▪ Public Data: when users explicitly choose to disclose:
no privacy!
▪ Sensitive Data: email, phone and messaging networks
need privacy protection!
❖ Example:

Anonymized
Social Networks
❖ In designing studies of such systems, one needs to set up the
data to protect the privacy of individual users while preserving
the global network properties for the research studies.
❖ Anonymization: a simple procedure in which each individual’s
“name” is replaced by a random userID, but the connections
between the people are revealed.

Attacks on
Anonymized Networks
Can anonymization protect users’ privacy?
❖ Identifying nodes and learning about the edge relations
among them compromise the privacy:
Privacy Breach!
❖ There are two types of attack:
▪ Passive Attack: An adversary tries to learn the identities of the nodes
only after the anonymized network has been released
▪ Active Attack: An adversary tries to compromise privacy by strategically
creating new user accounts and links before the anonymized network
is released

Active Attack
❖ First Step: before releasing the anonymized network G of n-k
nodes, attacker:
▪ Choose a set of b targeted users in G.
▪ Create a subgraph H containing k nodes.
▪ Attach H to the targeted nodes.
❖The secret subgraph H constructed for the attacks can be thought
as a kind of structural steganography.

Active Attack
❖ Second Step: after the release of the anonymized network:
▪ Find the subgraph H in the graph G
▪ Follow edges from H to locate b target nodes and their true location in
G
▪ Determine all edges among these b nodes :

Active Attack
❖ Second Step: after the release of the anonymized network:
▪ Find the subgraph H in the graph G
▪ Follow edges from H to locate b target nodes and their true location in
G
▪ Determine all edges among these b nodes : breach privacy

Graph Isomorphism
❖ In order to find the subgraph H, the construction of H succeeds if:
i. There is no subgraph S≠H in G such that G[S] and G[H] are isomorphic.
ii. The subgraph H can be efficiently found, given G.
iii. The subgraph H has no automorphism.
❖ An isomorphism between two set of nodes P and Q in G is a one-to-one
correspondence f : P->Q that maps edges to edges and non-edges to non-
edges. Two vertices u and v in P are connected if their corresponding node
f(u) and f(v) are connected in Q.
❖ An automorphism is an isomorphism to itself.

Walk-Based Attack
The construction of H
❖ Construction of H:
▪ H = set of nodes X with size k = (2+δ) log 𝑛 for a small constant δ > 0.
▪ W = {𝑤1, 𝑤2, … , 𝑤 𝑏} set of targeted users with size b = O(log2
𝑛)
• e.g. n = 1000M, b = 900, k ≈ 30

Walk-Based Attack
▪ Choose two constants d0 ≤ d1 = O(log n) and for each node 𝑥𝑖 we choose an
external degree Δ𝑖 ∈ [𝑑0, 𝑑1] specifying the number of edges 𝒙𝒊 will have to
nodes in G-H.
▪ Each 𝒘𝒋 connects to a set of nodes 𝑁𝑗 ⊆ 𝑋.
▪ Set 𝑵𝒋 must be of size at most c=3 and are distinct across all nodes 𝒘𝒋.
Total degree of xi is Δ'i

Walk-Based Attack
▪ Add arbitrary edges from H to G-H to make it Δi for all 𝒙𝒊
▪ Add internal edges in H: edge (𝒙𝒊, 𝒙𝒊+𝟏)
▪ Add additional internal edges connecting (𝑥𝑖, 𝑥𝑗) with probability 0.5
▪ Therefore, each node 𝒙𝒊 has total degrees of Δ’i = Δi + (#internal edges)
X1 X2 X3

Walk-Based Attack
Finding H
❖ When the graph G is released, we want to identify H searching along k-
node paths in G and looking for a k-node path P for which the edges
induced among the nodes of P have precisely the structure of H.
❖ Therefore, for every k-node path P = {𝑦1, 𝑦2, … , 𝑦 𝑘} in G, we visit the nodes
of P in order, declaring P to have failed in the comparison to H as soon as
we reach a node 𝑦𝑖 that fails one of the following two tests:
▪ Degree test: The degree of node 𝑦𝑖 should be equal to the value Δ’i, which we
know to be the degree of node 𝑥𝑖 in G.
▪ Internal structure test: For each j < i, there should be an edge (𝑦𝑗, 𝑦𝑖) in G if
and only if (𝑥𝑖, 𝑥𝑗) is an edge of H.
Δi
xi
#internal edges
Δ’i = Δi + (#internal edges)

Walk-Based Attack
Finding H
❖ Finally, if we reach the end of the path P
without failure of these tests:
copy of H in G.
❖ Search tree T: All nodes 𝛼𝑖 in T has
corresponding node f(𝛼𝑖) in G.
❖ Every path of nodes 𝛼1, 𝛼2, … . , 𝛼𝑗 from the
root must have corresponding path in G
formed by nodes f(𝛼1), f(𝛼2), …, f(𝛼𝑗) with
the same degree sequence 𝑥1, 𝑥2, … , 𝑥𝑗.

Walk-Based Attack
Analysis
❖ Theorem 1 [Uniqueness]:
▪ With high probability, there is no subset of nodes S≠X in G such that G[S] is isomorphic to
G[X] = H. Formally:
▪ H is a random subgraph and G is arbitrary
▪ Edges between H and G – H are arbitrary
▪ There are edges (xi, xi+1)
Then with high probability no subgraph of G is isomorphic to H

Walk-Based Attack
Analysis
❖ Theorem 1 [Uniqueness]:
▪ With high probability, there is no subset of nodes S≠X in G such that G[S] is isomorphic to
G[X] = H. Formally:
▪ H is a random subgraph and G is arbitrary
▪ Edges between H and G – H are arbitrary
▪ There are edges (xi, xi+1)
Then with high probability no subgraph of G is isomorphic to H
❖ Theorem 2 [Efficiency]:
▪ Search tree T does not grow too large. Formally:
For every ε, with high probability the size of T is O(𝒏 𝟏+𝝐
)

Walk-Based Attack
Experiment
❖ Data: Network of friends on
LiveJournal
▪ 4.4M nodes and 77M edges
▪ Anonymized it
❖ Uniqueness: With 7 nodes, an average of
70 nodes can be de-anonymized
▪ Even if (2+δ) log(4.4M) ≈ 44
❖ Efficiency: |T| is typically ~ 9∙104
❖ Detectability: the figure shows the
success frequency for two different
choices of 𝑑0 and 𝑑1 (interval [10,20]
and [20,60] and varying values of k. In
both cases with only 7 nodes we have a
high success rate.

The Cut-Based Attack
❖ Any Active Attack has a Theoretical asymptotic lower bound of new
nodes: Ω( log 𝑛 )
❖ Subgraph H=(V, E) V={𝑥1, 𝑥2, … , 𝑥 𝑘} , k = O( log n)
❖ How many compromised node?
❖ b = Θ log n

Construction of H
❖ For W={w1, w2,…, wb} targeted users
❖ Create k new user accounts X= {x1, x2,…, xk } where k=3b+3 nodes
❖ Create links between each pair (xi , xj ) with probability 0.5
❖ Choose arbitrary b nodes {x1, x2,…, xb};
❖ Connect xi to wi

Example
b=2
K=3b+3= 9
H

Properties
❖With high probability:
➢ H has non-trivial automorphism
➢ b is the size of the cut between H and G-H
➢ All internal cuts in H are those of size >b
➢ Cuts of size ≤ b are external.
Therefore these cuts will never break H.

Recovery H from G
❖ Algorithm:
1. Compute the Gomory-Hu tree of G O(nm)

Recovery H from G
❖ Algorithm:
1. Compute the Gomory-Hu tree of G O(nm)
2. Delete all edges of weight at most b from the tree
3. Iterate over all components of size equal k,
testing Isomorphism to H.
4. H has no non-trivial automorphisms, so from the found component
we can identify the nodes x1,...,xb hence we are able to identify the
targeted users {w1, w2,…, wb}

Some Statistics
1,49
0,4
0,35
0,3
1,7
0,5
0,4
0,31
1,936
0,7
0,5
0,328
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
2
Facebook Instagram LinkedIn Twitter
Billionofusers
Number of active users
2015 2016 2017
Based on Statistica.com

Specific numbers
❖Facebook
➢ N=1,968 billion users, creating k=21 new users account we can succeed in
identifying 6 targeted users
❖Instagram
➢ N=0,7 billion users, k=18 and b=5
❖LinkedIn
❖Twitter

Walk vs Cut
Walk-Based
✓ Fast recovery algorithm
✓ Hard to detect
Χ Needs more new nodes Θ(log n)
✓ Can de-anonymize b= Θ(log2n)=
Θ(k2)
Cut-Based
Χ More expensive recovery algorithm
Χ Easier to detect because H is dense
and tends to stand out
✓ Needs less new node O( log 𝑛)
(close to theoretical asymptotic
lower bound: Ω( log 𝑛))
Χ Can de-anonymize only Θ( log 𝑛)
= Θ(k)

Active vs Passive
Active attack
✓ More effective. Work with high
probability in any network.
✓ Can choose the victims
Χ Risk of being detected
Passive attack
Χ Attackers may not be able to
identify themselves after seeing
the released anonymized network.
Χ The victims are only those linked to
the attackers (neighbors).
✓ Harder to detect

Semi-passive attack
Semi-passive attack
❖ A coalition of existing users colludes to attack specific users
❖ Create only additional links to the targeted nodes. No
additional node.

Conclusions
❖Anonymized network is not safe, regardless of the
manner of privacy definition
▪ For the curator of sensitive data it’s very difficult
to detect H graph without knowing its structure.
❖Data utility Vs Privacy
❖Differential Privacy

Conclusions
Countermeasures
1. Detect fake accounts when created. Fake accounts send random friend
requests at the time they are created. If all friends of real person belong
to different communities is very suspicious.

Conclusions
Countermeasures
2. Add random perturbation
▪ Eg. Delete m edges and add other m edges.
▪ Need a model to bias perturbation in order to preserve the main
properties of the graph
3. Add non-random perturbation
▪ (k,l)-anonymity: express that a user can’t be re-identified with
probability higher than 1/k by an active attacker able to
introduce l sybil nodes in the graph
▪ Can be shown that all real life social graph tend to be
(1,1)anonymous which is the lowest privacy level
▪ New approach [3] consist in transform a graph into another with
higher anonymity than (1,1)-anonymity by only adding edges

Conclusions
Countermeasures
Original: Original G graph
Random approach: Anonymized with random approach
Our Approach: Anonymized by adding edges such that G is not (1,1)-anonymous

Conclusions
Active Attacks

References
Thank you !
[1] Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and
Structural Steganography, By Lars Backstrom, Cynthia Dwork, and Jon Kleinberg
[2] Technical Perspective Anonymity Is Not Privacy By Vitaly Shmatikov
[3] Counteracting active attacks in social network graphs By Sjouke Mauw, Rolando
Trujillo-Rasua, and Bochuan Xuan
And several other minor resources.

Active attacks on social networks

More Related Content

Similar to Active attacks on social networks (20)

Recently uploaded (20)

Active attacks on social networks