SlideShare a Scribd company logo
Journal of Science and Technology (JST)
Volume 2, Issue 3, October 2017, PP 41-46
www.jst.org.in ISSN: 2456 - 5660
www.jst.org.in 41 | Page
Deducing Private Information from Social Network
Using Unified Classification
T Hima Bindu
Department of CSE, Newtons Institute of Science and Technology, Macherla, India
Abstract:
ABSTRACT: Online social networks are used by many people. These Social networks allow their users to
connect by means of various link types in which the network gives an opportunity for people to list details about
themselves that are relevant to the nature of the network. Here there is a chance of inference when user released
some personal information in the network. Social network is represented as graph structure in which nodes and
edges denotes user’s of network and relationship links with friends. In this paper, the social network data has
been classified with the help of collective classification (both node and link classification) method. Using the
collective classification method the system could infer more sensitive information from the network with high
accuracy. In collective classification method, it involves three components called local classifier, relational
classifier and collective inference. From this experiments conducted in this research work, it is observed that
the proposed work provide better classification accuracy due to the application of collective classification
method in link analysis.
KEYWORDS: Social Network Analysis, Data Mining, Inference, machine learning methods, Collective
Classification Algorithm.
I. INTRODUCTION
Social networking used to connect and share information with friends. People may use social
networking services for different reasons: to network with new contacts, reconnect with former friends, maintain
current relationships, build or promote a business or project, participate in discussions about a certain topic, or
just have fun meeting and interacting with other users.
Facebook and Twitter, have a broad range of users. LinkedIn has positioned itself as a professional
networking site— profiles include resume information, and groups are created to share questions and ideas with
peers in similar fields. Unlike traditional personal homepages, people in these societies publish not only their
personal attributes, but also their relationships with friends. It may causes the privacy violation in social
networks. Information privacy is needed for users. Existing techniques are used to prevent direct disclosure of
sensitive personal information.
This paper focuses on social network data classification and inferring the individual’s private information.
More private information are inferred by applying collective classification algorithm. The system explore how
the online social network data could be used to predict some individual private trait that a user is not willing to
disclose.
For instance, in an office, people connect to each other because of similar professions. Therefore, it is
possible that one may be able to infer someone's attribute from the attributes of his/her friends. In such cases,
privacy is indirectly disclosed by their social relations rather than from the owner directly. This is called
personal information leakage from inference.
Journal of Science and Technology (11italic)
www.jst.org.in 42 | Page
II. ORGANIZATION OF THE PAPER
This paper is organized as follows. In Section 1, it deals with introduction. Section 2 , describes the
organization of the thesis. Section 3 , briefly describes the related work of the research. Section 4, describes the
proposed system, system design of the proposed work and the system functions. the system design of the
proposed work, Section 5, discuss the result and Section 6, describes the conclusion and future work for
proposals.
III. RELATED WORK
Lars Backstrom, Cynthia Dwork and Jon Kleinberg consider an attack against an anonymized network. In
their model, the network consists of only nodes and edges. Detail values are not included. The goal of the
attacker is simply to identify people. Backstrom and Kleinberg consider a “communication graph,” in which
nodes are e-mail addresses, and there is a directed edge (u, v) if u has sent at least a certain number of e-mail
messages or instant messages to v, or if v is included in u’s address book.
Here they will be considering the “purest” form of social network data, in which there are simply nodes
corresponding to individuals and edges indicating social interaction, without any further annotation such as
time-stamps or textual data.
Michael Hay, Gerome Miklau, David Jensen, Philipp Weis, and Siddharth Srivastava consider several
ways of anonym zing social networks. Advances in technology have made it possible to collect data about
individuals and the connections between them, such as email correspondence and friendships. Agencies and
researchers who have collected such social network data often have a compelling interest in allowing others to
analyze the data.
Hay et al. and Liu and Terzi consider several ways of anonymizing social networks. Our work focuses
on inferring details from nodes in the network, not individually identifying individuals. He et al. consider ways
to infer private information via friendship links by creating a Bayesian network from the links inside a social
network. While they crawl a real social network, Live Journal, they use hypothetical attributes to analyze their
learning algorithm.
Compared to Jianming He approach, provide techniques that can help with choosing the most effective details or
links that need to be removed for protecting privacy. Sen and Getoor compare various methods of link-based
classification including loopy belief propagation, mean field relaxation labeling, and iterative classification.
They rate each algorithm in terms of its robustness to noise, both in attribute values and correlations across
links. And also compare the performance of these classification methods &various types of correlations across
links.
Zheleva and Getoor attempt to predict the private attributes of users in four real-world data sets:
Facebook, Flickr, Dogster, and BibSonomy. They do not attempt to actually anonymize or sanitize any graph
data. Zheleva and Getoor work provides a substantial motivation for the need of the solution proposed in our
work.
Talukder et al. propose a method of measuring the amount of information that a user reveals to the
outside world and which automatically determines which information (on a per-user basis) should be removed
to increase the privacy of an individual.
For example, telephone accounts previously determined to be fraudulent may be linked, perhaps
indirectly, to those for which no assessment yet has been made. Macskassy and Provost discuss various
classification algorithms for social network classification and Such networked data present both complications
and opportunities for classification and machine
learning. Finally, the system infer the individuals private information by classifying the publically released
social network user data.
Journal of Science and Technology (11italic)
www.jst.org.in 43 | Page
IV. PROPOSED SYSTEM
The proposed system use collective classification algorithm for classifying the social network data. It
has three components: local classifier, relational classifier and collective inference. Relaxation labeling is used
as collective inference method. By applying the collective classification method the system could infer (indirect
disclosure) the user private information using the released network data.
The advantage of the system: Collective classification used to improve the classifier accuracy. The
collective inference method (relaxation labeling) runs 99 iterations for classifying the network data. It uses local
classifier as first iteration and set as a prior, and relational classifier as second iteration for trying more
combinations with nodes and links to gain more user attributes which is used to infer the personal information.
4.1. SYSTEM ARCHITECTURE
Fig 4.1 System Architecture Diagram
Crawl the Social (Ex.Facebook) network to gather data for experiments. Here the crawler loaded a profile, parsed
the details out of the HTML, and stored the details inside a MySQL database. Then, the crawler loaded all friends of the
current profile and stored the friends inside the database both as friend- ship links and as possible profiles to later crawl.
By crawling the profile the dataset has been collected for the experiment. From the dataset, the user
profiles and links are converted into the graph structure. Then use the collective classification method on social
network user data to infer the user’s private information.
a. SOCIAL NETWORK DATA GATHERING
For proposed work the details have been collected as follows. Username and password details of users in social
network such as Face book are collected. Log in to user accounts and download their profiles as .html files. Now apply html
parser to that parses HTML files and collects attribute values of user profiles. Store the results in database. The records in
database are exportedinto .csv format file for network classification. Model the dataset file as network graph.
Fig 4.2 Social network graph structure
A Social network is represented a graph structure. The graph model contains vertex, edges and details,
where each node represents a unique user of the social network.
The set of edges in the graph, which are the links defined in the social network and the links used to establish the
connection between the friends in the network.
Journal of Science and Technology (11italic)
www.jst.org.in 44 | Page
b. NETWORK CLASSIFICATION
Collective inference is a method of classifying social network data using a combination of node details
and connecting links in the social graph. Each of these classifiers consists of three components: a local classifier,
a relational classifier, and a collective inference algorithm.
Local classifiers are a type of learning method that are applied in the initial step of collective inference.
Naive bayes algorithm is used as a local classifier. This classifier builds a model based on the details of nodes in
the training set. It then applies this model to nodes.
The relational classifier is a separate type of learning algorithm that looks at the link structure of the graph,
and uses the labels of nodes. Four relational classifiers: class-distribution relational neighbour (cdRN), weighted-vote
relational neighbour (wvRN), network-only Bayes classifier (nBC), and network-only link-based classification (nLB).
Local classifiers consider only the details of the node it is classifying. And relational classifiers consider
only the link structure of a node. Collective inference uses both node and links in the network to improve the classifier
accuracy. By using a local classifier in the first iteration, collective inference ensures that every node will have an
initial probabilistic classification, referred to as a prior. The algorithm then uses a relational classifier to reclassify
nodes. At each of these steps i>2, the relational classifier uses the fully labelled graph from step i - 1 to classify each
node in the graph. The collective inference method also controls the length of time the algorithm runs.
For collective inference, relaxation labeling was best when there are few known labels. For relational
classification, the link-based classifier clearly was preferable when many labels were known. The lower-
variance methods (wvRN and cdRN) dominated when fewer labels were known. Relaxation Labeling -
repeatedly estimate class distributions on all unknowns, based on current estimates.
Steps involved in Collective classification:
Step 1: Assign initial label using local classifier. Use naïve bayes algorithm as local classifier.
Step 2: In first iteration the Naïve Bayes classifier selects the most likely classification Vnb given the attribute
value a1,a2,….,an. This result in,
Generally estimate P (ai |Vj ) using m-estimates:
Where, n = the number of training examples for which v = vj ; nc = number of examples for which
v = vj and
a= ai ; p = a priori estimate for P(ai | vJ) ;
m = the equivalent sample size
Fig 4.4 Assign Initial Label
Journal of Science and Technology (11italic)
www.jst.org.in 45 | Page
Step 3: Assign Initial Label which has high probability. Set initial label as prior. Start the second iteration
using relational classifier as weighted vote Relational Neighbours.
Step 4: In the wvRN relational classifier, to classify a node ni, each of its neighbours, nj, is given a weight. The
probability of ni being in class Cx is the weighted mean of the class probabilities of ni’s neighbours.
That is,
where Ni is the set of neighbours of ni and wi,j is a link weight parameter given to the wvRN classifier. Assume
that all link weights are 1.
Step 5: Learn a classifier from the labels or/and attributes of its neighbours to the label of one node. Here the
network information is used.
Fig.4.5 Use the attributes of related objects.
Step 6: Apply relational classifier to each node iteratively and reclassify the labels.
Step 7: Relaxation labeling is used to assign the number of iterations to run and Iterate until the inconsistency
between neighboring labels is minimized.
V. RESULTS AND DISCUSSION
When classifying the social network data by collective classification method, it improves the
classifier accuracy. By doing this, the proposed system could infer user private information with high
accuracy. Consider the details and accuracy of the classifiers when infer the private information with various
classification methods.
Table 5.1 Classifier Accuracy
Accuracy/
Private
Data
Local
Classif
ier
only
Relation
al
Classifi
er
only
Collective
Classificati
on
Gender 0.7214 0.1672 0.8621
Religion 0.5134 0.4751 0.9519
Political
Views
0.5541 0.2151 0.6273
Sexual
Orientatio
n
0.4023 0.2543 0.6979
Journal of Science and Technology (11italic)
www.jst.org.in 46 | Page
In proposed system, local classifier uses the naïve bayes algorithm. Naïve bayes classifies the user nodes in the
network and it finds the probability based on the node attributes. wvRN algorithm is used for relational
classification. It used to infer the details from the friendship links. Both the algorithms are infer the data from
node/links. In this the system first it classifies the node attributes and set as prior. So here some class labels are
known. For collective inference, relaxation labeling and wvRN was the best when there are few known labels.
Relational classifier is used as relational classifier and reassigns the class labels based on the link details. The
table 5.1 shows that the calculation of Various classifier accuracy.
Fig.5.1 Calculating Classifier Accuracy
From this experiments conducted in this research work, it is observed that the proposed work provide better
classification accuracy due to the application of collective classification method in link analysis.
VI. CONCLUSION AND FUTURE WORK
Here collective classification method used to infer the private information from the user nodes and
related links. The system showed that, user’s private information can be inferred via social relations and release
of personal information in the social network.
To protect the individuals private information leakage in social networks, the system either hide our friendship
relations or ask our friends to hide their attributes. For protecting the user’s private information perform the
sanitization process and suppression techniques on the network data. When sanitize the network data it reduces
the chance of inferring the individuals private information.
REFERENCES
[1] Raymond Heatherly, Murat Kantarcioglu, and Bhavani Thuraisingham, “Preventing Private Information Inference Attacks on
Social Networks,” IEEE Trans. Knowledge And Data Engineering, vol. 25, no. 8, Aug 2013, pp.1849-1861.
[2] L. Backstrom, C. Dwork, and J. Kleinberg, “Wherefore Art Thou r3579x?: Anonymized Social Networks, Hidden Patterns, and Structural
[3] M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava, “Anonymizing Social Networks,” Technical Report 07-19, Univ. of
Massachusetts Amherst, 2007.
[4] K. Liu and E. Terzi, “Towards Identity Anonymization on Graphs,” Proc. ACM SIGMOD Int’l Conf. Management of Data
(SIGMOD ’08), pp. 93-106, 2008.
[5] J. He, W. Chu, and V. Liu, “Inferring Privacy Information from Social Networks,” Proc. Intelligence and Security Informatics,
2006.
[6] P. Sen and L. Getoor, “Link-Based Classification,” Technical Report CS-TR-4858, Univ. of Maryland, Feb. 2007.
[7] S.A. Macskassy and F. Provost, “Classification in Networked Data: A Toolkit and a Univariate Case Study,” J. Machine
Learning Research, vol. 8, pp. 935-983, 2007.
[8] C. Johnson, “Project Gaydar,” The Boston Globe, Sept. 2009.
[9] E. Zheleva and L. Getoor, “To Join or Not to Join: The Illusion of Privacy in Social Networks with Mixed Public and Private
user Profiles,” Technical Report CS-TR-4926, Univ. of Maryland,College Park, July 2008.
[10] J. Lindamood, R. Heatherly, M. Kantarcioglu, and BThuraisingham,“Inferring Private Information Using Social Network
Data,”Proc. 18th Int’l Conf. World Wide Web (WWW), 2009

More Related Content

PDF
Q046049397
PDF
Identification of inference attacks on private Information from Social Networks
DOCX
Preventing private information inference attacks on social networks
PDF
Current trends of opinion mining and sentiment analysis in social networks
PPTX
PDF
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis
PDF
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
PDF
05 20275 computational solution...
Q046049397
Identification of inference attacks on private Information from Social Networks
Preventing private information inference attacks on social networks
Current trends of opinion mining and sentiment analysis in social networks
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
05 20275 computational solution...

Similar to Deducing Private Information from Social Network Using Unified Classification (20)

PDF
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
PDF
Distributed Link Prediction in Large Scale Graphs using Apache Spark
PDF
Organizational Overlap on Social Networks and its Applications
PDF
Mining and Analyzing Academic Social Networks
PPTX
01 Network Data Collection (2017)
PDF
2009-Social computing-Analyzing social media networks
PDF
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
PPT
01 Introduction to Networks Methods and Measures
PPT
01 Introduction to Networks Methods and Measures (2016)
PDF
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
PDF
Stabilization of Black Cotton Soil with Red Mud and Formulation of Linear Reg...
PDF
Dv31821825
PDF
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
PDF
IRJET- A Survey on Link Prediction Techniques
PDF
Multimode network based efficient and scalable learning of collective behavior
PDF
International Journal of Engineering Research and Development
PDF
An iac approach for detecting profile cloning
PDF
a modified weight balanced algorithm for influential users community detectio...
PDF
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
APPLICATION OF CLUSTERING TO ANALYZE ACADEMIC SOCIAL NETWORKS
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Organizational Overlap on Social Networks and its Applications
Mining and Analyzing Academic Social Networks
01 Network Data Collection (2017)
2009-Social computing-Analyzing social media networks
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures (2016)
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Stabilization of Black Cotton Soil with Red Mud and Formulation of Linear Reg...
Dv31821825
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
IRJET- A Survey on Link Prediction Techniques
Multimode network based efficient and scalable learning of collective behavior
International Journal of Engineering Research and Development
An iac approach for detecting profile cloning
a modified weight balanced algorithm for influential users community detectio...
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
Ad

More from pmaheswariopenventio (20)

PDF
Article+Bentounsi+++Ghimouze+++Bentounsi+++Hechiche.pdf
PDF
Sustainable Talent Management as a Driver of Competitive Advantage Via a Lea...
PDF
The Israeli Occupation's Employment of Propaganda Techniques and Media Frame...
PDF
Enhancing Teacher Knowledge and Application of Assistive Technology for Engl...
PDF
Waste Water Treatment by Using Reed Bed System
PDF
A Review on Different Generations of Geo-Thermal Energy and Power Plants
PDF
Cobalt Oxide Nanoparticles: Synthesis and Their Adsorption Study
PDF
Nutritional Management of Covid-19 Disease
PDF
Efficient Face Features Extraction and Recognition Using Principal Component...
PDF
Formulation, Development and Evaluation of Lopinavir Loaded Polymeric Micelles
PDF
Simulation and Controlling of Wind Energy System using ANN Controller
PDF
Biomedical Engineering / Water as A Fuel
PDF
Super Cube Root Cube Mean Labeling of Graphs
PDF
Dispersion Analysis in Single Mode and Multimode Fiber
PDF
Using Stadd Pro: Building Design And Analysis
PDF
Dispersion Analysis in Single Mode and Multimode Fiber
PDF
Comparative Study Of Cash Flow Statemen
PDF
Protecting Virtualized Infrastructures in Cloud Computing Based On Big Data ...
PDF
Integrated Teaching Programme S.Satyendra Kumar
PDF
AVIAN DIVERSITY IN MANJAMALAI SACRED GROVE
Article+Bentounsi+++Ghimouze+++Bentounsi+++Hechiche.pdf
Sustainable Talent Management as a Driver of Competitive Advantage Via a Lea...
The Israeli Occupation's Employment of Propaganda Techniques and Media Frame...
Enhancing Teacher Knowledge and Application of Assistive Technology for Engl...
Waste Water Treatment by Using Reed Bed System
A Review on Different Generations of Geo-Thermal Energy and Power Plants
Cobalt Oxide Nanoparticles: Synthesis and Their Adsorption Study
Nutritional Management of Covid-19 Disease
Efficient Face Features Extraction and Recognition Using Principal Component...
Formulation, Development and Evaluation of Lopinavir Loaded Polymeric Micelles
Simulation and Controlling of Wind Energy System using ANN Controller
Biomedical Engineering / Water as A Fuel
Super Cube Root Cube Mean Labeling of Graphs
Dispersion Analysis in Single Mode and Multimode Fiber
Using Stadd Pro: Building Design And Analysis
Dispersion Analysis in Single Mode and Multimode Fiber
Comparative Study Of Cash Flow Statemen
Protecting Virtualized Infrastructures in Cloud Computing Based On Big Data ...
Integrated Teaching Programme S.Satyendra Kumar
AVIAN DIVERSITY IN MANJAMALAI SACRED GROVE
Ad

Recently uploaded (20)

PDF
A Brief Introduction About Julia Allison
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
WRN_Investor_Presentation_August 2025.pdf
PDF
Chapter 5_Foreign Exchange Market in .pdf
PDF
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
PDF
IFRS Notes in your pocket for study all the time
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
COST SHEET- Tender and Quotation unit 2.pdf
PPTX
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PPTX
Probability Distribution, binomial distribution, poisson distribution
PDF
DOC-20250806-WA0002._20250806_112011_0000.pdf
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
Laughter Yoga Basic Learning Workshop Manual
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PDF
MSPs in 10 Words - Created by US MSP Network
DOCX
Euro SEO Services 1st 3 General Updates.docx
A Brief Introduction About Julia Allison
unit 1 COST ACCOUNTING AND COST SHEET
Belch_12e_PPT_Ch18_Accessible_university.pptx
WRN_Investor_Presentation_August 2025.pdf
Chapter 5_Foreign Exchange Market in .pdf
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
IFRS Notes in your pocket for study all the time
Roadmap Map-digital Banking feature MB,IB,AB
COST SHEET- Tender and Quotation unit 2.pdf
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
Probability Distribution, binomial distribution, poisson distribution
DOC-20250806-WA0002._20250806_112011_0000.pdf
Ôn tập tiếng anh trong kinh doanh nâng cao
Laughter Yoga Basic Learning Workshop Manual
Power and position in leadershipDOC-20250808-WA0011..pdf
MSPs in 10 Words - Created by US MSP Network
Euro SEO Services 1st 3 General Updates.docx

Deducing Private Information from Social Network Using Unified Classification

  • 1. Journal of Science and Technology (JST) Volume 2, Issue 3, October 2017, PP 41-46 www.jst.org.in ISSN: 2456 - 5660 www.jst.org.in 41 | Page Deducing Private Information from Social Network Using Unified Classification T Hima Bindu Department of CSE, Newtons Institute of Science and Technology, Macherla, India Abstract: ABSTRACT: Online social networks are used by many people. These Social networks allow their users to connect by means of various link types in which the network gives an opportunity for people to list details about themselves that are relevant to the nature of the network. Here there is a chance of inference when user released some personal information in the network. Social network is represented as graph structure in which nodes and edges denotes user’s of network and relationship links with friends. In this paper, the social network data has been classified with the help of collective classification (both node and link classification) method. Using the collective classification method the system could infer more sensitive information from the network with high accuracy. In collective classification method, it involves three components called local classifier, relational classifier and collective inference. From this experiments conducted in this research work, it is observed that the proposed work provide better classification accuracy due to the application of collective classification method in link analysis. KEYWORDS: Social Network Analysis, Data Mining, Inference, machine learning methods, Collective Classification Algorithm. I. INTRODUCTION Social networking used to connect and share information with friends. People may use social networking services for different reasons: to network with new contacts, reconnect with former friends, maintain current relationships, build or promote a business or project, participate in discussions about a certain topic, or just have fun meeting and interacting with other users. Facebook and Twitter, have a broad range of users. LinkedIn has positioned itself as a professional networking site— profiles include resume information, and groups are created to share questions and ideas with peers in similar fields. Unlike traditional personal homepages, people in these societies publish not only their personal attributes, but also their relationships with friends. It may causes the privacy violation in social networks. Information privacy is needed for users. Existing techniques are used to prevent direct disclosure of sensitive personal information. This paper focuses on social network data classification and inferring the individual’s private information. More private information are inferred by applying collective classification algorithm. The system explore how the online social network data could be used to predict some individual private trait that a user is not willing to disclose. For instance, in an office, people connect to each other because of similar professions. Therefore, it is possible that one may be able to infer someone's attribute from the attributes of his/her friends. In such cases, privacy is indirectly disclosed by their social relations rather than from the owner directly. This is called personal information leakage from inference.
  • 2. Journal of Science and Technology (11italic) www.jst.org.in 42 | Page II. ORGANIZATION OF THE PAPER This paper is organized as follows. In Section 1, it deals with introduction. Section 2 , describes the organization of the thesis. Section 3 , briefly describes the related work of the research. Section 4, describes the proposed system, system design of the proposed work and the system functions. the system design of the proposed work, Section 5, discuss the result and Section 6, describes the conclusion and future work for proposals. III. RELATED WORK Lars Backstrom, Cynthia Dwork and Jon Kleinberg consider an attack against an anonymized network. In their model, the network consists of only nodes and edges. Detail values are not included. The goal of the attacker is simply to identify people. Backstrom and Kleinberg consider a “communication graph,” in which nodes are e-mail addresses, and there is a directed edge (u, v) if u has sent at least a certain number of e-mail messages or instant messages to v, or if v is included in u’s address book. Here they will be considering the “purest” form of social network data, in which there are simply nodes corresponding to individuals and edges indicating social interaction, without any further annotation such as time-stamps or textual data. Michael Hay, Gerome Miklau, David Jensen, Philipp Weis, and Siddharth Srivastava consider several ways of anonym zing social networks. Advances in technology have made it possible to collect data about individuals and the connections between them, such as email correspondence and friendships. Agencies and researchers who have collected such social network data often have a compelling interest in allowing others to analyze the data. Hay et al. and Liu and Terzi consider several ways of anonymizing social networks. Our work focuses on inferring details from nodes in the network, not individually identifying individuals. He et al. consider ways to infer private information via friendship links by creating a Bayesian network from the links inside a social network. While they crawl a real social network, Live Journal, they use hypothetical attributes to analyze their learning algorithm. Compared to Jianming He approach, provide techniques that can help with choosing the most effective details or links that need to be removed for protecting privacy. Sen and Getoor compare various methods of link-based classification including loopy belief propagation, mean field relaxation labeling, and iterative classification. They rate each algorithm in terms of its robustness to noise, both in attribute values and correlations across links. And also compare the performance of these classification methods &various types of correlations across links. Zheleva and Getoor attempt to predict the private attributes of users in four real-world data sets: Facebook, Flickr, Dogster, and BibSonomy. They do not attempt to actually anonymize or sanitize any graph data. Zheleva and Getoor work provides a substantial motivation for the need of the solution proposed in our work. Talukder et al. propose a method of measuring the amount of information that a user reveals to the outside world and which automatically determines which information (on a per-user basis) should be removed to increase the privacy of an individual. For example, telephone accounts previously determined to be fraudulent may be linked, perhaps indirectly, to those for which no assessment yet has been made. Macskassy and Provost discuss various classification algorithms for social network classification and Such networked data present both complications and opportunities for classification and machine learning. Finally, the system infer the individuals private information by classifying the publically released social network user data.
  • 3. Journal of Science and Technology (11italic) www.jst.org.in 43 | Page IV. PROPOSED SYSTEM The proposed system use collective classification algorithm for classifying the social network data. It has three components: local classifier, relational classifier and collective inference. Relaxation labeling is used as collective inference method. By applying the collective classification method the system could infer (indirect disclosure) the user private information using the released network data. The advantage of the system: Collective classification used to improve the classifier accuracy. The collective inference method (relaxation labeling) runs 99 iterations for classifying the network data. It uses local classifier as first iteration and set as a prior, and relational classifier as second iteration for trying more combinations with nodes and links to gain more user attributes which is used to infer the personal information. 4.1. SYSTEM ARCHITECTURE Fig 4.1 System Architecture Diagram Crawl the Social (Ex.Facebook) network to gather data for experiments. Here the crawler loaded a profile, parsed the details out of the HTML, and stored the details inside a MySQL database. Then, the crawler loaded all friends of the current profile and stored the friends inside the database both as friend- ship links and as possible profiles to later crawl. By crawling the profile the dataset has been collected for the experiment. From the dataset, the user profiles and links are converted into the graph structure. Then use the collective classification method on social network user data to infer the user’s private information. a. SOCIAL NETWORK DATA GATHERING For proposed work the details have been collected as follows. Username and password details of users in social network such as Face book are collected. Log in to user accounts and download their profiles as .html files. Now apply html parser to that parses HTML files and collects attribute values of user profiles. Store the results in database. The records in database are exportedinto .csv format file for network classification. Model the dataset file as network graph. Fig 4.2 Social network graph structure A Social network is represented a graph structure. The graph model contains vertex, edges and details, where each node represents a unique user of the social network. The set of edges in the graph, which are the links defined in the social network and the links used to establish the connection between the friends in the network.
  • 4. Journal of Science and Technology (11italic) www.jst.org.in 44 | Page b. NETWORK CLASSIFICATION Collective inference is a method of classifying social network data using a combination of node details and connecting links in the social graph. Each of these classifiers consists of three components: a local classifier, a relational classifier, and a collective inference algorithm. Local classifiers are a type of learning method that are applied in the initial step of collective inference. Naive bayes algorithm is used as a local classifier. This classifier builds a model based on the details of nodes in the training set. It then applies this model to nodes. The relational classifier is a separate type of learning algorithm that looks at the link structure of the graph, and uses the labels of nodes. Four relational classifiers: class-distribution relational neighbour (cdRN), weighted-vote relational neighbour (wvRN), network-only Bayes classifier (nBC), and network-only link-based classification (nLB). Local classifiers consider only the details of the node it is classifying. And relational classifiers consider only the link structure of a node. Collective inference uses both node and links in the network to improve the classifier accuracy. By using a local classifier in the first iteration, collective inference ensures that every node will have an initial probabilistic classification, referred to as a prior. The algorithm then uses a relational classifier to reclassify nodes. At each of these steps i>2, the relational classifier uses the fully labelled graph from step i - 1 to classify each node in the graph. The collective inference method also controls the length of time the algorithm runs. For collective inference, relaxation labeling was best when there are few known labels. For relational classification, the link-based classifier clearly was preferable when many labels were known. The lower- variance methods (wvRN and cdRN) dominated when fewer labels were known. Relaxation Labeling - repeatedly estimate class distributions on all unknowns, based on current estimates. Steps involved in Collective classification: Step 1: Assign initial label using local classifier. Use naïve bayes algorithm as local classifier. Step 2: In first iteration the Naïve Bayes classifier selects the most likely classification Vnb given the attribute value a1,a2,….,an. This result in, Generally estimate P (ai |Vj ) using m-estimates: Where, n = the number of training examples for which v = vj ; nc = number of examples for which v = vj and a= ai ; p = a priori estimate for P(ai | vJ) ; m = the equivalent sample size Fig 4.4 Assign Initial Label
  • 5. Journal of Science and Technology (11italic) www.jst.org.in 45 | Page Step 3: Assign Initial Label which has high probability. Set initial label as prior. Start the second iteration using relational classifier as weighted vote Relational Neighbours. Step 4: In the wvRN relational classifier, to classify a node ni, each of its neighbours, nj, is given a weight. The probability of ni being in class Cx is the weighted mean of the class probabilities of ni’s neighbours. That is, where Ni is the set of neighbours of ni and wi,j is a link weight parameter given to the wvRN classifier. Assume that all link weights are 1. Step 5: Learn a classifier from the labels or/and attributes of its neighbours to the label of one node. Here the network information is used. Fig.4.5 Use the attributes of related objects. Step 6: Apply relational classifier to each node iteratively and reclassify the labels. Step 7: Relaxation labeling is used to assign the number of iterations to run and Iterate until the inconsistency between neighboring labels is minimized. V. RESULTS AND DISCUSSION When classifying the social network data by collective classification method, it improves the classifier accuracy. By doing this, the proposed system could infer user private information with high accuracy. Consider the details and accuracy of the classifiers when infer the private information with various classification methods. Table 5.1 Classifier Accuracy Accuracy/ Private Data Local Classif ier only Relation al Classifi er only Collective Classificati on Gender 0.7214 0.1672 0.8621 Religion 0.5134 0.4751 0.9519 Political Views 0.5541 0.2151 0.6273 Sexual Orientatio n 0.4023 0.2543 0.6979
  • 6. Journal of Science and Technology (11italic) www.jst.org.in 46 | Page In proposed system, local classifier uses the naïve bayes algorithm. Naïve bayes classifies the user nodes in the network and it finds the probability based on the node attributes. wvRN algorithm is used for relational classification. It used to infer the details from the friendship links. Both the algorithms are infer the data from node/links. In this the system first it classifies the node attributes and set as prior. So here some class labels are known. For collective inference, relaxation labeling and wvRN was the best when there are few known labels. Relational classifier is used as relational classifier and reassigns the class labels based on the link details. The table 5.1 shows that the calculation of Various classifier accuracy. Fig.5.1 Calculating Classifier Accuracy From this experiments conducted in this research work, it is observed that the proposed work provide better classification accuracy due to the application of collective classification method in link analysis. VI. CONCLUSION AND FUTURE WORK Here collective classification method used to infer the private information from the user nodes and related links. The system showed that, user’s private information can be inferred via social relations and release of personal information in the social network. To protect the individuals private information leakage in social networks, the system either hide our friendship relations or ask our friends to hide their attributes. For protecting the user’s private information perform the sanitization process and suppression techniques on the network data. When sanitize the network data it reduces the chance of inferring the individuals private information. REFERENCES [1] Raymond Heatherly, Murat Kantarcioglu, and Bhavani Thuraisingham, “Preventing Private Information Inference Attacks on Social Networks,” IEEE Trans. Knowledge And Data Engineering, vol. 25, no. 8, Aug 2013, pp.1849-1861. [2] L. Backstrom, C. Dwork, and J. Kleinberg, “Wherefore Art Thou r3579x?: Anonymized Social Networks, Hidden Patterns, and Structural [3] M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava, “Anonymizing Social Networks,” Technical Report 07-19, Univ. of Massachusetts Amherst, 2007. [4] K. Liu and E. Terzi, “Towards Identity Anonymization on Graphs,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’08), pp. 93-106, 2008. [5] J. He, W. Chu, and V. Liu, “Inferring Privacy Information from Social Networks,” Proc. Intelligence and Security Informatics, 2006. [6] P. Sen and L. Getoor, “Link-Based Classification,” Technical Report CS-TR-4858, Univ. of Maryland, Feb. 2007. [7] S.A. Macskassy and F. Provost, “Classification in Networked Data: A Toolkit and a Univariate Case Study,” J. Machine Learning Research, vol. 8, pp. 935-983, 2007. [8] C. Johnson, “Project Gaydar,” The Boston Globe, Sept. 2009. [9] E. Zheleva and L. Getoor, “To Join or Not to Join: The Illusion of Privacy in Social Networks with Mixed Public and Private user Profiles,” Technical Report CS-TR-4926, Univ. of Maryland,College Park, July 2008. [10] J. Lindamood, R. Heatherly, M. Kantarcioglu, and BThuraisingham,“Inferring Private Information Using Social Network Data,”Proc. 18th Int’l Conf. World Wide Web (WWW), 2009