SlideShare a Scribd company logo
DETERMINING 
A DIGITAL PROFILE 
FROM 
PUBLIC SOCIAL MEDIA 
INFORMATION 
Department of Informatics, School of Informatics and 
2013/14 Engineering
Outline 
 Motivation 
 Review of existing tools 
 Data harvesting 
 Data harvesting with Selenium 2.0 
 Resolving e-mail address 
 Demo 
 Results
Motivation 
 Q2 201 4 surveys 
conducted by Ipsos MRBI 
with 1000 respondents 
aged 15
Motivation 
 CPL “Employment 
Market monitor report” 
indicates that 60% of 
them using social 
media sources for pre-screening 
or for 
background digital 
footprinting job 
applicants’ prior to 
employment (CPL, Q3, 
2013) 
Options Regularly Sometimes Never Do not 
approve 
of this 
Google 13% 26% 53% 8% 
LinkedIn 30% 41% 24% 5% 
Facebook 9% 22% 59% 10% 
Google+ 3% 7% 81% 9% 
Other 10% 4% 77% 9%
Motivation 
jobseekers 
lying or 
exaggerati 
ng 
Fasle 
claim to 
speak 
antother 
language 
inflating IT 
skills 
 Global-Lingo.com 
surveys UK 
jobseekers market in 
Q1 2014 
 63% of jobseekers 
admitted to lying on 
their CV !!!!!!
Motivation – “resume-less” 
 “Having an ability to showcase and 
validate a candidate’s work through a 
social graph (Twitter, About.me, 
Facebook, Slideshare, Google+, 
forums, etc.), search engine footprint 
(special URL references to projects, 
linkbacks, publications, etc.), network 
connections is much more powerful 
than just 1 – 2 pages and 3 prepped 
references. The prospective 
employer now has an ability to fully 
evaluate a candidate and understand 
if they are a fit or not based on actual 
work, not just 2 pages of crafty 
wording.” 
#socialCV 
Mr. Vala Afshar (Chief Customer 
Officer @Enterasys)
Motivation – GitHub 
“Forget LinkedIn: Companies turn to GitHub to find tech talent” – CNET.COM 
 In the red-hot market for 
skilled software engineers, 
companies looking to 
make great hires are 
discovering that relying on 
traditional services that 
showcase candidates' 
work histories -- but not 
their actual work -- is a 
great way to miss out on 
the best available talent. 
 GitHub, a place where 
hiring managers and 
recruiters alike are 
increasingly turning to 
find not just the 
potential employees 
who look best on paper, 
but the ones that 
actively (and publicly) 
demonstrate their 
capabilities.
Review of existing tools
Data harvesting 
 Website Application Programming Interface (also called Web 
API ) – provides client with interface query over website 
provider database via HTTP request messages. In result 
client gets data output in XML or JSON. 
 Web scraping – software based technique, which transform 
the unstructured data on the web (typically HTML), into 
structured data that can be stored and analysed
Web harvesting - caveats 
Caveats Solution 
 Web 2.0 - highly driven on 
AJAX and dynamically 
populate HTML depends on 
user’s preferences and 
various conditions 
 Basic python libraries don’t 
catch all source code, as 
object may be hidden or 
event driven 
 Usually secure with SSL/TLS 
 Selenium 2.0 
 has capability like native webdriver 
 imitate the functionality of Android, 
Firefox, Google Chrome, Internet 
Explorer, Safari, Opera and event 
JavaScript HtmlUnit framework 
Phantomjs 
 perfect for dynamic populated 
elements 
 allows selecting elements via various 
html attributes from tag name, id to 
Xpath and even CSS selector
Web harvesting 
Facebook Friends List example
Web harvesting 
Facebook Friends List example
Resolving e-mail address 
Network Method Extra Details 
1 Facebook Direct 
2 Twitter Gmail 
3 SlideShare.net Direct Advance search, by user 
4 Academia.edu Gmail 
5 Github Semi-Direct Specific query over local-part of e-mail address 
6 LinkedIn Gmail Caveats 
1) not resolving e-mail address until, user 
send invitation to e-mail address owner 
2) caching previous search queries and 
suggest them in next query round
Resolving e-mail address 
Facebook 
 Search Engine 
 Find/Invite Friends
Resolving e-mail address 
 Academia.edu  Twitter
Resolving e-mail address 
 GitHub  SlideShare
Resolving e-mail address 
LinkedIn 
 SlideShare
Resolving e-mail address 
LinkedIn 
 SlideShare
Resolving e-mail address 
LinkedIn 
 SlideShare
Demo
Results 
Overview of 
performance 
test of three 
open source 
search 
engine vs. 
implemented 
prototype 
(ScrapYA). 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 
gravatar 
twitter 
facebook 
github 
stumbleupon 
vimeo 
Youtube 
picassa 
pintrest 
klout 
foursquare 
amazon 
ebay 
aol livestream 
soundCloud 
instagram 
g+ 
home 
slideshare 
about.me 
linkedin 
academia.edu 
people smart 
pipl 
spokeo 
scrapYA
Results 
Refined 
result of 
search test 
to the Social 
Media 
platforms 
implemented 
by the 
prototype 
(ScrapYA). 
100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
scrapYA 
spokeo 
pipl 
people smart
Determining a digital profile from public social media 
information 
Karolina Stamblewska 
B00075232

More Related Content

PDF
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
PDF
Learning to detect phishing ur ls
PDF
An Ontology-based Technique for Online Profile Resolution
PDF
IRJET- Fake Profile Identification using Machine Learning
PDF
Seo report
PDF
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
PPTX
Seminar on detecting fake accounts in social media using machine learning
PDF
A survey on identification of ranking fraud for mobile applications
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
Learning to detect phishing ur ls
An Ontology-based Technique for Online Profile Resolution
IRJET- Fake Profile Identification using Machine Learning
Seo report
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
Seminar on detecting fake accounts in social media using machine learning
A survey on identification of ranking fraud for mobile applications

What's hot (7)

PPTX
Exploration of gaps in Bitly's spam detection and relevant countermeasures
PDF
An Approach to Detect and Avoid Social Engineering and Phasing Attack in Soci...
PDF
Social computing, Analysing Social Media: Theory and Hackathon
PDF
Automated Methods for Identity Resolution across Online Social Networks
DOCX
Entity linking with a knowledge baseissues, techniques, and solutions
PDF
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
PDF
Social networks protection against fake profiles and social bots attacks
Exploration of gaps in Bitly's spam detection and relevant countermeasures
An Approach to Detect and Avoid Social Engineering and Phasing Attack in Soci...
Social computing, Analysing Social Media: Theory and Hackathon
Automated Methods for Identity Resolution across Online Social Networks
Entity linking with a knowledge baseissues, techniques, and solutions
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
Social networks protection against fake profiles and social bots attacks
Ad

Similar to Determining a digital profile from public social media information. (20)

PPTX
Social Media Data Collection & Analysis
PPT
Future of Search | Yury Lifshits, Yahoo! Research
PDF
40 tools for sourcing productivity #sosuasia
PPTX
Schema and Open Graph 101 - SMX Munich
PDF
LMFAO Leveraging Machines for Awesome Outreach
PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
PPTX
Eddi: Interactive Topic-Based Browsing of Social Status Streams
PDF
Introduction to Generative Engine Optimization (GEO)
PDF
Pratical Deep Dive into the Semantic Web - #smconnect
PDF
Neuron: A Learning Project and PoC implementing a private ChatGPT like (and...
PPTX
Structured Data & Schema.org - SMX Milan 2014
PPTX
Data Science Demystified
PDF
Smashing SIlos: UX is the New SEO
PDF
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
PDF
Industrial revolution 4.0
PDF
Talent Bin
PDF
Talentbin Sales Deck
PPTX
Find the 'Unfindable' with TalentBin by Monster!
PPTX
Accelerate: AI Trends in 2018
PDF
Boston seo meetup 2-28-2017
Social Media Data Collection & Analysis
Future of Search | Yury Lifshits, Yahoo! Research
40 tools for sourcing productivity #sosuasia
Schema and Open Graph 101 - SMX Munich
LMFAO Leveraging Machines for Awesome Outreach
Jeremy cabral search marketing summit - scraping data-driven content (1)
Eddi: Interactive Topic-Based Browsing of Social Status Streams
Introduction to Generative Engine Optimization (GEO)
Pratical Deep Dive into the Semantic Web - #smconnect
Neuron: A Learning Project and PoC implementing a private ChatGPT like (and...
Structured Data & Schema.org - SMX Milan 2014
Data Science Demystified
Smashing SIlos: UX is the New SEO
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Industrial revolution 4.0
Talent Bin
Talentbin Sales Deck
Find the 'Unfindable' with TalentBin by Monster!
Accelerate: AI Trends in 2018
Boston seo meetup 2-28-2017
Ad

Recently uploaded (20)

PPTX
How to Make Sure Your Video is Optimized for SEO
PDF
Why Digital Marketing Matters in Today’s World Ask ChatGPT
PPT
memimpindegra1uejehejehdksnsjsbdkdndgggwksj
PDF
Social Media Marketing Company In Nagpur
PPTX
Smart Card Face Mask detection soluiondr
PDF
Instagram Reels Growth Guide 2025.......
PDF
Regulation Study, Differences and Implementation of Bank Indonesia National C...
PDF
TikTok Live shadow viewers_ Who watches without being counted
DOC
ASU毕业证学历认证,圣三一拉邦音乐与舞蹈学院毕业证留学本科毕业证
PDF
Transform Your Social Media, Grow Your Brand
PDF
Zero-Day-and-Zero-Click-Attacks-Advanced-Cyber-Threats.pdf
PPTX
Eric Starker - Social Media Portfolio - 2025
PDF
Why Blend In When You Can Trend? Make Me Trend
PDF
Your Breakthrough Starts Here Make Me Popular
PDF
Mastering Social Media Marketing in 2025.pdf
PDF
Buy Verified Cryptocurrency Accounts - Lori Donato's blo.pdf
DOCX
Buy Goethe A1 ,B2 ,C1 certificate online without writing
PDF
Climate Risk and Credit Allocation: How Banks Are Integrating Environmental R...
DOCX
Get More Leads From LinkedIn Ads Today .docx
PDF
25K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
How to Make Sure Your Video is Optimized for SEO
Why Digital Marketing Matters in Today’s World Ask ChatGPT
memimpindegra1uejehejehdksnsjsbdkdndgggwksj
Social Media Marketing Company In Nagpur
Smart Card Face Mask detection soluiondr
Instagram Reels Growth Guide 2025.......
Regulation Study, Differences and Implementation of Bank Indonesia National C...
TikTok Live shadow viewers_ Who watches without being counted
ASU毕业证学历认证,圣三一拉邦音乐与舞蹈学院毕业证留学本科毕业证
Transform Your Social Media, Grow Your Brand
Zero-Day-and-Zero-Click-Attacks-Advanced-Cyber-Threats.pdf
Eric Starker - Social Media Portfolio - 2025
Why Blend In When You Can Trend? Make Me Trend
Your Breakthrough Starts Here Make Me Popular
Mastering Social Media Marketing in 2025.pdf
Buy Verified Cryptocurrency Accounts - Lori Donato's blo.pdf
Buy Goethe A1 ,B2 ,C1 certificate online without writing
Climate Risk and Credit Allocation: How Banks Are Integrating Environmental R...
Get More Leads From LinkedIn Ads Today .docx
25K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf

Determining a digital profile from public social media information.

  • 1. DETERMINING A DIGITAL PROFILE FROM PUBLIC SOCIAL MEDIA INFORMATION Department of Informatics, School of Informatics and 2013/14 Engineering
  • 2. Outline  Motivation  Review of existing tools  Data harvesting  Data harvesting with Selenium 2.0  Resolving e-mail address  Demo  Results
  • 3. Motivation  Q2 201 4 surveys conducted by Ipsos MRBI with 1000 respondents aged 15
  • 4. Motivation  CPL “Employment Market monitor report” indicates that 60% of them using social media sources for pre-screening or for background digital footprinting job applicants’ prior to employment (CPL, Q3, 2013) Options Regularly Sometimes Never Do not approve of this Google 13% 26% 53% 8% LinkedIn 30% 41% 24% 5% Facebook 9% 22% 59% 10% Google+ 3% 7% 81% 9% Other 10% 4% 77% 9%
  • 5. Motivation jobseekers lying or exaggerati ng Fasle claim to speak antother language inflating IT skills  Global-Lingo.com surveys UK jobseekers market in Q1 2014  63% of jobseekers admitted to lying on their CV !!!!!!
  • 6. Motivation – “resume-less”  “Having an ability to showcase and validate a candidate’s work through a social graph (Twitter, About.me, Facebook, Slideshare, Google+, forums, etc.), search engine footprint (special URL references to projects, linkbacks, publications, etc.), network connections is much more powerful than just 1 – 2 pages and 3 prepped references. The prospective employer now has an ability to fully evaluate a candidate and understand if they are a fit or not based on actual work, not just 2 pages of crafty wording.” #socialCV Mr. Vala Afshar (Chief Customer Officer @Enterasys)
  • 7. Motivation – GitHub “Forget LinkedIn: Companies turn to GitHub to find tech talent” – CNET.COM  In the red-hot market for skilled software engineers, companies looking to make great hires are discovering that relying on traditional services that showcase candidates' work histories -- but not their actual work -- is a great way to miss out on the best available talent.  GitHub, a place where hiring managers and recruiters alike are increasingly turning to find not just the potential employees who look best on paper, but the ones that actively (and publicly) demonstrate their capabilities.
  • 9. Data harvesting  Website Application Programming Interface (also called Web API ) – provides client with interface query over website provider database via HTTP request messages. In result client gets data output in XML or JSON.  Web scraping – software based technique, which transform the unstructured data on the web (typically HTML), into structured data that can be stored and analysed
  • 10. Web harvesting - caveats Caveats Solution  Web 2.0 - highly driven on AJAX and dynamically populate HTML depends on user’s preferences and various conditions  Basic python libraries don’t catch all source code, as object may be hidden or event driven  Usually secure with SSL/TLS  Selenium 2.0  has capability like native webdriver  imitate the functionality of Android, Firefox, Google Chrome, Internet Explorer, Safari, Opera and event JavaScript HtmlUnit framework Phantomjs  perfect for dynamic populated elements  allows selecting elements via various html attributes from tag name, id to Xpath and even CSS selector
  • 11. Web harvesting Facebook Friends List example
  • 12. Web harvesting Facebook Friends List example
  • 13. Resolving e-mail address Network Method Extra Details 1 Facebook Direct 2 Twitter Gmail 3 SlideShare.net Direct Advance search, by user 4 Academia.edu Gmail 5 Github Semi-Direct Specific query over local-part of e-mail address 6 LinkedIn Gmail Caveats 1) not resolving e-mail address until, user send invitation to e-mail address owner 2) caching previous search queries and suggest them in next query round
  • 14. Resolving e-mail address Facebook  Search Engine  Find/Invite Friends
  • 15. Resolving e-mail address  Academia.edu  Twitter
  • 16. Resolving e-mail address  GitHub  SlideShare
  • 17. Resolving e-mail address LinkedIn  SlideShare
  • 18. Resolving e-mail address LinkedIn  SlideShare
  • 19. Resolving e-mail address LinkedIn  SlideShare
  • 20. Demo
  • 21. Results Overview of performance test of three open source search engine vs. implemented prototype (ScrapYA). 9 8 7 6 5 4 3 2 1 0 gravatar twitter facebook github stumbleupon vimeo Youtube picassa pintrest klout foursquare amazon ebay aol livestream soundCloud instagram g+ home slideshare about.me linkedin academia.edu people smart pipl spokeo scrapYA
  • 22. Results Refined result of search test to the Social Media platforms implemented by the prototype (ScrapYA). 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% scrapYA spokeo pipl people smart
  • 23. Determining a digital profile from public social media information Karolina Stamblewska B00075232