SlideShare a Scribd company logo
Pleasures of basic
Facebook data
shoveling
Jan Fait
STEM/MARK

Guest Lecture at Charles University,
Prague, 4.12.2013
Today we are going to talk about :

1. Why

A tiny philosophical
corner

2. How

No programming, just copy
pasting
Why would I even try to mine FB data
myself?
The Boring part

The Fun part

Why are we doing
this?
What‘s in it for you?
What are other ways
to do this?

How is it done?
What is a facebook like worth for your
business?
Here‘s why. Sample questions:
In what ways are my fans like my other customers?
What do I actually know about my fans and followers on top of
their age?

Can I group my followers into segments?
Can I target my followers based on what they (are) like ?

Which ones are creating the most activity?
What on earth are all the other ones doing?

How similar/different is my competitors fanbase?
Built-in insights are fine for fanpage
managers, but not for research

Who could have
guessed..
Limitations of FB research?
External validity

Research in social media tells you little about life outside social
media
Facebook self vs. Real self

Sampling

Only some profiles are public > Is there enough data to make
claims about my fanbase?

Organic environment

Network engineers keep changing stuff so you are in constant
need of adjustment
OK, but there are other ways..

Bambillion !

Always posted by a lady in her 40s
Indeed, there are ways:
Ask professionals and pay them accordingly

(see below)

Setup a social media login or create an app

(a rather good

investment)

Use ready-made tools and solutions

(and pay for the useful ones)

DO IT YOURSELF – PARTISAN STYLE
Come
Buy
Recommend
Return
Buy more

What does
a brand
manager
want from
a
customer?
Come
Engage
(Share)
Return
Engage more

What does a
fanpage
manager want
from a fan?
How is it done?
Obstacles ahead
Facebook developers are smart so the road is a
bit thorny
Good tools are usually not free
Open source tools are usually not as good
Its mostly fine legally
… but I am not a
technical type.
a) Find someone who is
b) Break it down into little
steps
c) Your chance to stand
out
Tools to use

(where facebook meets google and google meets microsoft)

Facebook‘s own Graph API
https://developers.facebook.com/tools/explorer

OpenRefine

http://openrefine.org/download.html
Engineered at Google Inc., formerly named Google Refine

MS Excel / iOS Numbers
Programs > MS Office / ??
Subjects to examine
(pick any fanpage or group or event)

https://www.facebook.com/Gambrinus.cz
Subjects to examine
(pick any fanpage or group or event)

https://www.facebook.com/PilsnerUrquellCzech
Stand-off

Brand

More expensive,
high-end beer

Widely and wildly
consumed cheaper
beer

Quality, tradition,
national
heritage,craftmanship

Fun, shared
moments, soccer

Number of fans

204 734

47 566

Number of posts in
2013

415

425

Product

Image

Not really competitors,have the same mothership !
Hypothesis time

H1 : Their active fanbase consists of a less 10% of the total
fans

H2 : There is more than 10% overlap in their active fanbase
H3 : Gambrinus and Pilsner Urquell have the same
engagement per post

H4 :The interest positioning will show a small affinity as beer
is widely appreaciate across the population
Action !
Step 1 - Do not fear the Graph API

https://developers.facebook.com
Step 1 - Do not fear the Graph API

https://developers.facebook.com/tools/
Step 1 - Do not fear the Graph API
Access_token !

Result window

Fields selector
https://developers.facebook.com/tools/explorer
Step 1 – Facebook is nothing but a couple
big tables

https://developers.facebook.co
m/docs/reference/fql
Step 1 – The JSON result format
(JavaScript object notation)

Graph API gives you a
result in JSON Format.
Visually disturbing
yet convenient format
used in web applications.
Wait and see how
OpenRefine handles it..

No, not this Json
Step 2 – Making a simple Graph API query
Get the id of the fanpage - many ways to do it, f.e :
1) Click on a page profile pic

2) Look in the address bar and cut the last number before
„type“
146991996743
Step 2 – Making a simple Graph API query
1) Get a fresh access_token

Important, otherwise you
will only get a handful

2) And get data from your own timeline

123455687/posts?post_id&limit=50
Step 2 – Making a more complex query
1) Repeat with our Gambrinus.cz fanpage
2) And add some more fields – query likes and comments,
increase limit, reduce timespan with a unix timestamp (135..)

146991996743/posts?fields=likes,comments
&limit=20000&since=1356998400 (from 1.1.2013)
Step 3 – Build a string to post the same
query in browser address bar
A) URL :
https://graph.facebook.com/
B) query :
146991996743/posts?fields=likes,comments&limit=20000&since=13
56998400
C) Access token :
&access_token=XXXXXXXXX……and so on
Put together A+B+C :
https://graph.facebook.com/146991996743/posts?fields=likes,comm
ents&limit=20000&since=1356998400&access_token=XXXXX
Step 4 – Run OpenRefine
1) Run the programme

(it opens in your browser)

2) Select Web Addresses
Step 5 – Paste your address into the field
1) Take our query

https://graph.facebook.com/146991996743/posts?fields=likes,comments
&limit=20000&since=1356998400&access_token=XXXXXXX

2) Paste here
3) Click next
Step 6 – Transform your result

1) Tell the programme that
your result is JSON by
clicking on „JSON Files“
Step 7 – Pick an individual node !
This is one „like“ on a post made by user Maggu Ka
Step 7 – Behold !
Click on „Create Project“ in the upper left and download data
in Excel Sheet

Be sure this does
not exceed your
„limit“ in the query,
otherwise increase
the limit
Back to Step 3 !
The only thing you need to change is the id – instead of
Gambrinus, now try the Pilsner Urquell id
Don‘t remember?

https://www.youtube.com/watch?v=vUxdB-nl0Bw
Analysis

Note : The metrics chosen could
be re- designed to reflect other
stuff like time and location

(sort of)
Engagement, like .. ehm,kiwi.. has layers
Skin : All fans
Core : Fans
who interact
regularly

Inside :
Number fans
who interact

Sample question : Has my post attracted anyone outside the usual
bunch of followers who simply like everything?
Make crude metrics of those layers
Skin : All fans = 100%
Fans with more
than 1
interaction /
All fans = 2%

Unique Ids
within
ineractions /
All fans = 7%

Tip : By messing around with the column named created_time you can
see how your core fanbase has been losing and gaining interest in your
posts and whether it kept ineracting = compute a lifetime of a fan
Try it with real Gambrinus fanpage data
47 566 = 100%
575 interactors
with more than
1 action =
1.2% (28% of
all active fans)

2004 unique
interactors =
4.2%

Tip : What are these ratios among competitors ? Isn‘t that more
important than the widely cited number of fans?? Are any of your fans
also in the competitors core fanbase? Uhh, you nasty weasels !
And now the Pilsner Urquell
204 734= 100%
715 interactors
with more than
1 action =
0.03% (30% of
all active fans)

2358 unique
interactors =
1%

Tip : What are these ratios among competitors ? Isn‘t that more
important than the widely cited number of fans?? Are any of your fans
also in the competitors core fanbase? Uhh, you nasty weasels !
Stand-off revisited. H1 rejected and H2
confirmed
Brand

Number of fans

204 734

47 566

Number of posts in
2013

415

425

Number of active
fans in 2013

2358 / 1.1%

2004 / 4.2%

Number of
repeated
interactions

715 / 30% of active

575 / 28% of active

Fanbase overlap

5% of active

Variations : Share of all interactions created by the TOP 10% fans..
How to compute average engagement?
1) You may want to try to query the „insights“ table, but
mostly no success for pages other than yours
2) Else you need all the posts with likes,comments (and
shares) already aggregated
https://graph.facebook.com/fql?q=select post_id,
like_info,comment_info,share_info from stream where
source_id=146991996743 and created_time>1356998400 and
actor_id=146991996743 LIMIT 20000&access_token=XXXXX

3) Paste this query to OpenRefine like previously and work
with Excel sheet from there

Tip : Limit the type by adding type in(46,80,128,247) to the where clause so you don‘t get posts like „group created“
Stand-off again. H3 rejected
Brand

Average
engagement

248

74

Median
Engagement

144

29

10% Top trimmed
average

169 / diff of 79

44 / diff of 30

This may look surprising, especially considering the active fanbase is
more or less equal. Seems like the total fanbase does play a role.

Tip : For more precise information, you may want to exclude the top 5% fans to see how much it changes
Study competitor‘s top posts

https://www.facebook.com/Pils
nerUrquellCzech/posts/101513
04524945974

https://www.facebook.com/Gam
brinus.cz/posts/1015158166423
1744

Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
Some conclusions
Followers have a lifespan, some are
zombies, some have left Facebook
Large group of active followers is superior
to having large zombie fanbase =>
Facebook edge rank has buried your posts
for those guys anyway.
You can make up metrics once you have
the data > sometimes better to have the
data first
The Graph API returns errors all the time,
so don‘t be discouraged..
Step 4 –
• Sum it up
The dogdy part :
Know more
about the fans
The fans are well described by their
favorites, likes, interests, ...
Facebook ids of fans + Web Scraper

You have facebook id of someone
=> you can visit her profile
You have a web scraper (like
OpenRefine) => you can visit all
the profiles without actually
browsing throught them

.. And download whatever the
browser sees..
It is against the Facebook
policies to scrape profile pages
en-masse, but its „ok“ as a
training excercise.

Pete Warden scraped 200
000 000 FB profiles and they
let the lawyers off the leash
http://www.facebook.com/apps/site_scraping_tos_ter
ms.php
Step 2 – Preparing data for Outwit Hub
OutWit Hub is a free intelligent
scraper (limited amounts of data)
Prepare the links of Pilsner fans is a
notepad file like below and File=>
Open the txt. File in Outwit Hub

http://download.cnet.
com/OutWitHub/3000-11745_410846181.html
Step 3 – Creating a scraper in Outwit Hub
Prepare a scraper
1)
2)
3)
4)

Go to the „scrapers“ tab
Click new
Name the scraper somehow
Do the rest as below

Get everything
starting with -- and ending
with
Step 4 – Running the scraper on a couple of links
Step 5 – Calculate Affinity
Count occurences of individual fanpages in the results and
compare them to the occurence in the total czech facebook
population of 3 770 000
1)
2)
3)
4)
5)

Natural affinity = Total fans of the page / 3 770 000
Pilsner affinity = Occurences in results / Fans of Pilsner
Affinity ratio = Get the ratio of the two
Repeat for all fanpages
Bring up those where occurence is the largest

Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
Step 6 – Results (sample)
Step 6 – Troubleshooting

a) Go to Preferences > Time Settings and make sure none of
the sliders is „in the red“. That would result in frequent
CAPTCHA checks on most protected servers..
b) Make sure your scraper is targeting the right domain
c) Make sure your „Marker Before“ and „Marker After“ are
actually present on the page..
d) It is becoming easier to programm an app than try to
scrape a meaningful amount of data
Thank you. Now to your questions.

fait@stemmark.cz
www.stemmark.cz

Credits for affinity idea :
Work by Jan Schmid & Josef Šlerka

Images :
Photopin.com
Download all materials at :

www.stemmark.cz/downloads/educ/fb_mining.zip

By the way, Mark Zuckerberg likes Pilsner
Urquell.

More Related Content

PPTX
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
PDF
Facebook data analysis using r
PDF
2016 Presidential Candidate Tracker
PPTX
Newsgathering and monitoring the social web
PPTX
Finding stories by newsgathering and monitoring on social web .pptx
PDF
How to Search Twitter
PDF
News-gathering and Monitoring | by: Menna El-hosary
PPT
Journalists and the Social Web 1
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Facebook data analysis using r
2016 Presidential Candidate Tracker
Newsgathering and monitoring the social web
Finding stories by newsgathering and monitoring on social web .pptx
How to Search Twitter
News-gathering and Monitoring | by: Menna El-hosary
Journalists and the Social Web 1

What's hot (20)

PPTX
WEB Data Mining
PPTX
Surfing the web
PPTX
Search engines powerpoint
PPT
Understanding Seo At A Glance
PPTX
News gathering & social media monitoring platforms
PPT
John Conroy
PPTX
Search Engine working, Crawlers working, Search Engine mechanism
PDF
Microposts2015 - Social Spam Detection on Twitter
PDF
GeospatialDataAnalysis
ODP
Web2.0.2012 - lesson 8 - Google world
PPT
Improving VIVO search through semantic ranking.
PPTX
Presentation-Detecting Spammers on Social Networks
PPT
Vivo Search
PPTX
Newsgathering & Monitoring on the Social Web
PDF
Facebook technical analysis by the Data Protection Commissioner Ireland
PDF
Facebook report appendices
PDF
Using Search Engines
PPTX
Live Social Semantics @ ESWC2010
PPTX
This presentation is based on alan november’s book
PPTX
Finding stories on social media
WEB Data Mining
Surfing the web
Search engines powerpoint
Understanding Seo At A Glance
News gathering & social media monitoring platforms
John Conroy
Search Engine working, Crawlers working, Search Engine mechanism
Microposts2015 - Social Spam Detection on Twitter
GeospatialDataAnalysis
Web2.0.2012 - lesson 8 - Google world
Improving VIVO search through semantic ranking.
Presentation-Detecting Spammers on Social Networks
Vivo Search
Newsgathering & Monitoring on the Social Web
Facebook technical analysis by the Data Protection Commissioner Ireland
Facebook report appendices
Using Search Engines
Live Social Semantics @ ESWC2010
This presentation is based on alan november’s book
Finding stories on social media
Ad

Viewers also liked (20)

PDF
Social Data Mining
PDF
Data mining in social network
PPTX
Social media mining PPT
PPTX
Data mining for social media
PPT
Identification of User Patterns in Social Networks by Data Mining Techniques:...
PPTX
http://www.slideshare.net/stemmark/radioprojekt-3-a-4-q
PPTX
Five steps to search and store tweets by keywords
PPTX
Data Mining: Graph mining and social network analysis
PPTX
Data Mining: Graph mining and social network analysis
PDF
Data Mining in Facebook
PDF
Mining Facebook for Feelings
PPTX
Mining the Social Graph: How Digital Publishers Use Facebook Data to Delight ...
PDF
My First Data Science Project (using Rapid Miner)
PPTX
Social Media Mining - Chapter 3 (Network Measures)
PPTX
Data mining
PDF
2017 Digital Yearbook
PDF
Beyond the Gig Economy
PDF
African Americans: College Majors and Earnings
PDF
The Online College Labor Market
PDF
Digital in 2016
Social Data Mining
Data mining in social network
Social media mining PPT
Data mining for social media
Identification of User Patterns in Social Networks by Data Mining Techniques:...
http://www.slideshare.net/stemmark/radioprojekt-3-a-4-q
Five steps to search and store tweets by keywords
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
Data Mining in Facebook
Mining Facebook for Feelings
Mining the Social Graph: How Digital Publishers Use Facebook Data to Delight ...
My First Data Science Project (using Rapid Miner)
Social Media Mining - Chapter 3 (Network Measures)
Data mining
2017 Digital Yearbook
Beyond the Gig Economy
African Americans: College Majors and Earnings
The Online College Labor Market
Digital in 2016
Ad

Similar to DIY basic Facebook data mining (20)

PDF
Open Audience Manager Vidar Brekke Presentation - BDI 11/11/10 Social Commerc...
PPTX
The New Facebook: A Brand's Perspective
PDF
Phil Mohr (Comufy) and Jamie Kenny (Byte London)
PPT
Facebook Training
PDF
Facebook Graph Search
PDF
Social media analytics and measurement tool - Simplify360
PPTX
HKWAW Event - 20100428
PDF
Facebook for Business Master class
PDF
Facebook Lessons 2011
PDF
Facebook lessons 2011
PPTX
PDF
Ways understand fans II. - Facebook
PPT
Uwex facebook
PDF
People interest analysis on Facebook.
PPTX
Measuring performance of a brand on facebook
PPTX
The Flash Facebook Cookbook - FlashMidlands
PDF
The 2015 facebook industry report
PDF
Facebook Industry Report 2015
PDF
September’s Facebook Changes: Implications for Marketers
PPT
Facebook
Open Audience Manager Vidar Brekke Presentation - BDI 11/11/10 Social Commerc...
The New Facebook: A Brand's Perspective
Phil Mohr (Comufy) and Jamie Kenny (Byte London)
Facebook Training
Facebook Graph Search
Social media analytics and measurement tool - Simplify360
HKWAW Event - 20100428
Facebook for Business Master class
Facebook Lessons 2011
Facebook lessons 2011
Ways understand fans II. - Facebook
Uwex facebook
People interest analysis on Facebook.
Measuring performance of a brand on facebook
The Flash Facebook Cookbook - FlashMidlands
The 2015 facebook industry report
Facebook Industry Report 2015
September’s Facebook Changes: Implications for Marketers
Facebook

More from STEM/MARK (20)

PDF
Rezonance infografika final
PDF
Fotokniha: Kde fondy EU pomáhají
PDF
Brožura: Evropská unie – Všechno, co potřebujete vědět
PDF
Letáky: Kde fondy EU pomáhají
PPTX
Vnitrofiremní klima a recruitment
PPTX
Media Projekt 1Q + 2Q
PPTX
Radioprojekt 1 a 2Q. 2018
PPTX
Test metodik NPS
PDF
Volební guláš
PPTX
Radioprojekt 2017 Q4
PPTX
Radioprojekt 1 a 2Q. 2017
PPTX
Radioprojekt 3. a 4. Q
PDF
EURO CUP 2016 - how was the European football championship observed by the Cz...
PDF
Rio 2016: how were the Olympic Games observed on the Czech web? Case study by...
PPTX
Generace 55+
PDF
Dušičky halloween
PDF
Mezinárodní den stomiků
PDF
Mamahotel po Česku
PDF
Mobbing
PDF
Hledám chlapa, hledám ženskou - CSM ve výzkumu trhu
Rezonance infografika final
Fotokniha: Kde fondy EU pomáhají
Brožura: Evropská unie – Všechno, co potřebujete vědět
Letáky: Kde fondy EU pomáhají
Vnitrofiremní klima a recruitment
Media Projekt 1Q + 2Q
Radioprojekt 1 a 2Q. 2018
Test metodik NPS
Volební guláš
Radioprojekt 2017 Q4
Radioprojekt 1 a 2Q. 2017
Radioprojekt 3. a 4. Q
EURO CUP 2016 - how was the European football championship observed by the Cz...
Rio 2016: how were the Olympic Games observed on the Czech web? Case study by...
Generace 55+
Dušičky halloween
Mezinárodní den stomiků
Mamahotel po Česku
Mobbing
Hledám chlapa, hledám ženskou - CSM ve výzkumu trhu

Recently uploaded (20)

PPTX
Result-Driven Social Media Marketing Services | Boost ROI
PDF
Subscribe This Channel Subscribe Back You
PPTX
Developing lesson plan gejegkavbw gagsgf
PDF
Why Digital Marketing Matters in Today’s World Ask ChatGPT
PDF
FINAL-Content-Marketing-Made-Easy-Workbook-Guied-Editable.pdf
PDF
Instant Audience, Long-Term Impact Buy Real Telegram Members
PPT
memimpindegra1uejehejehdksnsjsbdkdndgggwksj
PDF
Transform Your Social Media, Grow Your Brand
PDF
Instagram Reels Growth Guide 2025.......
PPTX
Preposition and Asking and Responding Suggestion.pptx
PPTX
How Social Media Influencers Repurpose Content (1).pptx
PPTX
Office Administration Courses in Trivandrum That Employers Value.pptx
PDF
Your Best Post Vanished. Blame the Attention Economy
PDF
Mastering Social Media Marketing in 2025.pdf
PDF
The Edge You’ve Been Missing Get the Sociocosmos Edge
PDF
Real Presence. Real Power. Boost with Authenticity
PDF
Presence That Pays Off Activate My Social Growth
PDF
25K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
DOCX
Buy Goethe A1 ,B2 ,C1 certificate online without writing
PDF
Live Echo Boost on TikTok_ Double Devices, Higher Ranks
Result-Driven Social Media Marketing Services | Boost ROI
Subscribe This Channel Subscribe Back You
Developing lesson plan gejegkavbw gagsgf
Why Digital Marketing Matters in Today’s World Ask ChatGPT
FINAL-Content-Marketing-Made-Easy-Workbook-Guied-Editable.pdf
Instant Audience, Long-Term Impact Buy Real Telegram Members
memimpindegra1uejehejehdksnsjsbdkdndgggwksj
Transform Your Social Media, Grow Your Brand
Instagram Reels Growth Guide 2025.......
Preposition and Asking and Responding Suggestion.pptx
How Social Media Influencers Repurpose Content (1).pptx
Office Administration Courses in Trivandrum That Employers Value.pptx
Your Best Post Vanished. Blame the Attention Economy
Mastering Social Media Marketing in 2025.pdf
The Edge You’ve Been Missing Get the Sociocosmos Edge
Real Presence. Real Power. Boost with Authenticity
Presence That Pays Off Activate My Social Growth
25K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
Buy Goethe A1 ,B2 ,C1 certificate online without writing
Live Echo Boost on TikTok_ Double Devices, Higher Ranks

DIY basic Facebook data mining

  • 1. Pleasures of basic Facebook data shoveling Jan Fait STEM/MARK Guest Lecture at Charles University, Prague, 4.12.2013
  • 2. Today we are going to talk about : 1. Why A tiny philosophical corner 2. How No programming, just copy pasting
  • 3. Why would I even try to mine FB data myself? The Boring part The Fun part Why are we doing this? What‘s in it for you? What are other ways to do this? How is it done?
  • 4. What is a facebook like worth for your business?
  • 5. Here‘s why. Sample questions: In what ways are my fans like my other customers? What do I actually know about my fans and followers on top of their age? Can I group my followers into segments? Can I target my followers based on what they (are) like ? Which ones are creating the most activity? What on earth are all the other ones doing? How similar/different is my competitors fanbase?
  • 6. Built-in insights are fine for fanpage managers, but not for research Who could have guessed..
  • 7. Limitations of FB research? External validity Research in social media tells you little about life outside social media Facebook self vs. Real self Sampling Only some profiles are public > Is there enough data to make claims about my fanbase? Organic environment Network engineers keep changing stuff so you are in constant need of adjustment
  • 8. OK, but there are other ways.. Bambillion ! Always posted by a lady in her 40s
  • 9. Indeed, there are ways: Ask professionals and pay them accordingly (see below) Setup a social media login or create an app (a rather good investment) Use ready-made tools and solutions (and pay for the useful ones) DO IT YOURSELF – PARTISAN STYLE
  • 10. Come Buy Recommend Return Buy more What does a brand manager want from a customer?
  • 11. Come Engage (Share) Return Engage more What does a fanpage manager want from a fan?
  • 12. How is it done?
  • 13. Obstacles ahead Facebook developers are smart so the road is a bit thorny Good tools are usually not free Open source tools are usually not as good Its mostly fine legally
  • 14. … but I am not a technical type. a) Find someone who is b) Break it down into little steps c) Your chance to stand out
  • 15. Tools to use (where facebook meets google and google meets microsoft) Facebook‘s own Graph API https://developers.facebook.com/tools/explorer OpenRefine http://openrefine.org/download.html Engineered at Google Inc., formerly named Google Refine MS Excel / iOS Numbers Programs > MS Office / ??
  • 16. Subjects to examine (pick any fanpage or group or event) https://www.facebook.com/Gambrinus.cz
  • 17. Subjects to examine (pick any fanpage or group or event) https://www.facebook.com/PilsnerUrquellCzech
  • 18. Stand-off Brand More expensive, high-end beer Widely and wildly consumed cheaper beer Quality, tradition, national heritage,craftmanship Fun, shared moments, soccer Number of fans 204 734 47 566 Number of posts in 2013 415 425 Product Image Not really competitors,have the same mothership !
  • 19. Hypothesis time H1 : Their active fanbase consists of a less 10% of the total fans H2 : There is more than 10% overlap in their active fanbase H3 : Gambrinus and Pilsner Urquell have the same engagement per post H4 :The interest positioning will show a small affinity as beer is widely appreaciate across the population
  • 21. Step 1 - Do not fear the Graph API https://developers.facebook.com
  • 22. Step 1 - Do not fear the Graph API https://developers.facebook.com/tools/
  • 23. Step 1 - Do not fear the Graph API Access_token ! Result window Fields selector https://developers.facebook.com/tools/explorer
  • 24. Step 1 – Facebook is nothing but a couple big tables https://developers.facebook.co m/docs/reference/fql
  • 25. Step 1 – The JSON result format (JavaScript object notation) Graph API gives you a result in JSON Format. Visually disturbing yet convenient format used in web applications. Wait and see how OpenRefine handles it.. No, not this Json
  • 26. Step 2 – Making a simple Graph API query Get the id of the fanpage - many ways to do it, f.e : 1) Click on a page profile pic 2) Look in the address bar and cut the last number before „type“ 146991996743
  • 27. Step 2 – Making a simple Graph API query 1) Get a fresh access_token Important, otherwise you will only get a handful 2) And get data from your own timeline 123455687/posts?post_id&limit=50
  • 28. Step 2 – Making a more complex query 1) Repeat with our Gambrinus.cz fanpage 2) And add some more fields – query likes and comments, increase limit, reduce timespan with a unix timestamp (135..) 146991996743/posts?fields=likes,comments &limit=20000&since=1356998400 (from 1.1.2013)
  • 29. Step 3 – Build a string to post the same query in browser address bar A) URL : https://graph.facebook.com/ B) query : 146991996743/posts?fields=likes,comments&limit=20000&since=13 56998400 C) Access token : &access_token=XXXXXXXXX……and so on Put together A+B+C : https://graph.facebook.com/146991996743/posts?fields=likes,comm ents&limit=20000&since=1356998400&access_token=XXXXX
  • 30. Step 4 – Run OpenRefine 1) Run the programme (it opens in your browser) 2) Select Web Addresses
  • 31. Step 5 – Paste your address into the field 1) Take our query https://graph.facebook.com/146991996743/posts?fields=likes,comments &limit=20000&since=1356998400&access_token=XXXXXXX 2) Paste here 3) Click next
  • 32. Step 6 – Transform your result 1) Tell the programme that your result is JSON by clicking on „JSON Files“
  • 33. Step 7 – Pick an individual node ! This is one „like“ on a post made by user Maggu Ka
  • 34. Step 7 – Behold ! Click on „Create Project“ in the upper left and download data in Excel Sheet Be sure this does not exceed your „limit“ in the query, otherwise increase the limit
  • 35. Back to Step 3 ! The only thing you need to change is the id – instead of Gambrinus, now try the Pilsner Urquell id Don‘t remember? https://www.youtube.com/watch?v=vUxdB-nl0Bw
  • 36. Analysis Note : The metrics chosen could be re- designed to reflect other stuff like time and location (sort of)
  • 37. Engagement, like .. ehm,kiwi.. has layers Skin : All fans Core : Fans who interact regularly Inside : Number fans who interact Sample question : Has my post attracted anyone outside the usual bunch of followers who simply like everything?
  • 38. Make crude metrics of those layers Skin : All fans = 100% Fans with more than 1 interaction / All fans = 2% Unique Ids within ineractions / All fans = 7% Tip : By messing around with the column named created_time you can see how your core fanbase has been losing and gaining interest in your posts and whether it kept ineracting = compute a lifetime of a fan
  • 39. Try it with real Gambrinus fanpage data 47 566 = 100% 575 interactors with more than 1 action = 1.2% (28% of all active fans) 2004 unique interactors = 4.2% Tip : What are these ratios among competitors ? Isn‘t that more important than the widely cited number of fans?? Are any of your fans also in the competitors core fanbase? Uhh, you nasty weasels !
  • 40. And now the Pilsner Urquell 204 734= 100% 715 interactors with more than 1 action = 0.03% (30% of all active fans) 2358 unique interactors = 1% Tip : What are these ratios among competitors ? Isn‘t that more important than the widely cited number of fans?? Are any of your fans also in the competitors core fanbase? Uhh, you nasty weasels !
  • 41. Stand-off revisited. H1 rejected and H2 confirmed Brand Number of fans 204 734 47 566 Number of posts in 2013 415 425 Number of active fans in 2013 2358 / 1.1% 2004 / 4.2% Number of repeated interactions 715 / 30% of active 575 / 28% of active Fanbase overlap 5% of active Variations : Share of all interactions created by the TOP 10% fans..
  • 42. How to compute average engagement? 1) You may want to try to query the „insights“ table, but mostly no success for pages other than yours 2) Else you need all the posts with likes,comments (and shares) already aggregated https://graph.facebook.com/fql?q=select post_id, like_info,comment_info,share_info from stream where source_id=146991996743 and created_time>1356998400 and actor_id=146991996743 LIMIT 20000&access_token=XXXXX 3) Paste this query to OpenRefine like previously and work with Excel sheet from there Tip : Limit the type by adding type in(46,80,128,247) to the where clause so you don‘t get posts like „group created“
  • 43. Stand-off again. H3 rejected Brand Average engagement 248 74 Median Engagement 144 29 10% Top trimmed average 169 / diff of 79 44 / diff of 30 This may look surprising, especially considering the active fanbase is more or less equal. Seems like the total fanbase does play a role. Tip : For more precise information, you may want to exclude the top 5% fans to see how much it changes
  • 44. Study competitor‘s top posts https://www.facebook.com/Pils nerUrquellCzech/posts/101513 04524945974 https://www.facebook.com/Gam brinus.cz/posts/1015158166423 1744 Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
  • 45. Some conclusions Followers have a lifespan, some are zombies, some have left Facebook Large group of active followers is superior to having large zombie fanbase => Facebook edge rank has buried your posts for those guys anyway. You can make up metrics once you have the data > sometimes better to have the data first The Graph API returns errors all the time, so don‘t be discouraged..
  • 46. Step 4 – • Sum it up The dogdy part : Know more about the fans
  • 47. The fans are well described by their favorites, likes, interests, ...
  • 48. Facebook ids of fans + Web Scraper You have facebook id of someone => you can visit her profile You have a web scraper (like OpenRefine) => you can visit all the profiles without actually browsing throught them .. And download whatever the browser sees.. It is against the Facebook policies to scrape profile pages en-masse, but its „ok“ as a training excercise. Pete Warden scraped 200 000 000 FB profiles and they let the lawyers off the leash http://www.facebook.com/apps/site_scraping_tos_ter ms.php
  • 49. Step 2 – Preparing data for Outwit Hub OutWit Hub is a free intelligent scraper (limited amounts of data) Prepare the links of Pilsner fans is a notepad file like below and File=> Open the txt. File in Outwit Hub http://download.cnet. com/OutWitHub/3000-11745_410846181.html
  • 50. Step 3 – Creating a scraper in Outwit Hub Prepare a scraper 1) 2) 3) 4) Go to the „scrapers“ tab Click new Name the scraper somehow Do the rest as below Get everything starting with -- and ending with
  • 51. Step 4 – Running the scraper on a couple of links
  • 52. Step 5 – Calculate Affinity Count occurences of individual fanpages in the results and compare them to the occurence in the total czech facebook population of 3 770 000 1) 2) 3) 4) 5) Natural affinity = Total fans of the page / 3 770 000 Pilsner affinity = Occurences in results / Fans of Pilsner Affinity ratio = Get the ratio of the two Repeat for all fanpages Bring up those where occurence is the largest Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
  • 53. Step 6 – Results (sample)
  • 54. Step 6 – Troubleshooting a) Go to Preferences > Time Settings and make sure none of the sliders is „in the red“. That would result in frequent CAPTCHA checks on most protected servers.. b) Make sure your scraper is targeting the right domain c) Make sure your „Marker Before“ and „Marker After“ are actually present on the page.. d) It is becoming easier to programm an app than try to scrape a meaningful amount of data
  • 55. Thank you. Now to your questions. fait@stemmark.cz www.stemmark.cz Credits for affinity idea : Work by Jan Schmid & Josef Šlerka Images : Photopin.com
  • 56. Download all materials at : www.stemmark.cz/downloads/educ/fb_mining.zip By the way, Mark Zuckerberg likes Pilsner Urquell.