SlideShare a Scribd company logo
Good morning!

Enjoy your coffee and install
Putty and NotepadPlus via "Software Maintance/Application
Catalgue". And the Pattern-package (see my e-mail). Thanks.
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Hands-on-Workshop
Big (Twitter) Data
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam

30 January 2014
9.30
#bigdata

Damian Trilling
Analyzing social media with Python and other tools (1/4)
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

The next one and a half days
You’ll hear about
• Collecting social media data via APIs, RSS and scraping (and

the tools for it)
• Technical infrastructure (via surfsara)
• Python
• Sentiment analysis
• Automated coding
• Frequencies and other statistics
• Social network analysis with Gephi
• ...

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

In this session (1/4):
1 Big Data? What are we talking about?

Exploring the field
Some examples
2 The process: collect, store, analyze

A scheme
Our implementation
3 Python

What it is
When to use it
When not to use it
4 Questions?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What’s big data?
What are we talking about?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?
Today, it’s a hands-on workshop, so let’s keep this important (!)
discussion for later.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?

So, no definition, but some brief thoughts
• Existing data ( = experiments or surveys)
• Too big to code manually
• Too big to handle with normal tools
• New research questions
• Call to revisit the relationship between theory and empirical

research

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?
Today, . . .
• we are not going to talk about REALLY BIG data,
• but we will have some exercises on datasets a normal

computer can handle

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?
Today, . . .
• we are not going to talk about REALLY BIG data,
• but we will have some exercises on datasets a normal

computer can handle

Tomorrow, . . .
• we will also learn about scaling up these techniques
• SurfSARA provides infrastructure for this

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

What are we talking about?

Some sources
• Social Network Sites
• RSS-feeds
• Databases
• Scraping text from the web
• ...

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

It’s out there!
You only have to collect it.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

But why should we care?
We can answer new questions
• Find needles in haystacks
• Identify networks, co-word analysis, linguistic analysis, . . .
• Verify our theories in larger datasets

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Exploring the field

But why should we care?
We can answer new questions
• Find needles in haystacks
• Identify networks, co-word analysis, linguistic analysis, . . .
• Verify our theories in larger datasets

It makes sense
• There are things that computers are simply better at than

humans, e.g. in counting things
• Having human coders look for words in texts is like calculating

a regression analysis by hand

#bigdata

Damian Trilling
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Some examples

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent master thesis

The needle in the haystack

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent master thesis

The needle in the haystack
Imagine you want to analyze some very rare content.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent master thesis

The needle in the haystack
Imagine you want to analyze some very rare content.
Normal sampling won’t work, that’s for sure.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites
1

Collect all articles from nine news sites during a period of two
months, resulting in a database with 74.000 articles.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites
1

Collect all articles from nine news sites during a period of two
months, resulting in a database with 74.000 articles.

2

Filter articles containing specific keywords.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites
1

Collect all articles from nine news sites during a period of two
months, resulting in a database with 74.000 articles.

2

Filter articles containing specific keywords.

3

Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a
source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

It’s just one line of code!

url.txt
http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne
http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung
http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten
http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik
http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest
http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food
...
...
...

#bigdata

wget-commando
wget -i urls.txt

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent bachelor thesis

Tone in tweets

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent bachelor thesis

Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

A recent bachelor thesis

Tone in tweets
Imagine you want to know something about someone’s behavior on
twitter. Or how a specific topic is discussed on Twitter.
Do you really want to go through thousands of tweets by hand?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.
She used a Python-script to check which type of words was used to
refer to opponents.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

So you’d better think about automating your coding
Finding out how negative or positive politicians are towards
their opponents
The student took lists with positive and negative words and made
additional ones with a politician’s opponents.
She used a Python-script to check which type of words was used to
refer to opponents.
For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende
factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en
politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Frame adoption on Twitter

Which phrases used by Merkel and Steinbrück on TV make it
to the #tvduell discussion on Twitter?
Identify frequently used words in the transcript of the debate and
in tweets.
Find co-occurrances.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Some examples

Frame adoption on Twitter

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

A scheme

The process: collect, store, analyze
A scheme

#bigdata

Damian Trilling
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Analyzing social media with Python and other tools (1/4)
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl
yourTwapperkeeper
Continuosly calls the Twitter-API and saves all
tweets containing specific hashtags to a
mySQL-database.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl
yourTwapperkeeper
Continuosly calls the Twitter-API and saves all
tweets containing specific hashtags to a
mySQL-database.

rsshond
Calls the RSS-feeds of news sites 1x/hour,
saves title, time, header, and teaser of all new
articles into a CSV-table, follows the link to
the full text and downloads them.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl
yourTwapperkeeper
Continuosly calls the Twitter-API and saves all
tweets containing specific hashtags to a
mySQL-database.

rsshond
Calls the RSS-feeds of news sites 1x/hour,
saves title, time, header, and teaser of all new
articles into a CSV-table, follows the link to
the full text and downloads them.

snapshot
Visits some URLs every 4x/day and downloads
them.
#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to access the collected data?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to access the collected data?
Apache-webserver
Download the data from
http://datacollection.
followthenews-uva.cloudlet.sara.nl.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to access the collected data?
Apache-webserver
Download the data from
http://datacollection.
followthenews-uva.cloudlet.sara.nl.

SSH (scp)
Transfer data directly to your computer or
another server (like
speeltuin.followthenews-uva.cloudlet.sara.nl)

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Our implementation

How to access the collected data?
Apache-webserver
Download the data from
http://datacollection.
followthenews-uva.cloudlet.sara.nl.

SSH (scp)
Transfer data directly to your computer or
another server (like
speeltuin.followthenews-uva.cloudlet.sara.nl)

Beehub
Connect the server to beehub, which can be
mounted like the "p-schijf" or accessed online.
#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Python

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to rule them all?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to rule them all?

Of course there are ready-made tool for some of the questions we
want to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

One tool to rule them all?

Of course there are ready-made tool for some of the questions we
want to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need.

fun!

#bigdata

And it’s

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to

process
• You can run it on every platform

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to

process
• You can run it on every platform
• And yet it is easy to learn!

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data
• There are virtually no limits regarding the amount of data to

process
• You can run it on every platform
• And yet it is easy to learn!

It is widely used for content analysis
• Many online ressources and toolkits
• Books about NLP and Web Scraping with Python

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not have to become a
programmer.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not have to become a
programmer. If you know how to
write SPSS or STATA syntax, you
will understand Python.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not have to become a
programmer. If you know how to
write SPSS or STATA syntax, you
will understand Python.
(But if you have ever had contact with whatever programming language,
it helps.)

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You do not have to become a
programmer. If you know how to
write SPSS or STATA syntax, you
will understand Python.
(But if you have ever had contact with whatever programming language,

It’s enough if you can read and
modify the code.
it helps.)

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1

#bigdata

The data structure: You have a folder with articles

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1
2

#bigdata

The data structure: You have a folder with articles
The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentioned
between Israeli and Palestinian news coverage?
1
2

The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned

3

#bigdata

The data structure: You have a folder with articles

A typical task for a short Python script!

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

You need someting like this:
for every file in folder:
read the file
count actors
add new row to table with filename and actor counts
save table
(such a notation is called pseudo-code)

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

mypath ="C:UsersRicardaDocumentsArtikelen"
regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’)
filename_list=[]
matchcount54=0
matchcount54_list=[]
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
for f in onlyfiles:
matchcount54=0
artikel=open(join(mypath,f),"r")
for line in artikel:
matches54 = regex54.findall(line)
for word in matches54:
matchcount54=matchcount54+1
filename_list.append(f)
matchcount54_list.append(matchcount54)
artikel.close()
output=zip(filename_list,matchcount54_list)
writer = csv.writer(open("overzichtstabel.csv", ’wb’))
writer.writerows(output)
#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

What it is

This is not too different from a script Jelle uses for his dissertation.
The main difference: He doesn’t code regular expressions, but
calculates document similarity.
slides-jelle.pdf

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

When to use Python

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

1st group of tasks

Highly repetitive tasks
Simple tasks (counting things, comparing texts, . . . ) that can be
described in a formalized way. Saves time even with few cases, but
there is virtually no size limit.
Example: Retweets start with RT, optionally followed by a space,
and some letters. So it is very easy to identify them automatically

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

2nd group of tasks

Task for which specific Python modules exist
There are thousands of modules suitable for text analysis. You
basically only have to write code for data input and output.
Example: Sentiment analysis

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When to use it

3rd group of tasks

API’s, RSS, webscraping . . .
You can use Python if you want to collect and store information.
Example: Collecting bio’s of Twitter users, scraping the web (data
journalism!), downloading Facebook data

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

When not to use Python

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Maybe you do not need to write a Python script . . .

. . . when there are already suitable tools available.
Sometimes, the perfect ready-made tool already exists.

Example: Axel Bruns’ awk-scripts for Twitter analysis
(www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in
Python, but hey, he did it already with awk and it works.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Maybe you do not need to write a Python script . . .

. . . when there are already suitable tools available.
Sometimes, the perfect ready-made tool already exists.
But still, sometimes it is more efficient to write something that does exactly
what you want
Example: Axel Bruns’ awk-scripts for Twitter analysis
(www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in
Python, but hey, he did it already with awk and it works.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

And, let’s face it,. . .

. . . we are no programmers.
So maybe, some tasks are too complex for us to program ourselves.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

And, let’s face it,. . .

. . . we are no programmers.
So maybe, some tasks are too complex for us to program ourselves.
But there is a huge online community that helps you.

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

Recap
1 Big Data? What are we talking about?

Exploring the field
Some examples
2 The process: collect, store, analyze

A scheme
Our implementation
3 Python

What it is
When to use it
When not to use it
4 Questions?

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

When not to use it

After the break

Hand’s on! Exploring a basic Python script

#bigdata

Damian Trilling
Big Data? What are we talking about?

The process: collect, store, analyze

Python

Questions?

Vragen of opmerkingen?

Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
#bigdata

Damian Trilling

More Related Content

What's hot (20)

PPTX
Python for Big Data Analytics
PPTX
Informatics is a natural science
PPTX
Python and BIG Data analytics | Python Fundamentals | Python Architecture
PDF
Kim Hammar Msc Thesis Defense - 2018
PPTX
Frontiers of Computational Journalism week 2 - Text Analysis
Python for Big Data Analytics
Informatics is a natural science
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Kim Hammar Msc Thesis Defense - 2018
Frontiers of Computational Journalism week 2 - Text Analysis
Ad

Similar to Analyzing social media with Python and other tools (1/4) (20)

PDF
Concepts, use cases and principles to build big data systems (1)
PDF
OpenFest 2012 : Leveraging the public internet
PDF
Data science presentation
PPTX
Python PPT
PDF
SuanIct-Bigdata desktop-final
PDF
Google Cloud - Google's vision on AI
PPTX
2014 pycon-talk
PDF
Introduction To Data Science With Python
PDF
GalvanizeU Seattle: Eleven Almost-Truisms About Data
PDF
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
PPT
Searching tech2
PDF
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
PDF
PPTX
Foundations of Big Data: Concepts, Techniques, and Applications
PDF
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
PDF
Deep Learning using Tensorflow and Data Science Experience
PPTX
SKILLWISE-BIGDATA ANALYSIS
PPTX
Algorithm Marketplace and the new "Algorithm Economy"
Concepts, use cases and principles to build big data systems (1)
OpenFest 2012 : Leveraging the public internet
Data science presentation
Python PPT
SuanIct-Bigdata desktop-final
Google Cloud - Google's vision on AI
2014 pycon-talk
Introduction To Data Science With Python
GalvanizeU Seattle: Eleven Almost-Truisms About Data
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Searching tech2
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Foundations of Big Data: Concepts, Techniques, and Applications
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
Deep Learning using Tensorflow and Data Science Experience
SKILLWISE-BIGDATA ANALYSIS
Algorithm Marketplace and the new "Algorithm Economy"
Ad

More from Department of Communication Science, University of Amsterdam (8)

PDF
Media diets in an age of apps and social media: Dealing with a third layer of...
PDF
Conceptualizing and measuring news exposure as network of users and news items
PDF
Data Science: Case "Political Communication 2/2"
PDF
Data Science: Case "Political Communication 1/2"
Media diets in an age of apps and social media: Dealing with a third layer of...
Conceptualizing and measuring news exposure as network of users and news items
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 1/2"

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Pre independence Education in Inndia.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Institutional Correction lecture only . . .
PPTX
Lesson notes of climatology university.
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Microbial diseases, their pathogenesis and prophylaxis
2.FourierTransform-ShortQuestionswithAnswers.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
TR - Agricultural Crops Production NC III.pdf
GDM (1) (1).pptx small presentation for students
Module 4: Burden of Disease Tutorial Slides S2 2025
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pre independence Education in Inndia.pdf
Supply Chain Operations Speaking Notes -ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Sports Quiz easy sports quiz sports quiz
Institutional Correction lecture only . . .
Lesson notes of climatology university.
FourierSeries-QuestionsWithAnswers(Part-A).pdf
O7-L3 Supply Chain Operations - ICLT Program
Basic Mud Logging Guide for educational purpose
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx

Analyzing social media with Python and other tools (1/4)

  • 1. Good morning! Enjoy your coffee and install Putty and NotepadPlus via "Software Maintance/Application Catalgue". And the Pattern-package (see my e-mail). Thanks.
  • 2. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Hands-on-Workshop Big (Twitter) Data Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 30 January 2014 9.30 #bigdata Damian Trilling
  • 4. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? The next one and a half days You’ll hear about • Collecting social media data via APIs, RSS and scraping (and the tools for it) • Technical infrastructure (via surfsara) • Python • Sentiment analysis • Automated coding • Frequencies and other statistics • Social network analysis with Gephi • ... #bigdata Damian Trilling
  • 5. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? In this session (1/4): 1 Big Data? What are we talking about? Exploring the field Some examples 2 The process: collect, store, analyze A scheme Our implementation 3 Python What it is When to use it When not to use it 4 Questions? #bigdata Damian Trilling
  • 6. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What’s big data? What are we talking about? #bigdata Damian Trilling
  • 7. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, it’s a hands-on workshop, so let’s keep this important (!) discussion for later. #bigdata Damian Trilling
  • 8. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? So, no definition, but some brief thoughts • Existing data ( = experiments or surveys) • Too big to code manually • Too big to handle with normal tools • New research questions • Call to revisit the relationship between theory and empirical research #bigdata Damian Trilling
  • 9. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, . . . • we are not going to talk about REALLY BIG data, • but we will have some exercises on datasets a normal computer can handle #bigdata Damian Trilling
  • 10. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Today, . . . • we are not going to talk about REALLY BIG data, • but we will have some exercises on datasets a normal computer can handle Tomorrow, . . . • we will also learn about scaling up these techniques • SurfSARA provides infrastructure for this #bigdata Damian Trilling
  • 11. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field What are we talking about? Some sources • Social Network Sites • RSS-feeds • Databases • Scraping text from the web • ... #bigdata Damian Trilling
  • 12. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field It’s out there! You only have to collect it. #bigdata Damian Trilling
  • 13. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field But why should we care? We can answer new questions • Find needles in haystacks • Identify networks, co-word analysis, linguistic analysis, . . . • Verify our theories in larger datasets #bigdata Damian Trilling
  • 14. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Exploring the field But why should we care? We can answer new questions • Find needles in haystacks • Identify networks, co-word analysis, linguistic analysis, . . . • Verify our theories in larger datasets It makes sense • There are things that computers are simply better at than humans, e.g. in counting things • Having human coders look for words in texts is like calculating a regression analysis by hand #bigdata Damian Trilling
  • 17. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Some examples #bigdata Damian Trilling
  • 18. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack #bigdata Damian Trilling
  • 19. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack Imagine you want to analyze some very rare content. #bigdata Damian Trilling
  • 20. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent master thesis The needle in the haystack Imagine you want to analyze some very rare content. Normal sampling won’t work, that’s for sure. #bigdata Damian Trilling
  • 21. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 22. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 23. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. 2 Filter articles containing specific keywords. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 24. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better collect everything first Getting all news coverage from Dutch news sites 1 Collect all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. 2 Filter articles containing specific keywords. 3 Those 292 articles where then manually coded. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 25. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  • 26. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples It’s just one line of code! url.txt http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food ... ... ... #bigdata wget-commando wget -i urls.txt Damian Trilling
  • 27. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets #bigdata Damian Trilling
  • 28. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. #bigdata Damian Trilling
  • 29. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. Do you really want to go through thousands of tweets by hand? #bigdata Damian Trilling
  • 30. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 31. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 32. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. She used a Python-script to check which type of words was used to refer to opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 33. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples So you’d better think about automating your coding Finding out how negative or positive politicians are towards their opponents The student took lists with positive and negative words and made additional ones with a politician’s opponents. She used a Python-script to check which type of words was used to refer to opponents. For further analysis, the results where imported in SPSS. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
  • 34. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  • 35. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples #bigdata Damian Trilling
  • 36. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Frame adoption on Twitter Which phrases used by Merkel and Steinbrück on TV make it to the #tvduell discussion on Twitter? Identify frequently used words in the transcript of the debate and in tweets. Find co-occurrances. #bigdata Damian Trilling
  • 37. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Some examples Frame adoption on Twitter #bigdata Damian Trilling
  • 38. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? A scheme The process: collect, store, analyze A scheme #bigdata Damian Trilling
  • 45. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl #bigdata Damian Trilling
  • 46. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. #bigdata Damian Trilling
  • 47. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. rsshond Calls the RSS-feeds of news sites 1x/hour, saves title, time, header, and teaser of all new articles into a CSV-table, follows the link to the full text and downloads them. #bigdata Damian Trilling
  • 48. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation datacollection.followthenews-uva.cloudlet.sara.nl yourTwapperkeeper Continuosly calls the Twitter-API and saves all tweets containing specific hashtags to a mySQL-database. rsshond Calls the RSS-feeds of news sites 1x/hour, saves title, time, header, and teaser of all new articles into a CSV-table, follows the link to the full text and downloads them. snapshot Visits some URLs every 4x/day and downloads them. #bigdata Damian Trilling
  • 49. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? #bigdata Damian Trilling
  • 50. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. #bigdata Damian Trilling
  • 51. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. SSH (scp) Transfer data directly to your computer or another server (like speeltuin.followthenews-uva.cloudlet.sara.nl) #bigdata Damian Trilling
  • 52. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Our implementation How to access the collected data? Apache-webserver Download the data from http://datacollection. followthenews-uva.cloudlet.sara.nl. SSH (scp) Transfer data directly to your computer or another server (like speeltuin.followthenews-uva.cloudlet.sara.nl) Beehub Connect the server to beehub, which can be mounted like the "p-schijf" or accessed online. #bigdata Damian Trilling
  • 53. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Python #bigdata Damian Trilling
  • 54. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? #bigdata Damian Trilling
  • 55. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? Of course there are ready-made tool for some of the questions we want to answer. But for many, there isn’t. Python offers us the possibility to build exactly the tool we need. #bigdata Damian Trilling
  • 56. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is One tool to rule them all? Of course there are ready-made tool for some of the questions we want to answer. But for many, there isn’t. Python offers us the possibility to build exactly the tool we need. fun! #bigdata And it’s Damian Trilling
  • 57. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform #bigdata Damian Trilling
  • 58. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform • And yet it is easy to learn! #bigdata Damian Trilling
  • 59. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is What is Python? It is a programming language • It is flexible. You can use it for (in principle) any kind of data • There are virtually no limits regarding the amount of data to process • You can run it on every platform • And yet it is easy to learn! It is widely used for content analysis • Many online ressources and toolkits • Books about NLP and Web Scraping with Python #bigdata Damian Trilling
  • 60. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. #bigdata Damian Trilling
  • 61. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. #bigdata Damian Trilling
  • 62. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. (But if you have ever had contact with whatever programming language, it helps.) #bigdata Damian Trilling
  • 63. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You do not have to become a programmer. If you know how to write SPSS or STATA syntax, you will understand Python. (But if you have ever had contact with whatever programming language, It’s enough if you can read and modify the code. it helps.) #bigdata Damian Trilling
  • 64. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? #bigdata Damian Trilling
  • 65. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 #bigdata The data structure: You have a folder with articles Damian Trilling
  • 66. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 2 #bigdata The data structure: You have a folder with articles The desired output: You want a table with the file names and a column per actor, counting how often they are mentioned Damian Trilling
  • 67. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is Think of the following task RQ: What are the differences in terms of actors mentioned between Israeli and Palestinian news coverage? 1 2 The desired output: You want a table with the file names and a column per actor, counting how often they are mentioned 3 #bigdata The data structure: You have a folder with articles A typical task for a short Python script! Damian Trilling
  • 68. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is You need someting like this: for every file in folder: read the file count actors add new row to table with filename and actor counts save table (such a notation is called pseudo-code) #bigdata Damian Trilling
  • 69. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is mypath ="C:UsersRicardaDocumentsArtikelen" regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’) filename_list=[] matchcount54=0 matchcount54_list=[] onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ] for f in onlyfiles: matchcount54=0 artikel=open(join(mypath,f),"r") for line in artikel: matches54 = regex54.findall(line) for word in matches54: matchcount54=matchcount54+1 filename_list.append(f) matchcount54_list.append(matchcount54) artikel.close() output=zip(filename_list,matchcount54_list) writer = csv.writer(open("overzichtstabel.csv", ’wb’)) writer.writerows(output) #bigdata Damian Trilling
  • 70. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? What it is This is not too different from a script Jelle uses for his dissertation. The main difference: He doesn’t code regular expressions, but calculates document similarity. slides-jelle.pdf #bigdata Damian Trilling
  • 71. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it When to use Python #bigdata Damian Trilling
  • 72. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 1st group of tasks Highly repetitive tasks Simple tasks (counting things, comparing texts, . . . ) that can be described in a formalized way. Saves time even with few cases, but there is virtually no size limit. Example: Retweets start with RT, optionally followed by a space, and some letters. So it is very easy to identify them automatically #bigdata Damian Trilling
  • 73. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 2nd group of tasks Task for which specific Python modules exist There are thousands of modules suitable for text analysis. You basically only have to write code for data input and output. Example: Sentiment analysis #bigdata Damian Trilling
  • 74. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When to use it 3rd group of tasks API’s, RSS, webscraping . . . You can use Python if you want to collect and store information. Example: Collecting bio’s of Twitter users, scraping the web (data journalism!), downloading Facebook data #bigdata Damian Trilling
  • 75. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it When not to use Python #bigdata Damian Trilling
  • 76. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Maybe you do not need to write a Python script . . . . . . when there are already suitable tools available. Sometimes, the perfect ready-made tool already exists. Example: Axel Bruns’ awk-scripts for Twitter analysis (www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in Python, but hey, he did it already with awk and it works. #bigdata Damian Trilling
  • 77. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Maybe you do not need to write a Python script . . . . . . when there are already suitable tools available. Sometimes, the perfect ready-made tool already exists. But still, sometimes it is more efficient to write something that does exactly what you want Example: Axel Bruns’ awk-scripts for Twitter analysis (www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it in Python, but hey, he did it already with awk and it works. #bigdata Damian Trilling
  • 78. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it And, let’s face it,. . . . . . we are no programmers. So maybe, some tasks are too complex for us to program ourselves. #bigdata Damian Trilling
  • 79. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it And, let’s face it,. . . . . . we are no programmers. So maybe, some tasks are too complex for us to program ourselves. But there is a huge online community that helps you. #bigdata Damian Trilling
  • 80. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it Recap 1 Big Data? What are we talking about? Exploring the field Some examples 2 The process: collect, store, analyze A scheme Our implementation 3 Python What it is When to use it When not to use it 4 Questions? #bigdata Damian Trilling
  • 81. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? When not to use it After the break Hand’s on! Exploring a basic Python script #bigdata Damian Trilling
  • 82. Big Data? What are we talking about? The process: collect, store, analyze Python Questions? Vragen of opmerkingen? Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net #bigdata Damian Trilling