SlideShare a Scribd company logo
@supercoco9#devoxxBigQuery
Big Data with Google BigQuery
Javier Ramirez
@supercoco9
https://teowaki.com
@supercoco9#DevoxxBigquery
Managing Big Data with BigQuery
Javier Ramirez
•Writing software since 1996
•Web dev. since 1999 (C++, JAVA, PHP, Ruby, JS...)
•Founder of https://teowaki.com
•Google Developer Expert on the Cloud Platform
@YourTwitterHandle@supercoco9#DevoxxBigquery
B
IG
B
IGD
ATA
D
ATA
@YourTwitterHandle@supercoco9#DevoxxBigquery
B
IG
B
IGS
ER
V
ER
S
S
ER
V
ER
S
@YourTwitterHandle@supercoco9#DevoxxBigquery
B
IG
B
IGD
EV
O
P
S
D
EV
O
P
S
@YourTwitterHandle@supercoco9#DevoxxBigquery
B
IG
B
IGM
O
N
EY
M
O
N
EY
bigdata is cool but...
hard to set up and monitor
expensive cluster
not interactive enough
@supercoco9#DevoxxBigquery
bigdata is doing a fullscan to 330MM
rows, matching them against a
regexp, and getting the result
(223MM rows) in just 5 seconds
Google BigQuery
Data analysis as a service
http://developers.google.com/bigquery
Based on “Dremel”
Specifically designed for
interactive queries over
petabytes of real-time data
@supercoco9#DevoxxBigquery
Your only worries
•Load data
•Query the dataset
loading data.
You just send the data
in
text (or JSON) format
up to 100K
inserts per second
in stream mode
It's just SQL
select name from USERS order by date;
select count(*) from users;
select max(date) from USERS;
select sum(total) from ORDERS group by user;
@supercoco9#DevoxxBigquery
Subselect and joins out of the box
SELECT Year, Actor1Name, Actor2Name, Count FROM (
SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY
Count DESC) rank
FROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name
and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),
(SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE
Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and
Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT null
AND Actor2Name IS NOT null
GROUP EACH BY 1, 2, 3
HAVING Count > 100
)
WHERE rank=1
ORDER BY Year
http://gdeltproject.org/data.html#googlebigquery
@supercoco9#DevoxxBigquery
specific extensions for analytics
within
flatten
nest
stddev
top
first
last
nth
variance
var_pop
var_samp
covar_pop
covar_samp
quantiles
correlations
Things you always wanted to try but were too
scared to
select count(*) from
publicdata:samples.wikipedia
where REGEXP_MATCH(title, "[0-9]*")
AND wp_namespace = 0;
223,163,387 Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
columnar
storage
https://cookbook.experiencesaphana.com/crm/what-is-crm-on-hana/technology-innovation/row-vs-column-based/
highly distributed
execution using a tree
web console screenshot
@supercoco9#DevoxxBigquery
country segmented traffic
@supercoco9#DevoxxBigqueryjavier ramirez @supercoco9 https://teowaki.com
window functions
@supercoco9#DevoxxBigquery
our most active user
@supercoco9#DevoxxBigquery
Worldwide events in the last 36 years
SELECT Year, Actor1Name, Actor2Name, Count FROM (
SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY
Count DESC) rank
FROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name
and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),
(SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE
Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and
Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT null
AND Actor2Name IS NOT null
GROUP EACH BY 1, 2, 3
HAVING Count > 100
)
WHERE rank=1
ORDER BY Year
http://gdeltproject.org/data.html#googlebigquery
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
SELECT repository_name, repository_language, repository_description,
COUNT(repository_name) as cnt,
repository_url
FROM github.timeline
WHERE type="WatchEvent"
AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")
AND repository_url IN (
SELECT repository_url
FROM github.timeline
WHERE type="CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday}
20:00:00')
AND repository_fork = "false"
AND payload_ref_type = "repository"
GROUP BY repository_url
)
GROUP BY repository_name, repository_language, repository_description, repository_url
HAVING cnt >= 5
ORDER BY cnt DESC
LIMIT 25
@supercoco9#DevoxxBigquery
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
@supercoco9#DevoxxBigquery
Automation with Apps Script
●
Read from BigQuery
●
Create a spreadsheet on Drive
●
E-mail it everyday as a PDF
https://developers.google.com/apps-script/
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki
@supercoco9#DevoxxBigquery
bigquery pricing
$26 per stored TB
1000000 rows => $0.00416 / month
£0.00243 / month
$5 per processed TB
1 full scan = 160 MB
1 count = 0 MB
1 full scan over 1 column = 5.4 MB
100 GB => $0.05 / month £0.03
AppsScripts is for free
@supercoco9#DevoxxBigquery
£0.054307 / month*
per 1MM rows
*the 1st
1TB every month is free of charge
**assumming your rows have web server logs-like info
price per month
@supercoco9#DevoxxBigquery
ig
@YourTwitterHandle#DVXFR14{session hashtag} @supercoco9#devoxxBigquery
TH
A
N
K
S
!
Javier Ramirez
@supercoco9
https://teowaki.com
Related links at:
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
@supercoco9#DevoxxBigquery
Thanks / Creative Commons
•Presentation Template — Guillaume LaForge
•The Queen — A prestigious heritage with some
inspiration from The Sex Pistols and funny Devoxxians
•Girl with a Balloon — Banksy
•Tube — Michael Keen

More Related Content

ODP
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
PPTX
How to automate all your SEO projects
PDF
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
PDF
Analyse your SEO Data with R and Kibana
PDF
Complex realtime event analytics using BigQuery @Crunch Warmup
PDF
API analytics with Redis and Google Bigquery. NoSQL matters edition
PDF
MongoDB & Hadoop, Sittin' in a Tree
PPTX
Watch Your Log!
Big Data analytics with Nginx, Logstash, Redis, Google Bigquery and Neo4j, ja...
How to automate all your SEO projects
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
Analyse your SEO Data with R and Kibana
Complex realtime event analytics using BigQuery @Crunch Warmup
API analytics with Redis and Google Bigquery. NoSQL matters edition
MongoDB & Hadoop, Sittin' in a Tree
Watch Your Log!

What's hot (20)

PDF
Big Data made easy with a Spark
PDF
WordPress RESTful API & Amazon API Gateway - WordCamp Kansai 2016
PPTX
Bigdata : Big picture
ODP
Cool bonsai cool - an introduction to ElasticSearch
PPTX
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
PDF
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
PDF
PK chunking presentation from Tahoe Dreamin' 2016
PPTX
AI與大數據數據處理 Spark實戰(20171216)
PPTX
Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...
PDF
Buildingsocialanalyticstoolwithmongodb
PPTX
Elasticsearch Distributed search & analytics on BigData made easy
PDF
Forcelandia 2016 PK Chunking
PDF
Practical Elasticsearch - real world use cases
PPTX
My First Cluster with MongoDB Atlas
PDF
AWS re:Invent 특집 세미나 - (3) 인공지능/IoT 분야 신규 서비스 요약 :: 윤석찬 (AWS 테크에반젤리스트)
PDF
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
PPTX
How a Hacker Sees Your Site
PDF
dataviz on d3.js + elasticsearch
PDF
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
PPTX
Splunk bsides
Big Data made easy with a Spark
WordPress RESTful API & Amazon API Gateway - WordCamp Kansai 2016
Bigdata : Big picture
Cool bonsai cool - an introduction to ElasticSearch
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
PK chunking presentation from Tahoe Dreamin' 2016
AI與大數據數據處理 Spark實戰(20171216)
Hadoop Summit 2011 - Using a Hadoop Data Pipeline to Build a Graph of Users a...
Buildingsocialanalyticstoolwithmongodb
Elasticsearch Distributed search & analytics on BigData made easy
Forcelandia 2016 PK Chunking
Practical Elasticsearch - real world use cases
My First Cluster with MongoDB Atlas
AWS re:Invent 특집 세미나 - (3) 인공지능/IoT 분야 신규 서비스 요약 :: 윤석찬 (AWS 테크에반젤리스트)
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
How a Hacker Sees Your Site
dataviz on d3.js + elasticsearch
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Splunk bsides
Ad

Similar to Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki (20)

PDF
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
PDF
Big Data Analytics with Google BigQuery. GDG Summit Spain 2014
ODP
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
PDF
Economies of Scaling Software
PDF
How to not blow up spaceships
PDF
What we can learn from hackers (about the definition of work)
PDF
Cruft busting technical debt code smell and refactoring for seo - state of ...
PDF
Troubleshooting & Debugging Production Microservices in Kubernetes as present...
PDF
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
PDF
Data driven devops as presented at QCon London 2018
PPTX
How to implement Schemas using schema.org on your website >> SMX London 2015
PPTX
2016 Local SEO Ranking Factors
PDF
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...
PDF
Open Hack London - Introduction to YQL
PDF
Data Driven DevOps
PDF
From DevTestOops to DevTestOps
PDF
Riot on the web - Kenote @ QCon Sao Paulo 2014
PPTX
Datomic
PPT
Application Modeling with Graph Databases
PDF
Migrating the Frontend Stack from Python to React @ Yelp
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
Big Data Analytics with Google BigQuery. GDG Summit Spain 2014
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Economies of Scaling Software
How to not blow up spaceships
What we can learn from hackers (about the definition of work)
Cruft busting technical debt code smell and refactoring for seo - state of ...
Troubleshooting & Debugging Production Microservices in Kubernetes as present...
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
Data driven devops as presented at QCon London 2018
How to implement Schemas using schema.org on your website >> SMX London 2015
2016 Local SEO Ranking Factors
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...
Open Hack London - Introduction to YQL
Data Driven DevOps
From DevTestOops to DevTestOps
Riot on the web - Kenote @ QCon Sao Paulo 2014
Datomic
Application Modeling with Graph Databases
Migrating the Frontend Stack from Python to React @ Yelp
Ad

More from javier ramirez (20)

PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
PDF
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
PDF
How We Added Replication to QuestDB - JonTheBeach
PDF
The Building Blocks of QuestDB, a Time Series Database
PDF
¿Se puede vivir del open source? T3chfest
PDF
QuestDB: The building blocks of a fast open-source time-series database
PDF
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
PDF
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
PDF
Deduplicating and analysing time-series data with Apache Beam and QuestDB
PDF
Your Database Cannot Do this (well)
PDF
Your Timestamps Deserve Better than a Generic Database
PDF
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
PDF
QuestDB-Community-Call-20220728
PDF
Processing and analysing streaming data with Python. Pycon Italy 2022
PDF
QuestDB: ingesting a million time series per second on a single instance. Big...
PDF
Servicios e infraestructura de AWS y la próxima región en Aragón
PPTX
Primeros pasos en desarrollo serverless
PDF
How AWS is reinventing the cloud
PDF
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
PDF
Getting started with streaming analytics
The Future of Fast Databases: Lessons from a Decade of QuestDB
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
How We Added Replication to QuestDB - JonTheBeach
The Building Blocks of QuestDB, a Time Series Database
¿Se puede vivir del open source? T3chfest
QuestDB: The building blocks of a fast open-source time-series database
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Your Database Cannot Do this (well)
Your Timestamps Deserve Better than a Generic Database
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
QuestDB-Community-Call-20220728
Processing and analysing streaming data with Python. Pycon Italy 2022
QuestDB: ingesting a million time series per second on a single instance. Big...
Servicios e infraestructura de AWS y la próxima región en Aragón
Primeros pasos en desarrollo serverless
How AWS is reinventing the cloud
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Getting started with streaming analytics

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
L1 - Introduction to python Backend.pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Digital Strategies for Manufacturing Companies
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Nekopoi APK 2025 free lastest update
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
history of c programming in notes for students .pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
L1 - Introduction to python Backend.pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Odoo POS Development Services by CandidRoot Solutions
Digital Strategies for Manufacturing Companies
How to Choose the Right IT Partner for Your Business in Malaysia
Navsoft: AI-Powered Business Solutions & Custom Software Development
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Softaken Excel to vCard Converter Software.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Nekopoi APK 2025 free lastest update
ISO 45001 Occupational Health and Safety Management System
Operating system designcfffgfgggggggvggggggggg
Upgrade and Innovation Strategies for SAP ERP Customers
2025 Textile ERP Trends: SAP, Odoo & Oracle
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Design an Analysis of Algorithms II-SECS-1021-03
Which alternative to Crystal Reports is best for small or large businesses.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
history of c programming in notes for students .pptx

Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki