SlideShare a Scribd company logo
Exploring Open Data with BigQuery
Jenny Tong
Developer Advocate
Google Cloud Platform
@MimmingCodes
Agenda
● Origin story
● Count stuff
● How it works
● Some cool open data
● Do something useful
Google Research Publications
Google Research Publications
Managed Cloud Versions
Bigtable
Flume
Dremel
Bigtable
Dataflow
BigQuery
Google BigQueryGoogle BigQuery
Let's count some stuff
SELECT count(word)
FROM publicdata:samples.shakespeare
Words in Shakespeare
SELECT sum(requests) as total
FROM [fh-bigquery:wikipedia.pagecounts_20150511_05]
Wikipedia hits over 1 hour
SELECT sum(requests) as total
FROM [fh-bigquery:wikipedia.pagecounts_201505]
Wikipedia hits over 1 month
Several years of Wikipedia data
SELECT sum(requests) as total
FROM
[fh-bigquery:wikipedia.pagecounts_201105],
[fh-bigquery:wikipedia.pagecounts_201106],
[fh-bigquery:wikipedia.pagecounts_201107],
...
SELECT
SUM(requests) AS total
FROM
TABLE_QUERY(
[fh-bigquery:wikipedia],
'REGEXP_MATCH(
table_id,
r"pagecounts_2015[0-9]{2}$")')
Several years of Wikipedia data
How about a RegExp
SELECT
SUM(requests) AS total
FROM
TABLE_QUERY(
[fh-bigquery:wikipedia],
'REGEXP_MATCH(
table_id,
r"pagecounts_2015[0-9]{2}$")')
WHERE
(REGEXP_MATCH(title, '.*[dD]inosaur.*'))
How did it do that?
o_O
Qualities of a good RDBMS
Qualities of a good RDBMS
● Inserts & locking
● Indexing
● Cache
● Query planning
Qualities of a good RDBMS
● Inserts & locking
● Indexing
● Cache
● Query planning
Exploring Open Date with BigQuery: Jenny Tong
Exploring Open Date with BigQuery: Jenny Tong
Exploring Open Date with BigQuery: Jenny Tong
Storing data
-- -- -- --
-- -- -- --
-- -- -- --
Table
Columns
Disks
Reading data: Life of a BigQuery
SELECT sum(requests) as sum
FROM (
SELECT requests, title
FROM [fh-bigquery:wikipedia.
pagecounts_201501]
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
)
Life of a BigQuery
L L
MMixer
Leaf
Storage
L L L L
M M
M
Life of a BigQuery
Root Mixer
Mixer
Leaf
Storage
Life of a BigQuery
Query
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
SELECT requests, title
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT requests, title
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT sum(requests)
5.8 Mil
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
SELECT requests, title
Life of a BigQueryLife of a BigQuery
L L L L
M M
MRoot Mixer
Mixer
Leaf
Storage
5.4 Bil
SELECT sum(requests)
5.8 Mil
WHERE
(REGEXP_MATCH(title, '[Jj]en.+'))
SELECT requests, title
SELECT sum(requests)
Open Data
Finding Open Data
opendata.stackexchange.com
Finding Open Data
reddit.com/r/dataisbeautiful
Time to explore
GSOD
Weather in Half Moon Bay
SELECT DATE(year+mo+da) day, min, max
FROM [fh-bigquery:weather_gsod.gsod2013]
WHERE stn IN (
SELECT usaf FROM [fh-bigquery:weather_gsod.stations]
WHERE name = 'HALF MOON BAY AIRPOR')
AND max < 200
ORDER BY day;
Weather in Half Moon Bay
SELECT DATE(year+mo+da) day, min, max
FROM [fh-bigquery:weather_gsod.gsod2013]
WHERE stn IN (
SELECT usaf FROM [fh-bigquery:weather_gsod.stations]
WHERE name = 'HALF MOON BAY AIRPOR')
AND max < 200
ORDER BY day;
Global high temperatures
SELECT year, max(max) as max
FROM
TABLE_QUERY(
[fh-bigquery:weather_gsod],
'table_id CONTAINS "gsod"')
where max < 200
group by year order by year asc
GDELT
Stories per month - Massachusetts
SELECT DATE(STRING(MonthYear) + '01') month,
SUM(ActionGeo_ADM1Code='USMA') US
FROM [gdelt-bq:full.events]
WHERE MonthYear > 0
GROUP BY 1 ORDER BY 1
SELECT DATE(STRING(MonthYear) + '01') month,
SUM(ActionGeo_ADM1Code='USMA') / COUNT(*) newsyness
FROM [gdelt-bq:full.events]
WHERE MonthYear > 0
GROUP BY 1 ORDER BY 1
Stories per month, normalized
https://developers.google.com/genomics/
Genomics
Exploring Open Date with BigQuery: Jenny Tong
Genomics
SELECT Sample, SUM(single), SUM(double),
FROM (
SELECT call.call_set_name AS Sample,
SOME(call.genotype > 0) AND NOT EVERY(call.
genotype > 0) WITHIN call AS single,
EVERY(call.genotype > 0) WITHIN call AS double,
FROM[genomics-public-data:1000_genomes.variants]
OMIT RECORD IF reference_name IN ("X","Y","MT"))
GROUP BY Sample ORDER BY Sample
Genomics
SELECT Sample, SUM(single), SUM(double),
FROM (
SELECT call.call_set_name AS Sample,
SOME(call.genotype > 0) AND NOT EVERY(call.
genotype > 0) WITHIN call AS single,
EVERY(call.genotype > 0) WITHIN call AS double,
FROM[genomics-public-data:1000_genomes.variants]
OMIT RECORD IF reference_name IN ("X","Y","MT"))
GROUP BY Sample ORDER BY Sample
Something useful:
Use Wikipedia data to pick a movie
1. Wikipedia edits
2. ???
3. Movie recommendation
Follow the edits
Same
editor
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where
title contains 'Hackers'
and title contains '(film)'
and wp_namespace = 0
group by title, id
order by edits
limit 10
Pick a great movie
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where contributor_id in (
select contributor_id
from [publicdata:samples.wikipedia]
where
id=264176
and contributor_id is not null
and is_bot is null
and wp_namespace = 0
and title CONTAINS '(film)'
group by contributor_id)
and wp_namespace = 0
and id != 264176
and title CONTAINS '(film)'
group each by title, id
order by edits desc
limit 100
Find edits in common
Discover the most broadly popular films
select id from (
select id, count(id) as edits
from [publicdata:samples.wikipedia]
where
wp_namespace = 0
and title CONTAINS '(film)'
group each by id
order by edits desc
limit 20)
Edits in common, minus broadly popular
select title, id, count(id) as edits
from [publicdata:samples.wikipedia]
where contributor_id in (
select contributor_id
from [publicdata:samples.wikipedia]
where
id=264176
and contributor_id is not null
and is_bot is null
and wp_namespace = 0
and title CONTAINS '(film)'
group by contributor_id)
and wp_namespace = 0
and id != 264176
and title CONTAINS '(film)'
and id not in (
select id from (
select id, count(id) as edits
from [publicdata:samples.
wikipedia]
where
wp_namespace = 0
and title CONTAINS '(film)'
group each by id
order by edits desc
limit 20
)
)
group each by title, id
order by edits desc
limit 100
What we talked about
● Origin story
● Count stuff
● How it works
● Some cool open data
● Practical applications
● Try BigQuery
○ bigquery.cloud.google.com
● Queries we ran
○ github.com/mimming/snippets
● Me
○ @MimmingCodes
○ google.com/+mimming
The end
Exploring Open Date with BigQuery: Jenny Tong

More Related Content

ODP
A Year With MongoDB: The Tips
KEY
Python Development (MongoSF)
PPTX
Weather of the Century: Visualization
PPTX
Aggregation in MongoDB
KEY
MongoDB Aggregation Framework
PPTX
The Aggregation Framework
PPTX
MongoDB - Aggregation Pipeline
PDF
Aggregation Framework MongoDB Days Munich
A Year With MongoDB: The Tips
Python Development (MongoSF)
Weather of the Century: Visualization
Aggregation in MongoDB
MongoDB Aggregation Framework
The Aggregation Framework
MongoDB - Aggregation Pipeline
Aggregation Framework MongoDB Days Munich

What's hot (20)

PDF
Mongo indexes
PPTX
Aggregation Framework
PDF
MongoDB Aggregation Framework
PDF
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
PPTX
The Aggregation Framework
PDF
MongoDB and Python
PDF
Analytics with MongoDB Aggregation Framework and Hadoop Connector
ODP
Aggregation Framework in MongoDB Overview Part-1
PDF
Python and MongoDB
PDF
hySON - D2Fest
PDF
hySON
PDF
PyCon Russian 2015 - Dive into full text search with python.
PPTX
GraphDB
PDF
Data Processing and Aggregation with MongoDB
PPTX
MongoDB World 2016 : Advanced Aggregation
PDF
Using MongoDB and Python
PDF
What is the best full text search engine for Python?
PDF
San Francisco Java User Group
PPT
Profile of NPOESS HDF5 Files
PDF
Building social network with Neo4j and Python
Mongo indexes
Aggregation Framework
MongoDB Aggregation Framework
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
The Aggregation Framework
MongoDB and Python
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Aggregation Framework in MongoDB Overview Part-1
Python and MongoDB
hySON - D2Fest
hySON
PyCon Russian 2015 - Dive into full text search with python.
GraphDB
Data Processing and Aggregation with MongoDB
MongoDB World 2016 : Advanced Aggregation
Using MongoDB and Python
What is the best full text search engine for Python?
San Francisco Java User Group
Profile of NPOESS HDF5 Files
Building social network with Neo4j and Python
Ad

Viewers also liked (20)

KEY
Access to iDevices
PDF
Social business online information 201112
PPTX
Put the romance back into rome
PDF
How to set your ADI business profile
PDF
Get to know Holiday Extras 2011
PDF
Break away old
PDF
How to get started with Roadio in under 60 seconds
PDF
SMX 2010 Summary of Hot Topics from SEO Track
PPT
Hotleads:upsell
PDF
Presentation Hassle Free Anna
PDF
How to manage your payments
PDF
Static Sites Can be the Solution (Simon Wood)
PDF
Cinematic UX, Brad Weaver
ODP
Polyglot polywhat polywhy
PPT
Online Presence
PDF
Design+Startup 2013
PDF
BreakAway
ODP
Apache Cordova, Hybrid Application Development
PDF
Osservatorio congressuale Torino 2014 2015
PDF
Surviving the enterprise storm - @RianVDM
Access to iDevices
Social business online information 201112
Put the romance back into rome
How to set your ADI business profile
Get to know Holiday Extras 2011
Break away old
How to get started with Roadio in under 60 seconds
SMX 2010 Summary of Hot Topics from SEO Track
Hotleads:upsell
Presentation Hassle Free Anna
How to manage your payments
Static Sites Can be the Solution (Simon Wood)
Cinematic UX, Brad Weaver
Polyglot polywhat polywhy
Online Presence
Design+Startup 2013
BreakAway
Apache Cordova, Hybrid Application Development
Osservatorio congressuale Torino 2014 2015
Surviving the enterprise storm - @RianVDM
Ad

Similar to Exploring Open Date with BigQuery: Jenny Tong (20)

PDF
CloudML talk at DevFest Madurai 2016
PDF
TDC2016SP - Trilha BigData
PPTX
Jeff Jacob MSBI Training portfolio
PDF
Avro, la puissance du binaire, la souplesse du JSON
PPTX
MongoDB World 2018: Keynote
PDF
Term 2 CS Practical File 2021-22.pdf
PDF
Apache Calcite Tutorial - BOSS 21
PDF
Session 1.5 supporting virtual integration of linked data with just-in-time...
PDF
Mashing Up The Guardian
ODP
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
PDF
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
PDF
Mashing Up The Guardian
PDF
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
PDF
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
PPT
Hadoop institutes in hyderabad
PPTX
Interactive Analytics at Scale in Apache Hive Using Druid
KEY
When Relational Isn't Enough: Neo4j at Squidoo
PDF
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
PDF
Where is the World is my Open Government Data?
CloudML talk at DevFest Madurai 2016
TDC2016SP - Trilha BigData
Jeff Jacob MSBI Training portfolio
Avro, la puissance du binaire, la souplesse du JSON
MongoDB World 2018: Keynote
Term 2 CS Practical File 2021-22.pdf
Apache Calcite Tutorial - BOSS 21
Session 1.5 supporting virtual integration of linked data with just-in-time...
Mashing Up The Guardian
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Litestack talk at Brighton 2024 (Unleashing the power of SQLite for Ruby apps)
Mashing Up The Guardian
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Hadoop institutes in hyderabad
Interactive Analytics at Scale in Apache Hive Using Druid
When Relational Isn't Enough: Neo4j at Squidoo
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
Where is the World is my Open Government Data?

More from Future Insights (20)

PDF
The Human Body in the IoT. Tim Cannon + Ryan O'Shea
PDF
Pretty pictures - Brandon Satrom
PDF
Putting real time into practice - Saul Diez-Guerra
PDF
A Universal Theory of Everything, Christopher Murphy
PDF
Horizon Interactive Awards, Mike Sauce & Jeff Jahn
PDF
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
PDF
Front End Development Transformation at Scale, Damon Deaner
PDF
Structuring Data from Unstructured Things. Sean Lorenz
PDF
The Future is Modular, Jonathan Snook
PDF
Designing an Enterprise CSS Framework is Hard, Stephanie Rewis
PDF
Accessibility Is More Than What Lies In The Code, Jennison Asuncion
PDF
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
PDF
Designing for Dyslexia, Andrew Zusman
PDF
Beyond Measure, Erika Hall
PDF
Real Artists Ship, Haraldur Thorleifsson
PDF
Ok Computer. Peter Gasston
PDF
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
PPTX
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
PDF
The Wordpress Game Changer. Jenny Wong
PDF
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...
The Human Body in the IoT. Tim Cannon + Ryan O'Shea
Pretty pictures - Brandon Satrom
Putting real time into practice - Saul Diez-Guerra
A Universal Theory of Everything, Christopher Murphy
Horizon Interactive Awards, Mike Sauce & Jeff Jahn
Reading Your Users’ Minds: Empiricism, Design, and Human Behavior, Shane F. B...
Front End Development Transformation at Scale, Damon Deaner
Structuring Data from Unstructured Things. Sean Lorenz
The Future is Modular, Jonathan Snook
Designing an Enterprise CSS Framework is Hard, Stephanie Rewis
Accessibility Is More Than What Lies In The Code, Jennison Asuncion
Sunny with a Chance of Innovation: A How-To for Product Managers and Designer...
Designing for Dyslexia, Andrew Zusman
Beyond Measure, Erika Hall
Real Artists Ship, Haraldur Thorleifsson
Ok Computer. Peter Gasston
Digital Manuscripts Toolkit, using IIIF and JavaScript. Monica Messaggi Kaya
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
The Wordpress Game Changer. Jenny Wong
A behind the-scenes look at cross-browser testing with web driver, Adrian Bat...

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Chapter 3 Spatial Domain Image Processing.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf

Exploring Open Date with BigQuery: Jenny Tong

  • 1. Exploring Open Data with BigQuery
  • 2. Jenny Tong Developer Advocate Google Cloud Platform @MimmingCodes
  • 3. Agenda ● Origin story ● Count stuff ● How it works ● Some cool open data ● Do something useful
  • 10. SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_20150511_05] Wikipedia hits over 1 hour
  • 11. SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_201505] Wikipedia hits over 1 month
  • 12. Several years of Wikipedia data SELECT sum(requests) as total FROM [fh-bigquery:wikipedia.pagecounts_201105], [fh-bigquery:wikipedia.pagecounts_201106], [fh-bigquery:wikipedia.pagecounts_201107], ...
  • 14. How about a RegExp SELECT SUM(requests) AS total FROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")') WHERE (REGEXP_MATCH(title, '.*[dD]inosaur.*'))
  • 15. How did it do that? o_O
  • 16. Qualities of a good RDBMS
  • 17. Qualities of a good RDBMS ● Inserts & locking ● Indexing ● Cache ● Query planning
  • 18. Qualities of a good RDBMS ● Inserts & locking ● Indexing ● Cache ● Query planning
  • 22. Storing data -- -- -- -- -- -- -- -- -- -- -- -- Table Columns Disks
  • 23. Reading data: Life of a BigQuery SELECT sum(requests) as sum FROM ( SELECT requests, title FROM [fh-bigquery:wikipedia. pagecounts_201501] WHERE (REGEXP_MATCH(title, '[Jj]en.+')) )
  • 24. Life of a BigQuery L L MMixer Leaf Storage
  • 25. L L L L M M M Life of a BigQuery Root Mixer Mixer Leaf Storage
  • 26. Life of a BigQuery Query L L L L M M MRoot Mixer Mixer Leaf Storage
  • 27. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage SELECT requests, title
  • 28. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT requests, title WHERE (REGEXP_MATCH(title, '[Jj]en.+'))
  • 29. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT sum(requests) 5.8 Mil WHERE (REGEXP_MATCH(title, '[Jj]en.+')) SELECT requests, title
  • 30. Life of a BigQueryLife of a BigQuery L L L L M M MRoot Mixer Mixer Leaf Storage 5.4 Bil SELECT sum(requests) 5.8 Mil WHERE (REGEXP_MATCH(title, '[Jj]en.+')) SELECT requests, title SELECT sum(requests)
  • 35. GSOD
  • 36. Weather in Half Moon Bay SELECT DATE(year+mo+da) day, min, max FROM [fh-bigquery:weather_gsod.gsod2013] WHERE stn IN ( SELECT usaf FROM [fh-bigquery:weather_gsod.stations] WHERE name = 'HALF MOON BAY AIRPOR') AND max < 200 ORDER BY day;
  • 37. Weather in Half Moon Bay SELECT DATE(year+mo+da) day, min, max FROM [fh-bigquery:weather_gsod.gsod2013] WHERE stn IN ( SELECT usaf FROM [fh-bigquery:weather_gsod.stations] WHERE name = 'HALF MOON BAY AIRPOR') AND max < 200 ORDER BY day;
  • 38. Global high temperatures SELECT year, max(max) as max FROM TABLE_QUERY( [fh-bigquery:weather_gsod], 'table_id CONTAINS "gsod"') where max < 200 group by year order by year asc
  • 39. GDELT
  • 40. Stories per month - Massachusetts SELECT DATE(STRING(MonthYear) + '01') month, SUM(ActionGeo_ADM1Code='USMA') US FROM [gdelt-bq:full.events] WHERE MonthYear > 0 GROUP BY 1 ORDER BY 1
  • 41. SELECT DATE(STRING(MonthYear) + '01') month, SUM(ActionGeo_ADM1Code='USMA') / COUNT(*) newsyness FROM [gdelt-bq:full.events] WHERE MonthYear > 0 GROUP BY 1 ORDER BY 1 Stories per month, normalized
  • 44. Genomics SELECT Sample, SUM(single), SUM(double), FROM ( SELECT call.call_set_name AS Sample, SOME(call.genotype > 0) AND NOT EVERY(call. genotype > 0) WITHIN call AS single, EVERY(call.genotype > 0) WITHIN call AS double, FROM[genomics-public-data:1000_genomes.variants] OMIT RECORD IF reference_name IN ("X","Y","MT")) GROUP BY Sample ORDER BY Sample
  • 45. Genomics SELECT Sample, SUM(single), SUM(double), FROM ( SELECT call.call_set_name AS Sample, SOME(call.genotype > 0) AND NOT EVERY(call. genotype > 0) WITHIN call AS single, EVERY(call.genotype > 0) WITHIN call AS double, FROM[genomics-public-data:1000_genomes.variants] OMIT RECORD IF reference_name IN ("X","Y","MT")) GROUP BY Sample ORDER BY Sample
  • 46. Something useful: Use Wikipedia data to pick a movie
  • 47. 1. Wikipedia edits 2. ??? 3. Movie recommendation
  • 49. select title, id, count(id) as edits from [publicdata:samples.wikipedia] where title contains 'Hackers' and title contains '(film)' and wp_namespace = 0 group by title, id order by edits limit 10 Pick a great movie
  • 50. select title, id, count(id) as edits from [publicdata:samples.wikipedia] where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)' group each by title, id order by edits desc limit 100 Find edits in common
  • 51. Discover the most broadly popular films select id from ( select id, count(id) as edits from [publicdata:samples.wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20)
  • 52. Edits in common, minus broadly popular select title, id, count(id) as edits from [publicdata:samples.wikipedia] where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)' and id not in ( select id from ( select id, count(id) as edits from [publicdata:samples. wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20 ) ) group each by title, id order by edits desc limit 100
  • 53. What we talked about ● Origin story ● Count stuff ● How it works ● Some cool open data ● Practical applications
  • 54. ● Try BigQuery ○ bigquery.cloud.google.com ● Queries we ran ○ github.com/mimming/snippets ● Me ○ @MimmingCodes ○ google.com/+mimming The end