SlideShare a Scribd company logo
Powering Interactive Data
Analysis with Google BigQuery
Márton KODOK
@martonkodok
Software Architect @ REEA.net
Everycompany,
no matter how far from the tech they are,
isevolvingintoasoftwarecompany,
and by extension a datacompany.
Turning everything into “data” drives innovation
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
For a small company it’simportant
to have access to modernBigDatatools
withoutrunningadedicatedteam for it.
Small companies should do BigData - but how?
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
❏ Need backend/database to STORE, QUERY, EXTRACT data
❏ Deep analytics - large, multi-source, complex, unstructured
❏ Be real time
❏ Terabyte scale - Cost effective
❏ Run Ad-Hoc reports - Without Developer - interactive
❏ Minimal engineering efforts - no dedicated BigData team
❏ Simple Query language (prefered SQL / Javascript)
Making analytics accessible to more companies
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BehindtheScenes:
DaysToInsights
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean and
Aggregate
DAYS TO
INSIGHTS
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● SQL 2011 Standard + Javascript UDF (User Defined Functions)
● Familiar DB Structure (table, views, record, nested, JSON)
● Open Interfaces (REST, ODBC, Web UI, BQ command line tool)
● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *Oct 2017
What is BigQuery?
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Data Pipeline Integration at REEA.net
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load
Export
Replay
Standard
Devices
HTTPS
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
The following slides will present a sample Fluentd configuration to:
1. Transform a record
2. Copy event to multiple outputs
3. Store event data in File (for backup/log purposes)
4. Stream to BigQuery (for immediate analyses)
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
<filter frontend.user.*>
@type record_transformer
</filter>
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
</store>
<store>
@type bigquery
</store>
…
</match>
Filter plugin mutates incoming data. Add/modify/delete
event data transform attributes without a code deploy.1
2
3
4
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
The copy output plugin copies events to multiple outputs.
File(s), multiple databases, DB engines.
Great to ship same event to multiple subsystems.
The Bigquery output plugin on the fly streams the event to
the BigQuery warehouse. No need to write integration.
Data is available immediately for querying.
Whenever needed other output plugins can be wired in:
Kafka, Google Cloud Storage output plugin.
record_transformer copy file BigQuery
<filter frontend.user.*>
@type record_transformer
enable_ruby
remove_keys host
<record>
bq {"insert_id":"${uid}","host":"${host}",
"created":"${time.to_i}"}
avg ${record["total"] / record["count"]}
</record>
</filter>
syntax: Ruby, easy to use.
Great for:
- date transformation,
- quick normalizations,
- calculating something on the fly,
and store in clear log/analytics db
- renaming without code deploy.
1
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
2 3 4
record_transformer copy file BigQuery
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
<template>
path /tank/storage/${tag}.*.log
time_slice_format %Y%m%d
</template>
</store>
</match>
1
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
2 3 4
record_transformer copy file BigQuery
<match frontend.user.*>
@type bigquery
method insert
auth_method json_key
json_key /etc/td-agent/keys/key-31da042be48c.json
project project_id
dataset dataset_name
time_field timestamp
time_slice_format %Y%m%d
table user$%{time_slice}
ignore_unknown_values
schema_path /etc/td-agent/schema/user_login.json
</match>
1
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
2 3 4
Connector uses:
- JSON key auth file
- JSON table schema
Pro features:
- streaming to Partitioned tables
- ignore unknown values
(not reflected in schema)
● On data that it is difficult to process/analyze using traditional databases
● On exploring unstructured data
● Not a replacement to traditional DBs, but it compliments the system
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: Javascript for natural language processing)
● On streams (forms, Kafka, IoT streams)
● Major strength is handling Large datasets
Where to use BigQuery?
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
➢ Optimize product pages
Find, store, analyse in BQ time consuming user actions from using
25x more custom events/hits than Google Analytics
➢ Email engagement
Having stored every open/click raw data improve: subject line, layout,
follow up action emails, assistant like experience by heavy
A/B Split Tests on email marketing campaigns (interactive feedback loop)
➢ Funnel Analysis
Wrangle all the data to discover: a small improvement, an AI driven
upsell personal like experience, pre-sell products configured on the go -
not yet in catalog, but easily can be tweaked/customized
Achievements - goal reached by measuring everything
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Funnel analysis: Time on upsell pages
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Example HITS chain:
● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1
● page1 -> article2-> page3 -> orderpage2 -> ...
Attribute credit to first article visited on purchase
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
● No manual sharding
● No capacity guessing
● No idle resources
● No maintenance windows
● No manual scaling
● No file mgmt
BigQuery: Serverless Data Warehouse
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
serverless data warehouse depicted
● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
● no more throwing away-, expiring-, aggregating old data.
● run raw ad-hoc queries (either by analysts/sales or Devs)
● use Javascript in SQL to have an awesome BigData
experience wrangling “unstructured” like nerd
Our benefits
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Easily Build Custom Reports and Dashboards
Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Thank you.
Slides available on: slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity to deliver projects.

More Related Content

PDF
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
PDF
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
PDF
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
PDF
Making advanced analytics accessible to more companies
PDF
DevTalks Keynote Powering interactive data analysis with Google BigQuery
PDF
Google BigQuery for Everyday Developer
PDF
BigQuery ML - Machine learning at scale using SQL
PDF
Big query the first step - (MOSG)
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Making advanced analytics accessible to more companies
DevTalks Keynote Powering interactive data analysis with Google BigQuery
Google BigQuery for Everyday Developer
BigQuery ML - Machine learning at scale using SQL
Big query the first step - (MOSG)

What's hot (20)

PDF
Supercharge your data analytics with BigQuery
PDF
Big query
PDF
Google and big query
PDF
Google BigQuery - Features & Benefits
PDF
How Google Does Big Data - DevNexus 2014
PDF
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
PDF
Data Lineage with Apache Airflow using Marquez
PDF
BigQuery ML - Machine learning at scale using SQL
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
PDF
BigQuery for Beginners
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
PDF
Applying BigQuery ML on e-commerce data analytics
PDF
Big Query Basics
PDF
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
PDF
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
PDF
Self Service Analytics at Twitch
PPTX
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
PDF
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
PPTX
OWF 2014 - Take back control of your Web tracking - Dataiku
PPTX
Cracking the Code of Managing The Chaos Of Everyday Project Management
Supercharge your data analytics with BigQuery
Big query
Google and big query
Google BigQuery - Features & Benefits
How Google Does Big Data - DevNexus 2014
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Data Lineage with Apache Airflow using Marquez
BigQuery ML - Machine learning at scale using SQL
Connecta Event: Big Query och dataanalys med Google Cloud Platform
BigQuery for Beginners
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Applying BigQuery ML on e-commerce data analytics
Big Query Basics
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Self Service Analytics at Twitch
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
OWF 2014 - Take back control of your Web tracking - Dataiku
Cracking the Code of Managing The Chaos Of Everyday Project Management
Ad

Similar to GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery (20)

PDF
Complex realtime event analytics using BigQuery @Crunch Warmup
PDF
Google Dremel. Concept and Implementations.
PPTX
BigQuery for the Big Data win
PPTX
Google Developer Group - Cloud Singapore BigQuery Webinar
PPTX
(Almost) Serverless Analytics System with BigQuery & AppEngine
PDF
Exploring BigData with Google BigQuery
PDF
An overview of BigQuery
PDF
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
PDF
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
PDF
Executive Intro to BigQuery
PDF
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
PDF
[Webinar] Interacting with BigQuery and Working with Advanced Queries
PDF
Google BigQuery is the future of Analytics! (Google Developer Conference)
PDF
Modern Thinking área digital MSKM 21/09/2017
PPTX
Implementing google big query automation using google analytics data
PDF
Big Data Analytics with Google BigQuery. GDG Summit Spain 2014
PDF
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
PDF
BigQuery 101
PDF
2017 09-27 democratize data products with SQL
PPTX
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Complex realtime event analytics using BigQuery @Crunch Warmup
Google Dremel. Concept and Implementations.
BigQuery for the Big Data win
Google Developer Group - Cloud Singapore BigQuery Webinar
(Almost) Serverless Analytics System with BigQuery & AppEngine
Exploring BigData with Google BigQuery
An overview of BigQuery
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
Executive Intro to BigQuery
Big Data Analytics with Google BigQuery, by Javier Ramirez, datawaki, at Span...
[Webinar] Interacting with BigQuery and Working with Advanced Queries
Google BigQuery is the future of Analytics! (Google Developer Conference)
Modern Thinking área digital MSKM 21/09/2017
Implementing google big query automation using google analytics data
Big Data Analytics with Google BigQuery. GDG Summit Spain 2014
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
BigQuery 101
2017 09-27 democratize data products with SQL
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Ad

More from Márton Kodok (20)

PDF
AI Agents with Gemini 2.0 - Beyond the Chatbot
PDF
Gemini 2.0 and Vertex AI for Innovation Workshop
PDF
Function Calling with the Vertex AI Gemini API
PDF
Vector search and multimodal embeddings in BigQuery
PDF
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
PDF
Build applications with generative AI on Google Cloud
PDF
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
PDF
DevBCN Vertex AI - Pipelines for your MLOps workflows
PDF
Discover BigQuery ML, build your own CREATE MODEL statement
PDF
Cloud Run - the rise of serverless and containerization
PDF
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
PDF
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
PDF
Vertex AI: Pipelines for your MLOps workflows
PDF
Cloud Workflows What's new in serverless orchestration and automation
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
BigdataConference Europe - BigQuery ML
PDF
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
PDF
Google Cloud Platform Solutions for DevOps Engineers
AI Agents with Gemini 2.0 - Beyond the Chatbot
Gemini 2.0 and Vertex AI for Innovation Workshop
Function Calling with the Vertex AI Gemini API
Vector search and multimodal embeddings in BigQuery
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Build applications with generative AI on Google Cloud
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
DevBCN Vertex AI - Pipelines for your MLOps workflows
Discover BigQuery ML, build your own CREATE MODEL statement
Cloud Run - the rise of serverless and containerization
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI: Pipelines for your MLOps workflows
Cloud Workflows What's new in serverless orchestration and automation
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
BigdataConference Europe - BigQuery ML
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Google Cloud Platform Solutions for DevOps Engineers

Recently uploaded (20)

PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPT
Introduction Database Management System for Course Database
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
System and Network Administraation Chapter 3
PDF
AI in Product Development-omnex systems
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
L1 - Introduction to python Backend.pptx
PDF
medical staffing services at VALiNTRY
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Design an Analysis of Algorithms II-SECS-1021-03
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Introduction Database Management System for Course Database
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
System and Network Administraation Chapter 3
AI in Product Development-omnex systems
PTS Company Brochure 2025 (1).pdf.......
Upgrade and Innovation Strategies for SAP ERP Customers
Softaken Excel to vCard Converter Software.pdf
Odoo Companies in India – Driving Business Transformation.pdf
ManageIQ - Sprint 268 Review - Slide Deck
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Adobe Illustrator 28.6 Crack My Vision of Vector Design
L1 - Introduction to python Backend.pptx
medical staffing services at VALiNTRY

GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery

  • 1. Powering Interactive Data Analysis with Google BigQuery Márton KODOK @martonkodok Software Architect @ REEA.net
  • 2. Everycompany, no matter how far from the tech they are, isevolvingintoasoftwarecompany, and by extension a datacompany. Turning everything into “data” drives innovation Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 3. For a small company it’simportant to have access to modernBigDatatools withoutrunningadedicatedteam for it. Small companies should do BigData - but how? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 4. ❏ Need backend/database to STORE, QUERY, EXTRACT data ❏ Deep analytics - large, multi-source, complex, unstructured ❏ Be real time ❏ Terabyte scale - Cost effective ❏ Run Ad-Hoc reports - Without Developer - interactive ❏ Minimal engineering efforts - no dedicated BigData team ❏ Simple Query language (prefered SQL / Javascript) Making analytics accessible to more companies Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 5. Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 6. Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances BehindtheScenes: DaysToInsights Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 7. Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Minutes to kick in Hours to Run Batch Processing Hours to Clean and Aggregate DAYS TO INSIGHTS Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 8. Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 9. ● Analytics-as-a-Service - Data Warehouse in the Cloud ● Fully-Managed by Google (US or EU zone) ● Scales into Petabytes ● Ridiculously fast ● SQL 2011 Standard + Javascript UDF (User Defined Functions) ● Familiar DB Structure (table, views, record, nested, JSON) ● Open Interfaces (REST, ODBC, Web UI, BQ command line tool) ● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors ● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *Oct 2017 What is BigQuery? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 10. Architecting for The Cloud BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 11. Data Pipeline Integration at REEA.net Analytics Backend BigQuery On-Premises Servers Pipelines FluentD Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Data Analysts Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL Application ServersServers Cloud Storage archive Load Export Replay Standard Devices HTTPS Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 12. The following slides will present a sample Fluentd configuration to: 1. Transform a record 2. Copy event to multiple outputs 3. Store event data in File (for backup/log purposes) 4. Stream to BigQuery (for immediate analyses) Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 13. <filter frontend.user.*> @type record_transformer </filter> <match frontend.user.*> @type copy <store> @type forest subtype file </store> <store> @type bigquery </store> … </match> Filter plugin mutates incoming data. Add/modify/delete event data transform attributes without a code deploy.1 2 3 4 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua The copy output plugin copies events to multiple outputs. File(s), multiple databases, DB engines. Great to ship same event to multiple subsystems. The Bigquery output plugin on the fly streams the event to the BigQuery warehouse. No need to write integration. Data is available immediately for querying. Whenever needed other output plugins can be wired in: Kafka, Google Cloud Storage output plugin.
  • 14. record_transformer copy file BigQuery <filter frontend.user.*> @type record_transformer enable_ruby remove_keys host <record> bq {"insert_id":"${uid}","host":"${host}", "created":"${time.to_i}"} avg ${record["total"] / record["count"]} </record> </filter> syntax: Ruby, easy to use. Great for: - date transformation, - quick normalizations, - calculating something on the fly, and store in clear log/analytics db - renaming without code deploy. 1 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua 2 3 4
  • 15. record_transformer copy file BigQuery <match frontend.user.*> @type copy <store> @type forest subtype file <template> path /tank/storage/${tag}.*.log time_slice_format %Y%m%d </template> </store> </match> 1 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua 2 3 4
  • 16. record_transformer copy file BigQuery <match frontend.user.*> @type bigquery method insert auth_method json_key json_key /etc/td-agent/keys/key-31da042be48c.json project project_id dataset dataset_name time_field timestamp time_slice_format %Y%m%d table user$%{time_slice} ignore_unknown_values schema_path /etc/td-agent/schema/user_login.json </match> 1 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua 2 3 4 Connector uses: - JSON key auth file - JSON table schema Pro features: - streaming to Partitioned tables - ignore unknown values (not reflected in schema)
  • 17. ● On data that it is difficult to process/analyze using traditional databases ● On exploring unstructured data ● Not a replacement to traditional DBs, but it compliments the system ● Applying Javascript UDF on columnar storage to resolve complex tasks (eg: Javascript for natural language processing) ● On streams (forms, Kafka, IoT streams) ● Major strength is handling Large datasets Where to use BigQuery? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 18. ➢ Optimize product pages Find, store, analyse in BQ time consuming user actions from using 25x more custom events/hits than Google Analytics ➢ Email engagement Having stored every open/click raw data improve: subject line, layout, follow up action emails, assistant like experience by heavy A/B Split Tests on email marketing campaigns (interactive feedback loop) ➢ Funnel Analysis Wrangle all the data to discover: a small improvement, an AI driven upsell personal like experience, pre-sell products configured on the go - not yet in catalog, but easily can be tweaked/customized Achievements - goal reached by measuring everything Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 19. Funnel analysis: Time on upsell pages Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 20. Example HITS chain: ● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1 ● page1 -> article2-> page3 -> orderpage2 -> ... Attribute credit to first article visited on purchase Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 21. ● No manual sharding ● No capacity guessing ● No idle resources ● No maintenance windows ● No manual scaling ● No file mgmt BigQuery: Serverless Data Warehouse Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua serverless data warehouse depicted
  • 22. ● no provisioning/deploy ● no running out of resources ● no more focus on large scale execution plan ● no more throwing away-, expiring-, aggregating old data. ● run raw ad-hoc queries (either by analysts/sales or Devs) ● use Javascript in SQL to have an awesome BigData experience wrangling “unstructured” like nerd Our benefits Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 23. Easily Build Custom Reports and Dashboards Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
  • 24. Thank you. Slides available on: slideshare.net/martonkodok Reea.net - Integrated web solutions driven by creativity to deliver projects.