SlideShare a Scribd company logo
Making advanced analytics
accessible to more companies
Márton Kodok / @martonkodok
Google Developer Expert at REEA.net, Targu Mures, Romania
24 May 2017 Tirgu Mures, Romania
Issue 59 - May 2017
● Geek. Hiker. Do-er.
● Crafting Web/Mobile backends at REEA.net Targu Mures
● Among the TOP3 romanians on Stackoverflow.com
● Google Developer Expert on Cloud technologies
● BigQuery and database engine expert
● Active in mentoring
Twitter: @martonkodok
StackOverflow: pentium10
Slideshare: martonkodok
GitHub: pentium10
Making advanced analytics accessible to more companies @martonkodok
About me
Making advanced analytics accessible to more companies @martonkodok
Agenda
The
Challenge
Making advanced analytics
accessible to more companies
Architecture
Overview
Strategy &
Tricks
Winning
Solution
Companies:
❏ must be able to identify, combine, and manage multiple sources of data
❏ should have the ability to obtain advanced analytics using concepts
they are familiar with.
❏ have a deployment of the right technology architecture
matching their capabilities.
Making advanced analytics accessible to more companies @martonkodok
3 principles to get from small data to BigData
Making advanced analytics accessible to more companies @martonkodok
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Making advanced analytics accessible to more companies @martonkodok
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BehindtheScenes:
DaysToInsights
Making advanced analytics accessible to more companies @martonkodok
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean and
Aggregate
DAYS TO
INSIGHTS
❏ Need backend/database to STORE, QUERY, EXTRACT data
❏ Deep analytics - large, multi-source, complex, unstructured
❏ Be real time
❏ Terabyte scale - Cost effective
❏ Run Ad-Hoc reports - Without Developer - interactive
❏ Minimal engineering efforts - no dedicated BigData team
❏ Simple Query language (prefered SQL / Javascript)
Making advanced analytics accessible to more companies @martonkodok
Desired system
Making advanced analytics accessible to more companies @martonkodok
● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● SQL 2011 Standard + Javascript UDF (User Defined Functions)
● Familiar DB Structure (table, views, record, nested, JSON)
● Integrates with Tableau, Google Sheets + Cloud Storage + Pub/Sub connectors
● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017
Making advanced analytics accessible to more companies @martonkodok
What is BigQuery?
Making advanced analytics accessible to more companies @martonkodok
Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Making advanced analytics accessible to more companies @martonkodok
Data Pipeline Integration
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load
Export
Replay
Standard
Devices
HTTPS
Making advanced analytics accessible to more companies @martonkodok
<filter frontend.user.*>
@type record_transformer
enable_ruby
remove_keys host
<record>
bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"}
</record>
</filter>
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
<template>
path /tank/storage/${tag}.*.log
time_slice_format %Y%m%d
time_slice_wait 10m
</template>
</store>
<store>
@type bigquery
method insert
...
</store>
</match>
….bigquery section continued….
auth_method json_key
json_key /etc/td-agent/keys/key-31da042be48c.json
project project_id
dataset dataset_name
time_field timestamp
time_slice_format %Y%m%d
table user$%{time_slice}
ignore_unknown_values
schema_path /etc/td-agent/schema/user_login.json
1
2
3
4
● On data that it is difficult to process/analyze using traditional databases
● On exploring unstructured data
● Not a replacement to traditional DBs, but it compliments the system
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: JS for natural language processing)
● On streams (form wizard ...)
● On IoT streams
● Major strength is handling Large datasets
Making advanced analytics accessible to more companies @martonkodok
Where to use BigQuery?
● no manual sharding
● no capacity guessing
● no idle resources
● no manual scaling
● no provisioning/deploy/running out of resources
● run raw ad-hoc queries (either by analysts/sales)
● no more throwing away-, expiring-, aggregating old
data.
Making advanced analytics accessible to more companies @martonkodok
BigQuery Benefits: Serverless Data Warehouse
Making advanced analytics accessible to more companies @martonkodok
Easily Build Custom Reports and Dashboards
Thank you.
Slides available on: slideshare.net/martonkodok
Making advanced analytics accessible to more companies @martonkodok

More Related Content

PDF
DevTalks Keynote Powering interactive data analysis with Google BigQuery
PDF
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
PDF
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
PDF
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
PDF
Google BigQuery for Everyday Developer
PDF
How to migrate to GraphDB in 10 easy to follow steps
PDF
How Google Does Big Data - DevNexus 2014
PPTX
TechEvent biGenius What's New
DevTalks Keynote Powering interactive data analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
Google BigQuery for Everyday Developer
How to migrate to GraphDB in 10 easy to follow steps
How Google Does Big Data - DevNexus 2014
TechEvent biGenius What's New

What's hot (15)

PDF
Big Query - Utilizing Google Data Warehouse for Media Analytics
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
PDF
Google Analytics and BigQuery, by Javier Ramirez, from datawaki
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
PDF
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
PDF
Big Query Basics
PPTX
Webinar: Live Data Visualisation with Tableau and MongoDB
PDF
Data Lineage with Apache Airflow using Marquez
PDF
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
PPTX
The Yellowbrick Impact for MicroStrategy
PPTX
Enterprise Reporting with MongoDB and JasperSoft
PPTX
Graph Data: a New Data Management Frontier
PDF
BigQuery ML - Machine learning at scale using SQL
PDF
Big query
PPTX
Big Data at Tube: Events to Insights to Action
Big Query - Utilizing Google Data Warehouse for Media Analytics
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Google Analytics and BigQuery, by Javier Ramirez, from datawaki
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Big Query Basics
Webinar: Live Data Visualisation with Tableau and MongoDB
Data Lineage with Apache Airflow using Marquez
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
The Yellowbrick Impact for MicroStrategy
Enterprise Reporting with MongoDB and JasperSoft
Graph Data: a New Data Management Frontier
BigQuery ML - Machine learning at scale using SQL
Big query
Big Data at Tube: Events to Insights to Action
Ad

Similar to Making advanced analytics accessible to more companies (20)

PDF
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
PDF
Supercharge your data analytics with BigQuery
PDF
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
PDF
Applying BigQuery ML on e-commerce data analytics
PDF
Virtualisation de données : Enjeux, Usages & Bénéfices
PDF
Webinar: Faster Big Data Analytics with MongoDB
PDF
BigQuery ML - Machine learning at scale using SQL
PDF
IoT NY - Google Cloud Services for IoT
PDF
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
PPTX
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
PPTX
Big data an elephant business opportunities
PDF
Future of Data Strategy (ASEAN)
PDF
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
PDF
Critical Breakthroughs and Challenges in Big Data and Analytics
PDF
Future of Data Strategy
PDF
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
PDF
Workshop on Google Cloud Data Platform
PDF
Data Ingestion in Big Data and IoT platforms
PDF
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
Supercharge your data analytics with BigQuery
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Applying BigQuery ML on e-commerce data analytics
Virtualisation de données : Enjeux, Usages & Bénéfices
Webinar: Faster Big Data Analytics with MongoDB
BigQuery ML - Machine learning at scale using SQL
IoT NY - Google Cloud Services for IoT
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Big data an elephant business opportunities
Future of Data Strategy (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Critical Breakthroughs and Challenges in Big Data and Analytics
Future of Data Strategy
Advanced Reporting and ETL for MongoDB: Easily Build a 360-Degree View of You...
Workshop on Google Cloud Data Platform
Data Ingestion in Big Data and IoT platforms
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Ad

More from Márton Kodok (20)

PDF
AI Agents with Gemini 2.0 - Beyond the Chatbot
PDF
Gemini 2.0 and Vertex AI for Innovation Workshop
PDF
Function Calling with the Vertex AI Gemini API
PDF
Vector search and multimodal embeddings in BigQuery
PDF
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
PDF
Build applications with generative AI on Google Cloud
PDF
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
PDF
DevBCN Vertex AI - Pipelines for your MLOps workflows
PDF
Discover BigQuery ML, build your own CREATE MODEL statement
PDF
Cloud Run - the rise of serverless and containerization
PDF
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
PDF
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
PDF
Vertex AI: Pipelines for your MLOps workflows
PDF
Cloud Workflows What's new in serverless orchestration and automation
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
BigdataConference Europe - BigQuery ML
PDF
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
PDF
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
AI Agents with Gemini 2.0 - Beyond the Chatbot
Gemini 2.0 and Vertex AI for Innovation Workshop
Function Calling with the Vertex AI Gemini API
Vector search and multimodal embeddings in BigQuery
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Build applications with generative AI on Google Cloud
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
DevBCN Vertex AI - Pipelines for your MLOps workflows
Discover BigQuery ML, build your own CREATE MODEL statement
Cloud Run - the rise of serverless and containerization
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI: Pipelines for your MLOps workflows
Cloud Workflows What's new in serverless orchestration and automation
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
BigdataConference Europe - BigQuery ML
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig

Recently uploaded (20)

PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Nekopoi APK 2025 free lastest update
PDF
AI in Product Development-omnex systems
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Transform Your Business with a Software ERP System
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Introduction to Artificial Intelligence
PDF
System and Network Administration Chapter 2
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
medical staffing services at VALiNTRY
PPT
Introduction Database Management System for Course Database
PDF
Digital Strategies for Manufacturing Companies
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
top salesforce developer skills in 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Nekopoi APK 2025 free lastest update
AI in Product Development-omnex systems
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Transform Your Business with a Software ERP System
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Wondershare Filmora 15 Crack With Activation Key [2025
How to Migrate SBCGlobal Email to Yahoo Easily
Introduction to Artificial Intelligence
System and Network Administration Chapter 2
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Understanding Forklifts - TECH EHS Solution
medical staffing services at VALiNTRY
Introduction Database Management System for Course Database
Digital Strategies for Manufacturing Companies
VVF-Customer-Presentation2025-Ver1.9.pptx
top salesforce developer skills in 2025.pdf

Making advanced analytics accessible to more companies

  • 1. Making advanced analytics accessible to more companies Márton Kodok / @martonkodok Google Developer Expert at REEA.net, Targu Mures, Romania 24 May 2017 Tirgu Mures, Romania Issue 59 - May 2017
  • 2. ● Geek. Hiker. Do-er. ● Crafting Web/Mobile backends at REEA.net Targu Mures ● Among the TOP3 romanians on Stackoverflow.com ● Google Developer Expert on Cloud technologies ● BigQuery and database engine expert ● Active in mentoring Twitter: @martonkodok StackOverflow: pentium10 Slideshare: martonkodok GitHub: pentium10 Making advanced analytics accessible to more companies @martonkodok About me
  • 3. Making advanced analytics accessible to more companies @martonkodok Agenda The Challenge Making advanced analytics accessible to more companies Architecture Overview Strategy & Tricks Winning Solution
  • 4. Companies: ❏ must be able to identify, combine, and manage multiple sources of data ❏ should have the ability to obtain advanced analytics using concepts they are familiar with. ❏ have a deployment of the right technology architecture matching their capabilities. Making advanced analytics accessible to more companies @martonkodok 3 principles to get from small data to BigData
  • 5. Making advanced analytics accessible to more companies @martonkodok Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances
  • 6. Making advanced analytics accessible to more companies @martonkodok Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances BehindtheScenes: DaysToInsights
  • 7. Making advanced analytics accessible to more companies @martonkodok Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Minutes to kick in Hours to Run Batch Processing Hours to Clean and Aggregate DAYS TO INSIGHTS
  • 8. ❏ Need backend/database to STORE, QUERY, EXTRACT data ❏ Deep analytics - large, multi-source, complex, unstructured ❏ Be real time ❏ Terabyte scale - Cost effective ❏ Run Ad-Hoc reports - Without Developer - interactive ❏ Minimal engineering efforts - no dedicated BigData team ❏ Simple Query language (prefered SQL / Javascript) Making advanced analytics accessible to more companies @martonkodok Desired system
  • 9. Making advanced analytics accessible to more companies @martonkodok
  • 10. ● Analytics-as-a-Service - Data Warehouse in the Cloud ● Fully-Managed by Google (US or EU zone) ● Scales into Petabytes ● Ridiculously fast ● SQL 2011 Standard + Javascript UDF (User Defined Functions) ● Familiar DB Structure (table, views, record, nested, JSON) ● Integrates with Tableau, Google Sheets + Cloud Storage + Pub/Sub connectors ● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017 Making advanced analytics accessible to more companies @martonkodok What is BigQuery?
  • 11. Making advanced analytics accessible to more companies @martonkodok Architecting for The Cloud BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming
  • 12. Making advanced analytics accessible to more companies @martonkodok Data Pipeline Integration Analytics Backend BigQuery On-Premises Servers Pipelines FluentD Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Data Analysts Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL Application ServersServers Cloud Storage archive Load Export Replay Standard Devices HTTPS
  • 13. Making advanced analytics accessible to more companies @martonkodok <filter frontend.user.*> @type record_transformer enable_ruby remove_keys host <record> bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"} </record> </filter> <match frontend.user.*> @type copy <store> @type forest subtype file <template> path /tank/storage/${tag}.*.log time_slice_format %Y%m%d time_slice_wait 10m </template> </store> <store> @type bigquery method insert ... </store> </match> ….bigquery section continued…. auth_method json_key json_key /etc/td-agent/keys/key-31da042be48c.json project project_id dataset dataset_name time_field timestamp time_slice_format %Y%m%d table user$%{time_slice} ignore_unknown_values schema_path /etc/td-agent/schema/user_login.json 1 2 3 4
  • 14. ● On data that it is difficult to process/analyze using traditional databases ● On exploring unstructured data ● Not a replacement to traditional DBs, but it compliments the system ● Applying Javascript UDF on columnar storage to resolve complex tasks (eg: JS for natural language processing) ● On streams (form wizard ...) ● On IoT streams ● Major strength is handling Large datasets Making advanced analytics accessible to more companies @martonkodok Where to use BigQuery?
  • 15. ● no manual sharding ● no capacity guessing ● no idle resources ● no manual scaling ● no provisioning/deploy/running out of resources ● run raw ad-hoc queries (either by analysts/sales) ● no more throwing away-, expiring-, aggregating old data. Making advanced analytics accessible to more companies @martonkodok BigQuery Benefits: Serverless Data Warehouse
  • 16. Making advanced analytics accessible to more companies @martonkodok Easily Build Custom Reports and Dashboards
  • 17. Thank you. Slides available on: slideshare.net/martonkodok Making advanced analytics accessible to more companies @martonkodok