SlideShare a Scribd company logo
Applying BigQuery ML
on e-commerce data analytics
Márton Kodok
Google Developer Expert
REEA.net
● Among the Top3 romanians on Stackoverflow 137k reputation
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net
● BigQuery + Redis database engine expert
Slideshare: martonkodok
Twitter: @martonkodok
StackOverflow: pentium10
GitHub: pentium10
Applying BigQuery ML on e-commerce data analytics @martonkodok
About me
1. What is BigQuery? - Data warehouse in the Cloud
2. Introduction to BigQuery ML - execute ML models using SQL
3. Practical use cases
4. Segment and recommend with BigQuery ML
5. Conclusions
Agenda
Applying BigQuery ML on e-commerce data analytics @martonkodok
Legacy Reporting System
App
Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Applying BigQuery ML on e-commerce data analytics @martonkodok
Serverless Reporting System
App
Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BigQuery Data Studio
Report & Share
Business Analysis
Applying BigQuery ML on e-commerce data analytics @martonkodok
Applying BigQuery ML on e-commerce data analytics @martonkodok
Analytics-as-a-Service - Data Warehouse in the Cloud
Familiar DB Structure (table, columns, views, struct, nested, JSON)
Decent pricing (storage: $20/TB cold: $10/TB,queries $5/TB) *Sep 2019
SQL 2011 + Javascript UDF (User Defined Functions)
BigQuery ML enables users to create machine learning models by SQL queries
Scales into Exabytes on Managed Infrastructure
Integrates with Cloud SQL + Cloud Storage + Sheets + Pub/Sub connectors
What is BigQuery?
Applying BigQuery ML on e-commerce data analytics @martonkodok
1. Load from file - either local or from GCS (max 5TB each)
2. Streaming rows - event driven approach - high throughput 1M rows/sec
3. Functions - observer-trigger based (Google Cloud Functions)
4. Join with Cloud SQL - Ability to join with MySQL, Postgres
5. Pipelines - flexibility to do ETL - FluentD, Kafka, Google Dataflow
6. Export from connected services - Firestore, Billing, AuditLogs, Stackdriver
7. Firebase - Analytics - Messaging - Crashlytics - Perf. Monitoring - Predictions
Loading Data into BigQuery
Applying BigQuery ML on e-commerce data analytics @martonkodok
Serverless file ingest: Cloud Functions
BigQuery
On-Premises Servers
ApplicationEvent Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Cloud
Storage
Cloud
Functions
Triggered Code
Applying BigQuery ML on e-commerce data analytics @martonkodok
“ We have our app outside of GCP.
We need to join with our SQL database.
Solution: EXTERNAL_QUERY
Applying BigQuery ML on e-commerce data analytics @martonkodok
Combine on-premise with Cloud
App
Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
BigQuery
Applying BigQuery ML on e-commerce data analytics @martonkodok
Zone 1
us-east1-a
Replica
Cloud SQL
Cloud
VPN
Gateway
Execute combined
queries
Report
➢ Optimize product pages
Find, store, analyse in BQ time consuming user actions from using
25x more custom events/hits than Google Analytics
➢ Email engagement
Having stored every open/click raw data improve: subject line, layout,
follow up action emails, assistant like experience by heavy
A/B Split Tests on email marketing campaigns (interactive feedback loop)
➢ Funnel Analysis
Wrangle all the data to discover: a small improvement, an AI driven
upsell personal like experience, pre-sell products configured on the go -
not yet in catalog, but easily can be tweaked/customized
Where to use BigQuery?
Applying BigQuery ML on e-commerce data analytics @martonkodok
● SQL language to run BigData queries
● run raw ad-hoc queries (either by analysts/sales or Devs)
● no more throwing away-, expiring-, aggregating old data
● it’s serverless
● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
Our benefits
Applying BigQuery ML on e-commerce data analytics @martonkodok
Easily Build Custom Reports and Dashboards
Applying BigQuery ML on e-commerce data analytics @martonkodok
What is BigQueryML?
Applying BigQuery ML on e-commerce data analytics @martonkodok
Applying BigQuery ML on e-commerce data analytics @martonkodok
BigQuery ML
1. Execute ML initiatives without moving
data from BigQuery
2. Integrate on models in SQL in BigQuery
to increase development speed
3. Automate common ML tasks and
hyperparameter tuning
Developer SQL Data Scientist Use cases and skills
TensorFlow and
CloudML Engine
● Build and deploy state-of-art custom models
● Requires deep understanding of ML and
programming
BigQuery ML
● Build and deploy custom models using SQL
● Requires only basic understanding of ML
AutoML and
CloudML APIs
● Build and deploy Google-provided models for
standard use cases
● Requires almost no ML knowledge
Applying BigQuery ML on e-commerce data analytics @martonkodok
Making ML accessible for all audiences
● Linear regression for forecasting
● Binary or Multiclass logistic regression for classification (labels can have up to 50 unique values)
● K-means clustering for data segmentation (unsupervised learning - not require labels/training)
● Import TensorFlow models for prediction in BigQuery
● Matrix factorization (Alpha)
● Deep Neural Networks using Tensorflow (Alpha)
● Feature pre-processing functions (Alpha)
Alphas are whitelist only. Please contact your Google CE/Sales/TAM.
Supported models in BigQuery ML
Applying BigQuery ML on e-commerce data analytics @martonkodok
Conversion/Purchase prediction MODEL: Logistic-Regression
Predict if a user “converts” or "purchases". It is in the company's interest if many users sign up for this
membership as it helps streamline their Ads convertion and also helps with recurring revenue.
Customer Lifetime Value (LTV) prediction. MODEL: Logistic-Regression
It is used by the organisations to identify and prioritizesignificantcustomersegments that would be most
valuable to the company.
Customer Segmentation MODEL: K-means clustering
dividing a client base into groups in specific ways relevanttomarketing, such as interestsandspending
habits. Segmentation allows marketers to better customize their efforts to various audience groups.
E-commerce Use Cases
Applying BigQuery ML on e-commerce data analytics @martonkodok
Create a MODELthat predicts whether a website visitor will make a transaction.
● CREATEMODEL statement
● TheML.EVALUATE function to evaluate the ML model
● TheML.PREDICTfunction to make predictions using the ML model
Getting started with BigQuery ML
Applying BigQuery ML on e-commerce data analytics @martonkodok
Create a binarylogisticregressionmodel
Applying BigQuery ML on e-commerce data analytics @martonkodok
3
2
Create training dataset
using a labelcolumn
CREATEMODEL syntax
1
2
SELECT features
3
1
Evaluate your model
Applying BigQuery ML on e-commerce data analytics @martonkodok
Predict
Applying BigQuery ML on e-commerce data analytics @martonkodok
Use cases:
● Customer segmentation
● Data quality
Options and defaults
● Number of clusters: Default log10
(num_rows) clusters
● Distance type - Euclidean(default), Cosine
● Supports all major SQL data types including GIS
K-means clustering
Applying BigQuery ML on e-commerce data analytics @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “kmeans”)
AS SELECT..
ml.PREDICT maps rows to closest clusters
ml.CENTROID for cluster centroids
ml.EVALUATE
ml.TRAINING_INFO
ml.FEATURE_INFO
Available data:
● Encode yes/no features
(eg: has a microwave, has a kitchen, has a TV, has a bathroom)
● Can apply clustering on the encoded data
K-means clustering: Problem definition
Applying BigQuery ML on e-commerce data analytics @martonkodok
Premise
Predicting LTV for a new user
helps a company determine
which users are of most “value”,
understand those users’
common characteristics,
and focus more on them.
K-means clustering: Customer Lifetime Value
Applying BigQuery ML on e-commerce data analytics @martonkodok
Premise
We can identify oddities
(potential data quality issues)
by grouping things together
and separating outliers.
K-means clustering: Problem definition
Applying BigQuery ML on e-commerce data analytics @martonkodok
Use cases:
● Easily add TensorFlow predictions to BigQuery
● Build unstructured data models in TensorFlow,
predict in BigQuery
Key restrictions
● Model size limit of 250MB
Import TensorFlow models for prediction (Alpha)
Applying BigQuery ML on e-commerce data analytics @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type =“tensorflow”,
Model_path =’gs://’)
ml.PREDICT()
DEMO
Search 'QueryIt Smart' on GitHub to learn more.
What is on the roadmap of BigQueryML?
Cloud Next 19 announcements
Applying BigQuery ML on e-commerce data analytics @martonkodok
New on BigQuery UI - Training tab charts
Applying BigQuery ML on e-commerce data analytics @martonkodok
New on BigQuery UI - Evaluation charts
Applying BigQuery ML on e-commerce data analytics @martonkodok
New on BigQuery UI - Confusion Matrix
Applying BigQuery ML on e-commerce data analytics @martonkodok
Percentage of actual
labels that were
classified:
- Correctly (Blue)
- Incorrectly (Grey)
Conclusion
Applying BigQuery ML on e-commerce data analytics @martonkodok
Automation
● Run the process daily
● Determine hyperparameters
● Surface the results and route them somewhere for inspection and improvement
Testing
● AB test around impact of data quality on conversion and customer NPS (net promoter score)
Improvements
● Determine, and explore outliers
● Repeat, automate
Considerations
Applying BigQuery ML on e-commerce data analytics @martonkodok
● ML is hard, we don’t have dedicated team.
With BigQuery ML you need only devs who have good SQL skills.
● Extending your current stack with ML is no longer a steep learning curve using BigQuery ML
● Understand how to connect pieces of tabular data to fulfil a business requirement
● Start using the Cloud benefits and BigQuery ML as a complementary system
● Understand BigQuery ML to see that you don’t need large budget to add ML product improvements
#increase #innovation #work on #fun #stuff
Common mindset blockers
Applying BigQuery ML on e-commerce data analytics @martonkodok
● Democratizes the use of ML by empowering data analysts to build and run models using existing
business intelligence tools and spreadsheets
● Generalist team. Models are trained using SQL. There is no need to program an ML solution using
Python or Java.
● Increases the innovation and speed of model development by removing the need to export data from
the data warehouse.
● A Model serves a purpose. Easy to change/recycle.
Benefits of BigQuery ML
Applying BigQuery ML on e-commerce data analytics @martonkodok
The possibilities are endless
Applying BigQuery ML on e-commerce data analytics @martonkodok
Marketing Retail IndustrialandIoT Media/gaming
Predict customer value
Predict funnel conversion
Personalize ads, email,
webpage content
Optimize inventory
Forecase revenue
Enable product
recommendations
Optimize staff promotions
Forecast demand for
parking, traffic utilities,
personnel
Prevent equipment
downtime
Predict maintenance needs
Personalize content
Predict game difficulty
Predict player lifetime value
Thank you.
Slides available on: slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity
to deliver projects.

More Related Content

PDF
Ml ops intro session
PDF
Web scraping in python
PDF
LanGCHAIN Framework
PDF
BigQuery ML - Machine learning at scale using SQL
PPTX
Explainable AI in Industry (KDD 2019 Tutorial)
PDF
Importance of ML Reproducibility & Applications with MLfLow
PDF
ML-Ops how to bring your data science to production
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
Ml ops intro session
Web scraping in python
LanGCHAIN Framework
BigQuery ML - Machine learning at scale using SQL
Explainable AI in Industry (KDD 2019 Tutorial)
Importance of ML Reproducibility & Applications with MLfLow
ML-Ops how to bring your data science to production
Using MLOps to Bring ML to Production/The Promise of MLOps

What's hot (20)

PDF
MLOps by Sasha Rosenbaum
PDF
generative-ai-fundamentals and Large language models
PDF
BigQuery for Beginners
PDF
Improving Machine Learning using Graph Algorithms
PPTX
Recommender Systems
PPTX
A Comprehensive Review of Large Language Models for.pptx
PDF
The current state of generative AI
PPTX
MLOps - The Assembly Line of ML
PPTX
Getting Started with Azure AutoML
PDF
MLFlow: Platform for Complete Machine Learning Lifecycle
PPTX
Interpretable machine learning
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
PPTX
Dowhy: An end-to-end library for causal inference
PDF
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
PPTX
Google Vertex AI
PDF
Building a performing Machine Learning model from A to Z
PDF
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
PPTX
Responsible AI in Industry (ICML 2021 Tutorial)
PDF
Causality without headaches
PDF
Spark Autotuning Talk - Strata New York
MLOps by Sasha Rosenbaum
generative-ai-fundamentals and Large language models
BigQuery for Beginners
Improving Machine Learning using Graph Algorithms
Recommender Systems
A Comprehensive Review of Large Language Models for.pptx
The current state of generative AI
MLOps - The Assembly Line of ML
Getting Started with Azure AutoML
MLFlow: Platform for Complete Machine Learning Lifecycle
Interpretable machine learning
MLOps and Data Quality: Deploying Reliable ML Models in Production
Dowhy: An end-to-end library for causal inference
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Google Vertex AI
Building a performing Machine Learning model from A to Z
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Responsible AI in Industry (ICML 2021 Tutorial)
Causality without headaches
Spark Autotuning Talk - Strata New York
Ad

Similar to Applying BigQuery ML on e-commerce data analytics (20)

PDF
BigQuery ML - Machine learning at scale using SQL
PDF
BigdataConference Europe - BigQuery ML
PDF
Discover BigQuery ML, build your own CREATE MODEL statement
PDF
Supercharge your data analytics with BigQuery
PDF
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
PDF
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
PDF
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
PDF
Modern Thinking área digital MSKM 21/09/2017
PDF
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
PPTX
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
PDF
Webinar: Faster Big Data Analytics with MongoDB
PPTX
MongoDB.local Sydney 2019: Building Intelligent Apps with MongoDB & Google Cloud
PPTX
Building Intelligent Apps with MongoDB & Google Cloud
PDF
MongoDB.local Austin 2018: Building Intelligent Apps with MongoDB & Google Cloud
PDF
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
PDF
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
PDF
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
PDF
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
DOCX
Begin Data Science with Zero Coding Skills blog ..docx
PDF
Your Roadmap for An Enterprise Graph Strategy
BigQuery ML - Machine learning at scale using SQL
BigdataConference Europe - BigQuery ML
Discover BigQuery ML, build your own CREATE MODEL statement
Supercharge your data analytics with BigQuery
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
Modern Thinking área digital MSKM 21/09/2017
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Building Intelligent Apps with MongoDB and Google Cloud - Jane Fine
Webinar: Faster Big Data Analytics with MongoDB
MongoDB.local Sydney 2019: Building Intelligent Apps with MongoDB & Google Cloud
Building Intelligent Apps with MongoDB & Google Cloud
MongoDB.local Austin 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
Begin Data Science with Zero Coding Skills blog ..docx
Your Roadmap for An Enterprise Graph Strategy
Ad

More from Márton Kodok (20)

PDF
AI Agents with Gemini 2.0 - Beyond the Chatbot
PDF
Gemini 2.0 and Vertex AI for Innovation Workshop
PDF
Function Calling with the Vertex AI Gemini API
PDF
Vector search and multimodal embeddings in BigQuery
PDF
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
PDF
Build applications with generative AI on Google Cloud
PDF
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
PDF
DevBCN Vertex AI - Pipelines for your MLOps workflows
PDF
Cloud Run - the rise of serverless and containerization
PDF
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
PDF
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
PDF
Vertex AI: Pipelines for your MLOps workflows
PDF
Cloud Workflows What's new in serverless orchestration and automation
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
Serverless orchestration and automation with Cloud Workflows
PDF
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
PDF
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
PDF
Google Cloud Platform Solutions for DevOps Engineers
PDF
GDG DevFest Romania - Architecting for the Google Cloud Platform
AI Agents with Gemini 2.0 - Beyond the Chatbot
Gemini 2.0 and Vertex AI for Innovation Workshop
Function Calling with the Vertex AI Gemini API
Vector search and multimodal embeddings in BigQuery
BigQuery Remote Functions for Dynamic Mapping of E-mobility Charging Networks
Build applications with generative AI on Google Cloud
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
DevBCN Vertex AI - Pipelines for your MLOps workflows
Cloud Run - the rise of serverless and containerization
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI: Pipelines for your MLOps workflows
Cloud Workflows What's new in serverless orchestration and automation
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Google Cloud Platform Solutions for DevOps Engineers
GDG DevFest Romania - Architecting for the Google Cloud Platform

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Mega Projects Data Mega Projects Data
PPT
Quality review (1)_presentation of this 21
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Lecture1 pattern recognition............
PDF
Foundation of Data Science unit number two notes
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Global journeys: estimating international migration
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Data_Analytics_and_PowerBI_Presentation.pptx
Mega Projects Data Mega Projects Data
Quality review (1)_presentation of this 21
Reliability_Chapter_ presentation 1221.5784
Lecture1 pattern recognition............
Foundation of Data Science unit number two notes
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
IB Computer Science - Internal Assessment.pptx
Launch Your Data Science Career in Kochi – 2025
Global journeys: estimating international migration
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Major-Components-ofNKJNNKNKNKNKronment.pptx
Miokarditis (Inflamasi pada Otot Jantung)
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Acumen Training GuidePresentation.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn

Applying BigQuery ML on e-commerce data analytics

  • 1. Applying BigQuery ML on e-commerce data analytics Márton Kodok Google Developer Expert REEA.net
  • 2. ● Among the Top3 romanians on Stackoverflow 137k reputation ● Google Developer Expert on Cloud technologies ● Crafting Web/Mobile backends at REEA.net ● BigQuery + Redis database engine expert Slideshare: martonkodok Twitter: @martonkodok StackOverflow: pentium10 GitHub: pentium10 Applying BigQuery ML on e-commerce data analytics @martonkodok About me
  • 3. 1. What is BigQuery? - Data warehouse in the Cloud 2. Introduction to BigQuery ML - execute ML models using SQL 3. Practical use cases 4. Segment and recommend with BigQuery ML 5. Conclusions Agenda Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 4. Legacy Reporting System App Load Balancing NGINX Compute Engine 10GB PD 2 1 Database Service (Master/Slave) Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 5. Serverless Reporting System App Load Balancing NGINX Compute Engine 10GB PD 2 1 Database Service (Master/Slave) Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances BigQuery Data Studio Report & Share Business Analysis Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 6. Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 7. Analytics-as-a-Service - Data Warehouse in the Cloud Familiar DB Structure (table, columns, views, struct, nested, JSON) Decent pricing (storage: $20/TB cold: $10/TB,queries $5/TB) *Sep 2019 SQL 2011 + Javascript UDF (User Defined Functions) BigQuery ML enables users to create machine learning models by SQL queries Scales into Exabytes on Managed Infrastructure Integrates with Cloud SQL + Cloud Storage + Sheets + Pub/Sub connectors What is BigQuery? Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 8. 1. Load from file - either local or from GCS (max 5TB each) 2. Streaming rows - event driven approach - high throughput 1M rows/sec 3. Functions - observer-trigger based (Google Cloud Functions) 4. Join with Cloud SQL - Ability to join with MySQL, Postgres 5. Pipelines - flexibility to do ETL - FluentD, Kafka, Google Dataflow 6. Export from connected services - Firestore, Billing, AuditLogs, Stackdriver 7. Firebase - Analytics - Messaging - Crashlytics - Perf. Monitoring - Predictions Loading Data into BigQuery Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 9. Serverless file ingest: Cloud Functions BigQuery On-Premises Servers ApplicationEvent Sourcing Frontend Platform Services Metrics / Logs/ Streaming Cloud Storage Cloud Functions Triggered Code Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 10. “ We have our app outside of GCP. We need to join with our SQL database. Solution: EXTERNAL_QUERY Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 11. Combine on-premise with Cloud App Load Balancing NGINX Compute Engine 10GB PD 2 1 Database Service (Master/Slave) Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 Compute Engine 10GB PD 4 1 BigQuery Applying BigQuery ML on e-commerce data analytics @martonkodok Zone 1 us-east1-a Replica Cloud SQL Cloud VPN Gateway Execute combined queries Report
  • 12. ➢ Optimize product pages Find, store, analyse in BQ time consuming user actions from using 25x more custom events/hits than Google Analytics ➢ Email engagement Having stored every open/click raw data improve: subject line, layout, follow up action emails, assistant like experience by heavy A/B Split Tests on email marketing campaigns (interactive feedback loop) ➢ Funnel Analysis Wrangle all the data to discover: a small improvement, an AI driven upsell personal like experience, pre-sell products configured on the go - not yet in catalog, but easily can be tweaked/customized Where to use BigQuery? Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 13. ● SQL language to run BigData queries ● run raw ad-hoc queries (either by analysts/sales or Devs) ● no more throwing away-, expiring-, aggregating old data ● it’s serverless ● no provisioning/deploy ● no running out of resources ● no more focus on large scale execution plan Our benefits Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 14. Easily Build Custom Reports and Dashboards Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 15. What is BigQueryML? Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 16. Applying BigQuery ML on e-commerce data analytics @martonkodok BigQuery ML 1. Execute ML initiatives without moving data from BigQuery 2. Integrate on models in SQL in BigQuery to increase development speed 3. Automate common ML tasks and hyperparameter tuning
  • 17. Developer SQL Data Scientist Use cases and skills TensorFlow and CloudML Engine ● Build and deploy state-of-art custom models ● Requires deep understanding of ML and programming BigQuery ML ● Build and deploy custom models using SQL ● Requires only basic understanding of ML AutoML and CloudML APIs ● Build and deploy Google-provided models for standard use cases ● Requires almost no ML knowledge Applying BigQuery ML on e-commerce data analytics @martonkodok Making ML accessible for all audiences
  • 18. ● Linear regression for forecasting ● Binary or Multiclass logistic regression for classification (labels can have up to 50 unique values) ● K-means clustering for data segmentation (unsupervised learning - not require labels/training) ● Import TensorFlow models for prediction in BigQuery ● Matrix factorization (Alpha) ● Deep Neural Networks using Tensorflow (Alpha) ● Feature pre-processing functions (Alpha) Alphas are whitelist only. Please contact your Google CE/Sales/TAM. Supported models in BigQuery ML Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 19. Conversion/Purchase prediction MODEL: Logistic-Regression Predict if a user “converts” or "purchases". It is in the company's interest if many users sign up for this membership as it helps streamline their Ads convertion and also helps with recurring revenue. Customer Lifetime Value (LTV) prediction. MODEL: Logistic-Regression It is used by the organisations to identify and prioritizesignificantcustomersegments that would be most valuable to the company. Customer Segmentation MODEL: K-means clustering dividing a client base into groups in specific ways relevanttomarketing, such as interestsandspending habits. Segmentation allows marketers to better customize their efforts to various audience groups. E-commerce Use Cases Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 20. Create a MODELthat predicts whether a website visitor will make a transaction. ● CREATEMODEL statement ● TheML.EVALUATE function to evaluate the ML model ● TheML.PREDICTfunction to make predictions using the ML model Getting started with BigQuery ML Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 21. Create a binarylogisticregressionmodel Applying BigQuery ML on e-commerce data analytics @martonkodok 3 2 Create training dataset using a labelcolumn CREATEMODEL syntax 1 2 SELECT features 3 1
  • 22. Evaluate your model Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 23. Predict Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 24. Use cases: ● Customer segmentation ● Data quality Options and defaults ● Number of clusters: Default log10 (num_rows) clusters ● Distance type - Euclidean(default), Cosine ● Supports all major SQL data types including GIS K-means clustering Applying BigQuery ML on e-commerce data analytics @martonkodok CREATE MODEL yourmodel OPTIONS (model_type = “kmeans”) AS SELECT.. ml.PREDICT maps rows to closest clusters ml.CENTROID for cluster centroids ml.EVALUATE ml.TRAINING_INFO ml.FEATURE_INFO
  • 25. Available data: ● Encode yes/no features (eg: has a microwave, has a kitchen, has a TV, has a bathroom) ● Can apply clustering on the encoded data K-means clustering: Problem definition Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 26. Premise Predicting LTV for a new user helps a company determine which users are of most “value”, understand those users’ common characteristics, and focus more on them. K-means clustering: Customer Lifetime Value Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 27. Premise We can identify oddities (potential data quality issues) by grouping things together and separating outliers. K-means clustering: Problem definition Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 28. Use cases: ● Easily add TensorFlow predictions to BigQuery ● Build unstructured data models in TensorFlow, predict in BigQuery Key restrictions ● Model size limit of 250MB Import TensorFlow models for prediction (Alpha) Applying BigQuery ML on e-commerce data analytics @martonkodok CREATE MODEL yourmodel OPTIONS (model_type =“tensorflow”, Model_path =’gs://’) ml.PREDICT() DEMO Search 'QueryIt Smart' on GitHub to learn more.
  • 29. What is on the roadmap of BigQueryML? Cloud Next 19 announcements Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 30. New on BigQuery UI - Training tab charts Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 31. New on BigQuery UI - Evaluation charts Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 32. New on BigQuery UI - Confusion Matrix Applying BigQuery ML on e-commerce data analytics @martonkodok Percentage of actual labels that were classified: - Correctly (Blue) - Incorrectly (Grey)
  • 33. Conclusion Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 34. Automation ● Run the process daily ● Determine hyperparameters ● Surface the results and route them somewhere for inspection and improvement Testing ● AB test around impact of data quality on conversion and customer NPS (net promoter score) Improvements ● Determine, and explore outliers ● Repeat, automate Considerations Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 35. ● ML is hard, we don’t have dedicated team. With BigQuery ML you need only devs who have good SQL skills. ● Extending your current stack with ML is no longer a steep learning curve using BigQuery ML ● Understand how to connect pieces of tabular data to fulfil a business requirement ● Start using the Cloud benefits and BigQuery ML as a complementary system ● Understand BigQuery ML to see that you don’t need large budget to add ML product improvements #increase #innovation #work on #fun #stuff Common mindset blockers Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 36. ● Democratizes the use of ML by empowering data analysts to build and run models using existing business intelligence tools and spreadsheets ● Generalist team. Models are trained using SQL. There is no need to program an ML solution using Python or Java. ● Increases the innovation and speed of model development by removing the need to export data from the data warehouse. ● A Model serves a purpose. Easy to change/recycle. Benefits of BigQuery ML Applying BigQuery ML on e-commerce data analytics @martonkodok
  • 37. The possibilities are endless Applying BigQuery ML on e-commerce data analytics @martonkodok Marketing Retail IndustrialandIoT Media/gaming Predict customer value Predict funnel conversion Personalize ads, email, webpage content Optimize inventory Forecase revenue Enable product recommendations Optimize staff promotions Forecast demand for parking, traffic utilities, personnel Prevent equipment downtime Predict maintenance needs Personalize content Predict game difficulty Predict player lifetime value
  • 38. Thank you. Slides available on: slideshare.net/martonkodok Reea.net - Integrated web solutions driven by creativity to deliver projects.