SlideShare a Scribd company logo
Copyright© 2017GoDaddyInc. AllRights Reserved.
Customer Success Dashboard
GoDaddy Business Score
Baburao Kamble; Ryan Kleck; Robert Brown
Copyright© 2017GoDaddyInc. AllRights Reserved.
Agenda
2
GoDaddy’s Mission for Small Business Success
Customer Success Dashboard
Machine Learning on 64M Websites Content using Spark 2.1
Business Score
Marketing Use Case
See the Internet
like never before.
hello.
Everything you
need to know to
get your business
online.
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble
Copyright© 2017GoDaddyInc. AllRights Reserved.
Customer Success Dashboard
Develop the Content Analytics Platform (online and offline) for
the GoDaddy’s Customers to Understand Digital Presence of the
Small Business in a particular industry segment.
Copyright© 2017GoDaddyInc. AllRights Reserved.
HTML
Machine Logs
Security Logs
Internal Data
Open Source Data
TE L
Communication
Score Engine
Industry Verticals
Traffic Forecast (global,local)
EconomicFootprints
Business Score
GD SBI
Go
Initiative
Marketing
Ad hoc
Analytics
User Interface
FeatureDB
Customer Success Dashboard
Copyright© 2017GoDaddyInc. AllRights Reserved.
Components
9
Using Customer’s website content, develop a Customer
Success Dashboard from digital presence & local
econometrics in the city/neighborhood where business is
situated .
• Customer Success Dashboard will have all the analytics of
the website done in real-time and batch according to the use
case.
• Batch Processing: We are using GD data crawler as primary
source of the data third party social & business API’s. Use
case: Customers Online Presence Index as Business
Score.
• Real-time Processing: Scrape the content of the website
and plug DNS address for various internal tools to create
features in real-time. Use case: What you should add in
your website to make small business successful.
SEO
site content &
structure
ECOM
small business
websites
SOCIAL
social media
coverage
ECON
local economy
index
WEB
website
perfection
HOSTING
faster traffic for
your contents
ECOM
small business
websites
INDUSTRY
industry profiles
by objectives
Copyright© 2017GoDaddyInc. AllRights Reserved.
Industry Segments/Profiles
10
Using Customer’s website content, develop a Customer
Success Dashboard from digital presence & local
econometrics in the city/neighborhood where business is
situated .
• Customer Success Dashboard will have all the analytics of
the website done in real-time and batch according to the
use case.
• Batch Processing: We are using GD data crawler as
primary source of the data third party social & business
API’s. Use case: Customers Online Presence Index as
Business Score.
• Real-time Processing: Scrape the content of the website
and plug DNS address for various internal tools to create
features in real-time. Use case: What you should add in
your website to make small business successful.
TLDs
nlp on TLDs
keywords
PARTNERSHIPs
sic industry
codes
SURVEY
self
identification
CONTENT
naics
classification
Copyright© 2017GoDaddyInc. AllRights Reserved.
TLD’s Verticalization
11
Business Need
• Know the domains and purpose of the domain in detail by natural language processing (NLP).
• Are they personal/blog/portfolio or are they business/LLC/consultancy.
• Classify domains into 6 classes identified by modelers and marketers.
• Enable domain specific solution to enable hidden business.
Model
• We have used internal tokenizer to tokenize domains
• These tokens feed to feature generation model which detect language, human names, & geography.
• Machine Learning model is trained on the features and keywords to classify the tld’s into 6 classes.
• In 2nd
iteration, business keyword rule model is designed to overwrite some misclassified data.
Copyright© 2017GoDaddyInc. AllRights Reserved.
TLD’s Verticalization
12
containsenglish domainname hasname isallenglish keywords language pos wordcount
TRUE kidsmedicinestickers.com FALSE TRUE kids medicine stickers english N|V-N|V-N 3
TRUE actionwindowtint.com FALSE TRUE action window tint english N|V-N-N|V 3
TRUE aubihardware.com TRUE FALSE aubi hardware english N-N 2
TRUE michiganfellowship.com TRUE TRUE michigan fellowship english N-N 2
FALSE devinlamontagne.com TRUE FALSE devin lamontagne unknown V-N 2
We use Python internal tokenize and NLTK to generate the features.
Above features and keywords have used to classify domains in 6 classes based on the topics modeling some
of the classes are overwritten by business rules.
domainname Segment
kidsmedicinestickers.com Business
actionwindowtint.com Business
aubihardware.com Business
michiganfellowship.com Government
devinlamontagne.com Personal
Copyright© 2017GoDaddyInc. AllRights Reserved.
Industry Verticalization
13
Business Need
• Identify the North American Industry Classification System (NAICS) based industry verticals for the
GoDaddy domains (64M) based on the content analysis and domain names.
• Rank each domain into industry vertical based on probabilities.
• Enable business specific marketing solution.
NAICS
• A numerical, hierarchical classification system that groups all business establishments into industries based
on production process.
• The foundation on which economic census data is collected, tabulated, analyzed, and disseminated.
• A common language used among all North America.
Copyright© 2017GoDaddyInc. AllRights Reserved.
LDA: Latent Dirichlet Allocation
14
• We have developed training data using business keywords, NAICS description and keywords from the US
Census data.
• We used a technique called Latent Dirichlet Allocation to classify the fingerprint of the “unknown” domain to
the lineup of known NAICS Sector fingerprints.
• LDAis a generative statistical model that allows sets of observations in a linguistic corpus to be explained by unobserved groups.
• Among these algorithms, Latent DirichletAllocation (LDA), technique based in Bayesian Modeling, is the most commonly used
techniques for nowadays.
• Assumptions on all variables:
• Word: the basic unit of discrete data
• Document: a collection of words (exchangeability assumption)
• Corpus: a collection of documents
• Topic (hidden): a distribution over words & the number of topics 𝑲 is known.
LDA
Copyright© 2017GoDaddyInc. AllRights Reserved.
	𝛼
		𝜃%
		𝑍%,(
		𝑊%,(
		𝛽+
		𝜂
𝑁
𝐷
𝑘
Dirichlet parameter
Per document topic
proportion
Per word topic
assignment
Observed word
Topics
Topics
hyperparameter
For each topic 𝑘,draw a multinomial over words 𝛽+~𝐷𝑖𝑟 𝜂
For each document 𝑑,
• Draw a document topic proportion 𝜽%~𝐷𝑖𝑟 𝛼
• For each word 𝑤%,(:
• Draw a topic 𝑧%,(~𝑀𝑢𝑙𝑡𝑖 𝜽%
• Draw a word 𝑤%,(~𝑀𝑢𝑙𝑡𝑖(𝛽<=,>
)
Mathematical Representation of LDA
Copyright© 2017GoDaddyInc. AllRights Reserved.
Modeling Approach
16
Input
testing
Spark
DataFrame
ML Model
Input
training
Spark
DataFrame
Preprocessing
Preprocessing
Feature Selection &
Transformation
Feature Selection &
Transformation
Score
Copyright© 2017GoDaddyInc. AllRights Reserved.17
{NAICS} hdfs://html
Tokenize NAICS
Keywords
html2text
to tokenize
Industries to topics Websites to topics
Websites to Industries
toAnalytics
LDA
Input
Preprocessing
& NLP
Machine Learning
Output
Analytics
Industry Verticalization
Copyright© 2017GoDaddyInc. AllRights Reserved.
Sample Results
18
Industry Vertical Sample Score for GoDaddy.com
{‘godaddy.com’:[0.0,0.037261519371214455, 0.12163335800668636, 0.12212609505637625,0.0, 0.0, 0.0391883272323731, 0.006822915544477993,
0.07649356215522182, 0.03418770216006102, 0.08399637531579494, 0.0, 0.044632956375623346,0.08008591676796364, 0.23397736516285688,
0.023100845006343924, 0.05476212126480185, 0.045210897428489265, 0.01195930071056271,0.05362539333191509, 0.030935349109237273,
0.03476925258258562, 0.028247390216576837, 0.04214249732233799,0.11761814220985195, 0.0, 0.03485210250149108, 0.0, 0.036624106810977664,
0.023490911619049086,0.08161086708509405, 0.09972968094058156, 0.024987767078109006,0.01669645821975547, 0.027306928087764512, 0.0, 0.0,
0.59144987630468255, 0.01714920401645141, 0.057044712502075504,0.33934270898168075,0.026937393520935108]
Top three Industryverticals with probabilities
Row(domainname=u'godaddy.com, 541=u' 0.591449876', 551=u' 0.339342709', 518=u’0.233977365')
518 - Data processing, hosting, and related services
541- Professional, scientific,and technical services
551- Management of Companies and Enterprise
Copyright© 2017GoDaddyInc. AllRights Reserved.
WEB
19
• For Small business's website, it's vital
to position business online with a
strong, professional destination that
gives customers the impression you
mean business and the motivation to
want to engage more with your
business.
• We have created the process to
analyze the website based on the What
makes the Small Business Website
perfect. This helps visitor connect and
checkout the importantsection of the
website easily from website and
mobile.
WEBSITE
website
completeness
DESIGN
modern UI/UX
features templets
METEDATA
Analyze metadata
completeness.
CODING
coding style &
text to code ratio.
Copyright© 2017GoDaddyInc. AllRights Reserved.
ECOM
20
• For online small ecommerce business,
it's vital to make the secure and
professional website to sale products
online.
• Based on prominentecommerce
websites, we have created the
important componentfor successful
ecommerce business and checked
these components on the websites.
• We have created the process to
analyze the website for the ecommerce
components (shopping cart, payment
solution,SSL, shipping,etc.) .
SECURTY
web security &
SSL encryption
CONATCT
contact, chat,
email, maps etc.
ANALYTICS
analytics plugins
& campaign.
CART
online shopping &
payment solution
Copyright© 2017GoDaddyInc. AllRights Reserved.
HOSTING
21
• Most small businesses end up making
a wrong choice in the beginning
because they do not properly evaluate
their needs.
• We have created the process to
analyze the hosting based on the
content, SEO and website content size,
speed and hosting information
A-RECORDS
hosting
information
SPEED&COUNTS
download speed &
dns count month
BANDWITH
self
identification
CONTENT
type & size of the
content
Copyright© 2017GoDaddyInc. AllRights Reserved.
SEO
22
• Small business’s definitely needs to
have an SEO strategy in place if you
are interested in succeeding in terms of
online marketing.
• investing in organic SEO is more
important for small businesses now
than ever before.
• We have created the process just like
SEO experts to analyze the keywords
of website’s meta title, meta
description,h1, h2, and image alt tag.
This helps search engines connectwith
website with the keyword, making web
page more relevant.
METADATA
metadata
information
LINKS
analyze all urls &
links on website
ROBOTS
check robots.txt
& sitemap
WEB
web elements &
keywords
Copyright© 2017GoDaddyInc. AllRights Reserved.
ECON
23
• Small business’s growth is totally
depends on the local economy and
purchasing power of the town.
• Local economy has big impacton the
Small Businesses and vice a versa.
• We have used online open dataset like
tax, income,housing prices to create
the ECON index.
INCOME
average income of
the family
CITY DYNAMICS
population,
growth, tourism
HOUSING
housing price
index
TAX
local small
business tax
Copyright© 2017GoDaddyInc. AllRights Reserved.
SOCIAL
24
• For Small business's there is need
attract customers to local shops or
online ecommerce business with a
strong online presence.
• Traffic to the website or local store can
be driven through the large social
media efforts and listing in review
websites like yelp,foursquare etc.
• We are using third party api, data
providers and listings to aggregate the
score as the index of popularity of
business.
SOCIAL
social media for
small business.
LINKS
Analyze all urls &
links on website
LISTINGS
local business
listing (e.g. yelp)
MEDIA
Web elements &
keywords
Copyright© 2017GoDaddyInc. AllRights Reserved.
GoDaddy Business Score
• GoDaddy Small Business Index (score) is a digital number derived from a mathematical interpretation
through complex machine learning process on the digital information derived from the complex data.
• Our Score is a measure of how well website is presented, secured, socialized and how well website is doing
relative to other sites in same industry segment.
• We use the Bayesian Network to select the features and Gaussian constants from the feature database.
Copyright© 2017GoDaddyInc. AllRights Reserved.
Web Features Relevancy by Industry Segments
Bayesian network diagram shows the simple
relationship between the different components (data) of
the Customer Success Dashboard.
For example one can see that SEO, industry, social
(Facebook, twitter, etc.), mobile optimized and site
completeness all have casual effects on the likelihood of
the website traffic as expected. The relationship between
the industry and traffic also becomes clearer.
Copyright© 2017GoDaddyInc. AllRights Reserved.
Business Score
15%
WEB
10%
Social
21%
ECOM
16%
HOSTING
18%
ECON
Business
Score
is to based on the Credit Scoring System.
Each Component has given weight based
on the importance to the Small Business
27
20%
SEO
Copyright© 2017GoDaddyInc. AllRights Reserved.
GoDaddy Small Business Index
GoDaddy Business Score/Index is the similar to the
credit scoring system for digital presence.
Each of digital component has given weight.
We analyze and rescale the score based on the features of
the website with top 100 domains in the each industry
segment.
GD-Business
Score Model
based on Gaussian
formula
Get the top performing domains
in each Vertical based on the
Bayesian Network of Feature
Data
FeatureDB
Feature (120) database
created from Spark ETL
data pipelines
{‘domainname’: ‘godaddy.com’,
‘518’: ’98.8’,
‘541’: ’93.2’,
‘551’: ’97.3’}
Copyright© 2017GoDaddyInc. AllRights Reserved.
Marketing Use Cases
29
Copyright© 2016GoDaddyInc. AllRights Reserved.
Mobile Friendly
Websites
30
• Let's make sure GoDaddy
Customer's website shows
up in search results.
• Detect the whether
website is configured for
multiple devices & help
search engines understand
Small Business website to
show in mobile device
based search.
• We have designed the
algorithm to detect the
responsive web design
which is recommended
design pattern by major
search engines.
• Algorithm also detect
mobile optimized
websites.
• Algorithm is validated
with Google’s Mobile
Friendly tool with 0.2%
sample which reported
94% accuracy.
• We send notification to the
shopper and advice them
based on the outcome of the
mobile friendliness of the
websites.
• GoDaddy also help
shoppers to make websites
mobile friendly on all
devices.
Increase Mobile
device based SEO
Text Analytics on Raw
HTML data
Help to Small
Business.
MISSION MODEL REVENUE
Copyright© 2016GoDaddyInc. AllRights Reserved.
Cart + Payment
- SSL
31
• Let's make sure GoDaddy
Customer's online small
businesses are safely
accept the credit cards.
• Detect the whether
website has shopping cart,
payment solution and
SSL.
• We have designed the
algorithm to detect the
SSL, it’s type, shopping
cart and payment solution
provider.
• Algorithm also SSLdetect
encryption type.
• We send advice shoppers
based on the outcome of the
features detected from the
websites.
• GoDaddy also help
shoppers to make websites
more secure and shopper
friendly to accept online
transactions.
Secure the
transactions
Text Analytics on Raw
HTML data
Help to Ecommerce
Small Business.
MISSION MODEL REVENUE
Copyright© 2017GoDaddyInc. AllRights Reserved.
GoDaddy’sBusiness Score goal is to help
established small businesses get to the next level
by increasing online presence.
32
GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble

More Related Content

PDF
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
PDF
Databricks delta
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PDF
Maximize the Value of Your Data: Neo4j Graph Data Platform
PDF
Organising the Data Lake - Information Management in a Big Data World
PDF
Why Data Virtualization Matters in Your Portfolio
PDF
Three Dimensions of Data as a Service
PDF
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
Databricks delta
Data Lakehouse Symposium | Day 1 | Part 1
Maximize the Value of Your Data: Neo4j Graph Data Platform
Organising the Data Lake - Information Management in a Big Data World
Why Data Virtualization Matters in Your Portfolio
Three Dimensions of Data as a Service
Demystifying Data Virtualization: Why it’s Now Critical for Your Data Strategy

What's hot (14)

PDF
Introduction to Neo4j
PDF
A Key to Real-time Insights in a Post-COVID World (ASEAN)
PPTX
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
PDF
Smart data for a predictive bank
PDF
Best Practices in the Cloud for Data Management (US)
PDF
[XConf Brasil 2020] Data mesh
PDF
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
PDF
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Idera live 2021: Why Data Lakes are Critical for AI, ML, and IoT By Brian Flug
PDF
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
PDF
Transforming GE Healthcare with Data Platform Strategy
PPTX
Big Data Application Architectures - IoT
PPTX
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Introduction to Neo4j
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Smart data for a predictive bank
Best Practices in the Cloud for Data Management (US)
[XConf Brasil 2020] Data mesh
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Advanced Analytics and Machine Learning with Data Virtualization
Idera live 2021: Why Data Lakes are Critical for AI, ML, and IoT By Brian Flug
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Transforming GE Healthcare with Data Platform Strategy
Big Data Application Architectures - IoT
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Ad

Similar to GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble (20)

PDF
Personalization Strategies Leveraging a Data Management Platform - with Bank ...
PDF
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
PPTX
WPEngine Summit 2019
PDF
[Notes] Customer 360 Analytics with LEO CDP
PDF
Hadoop’s Impact on Recruit Company
PDF
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
PPTX
Allan Cook (Deloitte Consulting): How Do I Sell a “First” VR/AR Project to My...
PPTX
Enterprise Cloud Adoption
PDF
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
PDF
Achieving Business Value by Fusing Hadoop and Corporate Data
DOCX
Rajasekhar_resume
PDF
An Innovative Big-Data Web Scraping Tech Company
PPT
Making Hadoop Ready for the Enterprise
PDF
Run your in-house AI chatbot on an AMD EPYC 9534 processor-powered Dell Power...
PDF
WEBINAR – DAM 2020 Report & Analysis along side the user perspective
PDF
An Innovative Big-Data Web Scraping Tech Company
PDF
Microsoft Dynamics 365 xRM4Legal xRM4Accounting Technical Overview
PPTX
Operational Analytics Using Spark and NoSQL Data Stores
PDF
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
PDF
Digital Reinvention by NRB
Personalization Strategies Leveraging a Data Management Platform - with Bank ...
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
WPEngine Summit 2019
[Notes] Customer 360 Analytics with LEO CDP
Hadoop’s Impact on Recruit Company
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Allan Cook (Deloitte Consulting): How Do I Sell a “First” VR/AR Project to My...
Enterprise Cloud Adoption
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
Achieving Business Value by Fusing Hadoop and Corporate Data
Rajasekhar_resume
An Innovative Big-Data Web Scraping Tech Company
Making Hadoop Ready for the Enterprise
Run your in-house AI chatbot on an AMD EPYC 9534 processor-powered Dell Power...
WEBINAR – DAM 2020 Report & Analysis along side the user perspective
An Innovative Big-Data Web Scraping Tech Company
Microsoft Dynamics 365 xRM4Legal xRM4Accounting Technical Overview
Operational Analytics Using Spark and NoSQL Data Stores
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Digital Reinvention by NRB
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
PDF
Machine Learning CI/CD for Email Attack Detection
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Machine Learning CI/CD for Email Attack Detection

Recently uploaded (20)

PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Global journeys: estimating international migration
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Mega Projects Data Mega Projects Data
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Global journeys: estimating international migration
1_Introduction to advance data techniques.pptx
Introduction-to-Cloud-ComputingFinal.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Moving the Public Sector (Government) to a Digital Adoption
Foundation of Data Science unit number two notes
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
IBA_Chapter_11_Slides_Final_Accessible.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Mega Projects Data Mega Projects Data
IB Computer Science - Internal Assessment.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx

GoDaddy Customer Success Dashboard Using Apache Spark with Baburao Kamble

  • 1. Copyright© 2017GoDaddyInc. AllRights Reserved. Customer Success Dashboard GoDaddy Business Score Baburao Kamble; Ryan Kleck; Robert Brown
  • 2. Copyright© 2017GoDaddyInc. AllRights Reserved. Agenda 2 GoDaddy’s Mission for Small Business Success Customer Success Dashboard Machine Learning on 64M Websites Content using Spark 2.1 Business Score Marketing Use Case
  • 3. See the Internet like never before. hello. Everything you need to know to get your business online.
  • 7. Copyright© 2017GoDaddyInc. AllRights Reserved. Customer Success Dashboard Develop the Content Analytics Platform (online and offline) for the GoDaddy’s Customers to Understand Digital Presence of the Small Business in a particular industry segment.
  • 8. Copyright© 2017GoDaddyInc. AllRights Reserved. HTML Machine Logs Security Logs Internal Data Open Source Data TE L Communication Score Engine Industry Verticals Traffic Forecast (global,local) EconomicFootprints Business Score GD SBI Go Initiative Marketing Ad hoc Analytics User Interface FeatureDB Customer Success Dashboard
  • 9. Copyright© 2017GoDaddyInc. AllRights Reserved. Components 9 Using Customer’s website content, develop a Customer Success Dashboard from digital presence & local econometrics in the city/neighborhood where business is situated . • Customer Success Dashboard will have all the analytics of the website done in real-time and batch according to the use case. • Batch Processing: We are using GD data crawler as primary source of the data third party social & business API’s. Use case: Customers Online Presence Index as Business Score. • Real-time Processing: Scrape the content of the website and plug DNS address for various internal tools to create features in real-time. Use case: What you should add in your website to make small business successful. SEO site content & structure ECOM small business websites SOCIAL social media coverage ECON local economy index WEB website perfection HOSTING faster traffic for your contents ECOM small business websites INDUSTRY industry profiles by objectives
  • 10. Copyright© 2017GoDaddyInc. AllRights Reserved. Industry Segments/Profiles 10 Using Customer’s website content, develop a Customer Success Dashboard from digital presence & local econometrics in the city/neighborhood where business is situated . • Customer Success Dashboard will have all the analytics of the website done in real-time and batch according to the use case. • Batch Processing: We are using GD data crawler as primary source of the data third party social & business API’s. Use case: Customers Online Presence Index as Business Score. • Real-time Processing: Scrape the content of the website and plug DNS address for various internal tools to create features in real-time. Use case: What you should add in your website to make small business successful. TLDs nlp on TLDs keywords PARTNERSHIPs sic industry codes SURVEY self identification CONTENT naics classification
  • 11. Copyright© 2017GoDaddyInc. AllRights Reserved. TLD’s Verticalization 11 Business Need • Know the domains and purpose of the domain in detail by natural language processing (NLP). • Are they personal/blog/portfolio or are they business/LLC/consultancy. • Classify domains into 6 classes identified by modelers and marketers. • Enable domain specific solution to enable hidden business. Model • We have used internal tokenizer to tokenize domains • These tokens feed to feature generation model which detect language, human names, & geography. • Machine Learning model is trained on the features and keywords to classify the tld’s into 6 classes. • In 2nd iteration, business keyword rule model is designed to overwrite some misclassified data.
  • 12. Copyright© 2017GoDaddyInc. AllRights Reserved. TLD’s Verticalization 12 containsenglish domainname hasname isallenglish keywords language pos wordcount TRUE kidsmedicinestickers.com FALSE TRUE kids medicine stickers english N|V-N|V-N 3 TRUE actionwindowtint.com FALSE TRUE action window tint english N|V-N-N|V 3 TRUE aubihardware.com TRUE FALSE aubi hardware english N-N 2 TRUE michiganfellowship.com TRUE TRUE michigan fellowship english N-N 2 FALSE devinlamontagne.com TRUE FALSE devin lamontagne unknown V-N 2 We use Python internal tokenize and NLTK to generate the features. Above features and keywords have used to classify domains in 6 classes based on the topics modeling some of the classes are overwritten by business rules. domainname Segment kidsmedicinestickers.com Business actionwindowtint.com Business aubihardware.com Business michiganfellowship.com Government devinlamontagne.com Personal
  • 13. Copyright© 2017GoDaddyInc. AllRights Reserved. Industry Verticalization 13 Business Need • Identify the North American Industry Classification System (NAICS) based industry verticals for the GoDaddy domains (64M) based on the content analysis and domain names. • Rank each domain into industry vertical based on probabilities. • Enable business specific marketing solution. NAICS • A numerical, hierarchical classification system that groups all business establishments into industries based on production process. • The foundation on which economic census data is collected, tabulated, analyzed, and disseminated. • A common language used among all North America.
  • 14. Copyright© 2017GoDaddyInc. AllRights Reserved. LDA: Latent Dirichlet Allocation 14 • We have developed training data using business keywords, NAICS description and keywords from the US Census data. • We used a technique called Latent Dirichlet Allocation to classify the fingerprint of the “unknown” domain to the lineup of known NAICS Sector fingerprints. • LDAis a generative statistical model that allows sets of observations in a linguistic corpus to be explained by unobserved groups. • Among these algorithms, Latent DirichletAllocation (LDA), technique based in Bayesian Modeling, is the most commonly used techniques for nowadays. • Assumptions on all variables: • Word: the basic unit of discrete data • Document: a collection of words (exchangeability assumption) • Corpus: a collection of documents • Topic (hidden): a distribution over words & the number of topics 𝑲 is known. LDA
  • 15. Copyright© 2017GoDaddyInc. AllRights Reserved. 𝛼 𝜃% 𝑍%,( 𝑊%,( 𝛽+ 𝜂 𝑁 𝐷 𝑘 Dirichlet parameter Per document topic proportion Per word topic assignment Observed word Topics Topics hyperparameter For each topic 𝑘,draw a multinomial over words 𝛽+~𝐷𝑖𝑟 𝜂 For each document 𝑑, • Draw a document topic proportion 𝜽%~𝐷𝑖𝑟 𝛼 • For each word 𝑤%,(: • Draw a topic 𝑧%,(~𝑀𝑢𝑙𝑡𝑖 𝜽% • Draw a word 𝑤%,(~𝑀𝑢𝑙𝑡𝑖(𝛽<=,> ) Mathematical Representation of LDA
  • 16. Copyright© 2017GoDaddyInc. AllRights Reserved. Modeling Approach 16 Input testing Spark DataFrame ML Model Input training Spark DataFrame Preprocessing Preprocessing Feature Selection & Transformation Feature Selection & Transformation Score
  • 17. Copyright© 2017GoDaddyInc. AllRights Reserved.17 {NAICS} hdfs://html Tokenize NAICS Keywords html2text to tokenize Industries to topics Websites to topics Websites to Industries toAnalytics LDA Input Preprocessing & NLP Machine Learning Output Analytics Industry Verticalization
  • 18. Copyright© 2017GoDaddyInc. AllRights Reserved. Sample Results 18 Industry Vertical Sample Score for GoDaddy.com {‘godaddy.com’:[0.0,0.037261519371214455, 0.12163335800668636, 0.12212609505637625,0.0, 0.0, 0.0391883272323731, 0.006822915544477993, 0.07649356215522182, 0.03418770216006102, 0.08399637531579494, 0.0, 0.044632956375623346,0.08008591676796364, 0.23397736516285688, 0.023100845006343924, 0.05476212126480185, 0.045210897428489265, 0.01195930071056271,0.05362539333191509, 0.030935349109237273, 0.03476925258258562, 0.028247390216576837, 0.04214249732233799,0.11761814220985195, 0.0, 0.03485210250149108, 0.0, 0.036624106810977664, 0.023490911619049086,0.08161086708509405, 0.09972968094058156, 0.024987767078109006,0.01669645821975547, 0.027306928087764512, 0.0, 0.0, 0.59144987630468255, 0.01714920401645141, 0.057044712502075504,0.33934270898168075,0.026937393520935108] Top three Industryverticals with probabilities Row(domainname=u'godaddy.com, 541=u' 0.591449876', 551=u' 0.339342709', 518=u’0.233977365') 518 - Data processing, hosting, and related services 541- Professional, scientific,and technical services 551- Management of Companies and Enterprise
  • 19. Copyright© 2017GoDaddyInc. AllRights Reserved. WEB 19 • For Small business's website, it's vital to position business online with a strong, professional destination that gives customers the impression you mean business and the motivation to want to engage more with your business. • We have created the process to analyze the website based on the What makes the Small Business Website perfect. This helps visitor connect and checkout the importantsection of the website easily from website and mobile. WEBSITE website completeness DESIGN modern UI/UX features templets METEDATA Analyze metadata completeness. CODING coding style & text to code ratio.
  • 20. Copyright© 2017GoDaddyInc. AllRights Reserved. ECOM 20 • For online small ecommerce business, it's vital to make the secure and professional website to sale products online. • Based on prominentecommerce websites, we have created the important componentfor successful ecommerce business and checked these components on the websites. • We have created the process to analyze the website for the ecommerce components (shopping cart, payment solution,SSL, shipping,etc.) . SECURTY web security & SSL encryption CONATCT contact, chat, email, maps etc. ANALYTICS analytics plugins & campaign. CART online shopping & payment solution
  • 21. Copyright© 2017GoDaddyInc. AllRights Reserved. HOSTING 21 • Most small businesses end up making a wrong choice in the beginning because they do not properly evaluate their needs. • We have created the process to analyze the hosting based on the content, SEO and website content size, speed and hosting information A-RECORDS hosting information SPEED&COUNTS download speed & dns count month BANDWITH self identification CONTENT type & size of the content
  • 22. Copyright© 2017GoDaddyInc. AllRights Reserved. SEO 22 • Small business’s definitely needs to have an SEO strategy in place if you are interested in succeeding in terms of online marketing. • investing in organic SEO is more important for small businesses now than ever before. • We have created the process just like SEO experts to analyze the keywords of website’s meta title, meta description,h1, h2, and image alt tag. This helps search engines connectwith website with the keyword, making web page more relevant. METADATA metadata information LINKS analyze all urls & links on website ROBOTS check robots.txt & sitemap WEB web elements & keywords
  • 23. Copyright© 2017GoDaddyInc. AllRights Reserved. ECON 23 • Small business’s growth is totally depends on the local economy and purchasing power of the town. • Local economy has big impacton the Small Businesses and vice a versa. • We have used online open dataset like tax, income,housing prices to create the ECON index. INCOME average income of the family CITY DYNAMICS population, growth, tourism HOUSING housing price index TAX local small business tax
  • 24. Copyright© 2017GoDaddyInc. AllRights Reserved. SOCIAL 24 • For Small business's there is need attract customers to local shops or online ecommerce business with a strong online presence. • Traffic to the website or local store can be driven through the large social media efforts and listing in review websites like yelp,foursquare etc. • We are using third party api, data providers and listings to aggregate the score as the index of popularity of business. SOCIAL social media for small business. LINKS Analyze all urls & links on website LISTINGS local business listing (e.g. yelp) MEDIA Web elements & keywords
  • 25. Copyright© 2017GoDaddyInc. AllRights Reserved. GoDaddy Business Score • GoDaddy Small Business Index (score) is a digital number derived from a mathematical interpretation through complex machine learning process on the digital information derived from the complex data. • Our Score is a measure of how well website is presented, secured, socialized and how well website is doing relative to other sites in same industry segment. • We use the Bayesian Network to select the features and Gaussian constants from the feature database.
  • 26. Copyright© 2017GoDaddyInc. AllRights Reserved. Web Features Relevancy by Industry Segments Bayesian network diagram shows the simple relationship between the different components (data) of the Customer Success Dashboard. For example one can see that SEO, industry, social (Facebook, twitter, etc.), mobile optimized and site completeness all have casual effects on the likelihood of the website traffic as expected. The relationship between the industry and traffic also becomes clearer.
  • 27. Copyright© 2017GoDaddyInc. AllRights Reserved. Business Score 15% WEB 10% Social 21% ECOM 16% HOSTING 18% ECON Business Score is to based on the Credit Scoring System. Each Component has given weight based on the importance to the Small Business 27 20% SEO
  • 28. Copyright© 2017GoDaddyInc. AllRights Reserved. GoDaddy Small Business Index GoDaddy Business Score/Index is the similar to the credit scoring system for digital presence. Each of digital component has given weight. We analyze and rescale the score based on the features of the website with top 100 domains in the each industry segment. GD-Business Score Model based on Gaussian formula Get the top performing domains in each Vertical based on the Bayesian Network of Feature Data FeatureDB Feature (120) database created from Spark ETL data pipelines {‘domainname’: ‘godaddy.com’, ‘518’: ’98.8’, ‘541’: ’93.2’, ‘551’: ’97.3’}
  • 29. Copyright© 2017GoDaddyInc. AllRights Reserved. Marketing Use Cases 29
  • 30. Copyright© 2016GoDaddyInc. AllRights Reserved. Mobile Friendly Websites 30 • Let's make sure GoDaddy Customer's website shows up in search results. • Detect the whether website is configured for multiple devices & help search engines understand Small Business website to show in mobile device based search. • We have designed the algorithm to detect the responsive web design which is recommended design pattern by major search engines. • Algorithm also detect mobile optimized websites. • Algorithm is validated with Google’s Mobile Friendly tool with 0.2% sample which reported 94% accuracy. • We send notification to the shopper and advice them based on the outcome of the mobile friendliness of the websites. • GoDaddy also help shoppers to make websites mobile friendly on all devices. Increase Mobile device based SEO Text Analytics on Raw HTML data Help to Small Business. MISSION MODEL REVENUE
  • 31. Copyright© 2016GoDaddyInc. AllRights Reserved. Cart + Payment - SSL 31 • Let's make sure GoDaddy Customer's online small businesses are safely accept the credit cards. • Detect the whether website has shopping cart, payment solution and SSL. • We have designed the algorithm to detect the SSL, it’s type, shopping cart and payment solution provider. • Algorithm also SSLdetect encryption type. • We send advice shoppers based on the outcome of the features detected from the websites. • GoDaddy also help shoppers to make websites more secure and shopper friendly to accept online transactions. Secure the transactions Text Analytics on Raw HTML data Help to Ecommerce Small Business. MISSION MODEL REVENUE
  • 32. Copyright© 2017GoDaddyInc. AllRights Reserved. GoDaddy’sBusiness Score goal is to help established small businesses get to the next level by increasing online presence. 32