SlideShare a Scribd company logo
Proprietary and confidential
Making Sense of Applying ML to APIs
Anant Jhingran, CTO, Apigee (Google)
Sridhar Rajagopalan, Chief Architect, Data, Apigee (Google)
Proprietary and confidentialProprietary and confidential
2
Proprietary and confidential
APIs are everywhere
Proprietary and confidentialProprietary and confidential
Check social media = API
Proprietary and confidentialProprietary and confidential
Flight check in = API
Proprietary and confidentialProprietary and confidential
Online purchase = API
Proprietary and confidentialProprietary and confidential
Proprietary and confidentialProprietary and confidential
7
Proprietary and confidential
Apigee Full Lifecycle API Management Platform
Deliver connected experiences for customers, employees and partners
/idm/account
/erp/pricing/id
/cms/images/item
/inventory/itemIDAPIs
/catalog
/payment
/customer
Internet of Things
Legacy Apps
Cloud Apps
Employee Apps
Consumer Apps
Partner Apps
API Management Platform
ExposureConsumption
Proprietary and confidential
RETAIL
.34
GOVERNMENT
TRAVEL & HOSPITALITY
OTHER
FINANCIAL SERVICES
MEDIA & ENTERTAINMENT
TECHNOLOGY
MANUFACTURING
HEALTHCARE
TELCO
EDUCATION
700+ Customers, > 1T calls/year
Proprietary and confidentialProprietary and confidential
9
Proprietary and confidential
A Sample Response from Google Places API
Proprietary and confidentialProprietary and confidential
10
X 1,000,000,000,000
Proprietary and confidential
With so much data,
what can we do?
Proprietary and confidentialProprietary and confidential
12
Visibility for API Teams
Custom Reports
Business and Operational
Analytics
Out of the Box
Dashboards
Proprietary and confidentialProprietary and confidential
Proprietary and confidentialProprietary and confidential
Predict if an API
call is from a “bot”
Proprietary and confidentialProprietary and confidential
Predict time to
resolution of a
support ticket
We are seeing 500n server side errors impacting customers for second
time today
{"organization":"promisepay","environment":"prod","apiProduct":null,"pro
xyName":"prelive_rev9_2016_10_04","appName":null,"verb":"GET","url":
"https://secure.api.promisepay.com/status","responseCode":500,"respon
seReason":"Internal Server
Error","clientLatency":9,"targetLatency":0,"totalLatency":9,"remote_ip":"1
92.168.61.56","correlation_id":"rrt-089f6aa9f9c90adaa-b-wo-30508-
63404252-
1","request_host":"secure.api.promisepay.com","platform_name":"unkno
wn","platform_quota_volume":"null/null","endpoint_quota_volume":"null/n
ull","platform_quota_period":"nullmin","platform_spike_volume":"null/null"
,"endpoint_spike_volume":"null/null","asm_quota_req_quota_platform_k
ey":"unknown-status","asm_quota_all_quota_default_key":"default-
all","asm_quota_all_quota_platform_key":"unknown-all"} Start at 15:36
ESDT
Proprietary and confidentialProprietary and confidential
Predict correctly
that this rollout is
troublesome
Proprietary and confidentialProprietary and confidential
17
In other words, we will only focus on
“predictions”
Proprietary and confidentialProprietary and confidential
18
Proprietary and confidential
“Wow! ML”
“BI level Analysis”
“Data Prep”
Don’t Predictions need ML?
“Sometimes
needed”
“Always Needed, and often
enough for predictions”
“Always Needed”
ML at the top might or might not be needed, but ML helping
at the bottom seems really promising
Proprietary and confidential
Lesson #1:
Dataprep is always needed
Proprietary and confidentialProprietary and confidential
20
Proprietary and confidential
Garbage In, Garbage Out...
http_pattern = r'(https|http)?://(w|.|/|?|=|&|%|-|#)*b'
html_pattern = '<[^<]+?>'
injection_pattern = r"">"
json_pattern = '{.*}+'
email_pattern = r'[w.-]+@[w.-]+(.[w]+)+'
quote_pattern = r'["']'
nonalpha_pattern = "[^A-Za-z0-9.? ]+"
patterns = [http_pattern, html_pattern, injection_pattern, json_pattern, email_pattern,
quote_pattern, nonalpha_pattern]
pattern = "|".join(patterns)
def substitute (text):
text = re.sub("[Cc]an't", "cannot", text)
text = re.sub("I'm", "I am", text)
text = re.sub(" +", " ", text)
return text
def cleanup(text):
text = re.sub(pattern, ' ', text)
text = substitute (text)
return text
Proprietary and confidentialProprietary and confidential
21
A good area of research is applying
ML to the data prep problem
Proprietary and confidential
Lesson #2:
Databases are not likely to be
eliminated any time soon.
Aka: select count(*) goes a loooonng way
Proprietary and confidentialProprietary and confidential
Bot Detection Rules
Proprietary and confidentialProprietary and confidential
24
Many of these are expressible as
SQL – deep learning might give
marginal better results, BUT…
Proprietary and confidentialProprietary and confidential
25
Customers always want
explainability
Proprietary and confidentialProprietary and confidential
26
A good area of research is applying ML
to Explainability←→ Predictability
Tradeoff in SQLs
Proprietary and confidential
Lesson #3:
Humans are not likely to be
eliminated any time soon
Proprietary and confidentialProprietary and confidential
28
Proprietary and confidential
Public
API calls
Distinguishing QA from bot traffic...
API
QA
Proprietary and confidentialProprietary and confidential
29
A good area of research is
transfer learning from a huge
number of QA tasks
Proprietary and confidential
Lesson #4:
ML replaces some human tasks,
not select count(*)
Proprietary and confidentialProprietary and confidential
31
Proprietary and confidential
Visually detecting Anomalies does not work at scale
Proprietary and confidentialProprietary and confidential
32
A good area of research is ML
for anomaly detection
Proprietary and confidential
Lesson #5:
Without good labeled data,
you get nowhere
Proprietary and confidentialProprietary and confidential
[glacier]
Google Photos
34
Proprietary and confidentialProprietary and confidential
Proprietary and confidentialProprietary and confidential
36
But labeling improves precision,
not recall
Proprietary and confidentialProprietary and confidential
37
A good area of research is ML for
giving labeling tasks to humans
Proprietary and confidential
Lesson #6:
Without enough data, you
get nowhere
Proprietary and confidentialProprietary and confidential
39
Proprietary and confidential
APIs
Traffic
Have enough data for ML
Not enough data to do
anything, but we could
pool similar APIs together
to get enough data..
Is the new deployment of API or platform bad?
Proprietary and confidentialProprietary and confidential
40
A good area of research is learning
about anomalies by grouping a
number of similar small data sets.
Proprietary and confidential
Lesson #7:
Data litter will always
get you.
Proprietary and confidentialProprietary and confidential
42
1. Latency (minutes) at which 99% of data is ingested into the warehouse. Let’s say we have an internal
goal of X minutes.
2. Regardless of when you run an analysis, you will not get 100% of the data. So you have to accept that
any timely analysis will have incomplete data.
3. Tradeoff between timeliness of analysis, completeness of data, and cost of collection is fundamental.
Proprietary and confidentialProprietary and confidential
43
A good area of research is to
be able to distinguish whether
missing data is random noise
or systematic drops.
Proprietary and confidential
Of course, “Wow! ML” is also
needed for end user problems
Proprietary and confidentialProprietary and confidential
45
Proprietary and confidential
Rapidly accelerating use of deep learning at Google
2012 2018
and
more
…
Proprietary and confidentialProprietary and confidential
46
Proprietary and confidential
But it is as much needed in the boiler room...
“Wow! ML”
“BI level Analysis”
“Data Prep”
ML at the top might or might not be needed, but ML helping
at the bottom seems really promising
Proprietary and confidentialProprietary and confidential
47
Proprietary and confidential
Lesson #1: Dataprep is always needed.
Lesson #2: Databases are not likely to be eliminated any time soon.
Lesson #3: Humans are not likely to be eliminated any time soon.
Lesson #4: ML replaces some human tasks, not select count(*).
Lesson #5: Without good labeled data, you get nowhere.
Lesson #6: Without enough data, you get nowhere.
Lesson #7: Data litter will always get you.
Proprietary and confidentialProprietary and confidential
48
Don’t go chasing the next publication
on reinforced, one-shot, transfer
learning using gensim word2vec
model for ImageNet, chase it for
index selections, query
optimizations, data prep, labeling… :)
Proprietary and confidentialProprietary and confidential
https://cyberomin.github.io/startup/2018/07/01/sql-ml-ai.html
Proprietary and confidentialProprietary and confidential
https://cyberomin.github.io/startup/2018/07/01/sql-ml-ai.html
ML/AI Applied to
often
Proprietary and confidential
Thank You!

More Related Content

PPTX
Data Science-Why?What?How? By Hari Prasad
PPTX
Interpretable Machine Learning
PDF
DutchMLSchool. ML Automation
PDF
Worst Practices in Artificial Intelligence
PDF
Leading the Product 2017 - Wendy Glasgow
PDF
Fms invited talk_2018 v5
PDF
"You don't need a bigger boat": serverless MLOps for reasonable companies
PDF
Being a Data Science Product Manager
Data Science-Why?What?How? By Hari Prasad
Interpretable Machine Learning
DutchMLSchool. ML Automation
Worst Practices in Artificial Intelligence
Leading the Product 2017 - Wendy Glasgow
Fms invited talk_2018 v5
"You don't need a bigger boat": serverless MLOps for reasonable companies
Being a Data Science Product Manager

Similar to VLDB Slides on Making Sense of Applying ML to APIs (20)

PDF
Boursiquot "Privacy and The Effective Search Experience"
PDF
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
PPTX
AI presentation for everyone in every fields
PDF
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
PPTX
2024-02-24_Session 1 - PMLE_UPDATED.pptx
PPTX
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
PDF
AI Foundations Course Module 1 - An AI Transformation Journey
PDF
Executive Briefing: Why managing machines is harder than you think
PDF
GDPR in practise - Developing models with transparency and privacy in mind - ...
PDF
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
PPTX
Emerging engineering issues for building large scale AI systems By Srinivas P...
PPTX
AI-ML-Virtual-Internship on new technology
PDF
Msst 2019 v4
PDF
Starting your AI/ML project right (May 2020)
PDF
Ai design sprint - Finance - Wealth management
PDF
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
PPTX
Life of a ML Engineer [Redacted].pptx
PDF
DevOps for DataScience
PPTX
Artificial Intelligence and Machine Learning and Python FINAL.pptx
PPTX
Bridging the AI Gap: Building Stakeholder Support
Boursiquot "Privacy and The Effective Search Experience"
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
AI presentation for everyone in every fields
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
2024-02-24_Session 1 - PMLE_UPDATED.pptx
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Foundations Course Module 1 - An AI Transformation Journey
Executive Briefing: Why managing machines is harder than you think
GDPR in practise - Developing models with transparency and privacy in mind - ...
Google Analytics Konferenz 2019_Google Cloud Platform_Carl Fernandes & Ksenia...
Emerging engineering issues for building large scale AI systems By Srinivas P...
AI-ML-Virtual-Internship on new technology
Msst 2019 v4
Starting your AI/ML project right (May 2020)
Ai design sprint - Finance - Wealth management
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Life of a ML Engineer [Redacted].pptx
DevOps for DataScience
Artificial Intelligence and Machine Learning and Python FINAL.pptx
Bridging the AI Gap: Building Stakeholder Support
Ad

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
A Presentation on Artificial Intelligence
Programs and apps: productivity, graphics, security and other tools
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Ad

VLDB Slides on Making Sense of Applying ML to APIs