VLDB Slides on Making Sense of Applying ML to APIs

Proprietary and confidential
Making Sense of Applying ML to APIs
Anant Jhingran, CTO, Apigee (Google)
Sridhar Rajagopalan, Chief Architect, Data, Apigee (Google)

Proprietary and confidentialProprietary and confidential
2
APIs are everywhere

Check social media = API

Flight check in = API

Online purchase = API

7
Apigee Full Lifecycle API Management Platform
Deliver connected experiences for customers, employees and partners
/idm/account
/erp/pricing/id
/cms/images/item
/inventory/itemIDAPIs
/catalog
/payment
/customer
Internet of Things
Legacy Apps
Cloud Apps
Employee Apps
Consumer Apps
Partner Apps
API Management Platform
ExposureConsumption

RETAIL
.34
GOVERNMENT
TRAVEL & HOSPITALITY
OTHER
FINANCIAL SERVICES
MEDIA & ENTERTAINMENT
TECHNOLOGY
MANUFACTURING
HEALTHCARE
TELCO
EDUCATION
700+ Customers, > 1T calls/year

9
A Sample Response from Google Places API

10
X 1,000,000,000,000

With so much data,
what can we do?

12
Visibility for API Teams
Custom Reports
Business and Operational
Analytics
Out of the Box
Dashboards

Predict if an API
call is from a “bot”

Predict time to
resolution of a
support ticket
We are seeing 500n server side errors impacting customers for second
time today
{"organization":"promisepay","environment":"prod","apiProduct":null,"pro
xyName":"prelive_rev9_2016_10_04","appName":null,"verb":"GET","url":
"https://secure.api.promisepay.com/status","responseCode":500,"respon
seReason":"Internal Server
Error","clientLatency":9,"targetLatency":0,"totalLatency":9,"remote_ip":"1
92.168.61.56","correlation_id":"rrt-089f6aa9f9c90adaa-b-wo-30508-
63404252-
1","request_host":"secure.api.promisepay.com","platform_name":"unkno
wn","platform_quota_volume":"null/null","endpoint_quota_volume":"null/n
ull","platform_quota_period":"nullmin","platform_spike_volume":"null/null"
,"endpoint_spike_volume":"null/null","asm_quota_req_quota_platform_k
ey":"unknown-status","asm_quota_all_quota_default_key":"default-
all","asm_quota_all_quota_platform_key":"unknown-all"} Start at 15:36
ESDT

Predict correctly
that this rollout is
troublesome

17
In other words, we will only focus on
“predictions”

18
“Wow! ML”
“BI level Analysis”
“Data Prep”
Don’t Predictions need ML?
“Sometimes
needed”
“Always Needed, and often
enough for predictions”
“Always Needed”
ML at the top might or might not be needed, but ML helping
at the bottom seems really promising

Lesson #1:
Dataprep is always needed

20
Garbage In, Garbage Out...
http_pattern = r'(https|http)?://(w|.|/|?|=|&|%|-|#)*b'
html_pattern = '<[^<]+?>'
injection_pattern = r"">"
json_pattern = '{.*}+'
email_pattern = r'[w.-]+@[w.-]+(.[w]+)+'
quote_pattern = r'["']'
nonalpha_pattern = "[^A-Za-z0-9.? ]+"
patterns = [http_pattern, html_pattern, injection_pattern, json_pattern, email_pattern,
quote_pattern, nonalpha_pattern]
pattern = "|".join(patterns)
def substitute (text):
text = re.sub("[Cc]an't", "cannot", text)
text = re.sub("I'm", "I am", text)
text = re.sub(" +", " ", text)
return text
def cleanup(text):
text = re.sub(pattern, ' ', text)
text = substitute (text)
return text

21
A good area of research is applying
ML to the data prep problem

Lesson #2:
Databases are not likely to be
eliminated any time soon.
Aka: select count(*) goes a loooonng way

Bot Detection Rules

24
Many of these are expressible as
SQL – deep learning might give
marginal better results, BUT…

25
Customers always want
explainability

26
A good area of research is applying ML
to Explainability←→ Predictability
Tradeoff in SQLs

Lesson #3:
Humans are not likely to be
eliminated any time soon

28
Public
API calls
Distinguishing QA from bot traffic...
API
QA

29
A good area of research is
transfer learning from a huge
number of QA tasks

Lesson #4:
ML replaces some human tasks,
not select count(*)

31
Visually detecting Anomalies does not work at scale

32
A good area of research is ML
for anomaly detection

Lesson #5:
Without good labeled data,
you get nowhere

[glacier]
Google Photos
34

36
But labeling improves precision,
not recall

37
A good area of research is ML for
giving labeling tasks to humans

Lesson #6:
Without enough data, you
get nowhere

39
APIs
Traffic
Have enough data for ML
Not enough data to do
anything, but we could
pool similar APIs together
to get enough data..
Is the new deployment of API or platform bad?

40
A good area of research is learning
about anomalies by grouping a
number of similar small data sets.

Lesson #7:
Data litter will always
get you.

42
1. Latency (minutes) at which 99% of data is ingested into the warehouse. Let’s say we have an internal
goal of X minutes.
2. Regardless of when you run an analysis, you will not get 100% of the data. So you have to accept that
any timely analysis will have incomplete data.
3. Tradeoff between timeliness of analysis, completeness of data, and cost of collection is fundamental.

43
A good area of research is to
be able to distinguish whether
missing data is random noise
or systematic drops.

Of course, “Wow! ML” is also
needed for end user problems

45
Rapidly accelerating use of deep learning at Google
2012 2018
and
more
…

46
But it is as much needed in the boiler room...
“Wow! ML”
“BI level Analysis”
“Data Prep”
ML at the top might or might not be needed, but ML helping
at the bottom seems really promising

47
Lesson #1: Dataprep is always needed.
Lesson #2: Databases are not likely to be eliminated any time soon.
Lesson #3: Humans are not likely to be eliminated any time soon.
Lesson #4: ML replaces some human tasks, not select count(*).
Lesson #5: Without good labeled data, you get nowhere.
Lesson #6: Without enough data, you get nowhere.
Lesson #7: Data litter will always get you.

48
Don’t go chasing the next publication
on reinforced, one-shot, transfer
learning using gensim word2vec
model for ImageNet, chase it for
index selections, query
optimizations, data prep, labeling… :)

https://cyberomin.github.io/startup/2018/07/01/sql-ml-ai.html

https://cyberomin.github.io/startup/2018/07/01/sql-ml-ai.html
ML/AI Applied to
often

Thank You!

VLDB Slides on Making Sense of Applying ML to APIs

More Related Content

Similar to VLDB Slides on Making Sense of Applying ML to APIs (20)

Recently uploaded (20)

VLDB Slides on Making Sense of Applying ML to APIs