SlideShare a Scribd company logo
BUILDING	A	MACHINE	LEARNING	
APPLICATION	WITH	AWS	LAMBDA
Ludi Rehak
ludi@h2o.ai
Silicon	Valley	Big	Data	Science	Meetup
March	17,	2016
(+	help	from	Tom	and	Prithvi)
BUILDING	A	MACHINE	LEARNING	APPLICATION	
WITH	AWS	LAMBDA
Q: What	is	AWS	Lambda?
A: AWS	Lambda is	a	compute	service	that	runs	code	–
a	Lambda	function	- on-demand.	It	simplifies	the	process	
of	running	code	in	the	cloud	by	managing	compute	
resources	automatically.	
Offloads	DevOps tasks	related	to	VMs:
• Server	and	operating	system	maintenance
• Capacity	provisioning
• Scaling
• Code	monitoring	and	logging
• Security	patches
MAJOR	STEPS
Step	1:		Identify	problem	to	solve
Step	2:	 Train	model	on	data
Step	3:	 Export	the	model	as	a	POJO
Step	4:		Write	code	for	Lambda	handler
Step	5:	 Build	deployment	package	(.zip	file)	and	
upload	to	Lambda
Step	6:	 Map	API	endpoint	to	Lambda	function
Step	7:		Embed	endpoint	in	application
A	CONCRETE	 USE	CASE:	DOMAIN	NAME	
CLASSIFICATION
Malicious	domains
• Carry	out	malicious	activity	- botnets,	phishing,	
malware	hosting,	etc
• Names	are	generated	by	algorithms	to	defeat	security	
systems
Goal:	Classify	domains	as	legitimate	vs.	malicious
Legitimate Malicious
h2o zyxgifnjobqhzptuodmzov
zen-cart c3p4j7zdxexg1f2tuzk117wyzn
fedoraforum batdtrbtrikw
FEATURES
• String	length
• Shannon	Entropy
o Measure	of	uncertainty	in	a	random	variable
• Number	of	substrings	that	are	English	words
• Proportion	of	vowels
DATA
• Domains	and	whether	they	are	malicious
o http://datadrivensecurity.info/blog/data/2014/10/legit-dga_domains.csv.zip
o 133,927 rows
• English	words
o https://raw.githubusercontent.com/dwyl/english-words/master/words.txt
o 354,985	rows
MODEL	INFORMATION
Malicious	Domain	Model
Algorithm:	 GLM
Model	family: Binomial
Regularization: Ridge
Threshold	(max	F1): 0.4935
Class 0 1 Error
0 15889 315 FPR	
0.0194
1 346 10043 FNR
0.0333
Confusion matrix on validation data
Actual
Predicted
WORKFLOW	FOR	THIS	APP
Input	domain	
name
Get	Predictions
Malicious	
Domain?
Visit	web	
page
Malicious Legitimate
Yes No
APP	ARCHITECTURE	DIAGRAM
REST	
endpoint
JavaScript	
App
Lambda
Jython
Feature	
Munging
Lambda
Function
Handler
H2O	Model	
POJO	
Prediction
HTTPS
POST
domain
name
JSON
with
prediction
LAMBDA	FUNCTION	HANDLER
public	static	ResponseClass myHandler(RequestClass
request,	Context	context)	throws	PyException {
PyModule module	=	new	PyModule();
//Prediction	code	is	in	pymodule.py
double[]predictions=module.predict(request.domain);
return	new	ResponseClass(predictions);
}
REST	
endpoint
Jython
Feature	
Munging
Lambda
Function
Handler
H2O	Model	
POJO	
Prediction
JYTHON	FEATURE	MUNGING
def predict(domain):
domain	=	domain.split('.')[0]
row	=	RowData()
functions	=	[len,	entropy,	p_vowels,	num_valid_substrings]
eval_features =	[f(domain)	for	f	in	functions]
names	=	NamesHolder_MaliciousDomainModel().VALUES
beta	=	MaliciousDomainModel().BETA().VALUES
feature_coef_product =	[beta[len(beta)	- 1]]
for	i in	range(len(names)):
row.put(names[i],	float(eval_features[i]))
feature_coef_product.append(eval_features[i]	*	beta[i])
#prediction
model	=	EasyPredictModelWrapper(MaliciousDomainModel())
p	=	model.predictBinomial(row)
REST	
endpoint
Jython
Feature	
Munging
Lambda
Function
Handler
H2O	Model	
POJO	
Prediction
H2O	MODEL	POJO
• static	final	class	BETA_0	implements	java.io.Serializable {
static	final	void	fill(double[]	sa)	{
sa[0]	=	1.49207826021648;
sa[1]	=	2.8502716978560194;
sa[2]	=	-8.839804567200542;
sa[3]	=	-0.7977065034624655;
sa[4]	=	-14.94132841574946;
}
}
REST	
endpoint
Jython
Feature	
Munging
Lambda
Function
Handler
H2O	Model	
POJO	
Prediction
HANDS-ON	DEMONSTRATION
STEP	1:	Build
$	git clone	https://github.com/h2oai/app-malicious-domains
$	cd	app-consumer-loan
$	gradle wrapper
$	./gradlew build
STEP	2:	Create	Lambda	function	and	set	API	endpoint
See	instructions	and	screenshots	in	README.md
STEP	3:		Use	the	app	in	a	web	browser
$	./gradlew jettyRunWar –x	generateModel
http://localhost:8080
TROUBLESHOOTING
• Common	Py errors
o Another	H2O	is	already	running
• Py script	can’t	find	the	data	in	h2o.import_file()
• Common	Java	errors
o Java	not	installed	at	all
• Also,	must	install	a	JDK	(Java	Development	Kit)	so	that	the	Java	compiler	is	
available	(JRE	is	not	sufficient)
o Not	connected	to	the	internet
• Gradle needs	to	fetch	some	dependencies	from	the	internet
• Common	Lambda	errors
o Error	in	uploading	.zip	file
• Check	if	the	function	already	exists	and,	if	not,	try	again.	For	slower	internet	
connections,	try	uploading	.zip	file	with	S3	link.
o Timeout	error	when	testing	Lambda	function
• Go	to	advanced	settings	and	increase	Timeout	value
o Gateway	Timeout	(504	error)
• This	is	Lambda’s	cold	start	behavior.	Keep	trying,	eventually	Lambda	kicks	in
CAVEATS
• Stateless
o Can	access	stateful data	by	calling	other	web	services,	
such	as	Amazon	S3	or	Amazon	DynamoDB.
• Cold	start	behavior
o containers	are	instantiated	and	reused	after	the	first	
request	and	stay	active	for	a	window	of	time	(10-20	
minutes)
o “the	longer	I	leave	it	between	invocations,	the	longer	
the	function	takes	to	warm	up”
• API	Gateway	timeout	of	10	secs
o Can	request	longer	timeout
CONFIGURING	LAMBDA	FUNCTIONS
• Memory	
o Allocates	proportional	CPU	power,	network	
bandwidth,	and	disk	I/O
o Easy	single-dial	solution	
o Log	shows	how	much	memory	was	used	for	tuning	
and	cost	savings
• Timeout
LAMBDA	RESOURCE	LIMITS
Resource Default	Limit
Memory 512	MB
Number	of	file	descriptors 1,024
Number	of	processes	and	threads	
(combined	 total)
1,024
Maximum	execution	duration	per	request 300	seconds
Invoke request	body	payload	size 6	MB
Invoke response	body	payload	size 6	MB
Concurrent	executions	per	region 100
Item Default	Limit
Lambda	function	 deployment	package	size	
(.zip/.jar	file)
50	MB
Size	of	code/dependencies	 that	you	can	
zip	into	a	deployment	package	
(uncompressed	 zip/jar	size)
250	MB
LAMBDA	PRICING
• Lambda
o Requests
• First	1	million	per	month	are	free
• $0.20	per	1	million	requests	thereafter
o Duration
• First	400,000	GB-seconds	of	compute	time	per	month	are	free
• $0.00001667	for	every	GB-second thereafter
• API	Gateway
o $3.50	per	million	API	calls	received	plus	data	transfer	costs
• Estimate	for	Malicious	Domain	Application:	
• Lambda:	$0.37/hour	with	10	threads	after	free-tier
• API	Gateway:	$0.71/hour
• Total:	~$1/hr
LAMBDA	PERFORMANCE
Memory	
(MB)
Threads Loops Samples
Median
(ms)
Min
(ms)
Max
(ms)
%	
Error	
Throughput	
(calls/sec)
512 1 10000 10000 102 85 2137 0 8.4
512 10 1000 10000 102 85 30330 0.18 44
512 100 100 10000 149 85 30307 0.43 168
LAMBDA	SCALING
• Automatically	scales	to	support	the	rate	of	
incoming	requests
• “No	limit	to	the	number	of	requests	your	code	
can	handle”
• Starts	as	many	instances	of	Lambda	function	
as	needed
RELATED	EXAMPLES
• H2O	Generated	Model	POJO	in	a	Java	Servlet	container
o Github:	h2oai/app-consumer-loan
• H2O	Generated	Model	POJO	in	a	Storm	bolt
o GitHub:		h2oai/h2o-world-2015-training
o tutorials/streaming/storm
• H2O	Generated	Model	POJO	in	Spark	Streaming
o GitHub:	h2oai/sparkling-water
o examples/src/main/scala/org/apache/spark/examples/h2o
/CraigslistJobTitlesStreamingApp.scala
RESOURCES	ON	THE	WEB
• Slides
o GitHub h2oai/h2o-tutorials/tree/master/tutorials/aws-lambda-app
• Source	code
o GitHub h2oai/app-malicious-domains
• Latest	stable	H2O	for	Python	release
o http://h2o.ai/download/h2o/python
• Generated	POJO	model	Javadoc
o http://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/docs-
website/h2o-genmodel/javadoc/index.html
• AWS	Lambda	
o http://docs.aws.amazon.com/lambda/latest/dg/welcome.html
Q	&	A
• Thanks	for	attending!
• Send	follow	up	questions	to:
Ludi Rehak
ludi@h2o.ai

More Related Content

PDF
H2O Rains with Databricks Cloud - Parisoma SF
PDF
AWS Lambda and Serverless framework: lessons learned while building a serverl...
PDF
H2O Advancements - Arno Candel
PDF
Build Your Own Recommendation Engine
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
PDF
Genomics on aws-webinar-april2018
PPTX
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
PPTX
Scala eXchange: Building robust data pipelines in Scala
H2O Rains with Databricks Cloud - Parisoma SF
AWS Lambda and Serverless framework: lessons learned while building a serverl...
H2O Advancements - Arno Candel
Build Your Own Recommendation Engine
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Genomics on aws-webinar-april2018
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Scala eXchange: Building robust data pipelines in Scala

What's hot (9)

PDF
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
PPTX
Serverless Reality
PPTX
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
PDF
How Disney+ uses fast data ubiquity to improve the customer experience
PDF
Streaming analytics on Google Cloud Platform, by Javier Ramirez, teowaki
PDF
Kapil Thangavelu - Cloud Custodian
PPTX
Real Time Dashboard - Architecture
PDF
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
PDF
Apache Kafka Streams + Machine Learning / Deep Learning
Big data pipeline with scala by Rohit Rai, Tuplejump - presented at Pune Scal...
Serverless Reality
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
How Disney+ uses fast data ubiquity to improve the customer experience
Streaming analytics on Google Cloud Platform, by Javier Ramirez, teowaki
Kapil Thangavelu - Cloud Custodian
Real Time Dashboard - Architecture
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Apache Kafka Streams + Machine Learning / Deep Learning
Ad

Similar to Building a Machine Learning App with AWS Lambda (9)

PPTX
Building A Serverless Web Application With AWS Lambda: A Step-By-Step Guide
PDF
An introduction to serverless architectures (February 2017)
PPTX
Introduction to AWS lambda & Serverless Application1.pptx
PDF
Comprehensive Guide: Web Scraping with AWS Lambda
PPTX
Introduction to Aws lambda and build first application | Namespace IT
PDF
Getting Started with AWS Lambda & Serverless Cloud
PDF
AWS re:Invent 2020 Serverless Recap
PDF
10 Tips For Serverless Backends With NodeJS and AWS Lambda
PDF
Running R on AWS Lambda by Ana-Maria Niculescu
Building A Serverless Web Application With AWS Lambda: A Step-By-Step Guide
An introduction to serverless architectures (February 2017)
Introduction to AWS lambda & Serverless Application1.pptx
Comprehensive Guide: Web Scraping with AWS Lambda
Introduction to Aws lambda and build first application | Namespace IT
Getting Started with AWS Lambda & Serverless Cloud
AWS re:Invent 2020 Serverless Recap
10 Tips For Serverless Backends With NodeJS and AWS Lambda
Running R on AWS Lambda by Ana-Maria Niculescu
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
Enterprise h2o GPTe Learning Path Slide Deck
H2O Wave Course Starter - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
H2O Driverless AI Starter Course - Slides and Assignments
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx

Recently uploaded (20)

PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
top salesforce developer skills in 2025.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
System and Network Administraation Chapter 3
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Nekopoi APK 2025 free lastest update
PDF
Digital Strategies for Manufacturing Companies
PPTX
ai tools demonstartion for schools and inter college
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Transform Your Business with a Software ERP System
PDF
medical staffing services at VALiNTRY
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Operating system designcfffgfgggggggvggggggggg
top salesforce developer skills in 2025.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Which alternative to Crystal Reports is best for small or large businesses.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
Upgrade and Innovation Strategies for SAP ERP Customers
System and Network Administraation Chapter 3
Design an Analysis of Algorithms I-SECS-1021-03
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Nekopoi APK 2025 free lastest update
Digital Strategies for Manufacturing Companies
ai tools demonstartion for schools and inter college
Design an Analysis of Algorithms II-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Digital Systems & Binary Numbers (comprehensive )
Navsoft: AI-Powered Business Solutions & Custom Software Development
Transform Your Business with a Software ERP System
medical staffing services at VALiNTRY
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx

Building a Machine Learning App with AWS Lambda