SlideShare a Scribd company logo
Boosting Machine Learning with Redis
Modules and Spark
Dvir Volk, Redis Labs, November 2016
2
Hello World
Open source. The leading in-memory database
The open source home and commercial provider of
Redis - cloud and on-premise
Senior System Architect at Redis Labs. Redis user
and contributor for ~6 years
@dvirsky
dvirvolk
3
A Brief Overview of Redis
● Started in 2009 by Salvatore Sanfilippo
● Mostly a one man show
● Most popular KV store
● Notable Users:
○ Twitter, Netflix, Uber, Groupon, Twitch
○ Many, many more...
4
A Brief Overview of Redis
▪ Key => Data Structure server
▪ In memory disk backed
▪ Optional cluster mode
▪ Embedded Lua scripting
▪ Single Threaded!
▪ Key features: Fast, Flexible, Simple
5
A Lego For Your Database
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337 }
{ A: (51.5, 0.12), B: (32.1, 34.7) }
00110101 11001110 10101010
6
Redis In Practice
▪ “Front End Database”
▪ Real Time Counters
▪ Ad Serving
▪ Message Queues
▪ Geo Database
▪ Time Series
▪ Cache
▪ Session State
▪ Etc
7
But Can Redis Do X?
Secondary Index?
Time Series?
Full Text Search?
Graph?
Machine Learning?
AutoComplete?
SQL?
8
So You Want a New Feature?
▪ Try a Lua script
▪ Convince @antirez
▪ Fork Redis
▪ Build Your Own Database!
9
Enter Redis Modules
▪ In development since March 2016
▪ Redis 4.0 RC out soon
▪ Several modules already exist
▪ Key paradigm shift for Redis
10
New Capabilities
What Modules Actually Are
▪ Dynamic libraries loaded to redis
▪ Written in C/C++
▪ Use a C ABI/API isolating redis internals
▪ Near Zero latency access to data
New Commands
New Data Types
11
Obligatory Module Example
12
LEFTPAD Example
127.0.0.1:6379> MODULE LOAD "./example.so"
OK
127.0.0.1:6379> COMMAND INFO EXAMPLE.LEFTPAD
1) 1) "example.leftpad"
...
127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8
foo
127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 "_"
_____foo
13
Real Module: RediSearch
▪ From-Scratch search index over redis
▪ Uses Strings for holding compressed index data
▪ Includes stemming, exact phrase match, etc.
▪ Fast Fuzzy Auto-complete
▪ Up to X5 faster than Elastic / Solr
> FT.SEARCH “lcd tv” FILTER price 100 +inf
> FT.SUGGET “lcd” FUZZY
14
More Modules Out There
▪ Native JSON Support
▪ Time Series
▪ Secondary Indexing
▪ Encryption
▪ Bloom Filters
▪ Online Neural Network
▪ Many Many more...
15
Spark ML + Redis modules
16
Redis + Spark So Far
▪ Current connector:
- RDD abstraction
- SparkSQL
- Streaming Source
▪ ML is not addressed specifically
▪ Used for pre-computed results
▪ We felt that we can take it further
17
Addressing The ML Pain
▪ The missing piece of ML: Serving your model
- Not standardized
- Vendor-lock with cloud platforms
- Reliable services are hard to do
- If only we had a “database” for this!
- Well, maybe we do?
18
Why Modules for ML?
With modules we can:
▪ Define data structures for models
▪ Store training output as “hot model”
▪ Perform evaluation directly in Redis
▪ Easily integrate existing C/C++ libs
19
Spark + Modules = AWESOME
▪ Train ML model on Spark
▪ Save model to Redis and get:
- High availability
- Clustering
- Persistence
- Performance
- Client libraries
20
Spark-ML End-to-End Flow
Spark Training
Custom Server
Model saved to
Parquet file
Data Loaded
to Spark
Pre-computed
results
Batch Evaluation
?
ClientApp
21
Adding Redis Into The Mix
Redis-ML “Active Model”
Any Training Platform
ClientApp
Spark Training
Data Loaded
to Spark
22
Redis Module
Tree Ensembles
Linear Regression
Logistic Regression
Matrix + Vector Operations
More to come...
The Redis-ML Module
23
Example: Random Forest
24
Forest Data Type
▪ A collection of decision trees
▪ Supports classification & regression
▪ Splitter Node can be
- Categorical (e.g. day == “Sunday”)
- Numerical (e.g. age < 43)
25
Decision Tree Example
The famous Titanic survival predictor
sex=male?yes no
Survived
Died
Age > 9.5?
sibsp > 2.5?
Died Survived *sibsp = siblings + spouses
26
Forest Data Type Example
> MODULE LOAD "./redis-ml.so"
OK
> ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L
LEAF 1 .R LEAF 0
OK
> ML.FOREST.RUN myforest sex:male
"1"
> ML.FOREST.RUN myforest sex:yes_please
"0"
27
Using Redis-ML With Spark
scala> import com.redislabs.client.redisml.MLClient
scala> import com.redislabs.provider.redis.ml.Forest
scala> val rfModel =
pipelineModel.stages.last.asInstanceOf[RandomForestClassificationModel]
scala> val f = new Forest(rfModel.trees)
scala> f.loadToRedis("forest-test", "localhost")
scala> val jedis = new Jedis("localhost")
scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN,
"forest-test", makeInputString (0))
scala> jedis.getClient.getStatusCodeReply
res53: String = 1
28
Benchmarking Redis-ML
- Spark + Parquet Spark + Redis ML
Model Preparation + Save 3785ms 292ms
Model Load 2769ms 0ms (model is on memory)
Classification (AVG) 13ms 1ms
● Forest size: 15000 trees
● Data: $(SPARK_HOME)/data/mllib/sample_libsvm_data.txt
29
Going Forward - More Features
▪ Implement more Spark-ML model types
- SVM
- Naive Bayes Classifier
- Neural Networks
▪ Integration with Redis’ native types
▪ Data Processing (e.g. Word2Vec, TF-IDF)
▪ PMML Support
30
PS: Neural Redis
▪ Developed by Salvatore
▪ Training is done inside redis
▪ Online continuous training process
▪ Builds Fully Connected NNs
31
More Resources
Redis-ML:
https://github.com/RedisLabsModules/redis-ml
Spark-Redis-ML:
https://github.com/RedisLabs/spark-redis-ml
Neural-Redis:
https://github.com/antirez/neural-redis
32

More Related Content

PPTX
Introduction to Redis
PPTX
Introduction to Redis
PDF
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
PPTX
High-Volume Data Collection and Real Time Analytics Using Redis
PPTX
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
PDF
Redis 101
PPTX
Introduction to Redis
PPTX
Introduction to Redis
Introduction to Redis
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
High-Volume Data Collection and Real Time Analytics Using Redis
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Redis 101
Introduction to Redis

What's hot (20)

PDF
Redis modules 101
PPTX
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
KEY
Redis overview for Software Architecture Forum
PDF
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
PDF
Key-Value-Stores -- The Key to Scaling?
PPTX
Redis and it's data types
PPT
Introduction to redis
PDF
Background Tasks in Node - Evan Tahler, TaskRabbit
PDF
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
PPTX
Caching solutions with Redis
PPTX
A simple introduction to redis
PPTX
Redis tutoring
PDF
Redis memcached pdf
PPTX
Redis Functions, Data Structures for Web Scale Apps
KEY
Scaling php applications with redis
PPTX
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
PDF
MyRocks Deep Dive
PDF
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
PPTX
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
PPTX
Introduction to redis
Redis modules 101
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Redis overview for Software Architecture Forum
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Key-Value-Stores -- The Key to Scaling?
Redis and it's data types
Introduction to redis
Background Tasks in Node - Evan Tahler, TaskRabbit
Tuning Solr and its Pipeline for Logs: Presented by Rafał Kuć & Radu Gheorghe...
Caching solutions with Redis
A simple introduction to redis
Redis tutoring
Redis memcached pdf
Redis Functions, Data Structures for Web Scale Apps
Scaling php applications with redis
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
MyRocks Deep Dive
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Introduction to redis
Ad

Similar to Boosting Machine Learning with Redis Modules and Spark (20)

PDF
Spark Summit EU talk by Shay Nativ and Dvir Volk
PDF
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
PPTX
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
PPTX
Serving predictive models with Redis
PDF
Big Data LDN 2017: Serving Predictive Models with Redis
PDF
Deploying Real-Time Decision Services Using Redis with Tague Griffith
PDF
Getting Ready to Use Redis with Apache Spark with Tague Griffith
PDF
Accelerating Real-Time Decision Systems using Redis by Tague Griffith
PDF
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
PDF
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
PDF
RedisConf17 - Real-time Intelligence with Redis-ML and Apache Spark
PDF
Extend Redis with Modules
PDF
RedisConf18 - Making Real-Time Predictive Decisions with Redis
PPTX
Redis for Security Data : SecurityScorecard JVM Redis Usage
PPTX
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
PDF
Big Data Day LA 2015 - How to model anything in Redis by Josiah Carlson of Ze...
PPTX
Fast and Furious: Searching in a Distributed World with Highly Available Spri...
PPTX
Real-time Analytics with Redis
PPTX
How in memory technology will impact machine deep learning services (redis la...
Spark Summit EU talk by Shay Nativ and Dvir Volk
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Getting Ready to Use Redis with Apache Spark with Dvir Volk
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
Serving predictive models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
Deploying Real-Time Decision Services Using Redis with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Accelerating Real-Time Decision Systems using Redis by Tague Griffith
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
RedisConf17 - Real-time Intelligence with Redis-ML and Apache Spark
Extend Redis with Modules
RedisConf18 - Making Real-Time Predictive Decisions with Redis
Redis for Security Data : SecurityScorecard JVM Redis Usage
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2015 - How to model anything in Redis by Josiah Carlson of Ze...
Fast and Furious: Searching in a Distributed World with Highly Available Spri...
Real-time Analytics with Redis
How in memory technology will impact machine deep learning services (redis la...
Ad

More from Dvir Volk (8)

PDF
RediSearch
PDF
Searching Billions of Documents with Redis
PDF
Tales Of The Black Knight - Keeping EverythingMe running
PDF
10 reasons to be excited about go
PDF
Kicking ass with redis
PDF
Introduction to redis - version 2
PPT
Introduction to Thrift
PDF
Introduction to Redis
RediSearch
Searching Billions of Documents with Redis
Tales Of The Black Knight - Keeping EverythingMe running
10 reasons to be excited about go
Kicking ass with redis
Introduction to redis - version 2
Introduction to Thrift
Introduction to Redis

Recently uploaded (20)

PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
history of c programming in notes for students .pptx
DOCX
The Five Best AI Cover Tools in 2025.docx
PDF
Understanding Forklifts - TECH EHS Solution
PPT
Introduction Database Management System for Course Database
PPTX
Transform Your Business with a Software ERP System
PPTX
L1 - Introduction to python Backend.pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Introduction to Artificial Intelligence
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
ai tools demonstartion for schools and inter college
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
medical staffing services at VALiNTRY
Design an Analysis of Algorithms II-SECS-1021-03
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Upgrade and Innovation Strategies for SAP ERP Customers
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Online Work Permit System for Fast Permit Processing
history of c programming in notes for students .pptx
The Five Best AI Cover Tools in 2025.docx
Understanding Forklifts - TECH EHS Solution
Introduction Database Management System for Course Database
Transform Your Business with a Software ERP System
L1 - Introduction to python Backend.pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
Introduction to Artificial Intelligence
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
How to Migrate SBCGlobal Email to Yahoo Easily
ManageIQ - Sprint 268 Review - Slide Deck
ai tools demonstartion for schools and inter college
Which alternative to Crystal Reports is best for small or large businesses.pdf
medical staffing services at VALiNTRY

Boosting Machine Learning with Redis Modules and Spark

  • 1. Boosting Machine Learning with Redis Modules and Spark Dvir Volk, Redis Labs, November 2016
  • 2. 2 Hello World Open source. The leading in-memory database The open source home and commercial provider of Redis - cloud and on-premise Senior System Architect at Redis Labs. Redis user and contributor for ~6 years @dvirsky dvirvolk
  • 3. 3 A Brief Overview of Redis ● Started in 2009 by Salvatore Sanfilippo ● Mostly a one man show ● Most popular KV store ● Notable Users: ○ Twitter, Netflix, Uber, Groupon, Twitch ○ Many, many more...
  • 4. 4 A Brief Overview of Redis ▪ Key => Data Structure server ▪ In memory disk backed ▪ Optional cluster mode ▪ Embedded Lua scripting ▪ Single Threaded! ▪ Key features: Fast, Flexible, Simple
  • 5. 5 A Lego For Your Database Key "I'm a Plain Text String!" { A: “foo”, B: “bar”, C: “baz” } Strings/Blobs/Bitmaps Hash Tables (objects!) Linked Lists Sets Sorted Sets Geo Sets HyperLogLog { A , B , C , D , E } [ A → B → C → D → E ] { A: 0.1, B: 0.3, C: 100, D: 1337 } { A: (51.5, 0.12), B: (32.1, 34.7) } 00110101 11001110 10101010
  • 6. 6 Redis In Practice ▪ “Front End Database” ▪ Real Time Counters ▪ Ad Serving ▪ Message Queues ▪ Geo Database ▪ Time Series ▪ Cache ▪ Session State ▪ Etc
  • 7. 7 But Can Redis Do X? Secondary Index? Time Series? Full Text Search? Graph? Machine Learning? AutoComplete? SQL?
  • 8. 8 So You Want a New Feature? ▪ Try a Lua script ▪ Convince @antirez ▪ Fork Redis ▪ Build Your Own Database!
  • 9. 9 Enter Redis Modules ▪ In development since March 2016 ▪ Redis 4.0 RC out soon ▪ Several modules already exist ▪ Key paradigm shift for Redis
  • 10. 10 New Capabilities What Modules Actually Are ▪ Dynamic libraries loaded to redis ▪ Written in C/C++ ▪ Use a C ABI/API isolating redis internals ▪ Near Zero latency access to data New Commands New Data Types
  • 12. 12 LEFTPAD Example 127.0.0.1:6379> MODULE LOAD "./example.so" OK 127.0.0.1:6379> COMMAND INFO EXAMPLE.LEFTPAD 1) 1) "example.leftpad" ... 127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 foo 127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 "_" _____foo
  • 13. 13 Real Module: RediSearch ▪ From-Scratch search index over redis ▪ Uses Strings for holding compressed index data ▪ Includes stemming, exact phrase match, etc. ▪ Fast Fuzzy Auto-complete ▪ Up to X5 faster than Elastic / Solr > FT.SEARCH “lcd tv” FILTER price 100 +inf > FT.SUGGET “lcd” FUZZY
  • 14. 14 More Modules Out There ▪ Native JSON Support ▪ Time Series ▪ Secondary Indexing ▪ Encryption ▪ Bloom Filters ▪ Online Neural Network ▪ Many Many more...
  • 15. 15 Spark ML + Redis modules
  • 16. 16 Redis + Spark So Far ▪ Current connector: - RDD abstraction - SparkSQL - Streaming Source ▪ ML is not addressed specifically ▪ Used for pre-computed results ▪ We felt that we can take it further
  • 17. 17 Addressing The ML Pain ▪ The missing piece of ML: Serving your model - Not standardized - Vendor-lock with cloud platforms - Reliable services are hard to do - If only we had a “database” for this! - Well, maybe we do?
  • 18. 18 Why Modules for ML? With modules we can: ▪ Define data structures for models ▪ Store training output as “hot model” ▪ Perform evaluation directly in Redis ▪ Easily integrate existing C/C++ libs
  • 19. 19 Spark + Modules = AWESOME ▪ Train ML model on Spark ▪ Save model to Redis and get: - High availability - Clustering - Persistence - Performance - Client libraries
  • 20. 20 Spark-ML End-to-End Flow Spark Training Custom Server Model saved to Parquet file Data Loaded to Spark Pre-computed results Batch Evaluation ? ClientApp
  • 21. 21 Adding Redis Into The Mix Redis-ML “Active Model” Any Training Platform ClientApp Spark Training Data Loaded to Spark
  • 22. 22 Redis Module Tree Ensembles Linear Regression Logistic Regression Matrix + Vector Operations More to come... The Redis-ML Module
  • 24. 24 Forest Data Type ▪ A collection of decision trees ▪ Supports classification & regression ▪ Splitter Node can be - Categorical (e.g. day == “Sunday”) - Numerical (e.g. age < 43)
  • 25. 25 Decision Tree Example The famous Titanic survival predictor sex=male?yes no Survived Died Age > 9.5? sibsp > 2.5? Died Survived *sibsp = siblings + spouses
  • 26. 26 Forest Data Type Example > MODULE LOAD "./redis-ml.so" OK > ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L LEAF 1 .R LEAF 0 OK > ML.FOREST.RUN myforest sex:male "1" > ML.FOREST.RUN myforest sex:yes_please "0"
  • 27. 27 Using Redis-ML With Spark scala> import com.redislabs.client.redisml.MLClient scala> import com.redislabs.provider.redis.ml.Forest scala> val rfModel = pipelineModel.stages.last.asInstanceOf[RandomForestClassificationModel] scala> val f = new Forest(rfModel.trees) scala> f.loadToRedis("forest-test", "localhost") scala> val jedis = new Jedis("localhost") scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN, "forest-test", makeInputString (0)) scala> jedis.getClient.getStatusCodeReply res53: String = 1
  • 28. 28 Benchmarking Redis-ML - Spark + Parquet Spark + Redis ML Model Preparation + Save 3785ms 292ms Model Load 2769ms 0ms (model is on memory) Classification (AVG) 13ms 1ms ● Forest size: 15000 trees ● Data: $(SPARK_HOME)/data/mllib/sample_libsvm_data.txt
  • 29. 29 Going Forward - More Features ▪ Implement more Spark-ML model types - SVM - Naive Bayes Classifier - Neural Networks ▪ Integration with Redis’ native types ▪ Data Processing (e.g. Word2Vec, TF-IDF) ▪ PMML Support
  • 30. 30 PS: Neural Redis ▪ Developed by Salvatore ▪ Training is done inside redis ▪ Online continuous training process ▪ Builds Fully Connected NNs
  • 32. 32