SlideShare a Scribd company logo
Home of Redis
Analytics at the Speed of
Business with Redis and Spark
Leena Joshi
VP Product Marketing
2
Who We Are
The open source home and commercial provider
of Redis
Open source. The leading in-memory data
structure store, supporting any high
performance operational or analytic use case.
Redis Cloud
Available since mid-2013
6,100+ enterprise customers
Redis Labs Enterprise Cluster (RLEC)
Available since early-2015
100+ enterprise customers
50,000 + total customers
3
Redis is a Game Changer
Simplicity
(through Data Structures)
Extensibility
(through Redis Modules)
Performance
ListsSorted Sets
Hashes Hyperlog-logs
Geospatial
Indexes
Bitmaps
SetsStrings
Bit field
Why Use Redis in Analytics
5
Popular Redis Use Cases
Geo SearchData Ingestion Social Functionality
Following, Followers, Relations Location-based ApplicationsHigh Throughput Buffering
Job & Queue Caching
Any Business Application Any Web or Mobile App
High Speed Transactions Time-Series
Business Applications
Analytics
Real-time Computations Time-Based Analysis
6
Example : Redis For Bid Management
The Application Problem
• Many users bidding on items
• Need to instantly show who’s
leading, in what order and by how
much
• May also need to display analytics
like how many users are bidding in
what range
• Disk-based DBMS-es are too slow for
real-time, high scale calculations
Why Redis Rocks This
• Sorted sets automatically keep list of
users and scores updated and in
order (ZADD)
• ZRANGE, ZREVRANGE will get your
top users
• ZRANK will get any users rank
instantaneously
• ZCOUNT will return a count of users
in a range,
• ZRANGEBYSCORE will return all the
users in a range by their bids
7
Redis Sorted Sets
ZADD item:1 10000 id:2 21000 id: 1
ZADD item:1 34000 id:3 35000 id 4
ZINCRBY item1:1 10000 id:3
ZREVRANGE item:1 0 0
id:3
Item: 1
id:3 44000
id:4 35000
id:1
id:2
21000
10000
8
Example : Redis For Recommendations
The Application Problem
• Users, items, likes, dislikes, similarities
• Set comparisons of user likes, user
dislikes should help create similarity
scores, which can then be stored in a
sorted set
• Set comparisons of similar user
likes/dislikes with items not purchased
by current user should yield suggestions
• High speed and low latency
requirements
Why Redis Rocks This
• Redis Sets are unordered collections
of strings- SADD to add objects to
each tag
• Set operations executed in –
memory, blazing fast speeds
• SINTER, SINTERSTORE to intersect
multiple sets
• SUNIONSTORE to add multiple sets
• SISMEMBER to determine membership,
SMEMBERS to retrieve all values
• Sets and Sorted sets combined are a
great choice for recommendation
engines
9
Redis Sets
SADD item:1 tag:1 tag:22 tag:24
SADD tag:1 item:1
SADD tag: 2 item:22 item:14 item:3
SINTER tag1 tag2
item:3
SUNIONSTORE tag:x tag1 tag2
SMEMBERS tag:x
item:1 item:3 item:22 item:14 item:3
item 1 {tag:1, tag:22, tag:24}
{item:1, item:3}tag 1
{item:22, item:14, item: 3}tag 2
{item:1, item:22, item:14, item: 3}tag x
Redis & Spark
11Redis Labs proprietary & confidential information
Spark Operation w/o Redis
Read to RDD Deserialization Processing Serialization Write to RDD
Analytics & BI
1 2 3 4 5 6
Data SinkData Source
12Redis Labs proprietary & confidential information
Spark SQL &
Data Frame
Spark Operation with Redis
Data Source Serving Layer
Analytics & BI
1 2
Processing
Spark-Redis connector
Read
filtered/sorted
data
Write
filtered/sorted
data
13Redis Labs proprietary & confidential information
Accelerating Spark Time-Series with Redis
Redis is faster by upto 100 times compared to HDFS
and over 45 times compared to Tachyon or Spark
14
More Details About the Redis & Spark Integration
Github link: Spark-Redis Connector Package
https://github.com/RedisLabs/spark-redis
How to get started with Spark and Redis:
https://redislabs.com/solutions/spark-and-redis
Blog: https://redislabs.com/blog/connecting-spark-
and-redis
Cost Effective Analytics
16
Price/Performance of Memory Technology
17
Redis on Flash
Flash used as a RAM extender and NOT as persistent storage
18
How to Achieve Optimal Price/Performance
By dynamically setting RAM/Flash ratio Behind the scenes…
19
Single Server Results with Dell & Samsung NVMe
read
write
read
write
Avg: 2.04M ops/sec
Max: 2.14M ops/sec
Avg: 0.91msec
Max: 0.98 msec
% below 1msec: 100%
Avg: 313RMB / 9.4WMB
Max: 1.71RGB / 96WMB
Avg: 1.45Gbps (Tx) / 0.97Gbps (Rx)
Max: 1.6Gbps (Tx) / 1.2Gbps (Rx)
Test setup:
• Redis Labs Enterprise
Cluster v3.2
• Dell Xeon CPU E5-
2670 v3 @ 2.50GHz
• 4x Samsung NVMe
PM1725
• Memtier benchmark-
open source tool
• 100B object size
• 80% read
• 20% write
Throughput – ops/sec
Latency – msec
Disk Bandwidth – MB/sec
NW Bandwidth – Gb/sec
>2M Ops/sec, <1 ms latency, > 1GB disk bandwidth
20
Customer Example : Redis on Flash
• Genome dataset: 31TBs of raw data
• Optimized data set through encoding
and using Redis Hashes
• Resulting data runs high speed
analyses with 55GB of RAM
and 4.5TB of Flash
• 97% annual savings compared to a
pure RAM solution
Redis on RAM Redis on Flash
RAM Size 5TB 0.5TB
Flash size N/A 4.5TB
Servers
on AWS :
21x r3.8xlarge
on P8:
2x s822 LC
1yr costs $489,333 $15,677
P8 savings 97%
21
RLEC Flash on AWS SSDs - Customer Example
• Next gen community engagement
platform , >200 M unique users
per month
• Uses Redis as their only database
for handling 400k-1M user
requests/day
(peak of 500k messages/sec on AWS)
• RLEC Flash on AWS SSD instances
helps reduce operational costs by
up to 70%
“I am yet to encounter limits
with Redis Labs’ scalability. It
allows me to handle peaks in
traffic that grow 2000%
without any need to scale my
database infrastructure.”
Ishay Green
CTO
Spot.IM
Extending Redis Analytics
22
23
What Can Modules Do
23
• All modules are certified by Redis Labs for full compliance with OSS
Redis, Redis Cloud and Redis Labs Enterprise Cluster (RLEC)
Full Text Search Enhanced JSON Graph Operations Secondary Indexes
Linear Algebra SQL Support Image Processing
N-Dimension
Queries …
24
24
3.15
2.40
21.00
8.70
24.57
10.61
0.00
5.00
10.00
15.00
20.00
25.00
30.00
Full text search Prefix search
Average Latency (msec)
RLEC Elasticsearch Solr
20,045
6,831
690
3,686
621
3,133
0
5,000
10,000
15,000
20,000
25,000
Full text search Prefix search
Ops/sec
RLEC Elasticsearch Solr
85% higher
32x higher
7.8x faster 4.1x faster
redisearch
The world fastest text search engine
25
Redis Module Hub (www.redismodules.com)
26Redis Labs proprietary & confidential information
Next Steps
Learn More:
Redis with Spark: https://redislabs.com/solutions/spark-and-redis
Redis on Flash : https://redislabs.com/solutions/redis-for-very-large-
datasets
Redis Modules : www.redismodules.com
26
Home of Redis
Questions?
@socialeena

More Related Content

PDF
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
PPTX
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
PPTX
Lambda-less Stream Processing @Scale in LinkedIn
PPTX
Building Data Pipelines with Spark and StreamSets
PDF
What's new in SQL on Hadoop and Beyond
PDF
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
PPTX
Querying Druid in SQL with Superset
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Lambda-less Stream Processing @Scale in LinkedIn
Building Data Pipelines with Spark and StreamSets
What's new in SQL on Hadoop and Beyond
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Querying Druid in SQL with Superset

What's hot (20)

PDF
Strata San Jose 2017 - Ben Sharma Presentation
PDF
Building Data Intensive Analytic Application on Top of Delta Lakes
PPTX
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
PDF
Intro to databricks delta lake
PDF
Big Telco - Yousun Jeong
PPTX
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
PDF
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
PPTX
Spark in the Enterprise - 2 Years Later by Alan Saldich
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
PPTX
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
PPTX
Active Learning for Fraud Prevention
PDF
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
PDF
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
PPTX
Visual Mapping of Clickstream Data
PPTX
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
PPTX
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
PDF
Sherlock: an anomaly detection service on top of Druid
PDF
Data Pipelines with Spark & DataStax Enterprise
Strata San Jose 2017 - Ben Sharma Presentation
Building Data Intensive Analytic Application on Top of Delta Lakes
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Intro to databricks delta lake
Big Telco - Yousun Jeong
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
03-NOV-1510-Ognjen-Antonic-Telemach-stream-1
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
Spark in the Enterprise - 2 Years Later by Alan Saldich
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Active Learning for Fraud Prevention
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Visual Mapping of Clickstream Data
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Sherlock: an anomaly detection service on top of Druid
Data Pipelines with Spark & DataStax Enterprise
Ad

Viewers also liked (20)

PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
PDF
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
PDF
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
PPTX
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
PPTX
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
PDF
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
PDF
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
PDF
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
PDF
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
PDF
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
PDF
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
PPTX
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
PDF
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
PPTX
Explore big data at speed of thought with Spark 2.0 and Snappydata
PPTX
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Data Provenance Support in...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Joining the Club: Using Spark to Accelerate Big Data at Dollar Shave Club
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Data Science Track - Backstage to a Data Driven Culture...
Explore big data at speed of thought with Spark 2.0 and Snappydata
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Ad

Similar to Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redis and Spark, Dave Neilsen, Developer Relations, Redis Labs (20)

PPTX
What's new with enterprise Redis - Leena Joshi, Redis Labs
PDF
Running Analytics at the Speed of Your Business
PPTX
Real-time Analytics with Redis
PPTX
Add Redis to Postgres to Make Your Microservices Go Boom!
PDF
Redis as a Cache Boosting Performance and Scalability
PDF
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
PDF
Big Data LDN 2017: Delivering Instant Experience with Redid Enterprise
PPTX
RedisConf17 - Building Large High Performance Redis Databases with Redis Ente...
PPTX
10 Ways to Scale with Redis - LA Redis Meetup 2019
PPTX
Introduction to Redis
PPTX
Redis Labs and SQL Server
PDF
Steam Learn: An introduction to Redis
PDF
Redis everywhere - PHP London
PDF
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
PPTX
RedisConf17 - Redis Enterprise: Continuous Availability, Unlimited Scaling, S...
PPTX
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
PPTX
Microservices - Is it time to breakup?
PDF
An Introduction to Redis for Developers.pdf
PDF
Mini-Training: Redis
PDF
Redis Everywhere - Sunshine PHP
What's new with enterprise Redis - Leena Joshi, Redis Labs
Running Analytics at the Speed of Your Business
Real-time Analytics with Redis
Add Redis to Postgres to Make Your Microservices Go Boom!
Redis as a Cache Boosting Performance and Scalability
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2017: Delivering Instant Experience with Redid Enterprise
RedisConf17 - Building Large High Performance Redis Databases with Redis Ente...
10 Ways to Scale with Redis - LA Redis Meetup 2019
Introduction to Redis
Redis Labs and SQL Server
Steam Learn: An introduction to Redis
Redis everywhere - PHP London
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
RedisConf17 - Redis Enterprise: Continuous Availability, Unlimited Scaling, S...
Moving Beyond Cache by Yiftach Shoolman - Redis Day Bangalore 2020
Microservices - Is it time to breakup?
An Introduction to Redis for Developers.pdf
Mini-Training: Redis
Redis Everywhere - Sunshine PHP

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced Soft Computing BINUS July 2025.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Modernizing your data center with Dell and AMD
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation

Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redis and Spark, Dave Neilsen, Developer Relations, Redis Labs

  • 1. Home of Redis Analytics at the Speed of Business with Redis and Spark Leena Joshi VP Product Marketing
  • 2. 2 Who We Are The open source home and commercial provider of Redis Open source. The leading in-memory data structure store, supporting any high performance operational or analytic use case. Redis Cloud Available since mid-2013 6,100+ enterprise customers Redis Labs Enterprise Cluster (RLEC) Available since early-2015 100+ enterprise customers 50,000 + total customers
  • 3. 3 Redis is a Game Changer Simplicity (through Data Structures) Extensibility (through Redis Modules) Performance ListsSorted Sets Hashes Hyperlog-logs Geospatial Indexes Bitmaps SetsStrings Bit field
  • 4. Why Use Redis in Analytics
  • 5. 5 Popular Redis Use Cases Geo SearchData Ingestion Social Functionality Following, Followers, Relations Location-based ApplicationsHigh Throughput Buffering Job & Queue Caching Any Business Application Any Web or Mobile App High Speed Transactions Time-Series Business Applications Analytics Real-time Computations Time-Based Analysis
  • 6. 6 Example : Redis For Bid Management The Application Problem • Many users bidding on items • Need to instantly show who’s leading, in what order and by how much • May also need to display analytics like how many users are bidding in what range • Disk-based DBMS-es are too slow for real-time, high scale calculations Why Redis Rocks This • Sorted sets automatically keep list of users and scores updated and in order (ZADD) • ZRANGE, ZREVRANGE will get your top users • ZRANK will get any users rank instantaneously • ZCOUNT will return a count of users in a range, • ZRANGEBYSCORE will return all the users in a range by their bids
  • 7. 7 Redis Sorted Sets ZADD item:1 10000 id:2 21000 id: 1 ZADD item:1 34000 id:3 35000 id 4 ZINCRBY item1:1 10000 id:3 ZREVRANGE item:1 0 0 id:3 Item: 1 id:3 44000 id:4 35000 id:1 id:2 21000 10000
  • 8. 8 Example : Redis For Recommendations The Application Problem • Users, items, likes, dislikes, similarities • Set comparisons of user likes, user dislikes should help create similarity scores, which can then be stored in a sorted set • Set comparisons of similar user likes/dislikes with items not purchased by current user should yield suggestions • High speed and low latency requirements Why Redis Rocks This • Redis Sets are unordered collections of strings- SADD to add objects to each tag • Set operations executed in – memory, blazing fast speeds • SINTER, SINTERSTORE to intersect multiple sets • SUNIONSTORE to add multiple sets • SISMEMBER to determine membership, SMEMBERS to retrieve all values • Sets and Sorted sets combined are a great choice for recommendation engines
  • 9. 9 Redis Sets SADD item:1 tag:1 tag:22 tag:24 SADD tag:1 item:1 SADD tag: 2 item:22 item:14 item:3 SINTER tag1 tag2 item:3 SUNIONSTORE tag:x tag1 tag2 SMEMBERS tag:x item:1 item:3 item:22 item:14 item:3 item 1 {tag:1, tag:22, tag:24} {item:1, item:3}tag 1 {item:22, item:14, item: 3}tag 2 {item:1, item:22, item:14, item: 3}tag x
  • 11. 11Redis Labs proprietary & confidential information Spark Operation w/o Redis Read to RDD Deserialization Processing Serialization Write to RDD Analytics & BI 1 2 3 4 5 6 Data SinkData Source
  • 12. 12Redis Labs proprietary & confidential information Spark SQL & Data Frame Spark Operation with Redis Data Source Serving Layer Analytics & BI 1 2 Processing Spark-Redis connector Read filtered/sorted data Write filtered/sorted data
  • 13. 13Redis Labs proprietary & confidential information Accelerating Spark Time-Series with Redis Redis is faster by upto 100 times compared to HDFS and over 45 times compared to Tachyon or Spark
  • 14. 14 More Details About the Redis & Spark Integration Github link: Spark-Redis Connector Package https://github.com/RedisLabs/spark-redis How to get started with Spark and Redis: https://redislabs.com/solutions/spark-and-redis Blog: https://redislabs.com/blog/connecting-spark- and-redis
  • 17. 17 Redis on Flash Flash used as a RAM extender and NOT as persistent storage
  • 18. 18 How to Achieve Optimal Price/Performance By dynamically setting RAM/Flash ratio Behind the scenes…
  • 19. 19 Single Server Results with Dell & Samsung NVMe read write read write Avg: 2.04M ops/sec Max: 2.14M ops/sec Avg: 0.91msec Max: 0.98 msec % below 1msec: 100% Avg: 313RMB / 9.4WMB Max: 1.71RGB / 96WMB Avg: 1.45Gbps (Tx) / 0.97Gbps (Rx) Max: 1.6Gbps (Tx) / 1.2Gbps (Rx) Test setup: • Redis Labs Enterprise Cluster v3.2 • Dell Xeon CPU E5- 2670 v3 @ 2.50GHz • 4x Samsung NVMe PM1725 • Memtier benchmark- open source tool • 100B object size • 80% read • 20% write Throughput – ops/sec Latency – msec Disk Bandwidth – MB/sec NW Bandwidth – Gb/sec >2M Ops/sec, <1 ms latency, > 1GB disk bandwidth
  • 20. 20 Customer Example : Redis on Flash • Genome dataset: 31TBs of raw data • Optimized data set through encoding and using Redis Hashes • Resulting data runs high speed analyses with 55GB of RAM and 4.5TB of Flash • 97% annual savings compared to a pure RAM solution Redis on RAM Redis on Flash RAM Size 5TB 0.5TB Flash size N/A 4.5TB Servers on AWS : 21x r3.8xlarge on P8: 2x s822 LC 1yr costs $489,333 $15,677 P8 savings 97%
  • 21. 21 RLEC Flash on AWS SSDs - Customer Example • Next gen community engagement platform , >200 M unique users per month • Uses Redis as their only database for handling 400k-1M user requests/day (peak of 500k messages/sec on AWS) • RLEC Flash on AWS SSD instances helps reduce operational costs by up to 70% “I am yet to encounter limits with Redis Labs’ scalability. It allows me to handle peaks in traffic that grow 2000% without any need to scale my database infrastructure.” Ishay Green CTO Spot.IM
  • 23. 23 What Can Modules Do 23 • All modules are certified by Redis Labs for full compliance with OSS Redis, Redis Cloud and Redis Labs Enterprise Cluster (RLEC) Full Text Search Enhanced JSON Graph Operations Secondary Indexes Linear Algebra SQL Support Image Processing N-Dimension Queries …
  • 24. 24 24 3.15 2.40 21.00 8.70 24.57 10.61 0.00 5.00 10.00 15.00 20.00 25.00 30.00 Full text search Prefix search Average Latency (msec) RLEC Elasticsearch Solr 20,045 6,831 690 3,686 621 3,133 0 5,000 10,000 15,000 20,000 25,000 Full text search Prefix search Ops/sec RLEC Elasticsearch Solr 85% higher 32x higher 7.8x faster 4.1x faster redisearch The world fastest text search engine
  • 25. 25 Redis Module Hub (www.redismodules.com)
  • 26. 26Redis Labs proprietary & confidential information Next Steps Learn More: Redis with Spark: https://redislabs.com/solutions/spark-and-redis Redis on Flash : https://redislabs.com/solutions/redis-for-very-large- datasets Redis Modules : www.redismodules.com 26