SlideShare a Scribd company logo
© 2015 MapR Technologies 1© 2015 MapR Technologies
Taking Your Spark To Production Scale
© 2015 MapR Technologies 2
The Journey To Production Scale
Trials,
science projects
Large
mission-critical,
operational
deployments
© 2015 MapR Technologies 2
© 2015 MapR Technologies 3
Companies with Spark & MapR in Production
GLOBAL
TELECOM
HEALTHCARE
GLOBALFINANCIAL
SERVICES
© 2015 MapR Technologies 4
Key Issues
To Plan For
3
© 2015 MapR Technologies 5
Global Managed Security Services
delivered on Hadoop
Spark Stream processing used to
first check for known threats
Data next processed on Hadoop
using MLLib and GraphX
Additional SQL querying done via
Spark SQL
Security Intelligence Operations
Delivers Lightning Fast
Analytics for Clients
Building largest Hadoop cluster
in Australia
Real-time analytics using Spark on
MapR–reducing data loading time
from hours to minutes
Leverage multi-tenancy,
high-performance and reliability
of MapR
© 2015 MapR Technologies 7
Next-Gen Genomics
Develop flexible platform to keep up with fast
changing research techniques
POSIX file access lets bio-informaticians use
existing tools with open source tools (Spark)
Graph manipulations can be done reliably and
at scale using Spark
© 2015 MapR Technologies 8
Real-Time Customer Analytics
• MapR Data Lake stores both online
and archive data
• Spark on MapR reduced ETL
processing
• NFS moved data into the cluster
seamlessly
• 1/10th Total Cost of Ownership vs.
old way
• New customer onboarding cut from
months to weeks
© 2015 MapR Technologies 9
Databricks & MapR
Strategic Partnership
(since April 2014)
Support for the
complete Spark stack
Engineering & roadmap
collaboration
Back-end support
+
© 2015 MapR Technologies 10
The Most Complete Spark Environment
Spark SQL
(SQL)
Spark
Streaming
(Streaming)
MLlib
(Machine
learning)
GraphX (Graph
computation)
Foundation For Enterprise-Grade Spark
© 2015 MapR Technologies 11
DB
Operations
Real-Time and
Actionable
Analytics
Operations + Analytics on One Hadoop Platform with SQL Access
Mobile
application
server
Customer 360
dashboard
Churn analysis Product/service
optimization and
personalization
Real-time ad
targeting
Web application
server
Data exploration
(SQL)
• User profiles and state
• User interactions
• Real-time location data
• Web and mobile session state
• Comments/rankings
© 2015 MapR Technologies 12
Spark + MapR = Ready For Production Success
World-record performance on disk
High Performance
SLA-Driven Applications
• High availability
• Data protection
• Disaster recovery
Reliability for Production
Strategic partnership with
Databricks to ensure enterprise
support for the entire stack
24/7 Best-in-class Global
Support
MapR-DB + Spark = real-time analytics
Operational Data Store
© 2015 MapR Technologies 13
Free
On-Demand Training
www.mapr.com/training
© 2015 MapR Technologies 14
Self-Service Data Exploration
Data Agility with Less IT Required
Single SQL Interface for Structured
and Semi-Structured Data
© 2015 MapR Technologies 15
MapR Introduces 3 New Spark-Based
Quick Start Solutions
Real-Time Security Log Analytics
Time Series Analytics
Genome Sequencing
© 2015 MapR Technologies 16
Get Your Tattoo In The MapR Booth!
Show off your
Kickstart My Heart skills
and enter to win
Xbox 360 & Guitar Hero
© 2015 MapR Technologies 17
Top-Ranked NoSQL
Top-Ranked Hadoop
Distribution
Top-Ranked SQL-on Hadoop
Solution

More Related Content

PDF
IoT Use Cases with MapR
PPTX
MapR Streams and MapR Converged Data Platform
PPTX
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
PDF
Dchug m7-30 apr2013
PDF
MapR 5.2: Getting More Value from the MapR Converged Data Platform
PPTX
Evolving Beyond the Data Lake: A Story of Wind and Rain
PPTX
MapR and Cisco Make IT Better
IoT Use Cases with MapR
MapR Streams and MapR Converged Data Platform
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Dchug m7-30 apr2013
MapR 5.2: Getting More Value from the MapR Converged Data Platform
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR and Cisco Make IT Better

What's hot (20)

PPTX
3 Benefits of Multi-Temperature Data Management for Data Analytics
PPTX
Keys for Success from Streams to Queries
PPTX
CEP - simplified streaming architecture - Strata Singapore 2016
PDF
Real World Use Cases: Hadoop and NoSQL in Production
PPTX
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
PPTX
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
PPTX
NoSQL Application Development with JSON and MapR-DB
PDF
Open Source Innovations in the MapR Ecosystem Pack 2.0
PPTX
MapR on Azure: Getting Value from Big Data in the Cloud -
PPTX
MapR-DB – The First In-Hadoop Document Database
PPTX
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
PPTX
How Spark is Enabling the New Wave of Converged Cloud Applications
PPTX
MapR 5.2 Product Update
PPTX
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
PDF
Philly DB MapR Overview
PPTX
Log I am your father
PDF
Common and unique use cases for Apache Hadoop
PPTX
Big Data at your Desk with KNIME
PPTX
MapR 5.2: Getting More Value from the MapR Converged Community Edition
PDF
Deep Learning at Scale
3 Benefits of Multi-Temperature Data Management for Data Analytics
Keys for Success from Streams to Queries
CEP - simplified streaming architecture - Strata Singapore 2016
Real World Use Cases: Hadoop and NoSQL in Production
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
NoSQL Application Development with JSON and MapR-DB
Open Source Innovations in the MapR Ecosystem Pack 2.0
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR-DB – The First In-Hadoop Document Database
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR 5.2 Product Update
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Philly DB MapR Overview
Log I am your father
Common and unique use cases for Apache Hadoop
Big Data at your Desk with KNIME
MapR 5.2: Getting More Value from the MapR Converged Community Edition
Deep Learning at Scale
Ad

Viewers also liked (16)

PPTX
Machine Learning with Hadoop Boston hug 2012
PPTX
HBase backups and performance on MapR
PPTX
Inside MapR's M7
PPTX
Apache Drill – Hands-On SQL References
PPTX
Practical Machine Learning: Innovations in Recommendation Workshop
PPTX
Intro to Apache Spark by CTO of Twingo
PPTX
Real Time and Big Data – It’s About Time
PDF
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
PPTX
Intro to Apache Spark by Marco Vasquez
PDF
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
PDF
MapR & Skytree:
PPTX
Inside MapR's M7
PPTX
Introduction to Apache HBase, MapR Tables and Security
PDF
Apache Spark & Hadoop
PPTX
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
PPTX
Apache HBase Performance Tuning
Machine Learning with Hadoop Boston hug 2012
HBase backups and performance on MapR
Inside MapR's M7
Apache Drill – Hands-On SQL References
Practical Machine Learning: Innovations in Recommendation Workshop
Intro to Apache Spark by CTO of Twingo
Real Time and Big Data – It’s About Time
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Intro to Apache Spark by Marco Vasquez
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッション
MapR & Skytree:
Inside MapR's M7
Introduction to Apache HBase, MapR Tables and Security
Apache Spark & Hadoop
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
Apache HBase Performance Tuning
Ad

Similar to Spark & Hadoop at Production at Scale (20)

PDF
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
PPTX
How Experian increased insights with Hadoop
PDF
BDTC2015 databricks-辛湜-state of spark
PPTX
Integrating Hadoop into your enterprise IT environment
PDF
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
PDF
Hadoop and NoSQL joining forces by Dale Kim of MapR
PDF
Hadoop and the Future of SQL: Using BI Tools with Big Data
PPT
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
PDF
Key Considerations for Putting Hadoop in Production SlideShare
PDF
Big Data LDN 2017: How to leverage the cloud for Business Solutions
PDF
Meruvian - Introduction to MapR
PDF
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
PDF
Big Data & Open Source - Neil Jadhav
PPTX
Enabling Real-Time Business with Change Data Capture
PPTX
Spark + Hadoop Perfect together
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
PDF
Scaling up with Cisco Big Data: Data + Science = Data Science
PPTX
Spark and Hadoop Perfect Togeher by Arun Murthy
PPTX
Spark Summit EMEA - Arun Murthy's Keynote
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
How Experian increased insights with Hadoop
BDTC2015 databricks-辛湜-state of spark
Integrating Hadoop into your enterprise IT environment
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and the Future of SQL: Using BI Tools with Big Data
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Key Considerations for Putting Hadoop in Production SlideShare
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Meruvian - Introduction to MapR
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Big Data & Open Source - Neil Jadhav
Enabling Real-Time Business with Change Data Capture
Spark + Hadoop Perfect together
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Scaling up with Cisco Big Data: Data + Science = Data Science
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit EMEA - Arun Murthy's Keynote

More from MapR Technologies (20)

PPTX
Converging your data landscape
PPTX
ML Workshop 2: Machine Learning Model Comparison & Evaluation
PPTX
Self-Service Data Science for Leveraging ML & AI on All of Your Data
PPTX
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
PPTX
ML Workshop 1: A New Architecture for Machine Learning Logistics
PPTX
Machine Learning Success: The Key to Easier Model Management
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
PDF
Live Tutorial – Streaming Real-Time Events Using Apache APIs
PDF
Live Machine Learning Tutorial: Churn Prediction
PDF
An Introduction to the MapR Converged Data Platform
PPTX
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
PPTX
Best Practices for Data Convergence in Healthcare
PPTX
Geo-Distributed Big Data and Analytics
PPTX
MapR Product Update - Spring 2017
PPTX
Evolving from RDBMS to NoSQL + SQL
PDF
Handling the Extremes: Scaling and Streaming in Finance
PDF
Baptist Health: Solving Healthcare Problems with Big Data
PDF
The Keys to Digital Transformation
PDF
Insight Platforms Accelerate Digital Transformation
PPTX
Design Patterns for working with Fast Data
Converging your data landscape
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
ML Workshop 1: A New Architecture for Machine Learning Logistics
Machine Learning Success: The Key to Easier Model Management
Data Warehouse Modernization: Accelerating Time-To-Action
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Machine Learning Tutorial: Churn Prediction
An Introduction to the MapR Converged Data Platform
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Best Practices for Data Convergence in Healthcare
Geo-Distributed Big Data and Analytics
MapR Product Update - Spring 2017
Evolving from RDBMS to NoSQL + SQL
Handling the Extremes: Scaling and Streaming in Finance
Baptist Health: Solving Healthcare Problems with Big Data
The Keys to Digital Transformation
Insight Platforms Accelerate Digital Transformation
Design Patterns for working with Fast Data

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
KodekX | Application Modernization Development
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25 Week I
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?

Spark & Hadoop at Production at Scale

  • 1. © 2015 MapR Technologies 1© 2015 MapR Technologies Taking Your Spark To Production Scale
  • 2. © 2015 MapR Technologies 2 The Journey To Production Scale Trials, science projects Large mission-critical, operational deployments © 2015 MapR Technologies 2
  • 3. © 2015 MapR Technologies 3 Companies with Spark & MapR in Production GLOBAL TELECOM HEALTHCARE GLOBALFINANCIAL SERVICES
  • 4. © 2015 MapR Technologies 4 Key Issues To Plan For 3
  • 5. © 2015 MapR Technologies 5 Global Managed Security Services delivered on Hadoop Spark Stream processing used to first check for known threats Data next processed on Hadoop using MLLib and GraphX Additional SQL querying done via Spark SQL Security Intelligence Operations
  • 6. Delivers Lightning Fast Analytics for Clients Building largest Hadoop cluster in Australia Real-time analytics using Spark on MapR–reducing data loading time from hours to minutes Leverage multi-tenancy, high-performance and reliability of MapR
  • 7. © 2015 MapR Technologies 7 Next-Gen Genomics Develop flexible platform to keep up with fast changing research techniques POSIX file access lets bio-informaticians use existing tools with open source tools (Spark) Graph manipulations can be done reliably and at scale using Spark
  • 8. © 2015 MapR Technologies 8 Real-Time Customer Analytics • MapR Data Lake stores both online and archive data • Spark on MapR reduced ETL processing • NFS moved data into the cluster seamlessly • 1/10th Total Cost of Ownership vs. old way • New customer onboarding cut from months to weeks
  • 9. © 2015 MapR Technologies 9 Databricks & MapR Strategic Partnership (since April 2014) Support for the complete Spark stack Engineering & roadmap collaboration Back-end support +
  • 10. © 2015 MapR Technologies 10 The Most Complete Spark Environment Spark SQL (SQL) Spark Streaming (Streaming) MLlib (Machine learning) GraphX (Graph computation) Foundation For Enterprise-Grade Spark
  • 11. © 2015 MapR Technologies 11 DB Operations Real-Time and Actionable Analytics Operations + Analytics on One Hadoop Platform with SQL Access Mobile application server Customer 360 dashboard Churn analysis Product/service optimization and personalization Real-time ad targeting Web application server Data exploration (SQL) • User profiles and state • User interactions • Real-time location data • Web and mobile session state • Comments/rankings
  • 12. © 2015 MapR Technologies 12 Spark + MapR = Ready For Production Success World-record performance on disk High Performance SLA-Driven Applications • High availability • Data protection • Disaster recovery Reliability for Production Strategic partnership with Databricks to ensure enterprise support for the entire stack 24/7 Best-in-class Global Support MapR-DB + Spark = real-time analytics Operational Data Store
  • 13. © 2015 MapR Technologies 13 Free On-Demand Training www.mapr.com/training
  • 14. © 2015 MapR Technologies 14 Self-Service Data Exploration Data Agility with Less IT Required Single SQL Interface for Structured and Semi-Structured Data
  • 15. © 2015 MapR Technologies 15 MapR Introduces 3 New Spark-Based Quick Start Solutions Real-Time Security Log Analytics Time Series Analytics Genome Sequencing
  • 16. © 2015 MapR Technologies 16 Get Your Tattoo In The MapR Booth! Show off your Kickstart My Heart skills and enter to win Xbox 360 & Guitar Hero
  • 17. © 2015 MapR Technologies 17 Top-Ranked NoSQL Top-Ranked Hadoop Distribution Top-Ranked SQL-on Hadoop Solution

Editor's Notes

  • #3: JN: I’d make the arrow not absolutely vertical tilt steeply to the right
  • #6: http://www.thinkstockphotos.com/image/stock-photo-businessman-selecting-a-futuristic-padlock/179694007/popup?sq=Datacenter%20security/f=CPIHVX/s=Popularity
  • #8: http://www.thinkstockphotos.com/image/stock-photo-scientist-holding-test-tube/461208473/popup?sq=drug%20discovery/f=CPIHVX/s=Popularity a. Interested in Adam project - runs on top of Spark - for nextgen genomics - good whiteppaer - search for APche spark Adam Git hub for Adam - Notes - links to the whitepaper - AMPLab - http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-207.html Genomics realignment tool - crux - next ten gemonic medicine - allows you to much more quickly access and manage the alignment of data. b. Existing genomics pipeline - many weeks to realign - drill down augment and working through their chemical compounds - genetics can come and test their compound against the alignment c. Not the exact sections of then they cannot focus - back out and zoom back in on the right set of sequence. that process 6 weeks - geneticist - 1 day and shift it - 6 weeks to get to the change - Adam in a matter of hours - geneticists can do it themselves - whole team of HPC experts otherwise d. this is the case for almost all pharma companies - Novartis is also the same e. One tool in a bigger framework - several other use cases as well.
  • #9: Objectives: Razorsight's cloud-based predictive analytics software delivers insights to help communications service providers (CSPs) and media companies optimize operations and offer superior customer experiences. “As we grew as a company and big data evolved, there was a lot more data available,” explains Razorsight CTO Suren Nathan. “Today’s data has higher volumes and different structures. There are new types of devices generating data for the Internet of Things, mobile phones using broadband for apps, and VoIP.”   Challenges Razorsight’s old technology platform could not keep up with the demands and opportunities of this increasing volume of data. Storage costs were exploding and they wanted to be able to maintain performance and scalability at a lower cost. The prior platform was based on IBM Pure Data (Netezza) as a data warehouse appliance and couldn’t support high-speed data ingestion and reporting. Additionally it required having a separate database for each customer. Solution MapR came out on top for several reasons and is being used as the primary data store for online and archive data Having the flexibility of the full Spark stack as part of the Hadoop distribution was very important. Spark helps transition a large part of ETL processing 2. MapR provided production-class Hadoop with enterprise support 3. The NFS gateway was critical for easy, high-speed access. Business Impact Customers such as Virgin Mobile LA are reducing churn by tailoring campaigns to the right subscriber at the right time. Much lower TCO compared to IBM Netezza ( 1/8th the cost). Razorsight is able to also reduce on-boarding time for new customers from 4-6 months to 8-12 weeks.
  • #11: Spark execution engine rides on the Mesos framework and any distributed file system (in this case HDFS)
  • #15: Data Agility with No IT Intervention
  • #18: Just Thought You Might Want To Know….