SlideShare a Scribd company logo
© 2014 MapR Techno©lo g2i0e1s4 MapR Technologies 1 
Hadoop and NoSQL Joining Forces
© 2014 MapR Technologies 2 
Topics 
Big Data, Hadoop, and NoSQL 
The In-Hadoop Advantage 
NoSQL-on-Hadoop in Action 
Other In-Hadoop Examples 
Integrating with SQL
Big Data is Overwhelming Traditional Systems 
© 2014 MapR Technologies 3 
• Mission-critical reliability 
• Transaction guarantees 
• Deep security 
• Real-time performance 
• Backup and recovery 
• Interactive SQL 
• Rich analytics 
• Workload management 
• Data governance 
• Backup and recovery 
ENTERPRISE 
USERS 
Enterprise 
Data 
Architecture 
OPERATIONAL 
SYSTEMS 
ANALYTICAL 
SYSTEMS 
PRODUCTION 
REQUIREMENTS 
PRODUCTION 
REQUIREMENTS 
OUTSIDE SOURCES
High 
© 2014 MapR Technologies 4 
Scaling on Traditional Technologies 
Data volume, velocity 
Scale up to bigger, faster machines 
Data variety 
Extensive data modeling and ETL 
Low 
Low High
Data volume, velocity 
Low High 
NoSQL NoSQL NoSQL 
Data variety 
Low High 
© 2014 MapR Technologies 5 
Scaling on Newer Technologies 
Scale out with commodity hardware 
Use the right tool for unstructured, 
multi-structured, semi-structured, 
non-relational data
Hadoop and NoSQL Relieve the Pressure from Enterprise Systems 
Keys for Production Success 
1 Reliability and DR 
3 High performance 
© 2014 MapR Technologies 6 
OPERATIONAL 
SYSTEMS 
ANALYTICAL 
SYSTEMS 
ENTERPRISE 
USERS 
• Data staging 
• Archive 
• Data transformation 
• Data exploration 
• Streaming, 
interactions 
2 Interoperability 
4 
Supports operations 
and analytics 
+ NoSQL
© 2014 MapR Technologies 7 
You Already Know; 
• NoSQL is a class of databases that specialize in: 
– Scale-out on commodity servers – no application-level sharding 
– Flexible data models – no fixed schema required 
• Hadoop is a distributed platform designed for: 
– Storing/processing huge volumes of data cost-effectively 
– Spreading work across many servers (“divide and conquer”) 
Before we continue, let’s take a quick look back;
Google’s operational data store (BigTable) has enabled multiple revolutions 
within the company: 
© 2014 MapR Technologies 8 
What Would (Did) Google Do? 
2003 
GFS 
2004 
Web index is batch 
(GFS/MapReduce) 
2010 
Web index is real-time 
(BigTable) 
The transition from 
batch to real-time 
2004 
MapReduce 
2006 
BigTable 
The explosion in 
operational applications 
(1) 
(2)
© 2014 MapR Technologies 9 
Operations Vs. Analytics 
Operations (Databases) 
• Real-time 
• Reads/writes/updates 
• Current/recent data 
• Updated regularly 
• Fast inserts/updates 
• Large volumes of data 
Analytics (Hadoop) 
• Batch 
• Reports/Computations 
• Historical data 
• Generally non-volatile 
• Fast retrievals 
• Even larger volumes of data 
But is the data different?
© 2014 MapR Technologies 10 
Mobile 
application server 
Web 
application server 
Handling Multiple Workloads 
Analytics Operational 
Hadoop 
Data exploration 
(SQL) 
Operational NoSQL 
DBMS 
Batch import/export 
Customer 360 
dashboard 
Churn analysis 
(predictive analytics)
© 2014 MapR Technologies 11 
Mobile 
application server 
Product/service 
optimization and 
personalization 
Data exploration 
(SQL) 
Customer 360 
dashboard 
Churn analysis 
(predictive analytics) 
• Single cluster 
•High performance, low latency 
• Large-scale analytics 
• Enterprise-grade HA/DR 
•Unified file and table administration 
Real-time ad 
targeting 
Real-Time and Operational 
Actionable 
Analytics 
Web 
application server 
In-Hadoop Databases
© 2014 MapR Technologies 12 
Separate Clusters Versus Single Cluster 
Separate Hadoop and Database 
• Delays analyzing live data 
• Network traffic 
– Heavy bandwidth usage 
– Heavy cleanup upon error 
• Complexity 
– Higher maintenance, risk of error 
– More HA/DR administration 
– Risk to SLAs 
• Unnecessarily duplicated 
resources 
Consolidated Deployment 
• Real-time analysis/computation 
• Data locality 
– Reduced bandwidth utilization 
– Efficient divide-and-conquer analysis 
• Architectural simplicity 
– Lower risk of error 
– Lower administrative overhead 
• No unnecessary data/hardware 
duplication (except for HA/DR)
Databases on Direct Attached Storage (DAS) 
Advantages 
• Fast local file access 
• Lower cost vs. SAN/NAS 
© 2014 MapR Technologies 13
Databases on Networked Storage (SAN/NAS) 
Advantages 
• Snapshot/backup 
• Easy capacity expansion 
• Disaster recovery 
• Improved disk utilization 
• Seamless maintenance 
• Reliable 
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15 
Databases on Hadoop (“In-Hadoop”) 
Advantages 
• Benefits of DAS 
• Reduced complexity vs. 
SAN 
• Lower operational cost 
• Faster local file access 
• Easy capacity expansion 
• Dynamic storage utilization 
Hadoop
Lambda Architecture (lambda-architecture.net) 
© 2014 MapR Technologies 16 
BATCH VIEWS 
BATCH LAYER 
SERVING LAYER 
SPEED LAYER 
MERGE 
ALL DATA 
(HDFS) 
HADOOP 
BATCH 
RECOMPUTE 
PROCESS 
STREAM 
REAL-TIME VIEWS 
INCREMENT 
VIEWS 
STORM 
Partial 
aggregate 
REAL-TIME 
INCREMENT 
Partial 
aggregate 
Partial 
aggregate 
MERGED 
VIEW 
(HBASE) 
REAL-TIME DATA 
NEW DATA 
STREAM 
PRECOMPUTE 
VIEWS 
(MAPREDUCE)
© 2014 MapR Technologies 17 
Enterprise Data Hub Architecture 
Load more data 
sources 
Enrich data in Hadoop Analyze 
Offload / Enrich / 
Reload 
RELATIONAL, 
SAAS, 
MAINFRAME 
DOCUMENTS, 
EMAILS 
BLOGS, 
TWEETS, 
LINK DATA 
LOG FILES, 
CLICKSTREAMS 
MapR Control System (MCS) 
Hadoop User Experience (HUE) 
Batch Processing 
MR, YARN, Hive, Pig, etc. 
Interactive Querying 
Drill, Impala, Presto, etc. 
HBase other data stores 
MapR Data Platform 
MapR-DB Tables 
MAPR DISTRIBUTION INCLUDING HADOOP 
BI REPORTS AND 
APPLICATIONS 
High 
speed 
streaming 
DATA MARTS DATA WAREHOUSE 
PARSE, PROFILE, ETL 
LOAD 
REPLICATE, CDC 
STREAMING 
CLEANSE, MATCH 
LOAD
Customer data, network 
security event data 
Anomaly detection on 
large volumes of security 
event data, analytics on 
customer data to enable 
incremental sales 
© 2014 MapR Technologies 18
Industry data analysis, 
SaaS-based reporting 
© 2014 MapR Technologies 19 
Advertising 
Automation 
Cloud 
Buyers 
Cloud 
Sales performance 
management data 
combined with fast 
responsiveness SaaS-delivered 
reports
Customer profile data, 
customer behavior data 
Analytics on customer 
behavior for better 
recommendations 
© 2014 MapR Technologies 20 
Telecommunications Company
© 2014 MapR Technologies 21 
MapR Overview 
BIG 
DATA 
BEST 
PRODUCT 
BUSINESS 
IMPACT 
Hadoop 
Top Ranked 
Production 
Success
The Power of the Open Source Community 
Provisioning 
& 
coordination 
Savannah* 
Workflow 
& Data 
Governance 
Data 
Integration 
& Access 
Hue 
HttpFS 
Flume Knox* Falcon* 
© 2014 MapR Technologies 22 
MMaannaaggeemmeenntt 
APACHE HADOOP AND OSS ECOSYSTEM 
Streaming 
Storm* 
NoSQL & 
Search 
Solr 
MapR Data Platform 
Security 
SQL 
Drill* 
Shark 
Impala 
YARN 
Batch 
Spark 
Cascading 
Pig 
Spark 
Streaming 
HBase 
Juju 
ML, Graph 
GraphX 
MLLib 
Mahout 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Tez* 
Accumulo* 
Hive 
Sqoop Sentry* Oozie ZooKeeper 
MapR-DB MapR-FS 
* Certification/support planned for 2014
MapR-DB: Powerful NoSQL Integrated with Hadoop 
Benefit Features 
High Performance Over 1 million ops/sec with 10 nodes, in-memory processing 
Continuous Low Latency No I/O storms, no compaction delays 
© 2014 MapR Technologies 
24x7 Applications 
Instant recovery, online schema modification, snapshots, 
mirroring 
Consistency Strong data consistency, row-level ACID transactions 
Simplified Database 
Administration 
No processes to manage, automated splits, self-tuning 
High Scalability 1 trillion tables, trillions of rows, millions of columns 
Low TCO Files and tables on one platform, more work with fewer nodes 
Performance 
Reliability 
Easy 
Administration
MapR-DB (in MapR Enterprise Database Edition) 
© 2014 MapR Technologies 24 
MapR-DB 
 NoSQL Table-Style Store 
 Apache HBase API 
 In-Hadoop Database 
HBase 
JVM 
HDFS 
JVM 
ext3/ext4 
Disks 
Other Distros 
Tables/Files 
Disks 
MapR 
Fast, scalable, reliable. 
HBase API, in-memory option, Hadoop integration.
© 2014 MapR Technologies 
Consistent, Low Read Latency 
--- MapR-DB Read Latency --- Other’s Read Latency
© 2014 MapR Technologies 26 
Other In-Hadoop Database Technologies 
• Databases in Hadoop 
– Apache HBase 
– Apache Accumulo 
– Splice Machine 
– MarkLogic 
• Data Warehouses on Hadoop 
– HP Vertica 
– Pivotal HAWQ
© 2014 MapR Technologies 27 
What Other Trends? 
• SQL query engines 
– Apache Drill 
– Impala 
– Presto 
– Etc. 
• In-memory processing 
– GridGain 
– Apache Spark 
– HAMRTech
SQL Query Engines for Hadoop and NoSQL Together 
© 2014 MapR Technologies 28 
Impala
• Pioneering Data Agility for Hadoop 
• Apache open source project 
• Scale-out execution engine for low-latency queries 
• Unified SQL-based API for analytics  operational applications 
© 2014 MapR Technologies 29 
APACHE DRILL 
Vibrant Community 
40+ contributors 
150+ years of experience building 
databases and distributed systems
© 2014 MapR Technologies 30 
Q  A 
Engage with us! 
@mapr maprtech 
dalekim@mapr.com 
MapR 
maprtech 
mapr-technologies

More Related Content

PPTX
Zeta Architecture: The Next Generation Big Data Architecture
PDF
Key trends in Big Data and new reference architecture from Hewlett Packard En...
PPTX
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
PPTX
The DAP - Where YARN, HBase, Kafka and Spark go to Production
PPTX
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
PDF
SAP HORTONWORKS
PPTX
Introduction to Kudu - StampedeCon 2016
PDF
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Zeta Architecture: The Next Generation Big Data Architecture
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
The DAP - Where YARN, HBase, Kafka and Spark go to Production
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
SAP HORTONWORKS
Introduction to Kudu - StampedeCon 2016
Simplifying Big Data Integration with Syncsort DMX and DMX-h

What's hot (20)

PPTX
Big Data Simplified - Is all about Ab'strakSHeN
PDF
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
PPTX
Real-time Data Pipelines with SAP and Apache Kafka
PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
PDF
IBM Power8 announce
PDF
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
PPTX
Format Wars: from VHS and Beta to Avro and Parquet
PDF
Innovation in the Data Warehouse - StampedeCon 2016
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
PDF
Common and unique use cases for Apache Hadoop
PPTX
How Experian increased insights with Hadoop
PPTX
Insights into Real World Data Management Challenges
PDF
Filling the Data Lake
PDF
Big Data Architecture and Deployment
PPTX
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
PPTX
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
PPTX
MapR on Azure: Getting Value from Big Data in the Cloud -
PPTX
LLAP: Sub-Second Analytical Queries in Hive
PPTX
Solving Performance Problems on Hadoop
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Big Data Simplified - Is all about Ab'strakSHeN
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
Real-time Data Pipelines with SAP and Apache Kafka
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IBM Power8 announce
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
Format Wars: from VHS and Beta to Avro and Parquet
Innovation in the Data Warehouse - StampedeCon 2016
High Performance Spatial-Temporal Trajectory Analysis with Spark
Common and unique use cases for Apache Hadoop
How Experian increased insights with Hadoop
Insights into Real World Data Management Challenges
Filling the Data Lake
Big Data Architecture and Deployment
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
MapR on Azure: Getting Value from Big Data in the Cloud -
LLAP: Sub-Second Analytical Queries in Hive
Solving Performance Problems on Hadoop
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Ad

Viewers also liked (20)

PDF
140614 bigdatacamp-la-keynote-jon hsieh
PPTX
La big datacamp2014_vikram_dixit
PPTX
2014 bigdatacamp asya_kamsky
PDF
Ag big datacampla-06-14-2014-ajay_gopal
PPT
Big datacamp june14_alex_liu
PPTX
Summit v4 dave wolcott
PDF
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
PDF
20140614 introduction to spark-ben white
PDF
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
PDF
Yarn cloudera-kathleenting061414 kate-ting
PDF
Aziksa hadoop for buisness users2 santosh jha
PDF
Kiji cassandra la june 2014 - v02 clint-kelly
PPTX
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
PDF
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
PPTX
Hadoop Innovation Summit 2014
PPTX
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
PPTX
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
PPTX
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
PDF
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
140614 bigdatacamp-la-keynote-jon hsieh
La big datacamp2014_vikram_dixit
2014 bigdatacamp asya_kamsky
Ag big datacampla-06-14-2014-ajay_gopal
Big datacamp june14_alex_liu
Summit v4 dave wolcott
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
20140614 introduction to spark-ben white
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Yarn cloudera-kathleenting061414 kate-ting
Aziksa hadoop for buisness users2 santosh jha
Kiji cassandra la june 2014 - v02 clint-kelly
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Hadoop Innovation Summit 2014
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Ad

Similar to Hadoop and NoSQL joining forces by Dale Kim of MapR (20)

PPTX
Delivering on the Hadoop/HBase Integrated Architecture
PDF
Meruvian - Introduction to MapR
PPTX
Integrating Hadoop into your enterprise IT environment
PPTX
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
PDF
Key Considerations for Putting Hadoop in Production SlideShare
PPTX
IoT and Big Data - Iot Asia 2014
PPTX
MapR-DB – The First In-Hadoop Document Database
PDF
Drill into Drill – How Providing Flexibility and Performance is Possible
PPTX
Powering the "As it Happens" Business
PDF
Self-Service Data Exploration with Apache Drill
PPTX
The power of hadoop in business
PPSX
Big Data Basic Concepts | Presented in 2014
PDF
Hadoop and the Future of SQL: Using BI Tools with Big Data
PDF
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
PPTX
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
PPT
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
PPTX
Hadoop: Revolutionizing Analytics AND Operations
PPTX
MongoDB & Hadoop - Understanding Your Big Data
PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
PDF
Realtime analytics with_hadoop
Delivering on the Hadoop/HBase Integrated Architecture
Meruvian - Introduction to MapR
Integrating Hadoop into your enterprise IT environment
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Key Considerations for Putting Hadoop in Production SlideShare
IoT and Big Data - Iot Asia 2014
MapR-DB – The First In-Hadoop Document Database
Drill into Drill – How Providing Flexibility and Performance is Possible
Powering the "As it Happens" Business
Self-Service Data Exploration with Apache Drill
The power of hadoop in business
Big Data Basic Concepts | Presented in 2014
Hadoop and the Future of SQL: Using BI Tools with Big Data
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Hadoop: Revolutionizing Analytics AND Operations
MongoDB & Hadoop - Understanding Your Big Data
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Realtime analytics with_hadoop

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Modernizing your data center with Dell and AMD
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Monthly Chronicles - July 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Hadoop and NoSQL joining forces by Dale Kim of MapR

  • 1. © 2014 MapR Techno©lo g2i0e1s4 MapR Technologies 1 Hadoop and NoSQL Joining Forces
  • 2. © 2014 MapR Technologies 2 Topics Big Data, Hadoop, and NoSQL The In-Hadoop Advantage NoSQL-on-Hadoop in Action Other In-Hadoop Examples Integrating with SQL
  • 3. Big Data is Overwhelming Traditional Systems © 2014 MapR Technologies 3 • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery ENTERPRISE USERS Enterprise Data Architecture OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  • 4. High © 2014 MapR Technologies 4 Scaling on Traditional Technologies Data volume, velocity Scale up to bigger, faster machines Data variety Extensive data modeling and ETL Low Low High
  • 5. Data volume, velocity Low High NoSQL NoSQL NoSQL Data variety Low High © 2014 MapR Technologies 5 Scaling on Newer Technologies Scale out with commodity hardware Use the right tool for unstructured, multi-structured, semi-structured, non-relational data
  • 6. Hadoop and NoSQL Relieve the Pressure from Enterprise Systems Keys for Production Success 1 Reliability and DR 3 High performance © 2014 MapR Technologies 6 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions 2 Interoperability 4 Supports operations and analytics + NoSQL
  • 7. © 2014 MapR Technologies 7 You Already Know; • NoSQL is a class of databases that specialize in: – Scale-out on commodity servers – no application-level sharding – Flexible data models – no fixed schema required • Hadoop is a distributed platform designed for: – Storing/processing huge volumes of data cost-effectively – Spreading work across many servers (“divide and conquer”) Before we continue, let’s take a quick look back;
  • 8. Google’s operational data store (BigTable) has enabled multiple revolutions within the company: © 2014 MapR Technologies 8 What Would (Did) Google Do? 2003 GFS 2004 Web index is batch (GFS/MapReduce) 2010 Web index is real-time (BigTable) The transition from batch to real-time 2004 MapReduce 2006 BigTable The explosion in operational applications (1) (2)
  • 9. © 2014 MapR Technologies 9 Operations Vs. Analytics Operations (Databases) • Real-time • Reads/writes/updates • Current/recent data • Updated regularly • Fast inserts/updates • Large volumes of data Analytics (Hadoop) • Batch • Reports/Computations • Historical data • Generally non-volatile • Fast retrievals • Even larger volumes of data But is the data different?
  • 10. © 2014 MapR Technologies 10 Mobile application server Web application server Handling Multiple Workloads Analytics Operational Hadoop Data exploration (SQL) Operational NoSQL DBMS Batch import/export Customer 360 dashboard Churn analysis (predictive analytics)
  • 11. © 2014 MapR Technologies 11 Mobile application server Product/service optimization and personalization Data exploration (SQL) Customer 360 dashboard Churn analysis (predictive analytics) • Single cluster •High performance, low latency • Large-scale analytics • Enterprise-grade HA/DR •Unified file and table administration Real-time ad targeting Real-Time and Operational Actionable Analytics Web application server In-Hadoop Databases
  • 12. © 2014 MapR Technologies 12 Separate Clusters Versus Single Cluster Separate Hadoop and Database • Delays analyzing live data • Network traffic – Heavy bandwidth usage – Heavy cleanup upon error • Complexity – Higher maintenance, risk of error – More HA/DR administration – Risk to SLAs • Unnecessarily duplicated resources Consolidated Deployment • Real-time analysis/computation • Data locality – Reduced bandwidth utilization – Efficient divide-and-conquer analysis • Architectural simplicity – Lower risk of error – Lower administrative overhead • No unnecessary data/hardware duplication (except for HA/DR)
  • 13. Databases on Direct Attached Storage (DAS) Advantages • Fast local file access • Lower cost vs. SAN/NAS © 2014 MapR Technologies 13
  • 14. Databases on Networked Storage (SAN/NAS) Advantages • Snapshot/backup • Easy capacity expansion • Disaster recovery • Improved disk utilization • Seamless maintenance • Reliable © 2014 MapR Technologies 14
  • 15. © 2014 MapR Technologies 15 Databases on Hadoop (“In-Hadoop”) Advantages • Benefits of DAS • Reduced complexity vs. SAN • Lower operational cost • Faster local file access • Easy capacity expansion • Dynamic storage utilization Hadoop
  • 16. Lambda Architecture (lambda-architecture.net) © 2014 MapR Technologies 16 BATCH VIEWS BATCH LAYER SERVING LAYER SPEED LAYER MERGE ALL DATA (HDFS) HADOOP BATCH RECOMPUTE PROCESS STREAM REAL-TIME VIEWS INCREMENT VIEWS STORM Partial aggregate REAL-TIME INCREMENT Partial aggregate Partial aggregate MERGED VIEW (HBASE) REAL-TIME DATA NEW DATA STREAM PRECOMPUTE VIEWS (MAPREDUCE)
  • 17. © 2014 MapR Technologies 17 Enterprise Data Hub Architecture Load more data sources Enrich data in Hadoop Analyze Offload / Enrich / Reload RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS BLOGS, TWEETS, LINK DATA LOG FILES, CLICKSTREAMS MapR Control System (MCS) Hadoop User Experience (HUE) Batch Processing MR, YARN, Hive, Pig, etc. Interactive Querying Drill, Impala, Presto, etc. HBase other data stores MapR Data Platform MapR-DB Tables MAPR DISTRIBUTION INCLUDING HADOOP BI REPORTS AND APPLICATIONS High speed streaming DATA MARTS DATA WAREHOUSE PARSE, PROFILE, ETL LOAD REPLICATE, CDC STREAMING CLEANSE, MATCH LOAD
  • 18. Customer data, network security event data Anomaly detection on large volumes of security event data, analytics on customer data to enable incremental sales © 2014 MapR Technologies 18
  • 19. Industry data analysis, SaaS-based reporting © 2014 MapR Technologies 19 Advertising Automation Cloud Buyers Cloud Sales performance management data combined with fast responsiveness SaaS-delivered reports
  • 20. Customer profile data, customer behavior data Analytics on customer behavior for better recommendations © 2014 MapR Technologies 20 Telecommunications Company
  • 21. © 2014 MapR Technologies 21 MapR Overview BIG DATA BEST PRODUCT BUSINESS IMPACT Hadoop Top Ranked Production Success
  • 22. The Power of the Open Source Community Provisioning & coordination Savannah* Workflow & Data Governance Data Integration & Access Hue HttpFS Flume Knox* Falcon* © 2014 MapR Technologies 22 MMaannaaggeemmeenntt APACHE HADOOP AND OSS ECOSYSTEM Streaming Storm* NoSQL & Search Solr MapR Data Platform Security SQL Drill* Shark Impala YARN Batch Spark Cascading Pig Spark Streaming HBase Juju ML, Graph GraphX MLLib Mahout MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Tez* Accumulo* Hive Sqoop Sentry* Oozie ZooKeeper MapR-DB MapR-FS * Certification/support planned for 2014
  • 23. MapR-DB: Powerful NoSQL Integrated with Hadoop Benefit Features High Performance Over 1 million ops/sec with 10 nodes, in-memory processing Continuous Low Latency No I/O storms, no compaction delays © 2014 MapR Technologies 24x7 Applications Instant recovery, online schema modification, snapshots, mirroring Consistency Strong data consistency, row-level ACID transactions Simplified Database Administration No processes to manage, automated splits, self-tuning High Scalability 1 trillion tables, trillions of rows, millions of columns Low TCO Files and tables on one platform, more work with fewer nodes Performance Reliability Easy Administration
  • 24. MapR-DB (in MapR Enterprise Database Edition) © 2014 MapR Technologies 24 MapR-DB NoSQL Table-Style Store Apache HBase API In-Hadoop Database HBase JVM HDFS JVM ext3/ext4 Disks Other Distros Tables/Files Disks MapR Fast, scalable, reliable. HBase API, in-memory option, Hadoop integration.
  • 25. © 2014 MapR Technologies Consistent, Low Read Latency --- MapR-DB Read Latency --- Other’s Read Latency
  • 26. © 2014 MapR Technologies 26 Other In-Hadoop Database Technologies • Databases in Hadoop – Apache HBase – Apache Accumulo – Splice Machine – MarkLogic • Data Warehouses on Hadoop – HP Vertica – Pivotal HAWQ
  • 27. © 2014 MapR Technologies 27 What Other Trends? • SQL query engines – Apache Drill – Impala – Presto – Etc. • In-memory processing – GridGain – Apache Spark – HAMRTech
  • 28. SQL Query Engines for Hadoop and NoSQL Together © 2014 MapR Technologies 28 Impala
  • 29. • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics operational applications © 2014 MapR Technologies 29 APACHE DRILL Vibrant Community 40+ contributors 150+ years of experience building databases and distributed systems
  • 30. © 2014 MapR Technologies 30 Q A Engage with us! @mapr maprtech dalekim@mapr.com MapR maprtech mapr-technologies