SlideShare a Scribd company logo
© 2015 MapR Technologies© 2015 MapR Technologies
MapR-Elasticsearch Integration
Brian Parkison
bparkison@maprtech.com
© 2015 MapR Technologies
Agenda
Introduction to MapR-DB
Integration with Elasticsearch
Extending to other external sinks
© 2015 MapR Technologies© 2015 MapR Technologies
Introduction to MapR-DB
© 2015 MapR Technologies 4
MapR-DB and MapR Enterprise Database Edition
Hadoop-Enabled
Architecture
Proven Production Success
Consistent High Performance
at Any Scale
•True operational, real-time
analytics
•Reduced complexity and cost
versus separate clusters
deployments
•Leverage existing HBase
skillsets/expertise
•Zero downtime
•Zero database administration
•Integrated security
•High throughput
•Continuous low latency
•Extreme database scalability
© 2015 MapR Technologies 5
MapR-DB: The Best In-Hadoop Database
▪
▪
▪
HBase
JVM
HDFS
JVM
ext3/ext4
Disks
Typical Stack
Tables/Files
MapR-DB
The most scalable, enterprise-grade,
NoSQL database that supports online applications and analytics
© 2015 MapR Technologies 6
Consistent, Low Read Latency
--- MapR-DB Read Latency --- Other’s Read Latency
© 2015 MapR Technologies© 2015 MapR Technologies
Integration with Elasticsearch
© 2015 MapR Technologies
Motivations
Key-value stores are very fast, but don’t provide efficient searches for anything
but the row key
- Secondary indices don’t allow for the types of free-form text searches ES supports
Customer demand to be able to do complex queries of their data, without:
- Full table scans
- Application-level in-sync updates to Elasticsearch
Provide near real-time access to MapR-DB data in ES queries
- at internet-scale volumes
Robust fault-tolerant architecture with no changes to operational application
© 2015 MapR Technologies
Inside Lonely Planet
Lets take THIS And Lets Come up with THIS
● What activities can I do on a ‘$’ budget?
● Where can I see some “Howler monkeys”?
● Can I zipline with arenal volcano views?
● Tell me some “$$”, mexican restaurants
that are within 5 miles of Monteverde.
● Give me all the hotels with 8.5 or higher
user rating that are less than 5 years old.
© 2015 MapR Technologies
MapR
Server
Volume 1
Volume 1
DB Client Operations
ES Client
operations
MapR-DB Cluster
Volume 1
Tabl
e 1
Tabl
e 2
Tabl
e n
MapR
Server
Gateway
Nodes
Replication
Gateway &
ES Client
Repl Stream Write ES
Cluster
Replica
Replication Architecture
MapR-DB
Write
© 2015 MapR Technologies
Elasticsearch client management
Supports both node and transport Elasticsearch clients
Flushed to Elasticsearch every 128KB, as part of a BulkRequest
Multiple ES clients started on each Gateway
- Flushes can happen in parallel
- Ordering maintained for any individual row/document
© 2015 MapR Technologies
Data Type Conversions
Gateway converts change-log stream into JSON documents and builds an
UpdateRequest
ES assumes everything is a string, by default
Add a mapping to Elasticsearch before setting up replication
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html
Use a custom conversion class
- Implement com.mapr.fs.external.es.MapRESConverter
- Provide path to JAR file and name when setting up replication
- May need to do this in conjunction with manually creating a mapping in ES
© 2015 MapR Technologies© 2015 MapR Technologies
Future Extensions
© 2015 MapR Technologies
Supporting other External Sinks
ES integration is a specific implementation of a generic external replication
interface
Could easily add support for other sinks in the future
- e.g. Spark streaming
Could also open up the API, allowing others access to the replication stream
© 2015 MapR Technologies© 2015 MapR Technologies
Q & A
© 2015 MapR Technologies
Find my presentation and other related resources here:
http://events.mapr.com/Elastic
(you can also find this link in the event’s page at meetup.com)
Slidecks
Whiteboard & demo
videos
Free Hadoop Sandbox
Free On-Demand Training
Free eBooks
And more...

More Related Content

PPTX
Real time analytics using Hadoop and Elasticsearch
PDF
High-Scale Entity Resolution in Hadoop
PDF
Application Architectures with Hadoop
PPTX
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
PDF
Spark Uber Development Kit
PPTX
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
PPTX
Build Big Data Enterprise solutions faster on Azure HDInsight
PPTX
Data Science at Scale Using Apache Spark and Apache Hadoop
Real time analytics using Hadoop and Elasticsearch
High-Scale Entity Resolution in Hadoop
Application Architectures with Hadoop
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Spark Uber Development Kit
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Build Big Data Enterprise solutions faster on Azure HDInsight
Data Science at Scale Using Apache Spark and Apache Hadoop

What's hot (20)

PPTX
PPTX
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PPTX
Big Data in the Cloud - The What, Why and How from the Experts
PDF
HPE Hadoop Solutions - From use cases to proposal
PDF
Impala use case @ Zoosk
PPTX
Hadoop in the Cloud - The what, why and how from the experts
PPTX
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
PPTX
Hadoop in the Cloud – The What, Why and How from the Experts
PPTX
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
PDF
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
PPTX
Empower Data-Driven Organizations
PPTX
DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...
PDF
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
PPTX
Spark Technology Center IBM
PDF
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
PPTX
Deep Learning using Spark and DL4J for fun and profit
PPTX
Real time fraud detection at 1+M scale on hadoop stack
PDF
Apache Flink & Kudu: a connector to develop Kappa architectures
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Big Data in the Cloud - The What, Why and How from the Experts
HPE Hadoop Solutions - From use cases to proposal
Impala use case @ Zoosk
Hadoop in the Cloud - The what, why and how from the experts
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hadoop in the Cloud – The What, Why and How from the Experts
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Empower Data-Driven Organizations
DataStax | DataStax Enterprise Advanced Replication (Brian Hess & Cliff Gilmo...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Spark Technology Center IBM
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Deep Learning using Spark and DL4J for fun and profit
Real time fraud detection at 1+M scale on hadoop stack
Apache Flink & Kudu: a connector to develop Kappa architectures
Ad

Similar to MapR-DB Elasticsearch Integration (20)

PPTX
PDF
Meruvian - Introduction to MapR
PDF
Hadoop and NoSQL joining forces by Dale Kim of MapR
PPTX
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
PPTX
Integrating Hadoop into your enterprise IT environment
PPTX
MapR 5.2: Getting More Value from the MapR Converged Community Edition
PDF
Key Considerations for Putting Hadoop in Production SlideShare
PPTX
Powering the "As it Happens" Business
PPTX
MapR-DB – The First In-Hadoop Document Database
PDF
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
PPTX
How Experian increased insights with Hadoop
PDF
Drill into Drill – How Providing Flexibility and Performance is Possible
PDF
Cmu 2011 09.pptx
PPTX
The power of hadoop in business
PDF
MapR 5.2: Getting More Value from the MapR Converged Data Platform
PDF
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
PPTX
Analyzing Real-World Data with Apache Drill
PDF
Self-Service Data Exploration with Apache Drill
PPTX
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
PPTX
Dealing with an Upside Down Internet With High Performance Time Series Database
Meruvian - Introduction to MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Integrating Hadoop into your enterprise IT environment
MapR 5.2: Getting More Value from the MapR Converged Community Edition
Key Considerations for Putting Hadoop in Production SlideShare
Powering the "As it Happens" Business
MapR-DB – The First In-Hadoop Document Database
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
How Experian increased insights with Hadoop
Drill into Drill – How Providing Flexibility and Performance is Possible
Cmu 2011 09.pptx
The power of hadoop in business
MapR 5.2: Getting More Value from the MapR Converged Data Platform
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Analyzing Real-World Data with Apache Drill
Self-Service Data Exploration with Apache Drill
The Hive Think Tank: "Stream Processing Systems" by M.C. Srivas of MapR
Dealing with an Upside Down Internet With High Performance Time Series Database
Ad

More from MapR Technologies (20)

PPTX
Converging your data landscape
PPTX
ML Workshop 2: Machine Learning Model Comparison & Evaluation
PPTX
Self-Service Data Science for Leveraging ML & AI on All of Your Data
PPTX
Enabling Real-Time Business with Change Data Capture
PPTX
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
PPTX
ML Workshop 1: A New Architecture for Machine Learning Logistics
PPTX
Machine Learning Success: The Key to Easier Model Management
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
PDF
Live Tutorial – Streaming Real-Time Events Using Apache APIs
PPTX
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
PDF
Live Machine Learning Tutorial: Churn Prediction
PDF
An Introduction to the MapR Converged Data Platform
PPTX
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
PPTX
Best Practices for Data Convergence in Healthcare
PPTX
Geo-Distributed Big Data and Analytics
PPTX
MapR Product Update - Spring 2017
PPTX
3 Benefits of Multi-Temperature Data Management for Data Analytics
PPTX
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
PPTX
MapR and Cisco Make IT Better
PPTX
Evolving from RDBMS to NoSQL + SQL
Converging your data landscape
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Enabling Real-Time Business with Change Data Capture
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
ML Workshop 1: A New Architecture for Machine Learning Logistics
Machine Learning Success: The Key to Easier Model Management
Data Warehouse Modernization: Accelerating Time-To-Action
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Live Machine Learning Tutorial: Churn Prediction
An Introduction to the MapR Converged Data Platform
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Best Practices for Data Convergence in Healthcare
Geo-Distributed Big Data and Analytics
MapR Product Update - Spring 2017
3 Benefits of Multi-Temperature Data Management for Data Analytics
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR and Cisco Make IT Better
Evolving from RDBMS to NoSQL + SQL

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Machine Learning_overview_presentation.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Review of recent advances in non-invasive hemoglobin estimation
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
Machine Learning_overview_presentation.pptx
sap open course for s4hana steps from ECC to s4
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf

MapR-DB Elasticsearch Integration

  • 1. © 2015 MapR Technologies© 2015 MapR Technologies MapR-Elasticsearch Integration Brian Parkison bparkison@maprtech.com
  • 2. © 2015 MapR Technologies Agenda Introduction to MapR-DB Integration with Elasticsearch Extending to other external sinks
  • 3. © 2015 MapR Technologies© 2015 MapR Technologies Introduction to MapR-DB
  • 4. © 2015 MapR Technologies 4 MapR-DB and MapR Enterprise Database Edition Hadoop-Enabled Architecture Proven Production Success Consistent High Performance at Any Scale •True operational, real-time analytics •Reduced complexity and cost versus separate clusters deployments •Leverage existing HBase skillsets/expertise •Zero downtime •Zero database administration •Integrated security •High throughput •Continuous low latency •Extreme database scalability
  • 5. © 2015 MapR Technologies 5 MapR-DB: The Best In-Hadoop Database ▪ ▪ ▪ HBase JVM HDFS JVM ext3/ext4 Disks Typical Stack Tables/Files MapR-DB The most scalable, enterprise-grade, NoSQL database that supports online applications and analytics
  • 6. © 2015 MapR Technologies 6 Consistent, Low Read Latency --- MapR-DB Read Latency --- Other’s Read Latency
  • 7. © 2015 MapR Technologies© 2015 MapR Technologies Integration with Elasticsearch
  • 8. © 2015 MapR Technologies Motivations Key-value stores are very fast, but don’t provide efficient searches for anything but the row key - Secondary indices don’t allow for the types of free-form text searches ES supports Customer demand to be able to do complex queries of their data, without: - Full table scans - Application-level in-sync updates to Elasticsearch Provide near real-time access to MapR-DB data in ES queries - at internet-scale volumes Robust fault-tolerant architecture with no changes to operational application
  • 9. © 2015 MapR Technologies Inside Lonely Planet
  • 10. Lets take THIS And Lets Come up with THIS ● What activities can I do on a ‘$’ budget? ● Where can I see some “Howler monkeys”? ● Can I zipline with arenal volcano views? ● Tell me some “$$”, mexican restaurants that are within 5 miles of Monteverde. ● Give me all the hotels with 8.5 or higher user rating that are less than 5 years old.
  • 11. © 2015 MapR Technologies MapR Server Volume 1 Volume 1 DB Client Operations ES Client operations MapR-DB Cluster Volume 1 Tabl e 1 Tabl e 2 Tabl e n MapR Server Gateway Nodes Replication Gateway & ES Client Repl Stream Write ES Cluster Replica Replication Architecture MapR-DB Write
  • 12. © 2015 MapR Technologies Elasticsearch client management Supports both node and transport Elasticsearch clients Flushed to Elasticsearch every 128KB, as part of a BulkRequest Multiple ES clients started on each Gateway - Flushes can happen in parallel - Ordering maintained for any individual row/document
  • 13. © 2015 MapR Technologies Data Type Conversions Gateway converts change-log stream into JSON documents and builds an UpdateRequest ES assumes everything is a string, by default Add a mapping to Elasticsearch before setting up replication https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html Use a custom conversion class - Implement com.mapr.fs.external.es.MapRESConverter - Provide path to JAR file and name when setting up replication - May need to do this in conjunction with manually creating a mapping in ES
  • 14. © 2015 MapR Technologies© 2015 MapR Technologies Future Extensions
  • 15. © 2015 MapR Technologies Supporting other External Sinks ES integration is a specific implementation of a generic external replication interface Could easily add support for other sinks in the future - e.g. Spark streaming Could also open up the API, allowing others access to the replication stream
  • 16. © 2015 MapR Technologies© 2015 MapR Technologies Q & A
  • 17. © 2015 MapR Technologies Find my presentation and other related resources here: http://events.mapr.com/Elastic (you can also find this link in the event’s page at meetup.com) Slidecks Whiteboard & demo videos Free Hadoop Sandbox Free On-Demand Training Free eBooks And more...