Mobile Data with Couchbase Lite !
&!
Big Data HPCC Systems
By Fujio Turner
What is Couchbase Lite ?
What is Couchbase Lite ?
NoSQL JSON Document
Database for Mobile
+
Your Code
Embedded Database
Couchbase Lite 0.5 MB
Why do I need Couchbase Lite ?
Why do I need Couchbase Lite ?
Mobile Myths:
1. Always Available 2. Always High Performing
The mobile network is:
How Couchbase Lite tackles the Mobile Myths
Local data is always faster
How Couchbase Lite tackles the Mobile Myths
Local data is always faster
I need to save the data non-locally
,but
How Couchbase Lite tackles the Mobile Myths
Local data is always faster
I need to save the data non-locally
I need to send data to another mobile devices
,but
and/or
EZ Data Syncing with !
Couchbase Sync Gateway
https://github.com/couchbase/sync_gateway
Channels
{“data”:”yes”}
• Authentication & Sessions
• Definable channel rules
via JavaScript
http(s):// REST server
How Sync Gateway Works
Written in:
Data Flow:
CRUD:
Who is using Couchbase Lite ?
How
Uses Couchbase Lite
https://youtu.be/tYolHnbCavA
What BigData
solution is
ready for
the next
20 plus years ?
LexisNexis is a provider of legal,
tax, regulatory, news, business
information, and analysis to
legal, corporate, government,!
accounting and academic
markets. !
!
!
!
LexisNexis has been in
business since 1977 with over
30,000 employees worldwide. 
What is HPCC Systems?Who is ?
LexisNexis Risk is the division
of the LexisNexis which focuses
on data, Big Data processing,
linking and vertical expertise
and supports HPCC Systems
as an open source project
under Apache 2.0 License.
Comparison
JAVA C++
Petabytes
1-80,000 Jobs/day
Since 2005
Exabytes
Since 2000
Indexed: 2K-3K Jobs/sec*
? ? ? ? ? ?
Thor Roxie
Block Based File Based
In-Memory: 30 - 40 Jobs/min*
Non-Indexed: 4-1,040,000 Jobs/day
 *based on job (size / result set / complexity)
“I’m sub-second
fast.”
“I can query all
or part of your
data.”
Thor Roxie
Single Threaded
Hard Disk
Index(optional)
Multi-Threaded
Hard Disk
Index(optional)
In-memory
SSD
Either/Both
Architecture
BusinessDevelopmentCustomers
1 20
Non-Indexed Full Data Set
http://hpccsystems.com/why-hpcc/benchmarks
300GB File
Kevin CA 45
Mark MI 27
Sara FL 64
Name State Age
How is Data Stored on !
HPCC Systems ?!
Example
Customer Data May 2010
K.. CA 45 M.. MI 27 S.. FL 64
Thor Master
Thor Slaves
Kevin CA 45
Mark MI 27
Sara FL 64
Store Data
File Name
~/customers_2010-05
Data is distributed
evenly in the cluster
with replica copies
and is seen as a
file (example below).
K.. CA 45 M.. MI 27 S.. FL 64
Thor Master
Thor Slaves
Kevin CA 45
Mark MI 27
Sara FL 64
Store Data
Dali
File Location & Job Scheduler
File locations are
stored on disk.
File Name
~/customers_2010-05
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
1a.
2.
File Location & Job Scheduler
1.a A pre-compiled
query is triggered.
(Mostly used in Roxie)
1b. Ad-hoc query.
!
2.Query is sent to Dali
to get file locations.
1b.
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP3.
File Location & Job Scheduler
3. Job is placed in
que to be sent to
Thor Master. Thor
Master coordinates
job execution on
Thor Slave nodes.
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
File Location & Job Scheduler
Job are done
locally on slaves
and/or
coordinated by
master globally.
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
4.
4.
MI 500
CA 120
FL 7
File Location & Job Scheduler
4.Job is returned with
optional grouped by &
sorted by at run time.
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
MI 500
CA 120
FL 7
File Location & Job Scheduler
SORT!
GROUP!
DEDUP!
JOIN!
MERGE!
BETWEEN!
LENGTH!
REGEX!
ROUND!
SUM!
COUNT!
TRIM!
WHEN!
AVE!
CASE!
NORMALIZE!
DENORMALIZE!
K-MEANS!
more ….
Multiple other actions can be
done on the data in a single job.
Sort
Count
Group
Classification
(ROXIE) 0.27 seconds to (THOR) few hours
Country = ‘US’
Join
Index of
~/facebook_2013
Query is Completed in a Single Job!
Asynchronously
~/facebook_2013
Country = ‘US’
~/twitter_2013
optional
K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Kevin CA 45
Mark MI 27
Sara FL 64
CA row #3
MI row #17
MI row #4
FL row #5
Speed - Part 1
Indexing
IndexIndexIndex
• index per file
• customize by field(s)
File Name
~/customers_2010-05
File Name
~/customers_2010-05_index
1 40
Non-Indexed
1 200
To
Indexed
1 40
Non-Indexed
1 200
To
Indexed
male row #345
female row #4
male row #97
female row #267
CA row #3
MI row #17
MI row #4
FL row #5
Example Index Example Index
Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Index In-Memory
Index Index Index
Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Index In-Memory & Part or All Data
Index Index Index
or
Index In-Memory
Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-Threaded
Index In-Memory & Part or All Data
or
Index In-Memory
Index Index Index
Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-Threaded
Index In-Memory & Part or All Data
or
Index In-Memory
Index Index Index
SSD are OK - write few / read many
Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-Threaded
Index In-Memory & Part or All Data
or
Index In-Memory
Index Index Index
2004
Thor Master
Thor Slaves
Dali ESP
Roxie Master
Roxie Slaves
Common Cluster
Data is a mix of structured
and unstructured. Use
Thor to do ETL and send
results to Roxie for user
queries.
HPCC Systems 5.2
New JSON file support
https://github.com/couchbase/sync_gateway/wiki/Webhooks
Flow Data !
From: Sync Gateway !
To: HPCC Systems
{“data”:”yes”}
Sync Gateway’s Webhooks API
lets you catch every JSON
coming into Sync Gateway
{“data”:”yes”} Couchbase Lite to !
HPCC Systems !
Transport
A simple Python web server
that can catch all the HTTP POST
from Sync Gateway and writes it
to a file for HPCC Systems to store.
https://github.com/househippo
Couchbase Lite to HPCC Systems Transport
INSTALL!
in 5 Minutes
Download
Source Code
Learning More - Couchbase Lite
http://couchbase.com/download
https://github.com/couchbase
Mountain View, CA
San Francisco ,CA
http://developer.couchbase.com/
mobile/get-started/get-started-
mobile/index.html
INSTALL!
in 5 Minutes
Download
or
Source Code
https://github.com/hpcc-systems
http://hpccsystems.com/download/
Learning More - HPCC Systems
Atlanta, GA
Mountain View, CA
https://youtu.be/8SV43DCUqJg

More Related Content

PDF
HPCC Systems vs Hadoop
PDF
Big Data - Load CSV File & Query the EZ way - HPCC Systems
PDF
Big Data - Load, Index & Query the EZ way - HPCC Systems
PDF
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
PDF
Big Data - Fast Machine Learning at Scale + Couchbase
PDF
Your Data, Your Search, ElasticSearch (EURUKO 2011)
PPTX
PDF
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
HPCC Systems vs Hadoop
Big Data - Load CSV File & Query the EZ way - HPCC Systems
Big Data - Load, Index & Query the EZ way - HPCC Systems
Big Data - In-Memory Index / Sub Second Query engine - Roxie - HPCC Systems
Big Data - Fast Machine Learning at Scale + Couchbase
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab

What's hot (20)

PDF
SQL for Elasticsearch
PPTX
Practical Hadoop using Pig
PDF
Native erasure coding support inside hdfs presentation
PPTX
Redis/Lessons learned
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
PPTX
Hadoop Essential for Oracle Professionals
PDF
Embedded R Execution using SQL
PDF
Cassandra introduction 2016
PPTX
2016 bioinformatics i_io_wim_vancriekinge
PPTX
Redis 101 Data Structure
PDF
Introductive to Hive
PDF
Introduction to hadoop ecosystem
PPTX
Polyglot metadata for Hadoop
PPTX
Understanding Hadoop
PDF
Introduction to R and R Studio
PDF
Code as Data workshop: Using source{d} Engine to extract insights from git re...
KEY
Hive vs Pig for HadoopSourceCodeReading
PPTX
Presentation at the EMBL-EBI Industry RDF meeting
PDF
Configuring and manipulating HDFS files
PPT
'Scalable Logging and Analytics with LogStash'
SQL for Elasticsearch
Practical Hadoop using Pig
Native erasure coding support inside hdfs presentation
Redis/Lessons learned
Efficient Data Storage for Analytics with Apache Parquet 2.0
Hadoop Essential for Oracle Professionals
Embedded R Execution using SQL
Cassandra introduction 2016
2016 bioinformatics i_io_wim_vancriekinge
Redis 101 Data Structure
Introductive to Hive
Introduction to hadoop ecosystem
Polyglot metadata for Hadoop
Understanding Hadoop
Introduction to R and R Studio
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Hive vs Pig for HadoopSourceCodeReading
Presentation at the EMBL-EBI Industry RDF meeting
Configuring and manipulating HDFS files
'Scalable Logging and Analytics with LogStash'
Ad

Similar to NoSQL Couchbase Lite & BigData HPCC Systems (20)

PDF
Scaling Dropbox
PDF
Logging with Elasticsearch, Logstash & Kibana
PDF
Open Security Operations Center - OpenSOC
PPTX
Intro to hadoop
PDF
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
ODP
DNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAIL
PPTX
Why databases cry at night
PDF
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
PDF
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
PDF
Hyperspace for Delta Lake
PDF
POLARDB: A database architecture for the cloud
PPT
Implementing SharePoint on Azure, Lessons Learnt from a Real World Project
PDF
Tools for mxf-embedded bucore metadata, Dieter Van Rijsselbergen, Jean-Pierre...
PDF
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
PDF
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
PDF
Data Science
PDF
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
PDF
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
ODP
The care and feeding of a MySQL database
PPTX
Performance & Scalability Improvements in Perforce
Scaling Dropbox
Logging with Elasticsearch, Logstash & Kibana
Open Security Operations Center - OpenSOC
Intro to hadoop
Ensuring Quality in Data Lakes (D&D Meetup Feb 22)
DNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAIL
Why databases cry at night
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Hyperspace for Delta Lake
POLARDB: A database architecture for the cloud
Implementing SharePoint on Azure, Lessons Learnt from a Real World Project
Tools for mxf-embedded bucore metadata, Dieter Van Rijsselbergen, Jean-Pierre...
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
CBDW2014 - NoSQL Development With Couchbase and ColdFusion (CFML)
Data Science
CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Clo...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
The care and feeding of a MySQL database
Performance & Scalability Improvements in Perforce
Ad

Recently uploaded (20)

PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Hybrid model detection and classification of lung cancer
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Architecture types and enterprise applications.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
DOCX
search engine optimization ppt fir known well about this
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Five Habits of High-Impact Board Members
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Assigned Numbers - 2025 - Bluetooth® Document
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Enhancing emotion recognition model for a student engagement use case through...
Hybrid model detection and classification of lung cancer
sustainability-14-14877-v2.pddhzftheheeeee
DP Operators-handbook-extract for the Mautical Institute
A comparative study of natural language inference in Swahili using monolingua...
Architecture types and enterprise applications.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Web Crawler for Trend Tracking Gen Z Insights.pptx
observCloud-Native Containerability and monitoring.pptx
search engine optimization ppt fir known well about this
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Chapter 5: Probability Theory and Statistics
Five Habits of High-Impact Board Members
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
From MVP to Full-Scale Product A Startup’s Software Journey.pdf

NoSQL Couchbase Lite & BigData HPCC Systems

  • 1. Mobile Data with Couchbase Lite ! &! Big Data HPCC Systems By Fujio Turner
  • 3. What is Couchbase Lite ? NoSQL JSON Document Database for Mobile
  • 5. Why do I need Couchbase Lite ?
  • 6. Why do I need Couchbase Lite ? Mobile Myths: 1. Always Available 2. Always High Performing The mobile network is:
  • 7. How Couchbase Lite tackles the Mobile Myths Local data is always faster
  • 8. How Couchbase Lite tackles the Mobile Myths Local data is always faster I need to save the data non-locally ,but
  • 9. How Couchbase Lite tackles the Mobile Myths Local data is always faster I need to save the data non-locally I need to send data to another mobile devices ,but and/or
  • 10. EZ Data Syncing with ! Couchbase Sync Gateway https://github.com/couchbase/sync_gateway
  • 11. Channels {“data”:”yes”} • Authentication & Sessions • Definable channel rules via JavaScript http(s):// REST server How Sync Gateway Works Written in: Data Flow: CRUD:
  • 12. Who is using Couchbase Lite ?
  • 14. What BigData solution is ready for the next 20 plus years ?
  • 15. LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,! accounting and academic markets. ! ! ! ! LexisNexis has been in business since 1977 with over 30,000 employees worldwide.  What is HPCC Systems?Who is ? LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.
  • 16. Comparison JAVA C++ Petabytes 1-80,000 Jobs/day Since 2005 Exabytes Since 2000 Indexed: 2K-3K Jobs/sec* ? ? ? ? ? ? Thor Roxie Block Based File Based In-Memory: 30 - 40 Jobs/min* Non-Indexed: 4-1,040,000 Jobs/day  *based on job (size / result set / complexity)
  • 17. “I’m sub-second fast.” “I can query all or part of your data.” Thor Roxie Single Threaded Hard Disk Index(optional) Multi-Threaded Hard Disk Index(optional) In-memory SSD Either/Both Architecture
  • 18. BusinessDevelopmentCustomers 1 20 Non-Indexed Full Data Set http://hpccsystems.com/why-hpcc/benchmarks
  • 19. 300GB File Kevin CA 45 Mark MI 27 Sara FL 64 Name State Age How is Data Stored on ! HPCC Systems ?! Example Customer Data May 2010
  • 20. K.. CA 45 M.. MI 27 S.. FL 64 Thor Master Thor Slaves Kevin CA 45 Mark MI 27 Sara FL 64 Store Data File Name ~/customers_2010-05 Data is distributed evenly in the cluster with replica copies and is seen as a file (example below).
  • 21. K.. CA 45 M.. MI 27 S.. FL 64 Thor Master Thor Slaves Kevin CA 45 Mark MI 27 Sara FL 64 Store Data Dali File Location & Job Scheduler File locations are stored on disk. File Name ~/customers_2010-05
  • 22. K CA 45 M MI 27 S FL 64Thor Master Thor Slaves Dali What state do most people live in? ESP 1a. 2. File Location & Job Scheduler 1.a A pre-compiled query is triggered. (Mostly used in Roxie) 1b. Ad-hoc query. ! 2.Query is sent to Dali to get file locations. 1b.
  • 23. K CA 45 M MI 27 S FL 64Thor Master Thor Slaves Dali What state do most people live in? ESP3. File Location & Job Scheduler 3. Job is placed in que to be sent to Thor Master. Thor Master coordinates job execution on Thor Slave nodes.
  • 24. K CA 45 M MI 27 S FL 64Thor Master Thor Slaves Dali What state do most people live in? ESP File Location & Job Scheduler Job are done locally on slaves and/or coordinated by master globally.
  • 25. K CA 45 M MI 27 S FL 64Thor Master Thor Slaves Dali What state do most people live in? ESP 4. 4. MI 500 CA 120 FL 7 File Location & Job Scheduler 4.Job is returned with optional grouped by & sorted by at run time.
  • 26. K CA 45 M MI 27 S FL 64Thor Master Thor Slaves Dali What state do most people live in? ESP MI 500 CA 120 FL 7 File Location & Job Scheduler SORT! GROUP! DEDUP! JOIN! MERGE! BETWEEN! LENGTH! REGEX! ROUND! SUM! COUNT! TRIM! WHEN! AVE! CASE! NORMALIZE! DENORMALIZE! K-MEANS! more …. Multiple other actions can be done on the data in a single job.
  • 27. Sort Count Group Classification (ROXIE) 0.27 seconds to (THOR) few hours Country = ‘US’ Join Index of ~/facebook_2013 Query is Completed in a Single Job! Asynchronously ~/facebook_2013 Country = ‘US’ ~/twitter_2013 optional
  • 28. K CA 45 M MI 27 S FL 64Thor Master Thor Slaves Kevin CA 45 Mark MI 27 Sara FL 64 CA row #3 MI row #17 MI row #4 FL row #5 Speed - Part 1 Indexing IndexIndexIndex • index per file • customize by field(s) File Name ~/customers_2010-05 File Name ~/customers_2010-05_index
  • 30. 1 40 Non-Indexed 1 200 To Indexed male row #345 female row #4 male row #97 female row #267 CA row #3 MI row #17 MI row #4 FL row #5 Example Index Example Index
  • 31. Speed - Part 2 Roxie K CA 45 M MI 27 S FL 64Roxie Master Roxie Slaves Index In-Memory Index Index Index
  • 32. Speed - Part 2 Roxie K CA 45 M MI 27 S FL 64Roxie Master Roxie Slaves Index In-Memory & Part or All Data Index Index Index or Index In-Memory
  • 33. Speed - Part 2 Roxie K CA 45 M MI 27 S FL 64Roxie Master Roxie Slaves Roxie is Multi-Threaded Index In-Memory & Part or All Data or Index In-Memory Index Index Index
  • 34. Speed - Part 2 Roxie K CA 45 M MI 27 S FL 64Roxie Master Roxie Slaves Roxie is Multi-Threaded Index In-Memory & Part or All Data or Index In-Memory Index Index Index SSD are OK - write few / read many
  • 35. Speed - Part 2 Roxie K CA 45 M MI 27 S FL 64Roxie Master Roxie Slaves Roxie is Multi-Threaded Index In-Memory & Part or All Data or Index In-Memory Index Index Index 2004
  • 36. Thor Master Thor Slaves Dali ESP Roxie Master Roxie Slaves Common Cluster Data is a mix of structured and unstructured. Use Thor to do ETL and send results to Roxie for user queries.
  • 37. HPCC Systems 5.2 New JSON file support
  • 39. {“data”:”yes”} Sync Gateway’s Webhooks API lets you catch every JSON coming into Sync Gateway
  • 40. {“data”:”yes”} Couchbase Lite to ! HPCC Systems ! Transport A simple Python web server that can catch all the HTTP POST from Sync Gateway and writes it to a file for HPCC Systems to store.
  • 42. INSTALL! in 5 Minutes Download Source Code Learning More - Couchbase Lite http://couchbase.com/download https://github.com/couchbase Mountain View, CA San Francisco ,CA http://developer.couchbase.com/ mobile/get-started/get-started- mobile/index.html
  • 43. INSTALL! in 5 Minutes Download or Source Code https://github.com/hpcc-systems http://hpccsystems.com/download/ Learning More - HPCC Systems Atlanta, GA Mountain View, CA https://youtu.be/8SV43DCUqJg