SlideShare a Scribd company logo
A survey on NoSQL
database integration
Luiz Henrique Zambom Santana
Prof. Dr. Ronaldo dos Santos Mello
Profa. Dra. Carina Dorneles
Agenda
• Background
• NoSQL
• Global vs. Local
• Model
• Related Works
• Comparison
• Taxonomy
• Conclusions
Background: NoSQL
Background: NoSQL
Sadalage e Fowler, 2012
(http://martinfowler.com/books/nosql.html)
Not only SQL
Nathan Marz, 2014
(http://www.slideshare.net/nathanmarz/runaway-complexity-in-big-data-and-a-plan-to-stop-it)
Relational databases will be a
footnote in history
Background: NoSQL
Background: Global-as-view Vs. Local-as-view
● GAV
○ mapping from entities in
the mediated schema to
entities in the original
sources
● LAV
○ mapping from entities in
the original sources to
the mediated schema
● The latter approach requires more sophisticated
inferences to resolve a query on the mediated
schema, but makes it easier to add new data
sources to a (stable) mediated schema.
Model
• Hipothesis:
Dey, Akon, Alan Fekete, and Uwe Röhm. "Scalable transactions across
heterogeneous NoSQL key-value data stores." Proceedings of the VLDB
Endowment 6.12 (2013): 1434-1439.
• VLDB Endowment
• Qualis A1
• Impact Factor 1.568
• Why it is important?
• Seminal
• Transactions
•“Weak” global-as-view
Zhang, Duo, Benjamin Rubinstein, and Jim Gemmell.
"Principled graph matching algorithms for integrating multiple
data sources." (2014).
• IEEE Transactions on Knowledge and Data Engineering (TKDE)
• Qualis A1
• Impact factor 2.067
• Why it is important?
• Graph matching algorithms
• Entity resolution
• Shows that integration is far more complicated in NoSQL applications
•Local-as-view
Da Silva, Daniel L., et al. "A Computational Framework for Integrating
and Retrieving Biodiversity Data on a Large Scale." Big Data (BigData
Congress), 2014 IEEE International Congress on. IEEE, 2014.
• IEEE International Congress on Big Data
• No Qualis (yet)
• Impact factor
• Why it is important?
• Integrating and Retrieving Biodiversity Data
•Global-as-view
•Resembles the Lambda Architecture
Kiran, V. K., and R. Vijayakumar. "Ontology based data integration of NoSQL
datastores." Industrial and Information Systems (ICIIS), 2014 9th International
Conference on. IEEE, 2014.
• 2014 9th International Conference on
Industrial and Information Systems
(ICIIS)
• Qualis B1
• Why it is important?
• Intermediate model
• Global-Local-as-view
• Information extraction may require sourcing
data from multiple data sources, establishing
relationship among them and querying across
these data sources together.
Kaur, Karamjit, and Rinkle Rani. "Managing Data in
Healthcare Information Systems: Many Models, One
Solution." Computer 3 (2015): 52-59.
• IEEE Computer
• Qualis A1
• Impact fator 1.443
•Global-as-view
• Why it is important?
• Because healthcare data comes from
multiple, vastly different sources,
databases must adopt a range of models to
process and store it. A polyglot-persistent
framework combines relational, graph, and
document data models to accommodate
information variety
Duggan, Jennie, et al. "The BigDAWG Polystore
System." ACM SIGMOD Record 44.2 (2015): 11-
16.
• SIGMOD Record
• Qualis A1
• Impact Factor 1.05
• Global-as-view
• A polystore architecture designed to unify querying over multiple
data models.
•“No one size fits all”
Duggan, Jennie, et al. "The BigDAWG Polystore
System." ACM SIGMOD Record 44.2 (2015): 11-16.
• Why it is important?
• Twitter guys and Stonebraker
• Deals with the entire complexity
• Introduces the Island abstraction
• Model cast between the DBMS
Taxonomy
Comparativo
Year Main author Summary NoSQL Taxonomy
2013 Dey Transactional access Key/Value Schema unification >
Poliglot
2014 Zhang Graph match Graph Schema unification >
Unified Language
2014 Da Silva Biodiversity databases
integration
Document Applicational integration
> CAP
2014 Kiran Ontology as canonical
model
Column-oriented Schema unification >
Unified Language
2015 Kaur Medical Virtually any of database Applicational integration
> CAP
2015 Duggan BigDAWG Virtually any of database Federation >
Indepent access
Conclusions
• The problem is real
•Important for many fields
• Most of the solutions uses Global-as-View
• Most of the solutions exposes a REST API as unified access
• Many works cites also SQL and NoSQL integration
• Concerns
• The solution have to scalable
• The solution cannot be difficult to setup
• BigDAWG is the most complete approach

More Related Content

PPTX
An Approach for RDF-based Semantic Access to NoSQL Repositories
PDF
Design of Experiments on Federator Polystore Architecture
PDF
balloon: LOD forecasting - cloudy with a chance of services
PDF
Doing Research in the Cloud - NIH Workshop Dennis Gannon
PDF
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
PPT
Sept 24 NISO Virtual Conference: Library Data in the Cloud
PPTX
Limits of RDBMS and Need for NoSQL in Bioinformatics
PPTX
Neo4j_allHands_04112013
An Approach for RDF-based Semantic Access to NoSQL Repositories
Design of Experiments on Federator Polystore Architecture
balloon: LOD forecasting - cloudy with a chance of services
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Limits of RDBMS and Need for NoSQL in Bioinformatics
Neo4j_allHands_04112013

What's hot (20)

PPTX
NIH Data Commons Architecture Ideas
PPTX
Linked Open Data and DANS
 
PPTX
Introduction of big data unit 1
PPTX
Research Automation for Data-Driven Discovery
PDF
Nanopublications and Decentralized Publishing
PPTX
Your data layer - Choosing the right database solutions for the future
PPTX
CrawlerLD - Distributed crawler for linked data
PDF
A Data Ecosystem to Support Machine Learning in Materials Science
PPTX
Klevis Mino: MongoDB
PPTX
Modeling with Document Database: 5 Key Patterns
PPTX
Visualizing Austin's data with Elasticsearch and Kibana
PDF
Core data in Swfit
PPTX
Big Data Overview Part 1
PPT
Core Data Migration
PPTX
Intro to bigdata on gcp (1)
PPTX
Efficient frequent pattern mining in distributed system
PDF
From Big Data to Fast Data
DOCX
Secure and efficient skyline queries on encrypted data
PPTX
Globus publication demo screenshots
PDF
Lecture6 introduction to data streams
NIH Data Commons Architecture Ideas
Linked Open Data and DANS
 
Introduction of big data unit 1
Research Automation for Data-Driven Discovery
Nanopublications and Decentralized Publishing
Your data layer - Choosing the right database solutions for the future
CrawlerLD - Distributed crawler for linked data
A Data Ecosystem to Support Machine Learning in Materials Science
Klevis Mino: MongoDB
Modeling with Document Database: 5 Key Patterns
Visualizing Austin's data with Elasticsearch and Kibana
Core data in Swfit
Big Data Overview Part 1
Core Data Migration
Intro to bigdata on gcp (1)
Efficient frequent pattern mining in distributed system
From Big Data to Fast Data
Secure and efficient skyline queries on encrypted data
Globus publication demo screenshots
Lecture6 introduction to data streams
Ad

Viewers also liked (7)

PPTX
Franz. 2014. Explaining taxonomy's legacy to computers – how and why?
PPTX
Franz 2014 ESA Aligning Insect Phylogenies Perelleschus and Other Cases
PPTX
Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Exper...
PPTX
Franz 2015 SPNHC Taxonomic concept resolution for voucher-based biodiversity ...
PPTX
Franz 2014 BIGCB Tracking Change across Classifications and Phylogenies
PPTX
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
PPTX
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
Franz. 2014. Explaining taxonomy's legacy to computers – how and why?
Franz 2014 ESA Aligning Insect Phylogenies Perelleschus and Other Cases
Franz Et Al - Concepts and Tools Needed to Increase Bottom-Up Taxonomic Exper...
Franz 2015 SPNHC Taxonomic concept resolution for voucher-based biodiversity ...
Franz 2014 BIGCB Tracking Change across Classifications and Phylogenies
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
Ad

Similar to Survey on NoSQL integration (20)

PDF
Data analytics with NOSQL
PDF
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
PDF
The return of big iron?
PPTX
Gilbane Boston 2011 big data
PPTX
The Rise of NoSQL and Polyglot Persistence
PPTX
2018 05 08_biological_databases_no_sql
PPTX
Gilbane Boston 2012 Big Data 101
PPTX
Big Data Warehousing Meetup with Riak
PPTX
Sql vs NoSQL
PDF
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
PPTX
NoSQL databases
PDF
Database Systems - A Historical Perspective
PDF
The technical case for a semantic web
PDF
Relational vs. Non-Relational
PPTX
NoSQL A brief look at Apache Cassandra Distributed Database
PPT
SQL, NoSQL, BigData in Data Architecture
PPTX
Introduction to Data Science NoSQL.pptx
PPTX
Big iron 2 (published)
PPT
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Data analytics with NOSQL
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
The return of big iron?
Gilbane Boston 2011 big data
The Rise of NoSQL and Polyglot Persistence
2018 05 08_biological_databases_no_sql
Gilbane Boston 2012 Big Data 101
Big Data Warehousing Meetup with Riak
Sql vs NoSQL
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
NoSQL databases
Database Systems - A Historical Perspective
The technical case for a semantic web
Relational vs. Non-Relational
NoSQL A brief look at Apache Cassandra Distributed Database
SQL, NoSQL, BigData in Data Architecture
Introduction to Data Science NoSQL.pptx
Big iron 2 (published)
Choosing the Right Big Data Tools for the Job - A Polyglot Approach

More from Luiz Henrique Zambom Santana (20)

PDF
Federal University of Santa Catarina (UFSC) - PySpark Tutorial
PDF
UFSC - Data Lakes Technlogies & Implementation - 2025
PDF
Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...
PDF
Apache Sedona: how to process petabytes of agronomic data with Spark
PDF
De Arquiteto para Gerente: como debugar uma equipe
PDF
VoltDB: as vantagens e os desafios dos banco de dados NewSQL
PDF
IBM Watson, Apache Spark ou TensorFlow?
PDF
Uma visão sobre Fast-Data: Spark, VoltDB e Elasticsearch
PPTX
Banco de dados nas nuvens - aula 3
PPTX
Banco de dados nas nuvens - aula 2
PPTX
Banco de dados nas nuvens - aula 1
PDF
Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Gra...
PPTX
A middleware for storing massive RDF graphs into NoSQL
PDF
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
PDF
PDF
Consultas básicas em SQL
PDF
Processamento em Big Data
PPTX
Seminário de Andamento de Doutorado
PPTX
Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...
Federal University of Santa Catarina (UFSC) - PySpark Tutorial
UFSC - Data Lakes Technlogies & Implementation - 2025
Perspectives on the use of data in Agriculture - Luiz Santana - Leaf Agricult...
Apache Sedona: how to process petabytes of agronomic data with Spark
De Arquiteto para Gerente: como debugar uma equipe
VoltDB: as vantagens e os desafios dos banco de dados NewSQL
IBM Watson, Apache Spark ou TensorFlow?
Uma visão sobre Fast-Data: Spark, VoltDB e Elasticsearch
Banco de dados nas nuvens - aula 3
Banco de dados nas nuvens - aula 2
Banco de dados nas nuvens - aula 1
Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Gra...
A middleware for storing massive RDF graphs into NoSQL
A Workload-Aware Middleware for Storing Massive RDF Graphs into NoSQL Databases
Consultas básicas em SQL
Processamento em Big Data
Seminário de Andamento de Doutorado
Como modelar, integrar e desenvolver aplicações com múltiplos bancos de dados...

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
KodekX | Application Modernization Development
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KodekX | Application Modernization Development
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.

Survey on NoSQL integration

  • 1. A survey on NoSQL database integration Luiz Henrique Zambom Santana Prof. Dr. Ronaldo dos Santos Mello Profa. Dra. Carina Dorneles
  • 2. Agenda • Background • NoSQL • Global vs. Local • Model • Related Works • Comparison • Taxonomy • Conclusions
  • 4. Background: NoSQL Sadalage e Fowler, 2012 (http://martinfowler.com/books/nosql.html) Not only SQL Nathan Marz, 2014 (http://www.slideshare.net/nathanmarz/runaway-complexity-in-big-data-and-a-plan-to-stop-it) Relational databases will be a footnote in history
  • 6. Background: Global-as-view Vs. Local-as-view ● GAV ○ mapping from entities in the mediated schema to entities in the original sources ● LAV ○ mapping from entities in the original sources to the mediated schema ● The latter approach requires more sophisticated inferences to resolve a query on the mediated schema, but makes it easier to add new data sources to a (stable) mediated schema.
  • 8. Dey, Akon, Alan Fekete, and Uwe Röhm. "Scalable transactions across heterogeneous NoSQL key-value data stores." Proceedings of the VLDB Endowment 6.12 (2013): 1434-1439. • VLDB Endowment • Qualis A1 • Impact Factor 1.568 • Why it is important? • Seminal • Transactions •“Weak” global-as-view
  • 9. Zhang, Duo, Benjamin Rubinstein, and Jim Gemmell. "Principled graph matching algorithms for integrating multiple data sources." (2014). • IEEE Transactions on Knowledge and Data Engineering (TKDE) • Qualis A1 • Impact factor 2.067 • Why it is important? • Graph matching algorithms • Entity resolution • Shows that integration is far more complicated in NoSQL applications •Local-as-view
  • 10. Da Silva, Daniel L., et al. "A Computational Framework for Integrating and Retrieving Biodiversity Data on a Large Scale." Big Data (BigData Congress), 2014 IEEE International Congress on. IEEE, 2014. • IEEE International Congress on Big Data • No Qualis (yet) • Impact factor • Why it is important? • Integrating and Retrieving Biodiversity Data •Global-as-view •Resembles the Lambda Architecture
  • 11. Kiran, V. K., and R. Vijayakumar. "Ontology based data integration of NoSQL datastores." Industrial and Information Systems (ICIIS), 2014 9th International Conference on. IEEE, 2014. • 2014 9th International Conference on Industrial and Information Systems (ICIIS) • Qualis B1 • Why it is important? • Intermediate model • Global-Local-as-view • Information extraction may require sourcing data from multiple data sources, establishing relationship among them and querying across these data sources together.
  • 12. Kaur, Karamjit, and Rinkle Rani. "Managing Data in Healthcare Information Systems: Many Models, One Solution." Computer 3 (2015): 52-59. • IEEE Computer • Qualis A1 • Impact fator 1.443 •Global-as-view • Why it is important? • Because healthcare data comes from multiple, vastly different sources, databases must adopt a range of models to process and store it. A polyglot-persistent framework combines relational, graph, and document data models to accommodate information variety
  • 13. Duggan, Jennie, et al. "The BigDAWG Polystore System." ACM SIGMOD Record 44.2 (2015): 11- 16. • SIGMOD Record • Qualis A1 • Impact Factor 1.05 • Global-as-view • A polystore architecture designed to unify querying over multiple data models. •“No one size fits all”
  • 14. Duggan, Jennie, et al. "The BigDAWG Polystore System." ACM SIGMOD Record 44.2 (2015): 11-16. • Why it is important? • Twitter guys and Stonebraker • Deals with the entire complexity • Introduces the Island abstraction • Model cast between the DBMS
  • 16. Comparativo Year Main author Summary NoSQL Taxonomy 2013 Dey Transactional access Key/Value Schema unification > Poliglot 2014 Zhang Graph match Graph Schema unification > Unified Language 2014 Da Silva Biodiversity databases integration Document Applicational integration > CAP 2014 Kiran Ontology as canonical model Column-oriented Schema unification > Unified Language 2015 Kaur Medical Virtually any of database Applicational integration > CAP 2015 Duggan BigDAWG Virtually any of database Federation > Indepent access
  • 17. Conclusions • The problem is real •Important for many fields • Most of the solutions uses Global-as-View • Most of the solutions exposes a REST API as unified access • Many works cites also SQL and NoSQL integration • Concerns • The solution have to scalable • The solution cannot be difficult to setup • BigDAWG is the most complete approach