SlideShare a Scribd company logo
Inferring Versioned
Schemas from NoSQL
Databases and its
Applications
ER’15
Stockholm, October 2015
[{ ”id”: ”90234 af”, ”value”: { ”author”: ”Diego Sevilla Ruiz”,
”e-mail”: ”dsevilla@um.es”,
”institution”: ”U. of Murcia”}},
{ ”id”: ”a243bb5”, ”value”: { ”author”: ”Severino Feliciano Morales”,
”e-mail”: ”severino.feliciano@um.es”,
”institution”: ”U. of Murcia”}},
{ ”id”: ”096705d”, ”value”: { ”author”: ”Jesús García Molina”,
”e-mail”: ”jmolina@um.es”,
”institution”: ”U. of Murcia”}}]
Motivation
NoSQL Databases are Schemaless
Benefits
▶ No need to previously
define an Schema
▶ Non-uniform data
▶ Custom fields
▶ Non-uniform types
▶ Easier evolution
Drawbacks
▶ Harder to reason about
the DB
▶ Static checking is lost
▶ Some of the data logic is
in the application code
(more error prone)
▶ Some utilities need
Schema information to
work
Schemas for NoSQL Databases
▶ How to alleviate the problems of schemaless
databases? ⇒ Inferring a Schema
▶ The Schema Model contains information about
Entities and Relationships
▶ Take into account the different Entity Versions in
the Database
▶ Heterogeneity usually because of slight variations on
Entities
▶ We obtain a precise database model
▶ The Schema allows us to automate the construction
of tools:
▶ migration, refactoring, visualization, …
Related Work
▶ JSON Schema
▶ Object versions and relationships are not considered
▶ Apache Spark SQL/Drill: SQL-like schemas
▶ Union of all fields, nullable ⇒ incorrect combinations
▶ Over-generalization to String
▶ Aggregations and Reference relations not considered
▶ MongoDB-Schema
▶ Prototype to infer schemas from MongoDB
collections
▶ Same limitations than Spark SQL
▶ JSON Discoverer
▶ A MDE solution to infer domain models from REST
web services (i.e. JSON documents)
▶ Not database-oriented; Object versions not
considered
Spark SQL Example
{”name”:”Michael”}
{”name”:”Andy”, ”age”:30}
{”name”:”Justin”, ”age”:19}
{”name”:”Peter”, ”age”:”tiny”}
{”name”:”Martina”, ”address”:”home!”}
> people.printSchema
root
|-- address: string (nullable = true)
|-- age: string (nullable = true)
|-- name: string (nullable = true)
▶ age promoted to string
▶ age and address are never part of the same object
{
”rows”:[
{
”content”:{
”chapters”:33,
”pages”:527
},
”authors”:[
{
”company”:{
”country”:”USA”,
”name”:”IBM”
},
”name”:”Grady Booch”,
”_id”:”210”
},
{
”company”:{
”country”:”USA”,
”name”:”IBM”
},
”name”:”James Rumbaugh”,
”_id”:”310”
},
{
”country”:”USA”,
”company”:”Ivar Jacobson Consulting”,
”name”:”Ivar Jacobson”,
”_id”:”410”
}],
”type”:”book”,
”year”:2013,
”publisher_id”:”345679”,
”title”:”The Unified Modeling Language”,
”_id”:”1”
},
{
”discipline”:”software engineering”,
”issn”:[
”0098 -5589”,
”1939 -3520”
],
”name”:”IEEE Trans. on Software Engineering”,
”type”:”journal”,
”_id”:”11”
},
{
”name”:”Automated Software Engineering”,
”issn”:[
”0928 -8910”,
”1573 -7535”
],
”discipline”:”software engineering”,
”type”:”journal”,
”_id”:”12”,
”number”:10515
},
{
”city”:”Barcelona”,
”name”:”Omega”,
”type”:”publisher”,
”_id”:”123451”
},
{
”type”:”publisher”,
”city”:”Newton”,
”name”:”O’Reilly Media”,
”_id”:”928672”
},
{
”type”:”book”,
”author”:{
”_id”:”101”,
”name”:”Bradley Holt”,
”company”:{
”country”:”USA”,
”name”:”IBM Cloudant”,
}
},
”title”:”Writing and Querying MapReduce Views in
CouchDB”,
”publisher_id”:”928672”,
”_id”:”2”
},
{
”name”:”Addison -Wesley”,
”type”:”publisher”,
”_id”:”345679”
},
{
”type”:”publisher”,
”journals”:[
”11”,
”12”
],
”name”:”IEEE Publications”,
”_id”:”907863”
}]}
NoSQL Database Model
▶ Objects (Entities) and Entity Versions
▶ Attributes
▶ Relationships
▶ Aggregation
▶ References
{
”type”:”publisher”,
”city”:”Newton”,
”name”:”O’Reilly Media”,
”_id”:”928672”
},
{
”type”:”book”,
”author”:{
”_id”:”101”,
”name”:”Bradley Holt”,
”company”:{
”country”:”USA”,
”name”:”IBM Cloudant”,
}
},
”title”:”Writing and Querying MapReduce Views in CouchDB”,
”publisher_id”:”928672”,
”_id”:”2”
},
Schema & Entity Versions Description
Entity Publisher {
Version 1 {
name: String
city: String
}
Version 2 {
name: String
}
Version 3 {
name: String
journal[+]: [Ref]->[Journal] (opposite=False)
}
}
Entity Journal {
Version 1 {
issn: Tuple [String, String]
name: String
discipline: String
}
Version 2 {
issn: Tuple [String, String]
name: String
discipline: String
number: int
}
}
Entity Book {
Version 1 {
title: String
year: int
publisher[1]: [Ref]->[Publisher] (opossite=False)
content[1]: [Aggregate]Content1
author[+]: [Aggregate]Author1
}
Version 2 {
title: String
publisher[1]: [Ref]->[Publisher] (opossite=False)
author[1]: [Aggregate]Author1
}
}
Entity Author {
Version 1 {
name: String
company[1]: [Aggregate]Company
}
Version 2 {
country: String
company: String
name: String
}
}
Entity Company {
Version 1 {
name: String
country: String
}
}
Entity Content {
Version 1 {
chapters: int
pages: int
}
}
(a) (b)
[1..1] company
[1..1] publisher[1..1] content[1..*] authors
[1..*] journals
Solution Design Considerations
▶ We have to process all the objects in the Database
⇒ Map-Reduce
▶ Natural data processing on NoSQL databases
▶ Leverage MDE technologies
▶ Reuse EMF/Ecore tooling to show entity diagrams
▶ Automation & Code Generation by Metamodeling &
Model Transformations
Proposed MDE Architecture
NoSQL
Database
MapReduce
Object
Versions
(JSON)
JSON
Injection
JSON
Model
JSON
Metamodel
Schema
Reverse
Eng
Schema
Model
Application
Generation
Schema
Viewer/
Data
Validator/
Migration
Assistant
Applications Schema
Metamodel
instance
instance
Reverse Engineering Process (i)
▶ Map-Reduce process
▶ Map: obtains the Raw Schema for each object
▶ Reduce: selects an archetype for each Entity Version
▶ Entity Type
▶ Root objects ⇒ “type” field or collection name
▶ Aggregated objects ⇒ key of the pair (e.g. “author”)
JSON object Raw Schema
{name:“Omega”, city:“Barcelona”} {name:String, city:String}
{title:“Writing and...”,
publisher_id:“928672”,
author:{name:“Bradley Holt”,
company:{country:“USA”,
name:“IBM Cloudant”} } }
{title:String,
publisher_id:String,
author:{name:String,
company:{country:String,
name:String} } }
Reverse Engineering Process (ii)
▶ Attributes: primitive or tuple
▶ Aggregated Entities
▶ Value of the pair is an Object (or array of objects)
▶ Entity type inferred from the key
▶ References
▶ Heuristics/Conventions
▶ Key: <entity_name>_id
▶ Value: MongoDB’s DBRef abstraction:
{”$ref”: ”<entity_name>”, ”$id”, <id_value>}
▶ Honor cardinalities (arrays)
Inferring Versioned Schemas from NoSQL Databases and its Applications
Example NoSQL Applications
▶ From the DBSchema model, using Model
Transformations and Model-to-Text transformations
(Code Generation), we can:
▶ Generate models that Characterize each Entity
Version
▶ That characterization can be used to Visualize the
Database
▶ And also to generate code to Validate objects
entering the Database
▶ Generate models that allow Database Migration to
the desired Entity Versions
Type Discrimination/Characterization
Metamodel
function isOfExactTypeBook_2(obj) {
if (! (”type” in obj)) {
return false;
}
if (obj[type] !== ”Book”) {
return false;
}
if (! (”title” in obj)) {
return false;
}
if (! (”author” in obj)) {
return false;
}
if (”publisher” in obj) {
return false;
}
if (”content” in obj) {
return false;
}
if (”year” in obj) {
return false;
}
return true;
}
Generated using a Model-
to-Text transformation
from an instance of the
previous Type Discrimina-
tion Metamodel
Inferring Versioned Schemas from NoSQL Databases and its Applications
Inferring Versioned Schemas from NoSQL Databases and its Applications
Entity Versions
Alternate: D3.js Treemap
Type Transformation Metamodel
db.<collection >. update(
<query >,
<update >,
{
multi: true
}
)
Obtained by Entity Type Characterization
Generate the correct update
MongoDB statement using $set,
$push, etc., maybe via user assis-
tance through a DSL.
For example, for Journal_1 to
Journal_2:
$set: { ”number”: 1 }
Conclusions & Future work
▶ A process for obtaining Conceptual Model Schemas
for NoSQL Databases is shown
▶ The process takes into account the different Entity
Versions present in the Database
▶ A MDE process allows us to automate the
production of several applications from the Schemas
▶ Example applications that allow Database
Visualization and Migration are shown
Conclusions & Future work (ii)
▶ Future work includes:
▶ Building a NoSQL Database Tool Set (NoSQL Data
Engineering)
▶ DSL for Entity Version migration
▶ Refining the Schema to allow a richer Type System
▶ Allow value ranges or enumerated sets
▶ Infer attribute dependencies (derived attributes,
i.e. the value of an attribute dictates the value of
another attribute)
▶ etc.

More Related Content

PDF
Hands on JSON
KEY
Practical Ruby Projects (Alex Sharp)
KEY
Practical Ruby Projects with MongoDB - MongoSF
PDF
Schema Design
PPTX
Schema design mongo_boston
PDF
Schema Design
PPT
MongoDB Schema Design
PPTX
Schema Design
Hands on JSON
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects with MongoDB - MongoSF
Schema Design
Schema design mongo_boston
Schema Design
MongoDB Schema Design
Schema Design

What's hot (20)

PDF
MongoDB Schema Design
PPTX
Schema Design
PDF
Schema & Design
PDF
Schema Design
PPTX
Webinar: Schema Design
PPTX
Back to Basics 1: Thinking in documents
PDF
Mysql to mongo
PDF
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
PPTX
Schema Design
PPTX
How to Win Friends and Influence People (with Hadoop)
PDF
ActiveRecord vs Mongoid
KEY
JSON-LD and MongoDB
PDF
Storing tree structures with MongoDB
PPT
KEY
MongoDB, PHP and the cloud - php cloud summit 2011
PDF
Using Mongoid with Ruby on Rails
KEY
MongoDB and PHP ZendCon 2011
PPTX
Schema Design
PDF
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
PDF
d3sparql.js demo at SWAT4LS 2014 in Berlin
MongoDB Schema Design
Schema Design
Schema & Design
Schema Design
Webinar: Schema Design
Back to Basics 1: Thinking in documents
Mysql to mongo
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
Schema Design
How to Win Friends and Influence People (with Hadoop)
ActiveRecord vs Mongoid
JSON-LD and MongoDB
Storing tree structures with MongoDB
MongoDB, PHP and the cloud - php cloud summit 2011
Using Mongoid with Ruby on Rails
MongoDB and PHP ZendCon 2011
Schema Design
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
d3sparql.js demo at SWAT4LS 2014 in Berlin
Ad

Viewers also liked (19)

PDF
05 Problem Detection
DOCX
Dominguez n fichasdecontenido
ODP
Principles of site design
PDF
Turismo Fluvial en Alemania - 2012
PDF
Insan ve bilgisayar etkileşimi / Human Computer interaction
PPT
PresentacióN1
PPT
Conceptos para avanzar juntos en la educación actual
DOCX
Planificación TEOYE 1°APM 2014
PDF
ISLE Professionalization Fair 2. Soledad Gómez González: "Sustainable Practi...
PDF
Handbook en
PDF
June2016TradeComplianceOps
PDF
Cómo registrar tu marca en España paso a paso
DOCX
Historia del Planeta Alfa, "el manantial sagrado".
PPS
Biografia web pilar
PPTX
Ondas.pptx 11 b
PDF
Sps Conferenc Essen 2009 Stenum Fresner
PDF
Wellness & Spa Hotel Lindenhof in South Tyrol
PDF
MCCCD Experts
PDF
From Past to Present: Sustainable Transportation Practices in Graz
05 Problem Detection
Dominguez n fichasdecontenido
Principles of site design
Turismo Fluvial en Alemania - 2012
Insan ve bilgisayar etkileşimi / Human Computer interaction
PresentacióN1
Conceptos para avanzar juntos en la educación actual
Planificación TEOYE 1°APM 2014
ISLE Professionalization Fair 2. Soledad Gómez González: "Sustainable Practi...
Handbook en
June2016TradeComplianceOps
Cómo registrar tu marca en España paso a paso
Historia del Planeta Alfa, "el manantial sagrado".
Biografia web pilar
Ondas.pptx 11 b
Sps Conferenc Essen 2009 Stenum Fresner
Wellness & Spa Hotel Lindenhof in South Tyrol
MCCCD Experts
From Past to Present: Sustainable Transportation Practices in Graz
Ad

Similar to Inferring Versioned Schemas from NoSQL Databases and its Applications (20)

PDF
01_Chapter_Introducing Data Modeling.pdf
PDF
01_Chapter_Introducing Data Modeling.pdf
PPTX
Module 2.3 Document Databases in NoSQL Systems
PDF
Leveraging your Knowledge of ORM Towards Performance-based NoSQL Technology
PDF
Semi Formal Model for Document Oriented Databases
PPTX
Big data technology unit 3
PDF
Document Based Data Modeling Technique
PDF
SiriusCon2016 - Visualization of Inferred Versioned Schemas from NoSQL Databases
PDF
moving_from_relational_to_nosql_couchbase_2016
PDF
Your Database Cannot Do this (well)
PPTX
Dev Jumpstart: Schema Design Best Practices
PDF
Comparative study of relational and non relations database performances using...
PDF
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
PPTX
NoSQL Endgame Percona Live Online 2020
PPTX
NoSQL.pptx
PPTX
Introduction to Data Science NoSQL.pptx
PDF
Spring one2gx2010 spring-nonrelational_data
PDF
the rising no sql technology
PPTX
cours database pour etudiant NoSQL (1).pptx
01_Chapter_Introducing Data Modeling.pdf
01_Chapter_Introducing Data Modeling.pdf
Module 2.3 Document Databases in NoSQL Systems
Leveraging your Knowledge of ORM Towards Performance-based NoSQL Technology
Semi Formal Model for Document Oriented Databases
Big data technology unit 3
Document Based Data Modeling Technique
SiriusCon2016 - Visualization of Inferred Versioned Schemas from NoSQL Databases
moving_from_relational_to_nosql_couchbase_2016
Your Database Cannot Do this (well)
Dev Jumpstart: Schema Design Best Practices
Comparative study of relational and non relations database performances using...
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
NoSQL Endgame Percona Live Online 2020
NoSQL.pptx
Introduction to Data Science NoSQL.pptx
Spring one2gx2010 spring-nonrelational_data
the rising no sql technology
cours database pour etudiant NoSQL (1).pptx

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Lecture1 pattern recognition............
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Business Analytics and business intelligence.pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Fluorescence-microscope_Botany_detailed content
Database Infoormation System (DBIS).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Supervised vs unsupervised machine learning algorithms
Business Acumen Training GuidePresentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Knowledge Engineering Part 1
Lecture1 pattern recognition............
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Business Analytics and business intelligence.pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
climate analysis of Dhaka ,Banglades.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Data_Analytics_and_PowerBI_Presentation.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Ppt On Nestle.pptx huunnnhhgfvu
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Inferring Versioned Schemas from NoSQL Databases and its Applications

  • 1. Inferring Versioned Schemas from NoSQL Databases and its Applications ER’15 Stockholm, October 2015 [{ ”id”: ”90234 af”, ”value”: { ”author”: ”Diego Sevilla Ruiz”, ”e-mail”: ”dsevilla@um.es”, ”institution”: ”U. of Murcia”}}, { ”id”: ”a243bb5”, ”value”: { ”author”: ”Severino Feliciano Morales”, ”e-mail”: ”severino.feliciano@um.es”, ”institution”: ”U. of Murcia”}}, { ”id”: ”096705d”, ”value”: { ”author”: ”Jesús García Molina”, ”e-mail”: ”jmolina@um.es”, ”institution”: ”U. of Murcia”}}]
  • 2. Motivation NoSQL Databases are Schemaless Benefits ▶ No need to previously define an Schema ▶ Non-uniform data ▶ Custom fields ▶ Non-uniform types ▶ Easier evolution Drawbacks ▶ Harder to reason about the DB ▶ Static checking is lost ▶ Some of the data logic is in the application code (more error prone) ▶ Some utilities need Schema information to work
  • 3. Schemas for NoSQL Databases ▶ How to alleviate the problems of schemaless databases? ⇒ Inferring a Schema ▶ The Schema Model contains information about Entities and Relationships ▶ Take into account the different Entity Versions in the Database ▶ Heterogeneity usually because of slight variations on Entities ▶ We obtain a precise database model ▶ The Schema allows us to automate the construction of tools: ▶ migration, refactoring, visualization, …
  • 4. Related Work ▶ JSON Schema ▶ Object versions and relationships are not considered ▶ Apache Spark SQL/Drill: SQL-like schemas ▶ Union of all fields, nullable ⇒ incorrect combinations ▶ Over-generalization to String ▶ Aggregations and Reference relations not considered ▶ MongoDB-Schema ▶ Prototype to infer schemas from MongoDB collections ▶ Same limitations than Spark SQL ▶ JSON Discoverer ▶ A MDE solution to infer domain models from REST web services (i.e. JSON documents) ▶ Not database-oriented; Object versions not considered
  • 5. Spark SQL Example {”name”:”Michael”} {”name”:”Andy”, ”age”:30} {”name”:”Justin”, ”age”:19} {”name”:”Peter”, ”age”:”tiny”} {”name”:”Martina”, ”address”:”home!”} > people.printSchema root |-- address: string (nullable = true) |-- age: string (nullable = true) |-- name: string (nullable = true) ▶ age promoted to string ▶ age and address are never part of the same object
  • 6. { ”rows”:[ { ”content”:{ ”chapters”:33, ”pages”:527 }, ”authors”:[ { ”company”:{ ”country”:”USA”, ”name”:”IBM” }, ”name”:”Grady Booch”, ”_id”:”210” }, { ”company”:{ ”country”:”USA”, ”name”:”IBM” }, ”name”:”James Rumbaugh”, ”_id”:”310” }, { ”country”:”USA”, ”company”:”Ivar Jacobson Consulting”, ”name”:”Ivar Jacobson”, ”_id”:”410” }], ”type”:”book”, ”year”:2013, ”publisher_id”:”345679”, ”title”:”The Unified Modeling Language”, ”_id”:”1” }, { ”discipline”:”software engineering”, ”issn”:[ ”0098 -5589”, ”1939 -3520” ], ”name”:”IEEE Trans. on Software Engineering”, ”type”:”journal”, ”_id”:”11” }, { ”name”:”Automated Software Engineering”, ”issn”:[ ”0928 -8910”, ”1573 -7535” ], ”discipline”:”software engineering”, ”type”:”journal”, ”_id”:”12”, ”number”:10515 }, { ”city”:”Barcelona”, ”name”:”Omega”, ”type”:”publisher”, ”_id”:”123451” }, { ”type”:”publisher”, ”city”:”Newton”, ”name”:”O’Reilly Media”, ”_id”:”928672” }, { ”type”:”book”, ”author”:{ ”_id”:”101”, ”name”:”Bradley Holt”, ”company”:{ ”country”:”USA”, ”name”:”IBM Cloudant”, } }, ”title”:”Writing and Querying MapReduce Views in CouchDB”, ”publisher_id”:”928672”, ”_id”:”2” }, { ”name”:”Addison -Wesley”, ”type”:”publisher”, ”_id”:”345679” }, { ”type”:”publisher”, ”journals”:[ ”11”, ”12” ], ”name”:”IEEE Publications”, ”_id”:”907863” }]}
  • 7. NoSQL Database Model ▶ Objects (Entities) and Entity Versions ▶ Attributes ▶ Relationships ▶ Aggregation ▶ References { ”type”:”publisher”, ”city”:”Newton”, ”name”:”O’Reilly Media”, ”_id”:”928672” }, { ”type”:”book”, ”author”:{ ”_id”:”101”, ”name”:”Bradley Holt”, ”company”:{ ”country”:”USA”, ”name”:”IBM Cloudant”, } }, ”title”:”Writing and Querying MapReduce Views in CouchDB”, ”publisher_id”:”928672”, ”_id”:”2” },
  • 8. Schema & Entity Versions Description Entity Publisher { Version 1 { name: String city: String } Version 2 { name: String } Version 3 { name: String journal[+]: [Ref]->[Journal] (opposite=False) } } Entity Journal { Version 1 { issn: Tuple [String, String] name: String discipline: String } Version 2 { issn: Tuple [String, String] name: String discipline: String number: int } } Entity Book { Version 1 { title: String year: int publisher[1]: [Ref]->[Publisher] (opossite=False) content[1]: [Aggregate]Content1 author[+]: [Aggregate]Author1 } Version 2 { title: String publisher[1]: [Ref]->[Publisher] (opossite=False) author[1]: [Aggregate]Author1 } } Entity Author { Version 1 { name: String company[1]: [Aggregate]Company } Version 2 { country: String company: String name: String } } Entity Company { Version 1 { name: String country: String } } Entity Content { Version 1 { chapters: int pages: int } } (a) (b) [1..1] company [1..1] publisher[1..1] content[1..*] authors [1..*] journals
  • 9. Solution Design Considerations ▶ We have to process all the objects in the Database ⇒ Map-Reduce ▶ Natural data processing on NoSQL databases ▶ Leverage MDE technologies ▶ Reuse EMF/Ecore tooling to show entity diagrams ▶ Automation & Code Generation by Metamodeling & Model Transformations
  • 11. Reverse Engineering Process (i) ▶ Map-Reduce process ▶ Map: obtains the Raw Schema for each object ▶ Reduce: selects an archetype for each Entity Version ▶ Entity Type ▶ Root objects ⇒ “type” field or collection name ▶ Aggregated objects ⇒ key of the pair (e.g. “author”) JSON object Raw Schema {name:“Omega”, city:“Barcelona”} {name:String, city:String} {title:“Writing and...”, publisher_id:“928672”, author:{name:“Bradley Holt”, company:{country:“USA”, name:“IBM Cloudant”} } } {title:String, publisher_id:String, author:{name:String, company:{country:String, name:String} } }
  • 12. Reverse Engineering Process (ii) ▶ Attributes: primitive or tuple ▶ Aggregated Entities ▶ Value of the pair is an Object (or array of objects) ▶ Entity type inferred from the key ▶ References ▶ Heuristics/Conventions ▶ Key: <entity_name>_id ▶ Value: MongoDB’s DBRef abstraction: {”$ref”: ”<entity_name>”, ”$id”, <id_value>} ▶ Honor cardinalities (arrays)
  • 14. Example NoSQL Applications ▶ From the DBSchema model, using Model Transformations and Model-to-Text transformations (Code Generation), we can: ▶ Generate models that Characterize each Entity Version ▶ That characterization can be used to Visualize the Database ▶ And also to generate code to Validate objects entering the Database ▶ Generate models that allow Database Migration to the desired Entity Versions
  • 16. function isOfExactTypeBook_2(obj) { if (! (”type” in obj)) { return false; } if (obj[type] !== ”Book”) { return false; } if (! (”title” in obj)) { return false; } if (! (”author” in obj)) { return false; } if (”publisher” in obj) { return false; } if (”content” in obj) { return false; } if (”year” in obj) { return false; } return true; } Generated using a Model- to-Text transformation from an instance of the previous Type Discrimina- tion Metamodel
  • 21. Type Transformation Metamodel db.<collection >. update( <query >, <update >, { multi: true } ) Obtained by Entity Type Characterization Generate the correct update MongoDB statement using $set, $push, etc., maybe via user assis- tance through a DSL. For example, for Journal_1 to Journal_2: $set: { ”number”: 1 }
  • 22. Conclusions & Future work ▶ A process for obtaining Conceptual Model Schemas for NoSQL Databases is shown ▶ The process takes into account the different Entity Versions present in the Database ▶ A MDE process allows us to automate the production of several applications from the Schemas ▶ Example applications that allow Database Visualization and Migration are shown
  • 23. Conclusions & Future work (ii) ▶ Future work includes: ▶ Building a NoSQL Database Tool Set (NoSQL Data Engineering) ▶ DSL for Entity Version migration ▶ Refining the Schema to allow a richer Type System ▶ Allow value ranges or enumerated sets ▶ Infer attribute dependencies (derived attributes, i.e. the value of an attribute dictates the value of another attribute) ▶ etc.