SlideShare a Scribd company logo
Big Data and the Multi-model
Database
Heli Helskyaho
DB Tech Showcase Tokyo, 2018
Copyright © Miracle Finland Oy
 Graduated from University of Helsinki (Master of Science, computer science), currently
a doctoral student, researcher and lecturer (databases, Big Data, Multi-model
Databases, methods and tools for utilizing semi-structured data for decision making) at
University of Helsinki
 Worked with Oracle products since 1993, worked for IT since 1990
 Data and Database!
 CEO for Miracle Finland Oy
 Oracle ACE Director
 Ambassador for EOUC (EMEA Oracle Users Group Community)
 Listed as one of the TOP 100 influences on IT sector in Finland (2015, 2016, 2017)
 Public speaker and an author
 Author of the book Oracle SQL Developer Data Modeler for Database Design Mastery
(Oracle Press, 2015), co-author for Real World SQL and PL/SQL: Advice from the Experts
(Oracle Press, 2016)
Introduction, Heli
Copyright © Miracle Finland Oy
Copyright © Miracle Finland Oy
References
 [1] Marcello Buoncristiano, Giansalvatore Mecca, Elisa Quintarelli, Manuel Roveri,
Donatello Santoro, Letizia Tanca: Database Challenges for Exploratory Computing.
SIGMOD Record 44(2): 17-22 (2015).
 [2] Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri: Overview of Data
Exploration Techniques. SIGMOD Conference 2015: 277-281.
 [3] Zhen Hua Liu, Dieter Gawlick: Management of Flexible Schema Data in RDBMSs -
Opportunities and Limitations for NoSQL. 7th Biennial Conference on Innovative Data
Systems Research (CIDR ’15) January 4-7, 2015, Asilomar, California, USA.
 [4] Z. H. Liu, B. Hammerschmidt, D. McMahon, Y. Liu, H.J.Chang: Closing the functional
and Performance Gap between SQL and NoSQL. SIGMOD Conference 2016: 227-238
 [5] Z. H. Liu, B. Christoph Hammerschmidt, D. McMahon: JSON data management:
supporting schema-less development in RDBMS. SIGMOD Conference 2014: 1247-1258
 [6] https://docs.oracle.com/database/122/ADJSN/json-dataguide.htm#ADJSN-GUID-
219FC30E-89A7-4189-BC36-7B961A24067C
Copyright © Miracle Finland Oy
 Solutions for storing and retrieving data
 The network data model, the hierarchical data model
…
 The relational data model
 The relational database management system (RDBMS)
 has been de facto as a data model solution for decades
 based on solid theory
 standardized environment for storing and retrieving
data
The history of data models, briefly
Copyright © Miracle Finland Oy
 of software development and programming
languages leads to new demands for data models
The evolution…
Copyright © Miracle Finland Oy
 The need to save and retrieve objects easily using
object-oriented programming languages
 -> Object databases
 -> Later these features were added to a RDBMS
making it an ORDMS (Object-Relational DataBase
Management System)
 Supported in Oracle Database
Objects
Copyright © Miracle Finland Oy
 To transfer data in a standardized way
 The need to save and retrieve XML documents easily
 XML Databases
 Later these features were added to the RDBMS
 Supported in Oracle Database (XMLType and its
methods, SQL functions, indexing, registering XML
Schema,…)
XML (Extensible Markup Language)
Copyright © Miracle Finland Oy
 The need to save and retrieve spatial data easily
 Spatial Databases
 Later these features were added to the RDBMS
 Supported in Oracle Database
Spatial
Copyright © Miracle Finland Oy
 Relational Data: ”schema first, data later”
 Flexible Schema Data (FSD): ”data first, schema
later/never”
New demand for a Data Model: FSD
Copyright © Miracle Finland Oy
 An ”umbrella” for different solutions, each for a certain
purpose/problem
 Document Stores
 Key-value pair stores
 Column stores
 Graph stores
 No ACID (Atomicity, Consistency, Isolation, and
Durability) but maybe BASE (Basically Available, Soft
state, Eventual consistency).
 Scale-up vs scale-out
NoSQL
Copyright © Miracle Finland Oy
 JSON for Javascript programming, lighter version of
XML
 The need to save and retrieve JSON (Javascript
Notation) documents easily
 Supported in Oracle Database (CLOB, NCLOB)
Document Stores
Copyright © Miracle Finland Oy
 something between semi-structured and text, key is
structured and value is text
 Supported in Oracle Database (+Oracle NoSQL)
Key-value pair stores
Copyright © Miracle Finland Oy
 Data stored in columns, not rows
 Supported in Oracle Database (in-memory column
store)
Column stores
Copyright © Miracle Finland Oy
 About relationships
 Nodes/vertices, links/edges and properties
 Each vertix and each edge has a collection of key-value
properties
 A graph can be created from a relational model quite easily
 Supported in Oracle Database
 To me the most interesting of these NoSQL solutions….
 But worth a presentation of its own
Graph stores
Copyright © Miracle Finland Oy
 Slowly these features have been added to the
RDBMS…
 But can this continue?
 New feature-> new db-> added to RDBMS
NoSQL
Copyright © Miracle Finland Oy
Key Findings
 Operational DBMS use cases have expanded beyond traditional transactions
— still the leading revenue generator — to encompass three additional use
cases: distributed variable data; lightweight events and observations; and
hybrid transactional/analytical processing (HTAP).
 Relational operational DBMS products and emerging NoSQL and multimodel
offerings are entering the market to address some of these new use cases,
creating opportunities for best-fit engineering, but threatening established
company standards for information management.
 Incumbent vendors are responding by expanding the capabilities of their
relational database management systems (RDBMSs) into multimodel
territory, but hedging their bets by also adding specialized products to their
portfolios, creating both multimodel and multiproduct choices — and
confusion in the minds of buyers.
Gartner, Critical Capabilities for Operational Database
Management Systems, Dec 9th 2015
Copyright © Miracle Finland Oy
Recommendations
 Identify the use cases you have under consideration, and assess your current
operational DBMS products' fit, costs, deployment options and skills
requirements.
 Evaluate potential alternative product selections based on best-fit
engineering that maps to your use cases — don't pay for features you don't
and will not need. Use this Critical Capabilities document alongside its
companion Magic Quadrant to identify candidates.
 Issue RFPs (using Gartner's RFP Toolkit templates) and conduct proof of
concept (POC) exercises to confirm and validate candidates.
 Update your standards, training and support organizations to manage, qualify
and accommodate additional requirements if your technology portfolio is
expanding.
Gartner, Critical Capabilities for Operational Database
Management Systems, Dec 9th 2015
Copyright © Miracle Finland Oy
Strategic Planning Assumptions
 By 2017, all leading operational DBMSs will offer multiple data
models, relational and NoSQL, in a single DBMS platform.
 Through 2018, 50% of small operational DBMS vendors will
disappear due to acquisitions, mergers or business failures.
 By 2017, the "NoSQL" label will cease to distinguish DBMSs,
reducing its value and resulting in the label falling out of use.
Gartner, Critical Capabilities for Operational Database
Management Systems, Dec 9th 2015
Copyright © Miracle Finland Oy
 Brings even more than just the need for new data
models…
The Big Data…
Copyright © Miracle Finland Oy
 There is no size that makes a data to be ”Big Data”, it
always depends on the capabilities
 The data is ”Big” when traditional processing with
traditional tools is not possible due to the amount or
the complexity of the data
 You cannot open an attachement in email
 You cannot edit a photo
 etc.
What is Big Data?
Copyright © Miracle Finland Oy
 Volume, the size/scale of the data
 Velocity, the speed of change, analysis of streaming
data
 Variety, different formats of data sources, different
forms of data; structured, semi-structured,
unstructured
The three V’s
Copyright © Miracle Finland Oy
 Veracity, the uncertainty of the data, the data is worthless or
harmful if it’s not accurate
 Viability, validate that hypothesis before taking further action
(and, in the process of determining the viability of a variable,
we can expand our view to determine other variables)
 Value, the potential value
 Variability, refers to data whose meaning is constantly
changing, in consistency of data; for example words and
context
 Visualization, a way of presenting the data in a manner that’s
readable and accessible
The other V’s
Copyright © Miracle Finland Oy
 Are easy to understand and there are plenty of
technical solutions to solve the problems but…
Volume and Velocity
Copyright © Miracle Finland Oy
Variety leads to…
 -> diversity, complexity, …
 Users can not query all of their data using a single high level
declarative query language. They have to
 use different query languages for querying different data
 implement their own join algorithms to join between
relational data and FSD
 Also many specialized systems lack essential advanced
functionalities, such as ACID and fine grain security, that
are the “norm” in RDBMSs
Copyright © Miracle Finland Oy
Data Exploration
 Data exploration is to efficiently extract knowledge from
data, even though you do not know what you are looking
for.
 Exploratory computing: a conversation between a user and
a computer.
 The system provides active support
 The exploration process is investigation, exploration-
seeking, comparison-making, and learning
 Wide range of different techniques is needed (the use of
statistics, data analysis, query suggestion, advanced
visualization tools, etc.)
Copyright © Miracle Finland Oy
Not a new thing…
 J. W. Tukey. Exploratory data analysis. Addison-
Wesley, Reading,MA, 1977:
 “with exploratory data analysis the researcher explores
the data in many possible ways, including the use of
graphical tools like boxplots or histograms, gaining
knowledge from the way data are displayed”
Copyright © Miracle Finland Oy
Old problems and new problems
 More and more data (volume)
 Different data models and formats (variety)
 Loading in progress while data exploration going on
(velocity)
 Not all data is reliable (veracity)
 We do not know what we are looking for (value, viability,
variability)
 Must support also non-technical users (journalists,
investors, politicians,…) (visualization)
 All must be done efficiently and fast
Copyright © Miracle Finland Oy
Different Layers
 the user interaction/user interface
 the middleware
 the database layer/engine
Copyright © Miracle Finland Oy
Challenges for the User Interaction
 The user interface should help the user to find information
she/he was not aware of and to find even more information
based on the information already found.
 The data is in different formats, it is a large amount of
different kinds of data and analyzing it takes time.
 The conversation must be fluently responsive.
 Not all users are IT professionals
 The query result visualization to understand the data and to
communicate with the exploration software would be
valuable
Copyright © Miracle Finland Oy
User Interaction,
Query Visualization
 The role of visualization for data analytics is huge (a
picture tells more than 1000 words)
 visualization tools that assist users in navigating the
underlying data structures
 tools that incorporate new types of interactions such
as collaborative annotations and searches
 Query result reduction (aggregation, sampling and/or
filtering operations)
 Query result visualization
Copyright © Miracle Finland Oy
User Interaction,
User Involvement
 the relevance of an answer for a specific user
 predict users’ interests and anticipations in order to issue
the most relevant answer
 Some possible techniques
 Grouping the users (subgroups)
 classify different notions of relevance
 devise strategies to customize the system response to
the user behavior
Copyright © Miracle Finland Oy
Middleware
 Building various optimizations on top of the database
layer/engine
 Can be used to improve the efficiency of the data
exploration without changing the underlying
architecture
 The exploration tasks are computationally heavy and
can be speedup in the middleware layer with
 Data Prefetching (and background execution)
 Query Approximation
Copyright © Miracle Finland Oy
Middleware, Data Prefetching
 caching data sets which are likely to be used
 the trade-off between introducing new results and
re-using cached ones
 the main challenge is identifying the data set with
the highest utility
Copyright © Miracle Finland Oy
Middleware, Query Approximation
 The system offers approximate answers
 Allows users to get a quick sense of whether a
particular query reveals anything interesting about
the data
 Need solutions that process queries on sampled
data sets to provide fast query response times
 the trade-off between results accuracy and query
performance
Copyright © Miracle Finland Oy
Database Engine/Layer, Solutions
1. Adaptive Indexing
2. Adaptive Loading
3. Adaptive Storage
4. Flexible Architecture
5. Architectures tailored for approximation processing
Copyright © Miracle Finland Oy
Adaptive Indexing
 creating indexes incrementally and adaptively during
query processing based on the columns, tables and
value ranges that queries request
 Indexes are built gradually; as more queries arrive
indexes are continuously fine-tuned
Copyright © Miracle Finland Oy
Adaptive Loading
 Not all data is needed; users might want to leave
some parts of the data unloaded
 Not all data is available; users might start querying
a database system before all data is loaded
Copyright © Miracle Finland Oy
Adaptive Storage
 There is no one perfect solution for storage layout for all
the data.
 Adaptive Storage can be used for different needs of
storage layouts
 Examples
 OctopusDB (a central log and Storage Views)
 H2O (supports multiple storage layouts and data access
patterns, decides during query processing, which design is
best for classes of queries)
 In-memory columnar stores (for example Oracle)
 real-time materialized views (Oracle 12c rel 2)
Copyright © Miracle Finland Oy
Flexible Architecture
 can tune the architecture to the task at hand
 by having a declarative interface for the physical data
layouts
 by having a declarative interface for the whole engine
 using organic databases to continuously match incoming
data and queries
 Start with a schema that is ”good-enough”, schema
evolution while loading and querying
Copyright © Miracle Finland Oy
Architectures tailored for
approximation processing
 An architecture where storage and access patterns are
tailored to efficiently support sampling-based query
processing
 efficient access and updates of sampled data sets
 In the DB core
 For example Oracle Database 12c: Adaptive Dynamic
Sampling (for the optimizer)
 DICE (Distributed and Interactive Cube Exploration) faceted
exploration of data cubes; combines sampling with
speculative execution in the same engine
 Design new engines is one of the open challenges.
Copyright © Miracle Finland Oy
Techniques for Large Data Size
 Summarizing: fully pre-computing is not possible and not smart
 The size of data summarized must be large enough to estimate
statistical parameters and distributions, but manageable from the
computation viewpoint
 Data mining techniques
 Histograms, presenting a histogram algebra for efficiently
executing queries on the histograms instead of on the data,
and estimating the quality of the approximate answers
 Effective pruning techniques, not all node-paths are interesting
from the user viewpoint, and the ones that fail to satisfy this
requirement should be discarded as soon as possible.
Copyright © Miracle Finland Oy
 Two possibilities:
 Polyglot persistence
 Multimodel database
Different Data Models
Copyright © Miracle Finland Oy
 Different kinds of data models are best dealt with
different data stores
 Data is stored using multiple data storage
technologies to support better the usage of the data
(applications, components)
Polyglot Persistence
Copyright © Miracle Finland Oy
 Even using Oracle technology you can build a polyglot
solution:
 Berkeley DB as a Key-Value store
 Oracle NoSQL Database as a Key-Value and sharded
database
 OracleTimesTen as an In-Memory Database
 Essbase for analytic processing
 …
Oracle and Polyglot Persistence
Copyright © Miracle Finland Oy
 Different query languages
 Different frontends/user interfaces
 How to join data?
 Is data consistents?
 Is data secured?
 Many different skills needed
 …
What’s wrong with polyglot
Copyright © Miracle Finland Oy
Polyglot vs. Multimodel
Copyright © Miracle Finland Oy
 A multi-model database is designed to support
multiple data models against a single, integrated
backend
 One query language
 All data available
 ACID (or any level needed), security, backup/recovery,
…
 All the ”good” from relational added with all the
”good” from everywhere else
Multi-model database
Copyright © Miracle Finland Oy
How to save and how to retrive the
data?
Copyright © Miracle Finland Oy
Database Engine/Layer
 A database is all about saving and retrieving data
 The way the data is stored defines the best possible
ways of accessing it, the way data is accessed defines
the best way to store it…
 We might not know what data we will be loading
 We do not know what we are looking for
-> therefore are unable to define the ”best” way of storing
 how can the architecture of database systems be
redesigned to aid data exploration tasks?
Copyright © Miracle Finland Oy
 Efficiently, Reliably
 Understanding the data while saving
 finding the PK to be able to join to other data
 Taxonomies to understand that first name = fname?
 Metadata Management
 …
 Categorizing
 Some data is more valuable than other
 some data is more reliable than other
 not all the data want to be stored in the same, operational database
 data classification (reliable/no reliable)
Saving
Copyright © Miracle Finland Oy
 The data need to be classified before storing and the
qualification might affect where and on which format the
data is stored.
 Maybe transforming?
 Usually the disk space in RDBMS is expensive
 -> new ways to store the data (Hadoop,…)
 What data to save close to what and why
 todays spam is tomorrows ham (possibility to re-
categorize)
 How to handle date datatype?
 …
Saving
Copyright © Miracle Finland Oy
 Quite often the solution for saving a new data model
is CLOB, BLOB or external table
 But we need more than that…
Oracle Database
Copyright © Miracle Finland Oy
Interesting solutions in Oracle
Database
Copyright © Miracle Finland Oy
 save as it is, use it with SQL
 Now only JSON, something more in the future?
 “write without schema, read with schema”
JSON DataGuide in Oracle Database
12cR2
Copyright © Miracle Finland Oy
 Enabling FSDM (Flexible Schema Data Management) in a
RDBMS
 Storing JSON objects as aggregated, self-described entities
without shredding them into relational rows and columns
 Indexing JSON using a schema-agnostic strategy to support
ad-hoc queries that search both schema and values
together
 Querying JSON using SQL as the inter-document query
language and SQL/JSON path language as the intra-
document query language
JSON DataGuide in Oracle Database
12cR2 and on
Copyright © Miracle Finland Oy
 DataGuide information is part of the JSON search index
infrastructure
 CREATE SEARCH INDEX FOR JSON
 Can be created for index only, DataGuide only, both (the
default is both)
 database privilege CTXAPP needed
 The content of DataGuide is automatically updated
whenever the index is synchronized.
 To obtain the DataGuide information stored in a JSON
search index use PL/SQL function
DBMS_JSON.get_index_dataguide
Creating a DataGuide, Persistent
Copyright © Miracle Finland Oy
 When the index is synced, additive updates to the document
set are automatically reflected in the persisted DataGuide
information
 Deletions are not updated
 you must create the index again if the data of deletion is needed
 can also include statistics (how frequently each JSON field is
used in the document set)
 Explicitly gathered (EXEC
DBMS_STATS.gather_index_stats(username,
'json_doc_search_idx', estimate_percent => 90);)
 Not updated automatically
JSON search index and Data Guide
Copyright © Miracle Finland Oy
 There are two formats for a DataGuide (both defined as
CLOB):
 Flat
 to query DataGuide information such as field frequencies and
types
 Hierarchical
 to add virtual columns, or to create a view using particular fields
that you choose on the basis of DataGuide information
 (DBMS.JSON.FORMAT_FLAT or
DBMS.JSON.FORMAT_HIERARCHICAL)
Formats of DataGuide
Copyright © Miracle Finland Oy
 Based on DataGuide information for a JSON column, you
can project scalar fields as virtual columns to the same
table that contains this JSON column
 From
 hierarchical DataGuide or
 DataGuide enabled JSON search index
 DBMS_JSON.add_virtual_columns ,
DBMS_JSON.drop_virtual_columns,
DBMS_JSON.rename_column
 can decide which columns should be projected based on the
frequency of the elements in the documents stored in the
JSON column (frequency => )
DataGuide and a Virtual Column
Copyright © Miracle Finland Oy
 To improve the performance you can
 Build an index on it
 Gather statistics on it for the optimizer
 Load it into the In-Memory Column Store
DataGuide and a Virtual Column
Copyright © Miracle Finland Oy
 Relational views on top of a set of JSON files for
querying the data as it were relational
 still only saved once and as a JSON document
 You can create multiple views based on the same
JSON document set, projecting different fields
JSON DataGuide, views
Copyright © Miracle Finland Oy
 Based on a Hierarchical DataGuide
 The fields projected are those in the DataGuide
 You can edit the DataGuide to include only the fields that you want
to project.
 DBMS_JSON.create_view, DBMS_JSON. create_view_on_path
(create the view based on the frequency of the elements)
 Based on a Path Expression
 You can use the information in a DataGuide enabled JSON search
index to create a database view
 Columns project JSON fields from your documents
 The fields projected are the scalar fields and the scalar fields in the
data targeted by a specified SQL/JSON path expression.
Creating views over JSON data
Copyright © Miracle Finland Oy
CREATE TABLE json_documents (
id RAW(16) NOT NULL,
data CLOB,
CONSTRAINT json_documents_pk PRIMARY KEY (id),
CONSTRAINT json_documents_json_chk CHECK (data IS
JSON)
);
Creating views over JSON data, the
table
Copyright © Miracle Finland Oy
BEGIN
DBMS_JSON.create_view_on_path(
viewname => 'json_documents_view',
tablename => 'json_documents',
jcolname => 'data',
path => '$',
frequency => 50);
END;
/
Creating views over JSON data
Copyright © Miracle Finland Oy
DESC json_documents_view
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NOT NULL RAW(16)
FIRST_NAME VARCHAR2(30)
LAST_NAME VARCHAR2(30)
STREET VARCHAR2(50)
POSTCODE VARCHAR2(10)
CITY VARCHAR2(30)
COUNTRY VARCHAR2(100)
EMAIL VARCHAR2(100)
PHONE VARCHAR2(20)
Creating views over JSON data, the
view
Copyright © Miracle Finland Oy
 It enables FSDM (Flexible Schema Data Management)
in a RDBMS
 And as a solution might be used in other cases for
different semi-structured data models in the future…
Why DataGuide is interesting?
Copyright © Miracle Finland Oy
 Save as it is, copy to a faster data model kept in-
memory
 Now only columnar store
In-Memory
Copyright © Miracle Finland Oy
 Very interesting for retrieving data…
 Saving data as its original data model and showing a copy of it
on an in-memory version of another data model when needed
 Original data as it is and a copy of it created in another data
model when needed
 Fast (relative term ☺ )
 Synchronized automatically
 No additional storage needed
 The version of a data model is used that works best for the need
 Maybe other data models could be stored in-memory too?
Graph?
In-Memory
Copyright © Miracle Finland Oy
 So many data models
 The network data model
 The hierarchical data model
 The relational data model
 Objects
 XML
 Spatial
 NoSQL models
 …
 Flexible Schema Data (FSD): ”data first, schema
later/never”
Conclusion
Copyright © Miracle Finland Oy
 Several V’s related to Big Data…
 Volume
 Velocity
 Variety
 Veracity
 Viability
 Value
 Variability
 Visualization
 …
Conclusion
Copyright © Miracle Finland Oy
Conclusion
 Data exploration is about finding new information and
knowledge from the data without knowing what you
are looking for. And it brings a lot of pressure for the
performance…in all levels:
 the user interaction/user interface
 the middleware
 the database layer/engine
Copyright © Miracle Finland Oy
Conclusion
 Various data models, solutions
 Polyglot
 Multimodel database
 Promising solutions in Oracle Database
 JSON DataGuide in Oracle Database 12cR2
 In-Memory
 Plenty of research going on in this area and results
needed soon…
THANK YOU!
QUESTIONS?
Email: heli@miracleoy.fi
Twitter: @HeliFromFinland
Blog: Helifromfinland.com

More Related Content

PDF
Minimizing the Complexities of Machine Learning with Data Virtualization
PDF
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
PDF
Getting Started with Data Virtualization – What problems DV solves
PDF
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
PDF
Data Virtualization: Introduction and Business Value (UK)
PDF
Best Practices: Data Virtualization Perspectives and Best Practices
PDF
Enabling Cloud Data Integration (EMEA)
PDF
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
Minimizing the Complexities of Machine Learning with Data Virtualization
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
Getting Started with Data Virtualization – What problems DV solves
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Data Virtualization: Introduction and Business Value (UK)
Best Practices: Data Virtualization Perspectives and Best Practices
Enabling Cloud Data Integration (EMEA)
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics

What's hot (20)

PDF
Data Virtualization: From Zero to Hero
PDF
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
PDF
Data Warehouse Logical Design Guide
PPTX
Technical Demonstration - Denodo Platform 7.0
PDF
Logical Data Fabric: Architectural Components
PDF
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
PPTX
Data management
PPTX
A Big Data Journey
PDF
Performance Acceleration: Summaries, Recommendation, MPP and more
PDF
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
PDF
Flash session -streaming--ses1243-lon
PDF
Future of Data Strategy
PDF
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
PPT
Why Data Virtualization? An Introduction by Denodo
PDF
Ten Pillars of World Class Data Virtualization
PPTX
The Need to Know for Information Architects: Big Data to Big Information
PPTX
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
PDF
Supporting Data Services Marketplace using Data Virtualization
PDF
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
PPT
Data Modeling Presentations I
Data Virtualization: From Zero to Hero
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Warehouse Logical Design Guide
Technical Demonstration - Denodo Platform 7.0
Logical Data Fabric: Architectural Components
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Data management
A Big Data Journey
Performance Acceleration: Summaries, Recommendation, MPP and more
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Flash session -streaming--ses1243-lon
Future of Data Strategy
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Why Data Virtualization? An Introduction by Denodo
Ten Pillars of World Class Data Virtualization
The Need to Know for Information Architects: Big Data to Big Information
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Supporting Data Services Marketplace using Data Virtualization
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Data Modeling Presentations I
Ad

Similar to [db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Database』 (20)

PDF
[db tech showcase Tokyo 2018] #dbts2018 #B36 『Design Your Databases straight ...
PPTX
Master Meta Data
PDF
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
PDF
Big Data: Its Characteristics And Architecture Capabilities
PDF
Modern Data Management for Federal Modernization
PDF
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
PPTX
Insights into Real-world Data Management Challenges
PDF
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
PPTX
Insights into Real World Data Management Challenges
PDF
Unlock Your Data for ML & AI using Data Virtualization
PPTX
Extreme SSAS- SQL 2011
PDF
Oracle databáze – Konsolidovaná Data Management Platforma
PPTX
Fast Data Strategy Houston Roadshow Presentation
PDF
Benefits of a data lake
PDF
Big Data using NoSQL Technologies
PPTX
Exploiting Data Lakes: Architecture, Capabilities & Future
PDF
Data Virtualization. An Introduction (ASEAN)
PDF
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
PPTX
Choosing technologies for a big data solution in the cloud
PDF
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
[db tech showcase Tokyo 2018] #dbts2018 #B36 『Design Your Databases straight ...
Master Meta Data
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
Big Data: Its Characteristics And Architecture Capabilities
Modern Data Management for Federal Modernization
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Insights into Real-world Data Management Challenges
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Insights into Real World Data Management Challenges
Unlock Your Data for ML & AI using Data Virtualization
Extreme SSAS- SQL 2011
Oracle databáze – Konsolidovaná Data Management Platforma
Fast Data Strategy Houston Roadshow Presentation
Benefits of a data lake
Big Data using NoSQL Technologies
Exploiting Data Lakes: Architecture, Capabilities & Future
Data Virtualization. An Introduction (ASEAN)
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Choosing technologies for a big data solution in the cloud
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Ad

More from Insight Technology, Inc. (20)

PDF
グラフデータベースは如何に自然言語を理解するか?
PDF
Docker and the Oracle Database
PDF
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
PDF
事例を通じて機械学習とは何かを説明する
PDF
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
PDF
MBAAで覚えるDBREの大事なおしごと
PDF
グラフデータベースは如何に自然言語を理解するか?
PDF
DBREから始めるデータベースプラットフォーム
PDF
SQL Server エンジニアのためのコンテナ入門
PDF
Lunch & Learn, AWS NoSQL Services
PDF
db tech showcase2019オープニングセッション @ 森田 俊哉
PDF
db tech showcase2019 オープニングセッション @ 石川 雅也
PDF
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
PPTX
難しいアプリケーション移行、手軽に試してみませんか?
PPTX
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
PPTX
そのデータベース、クラウドで使ってみませんか?
PPTX
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
PDF
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
PPTX
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
PPTX
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
グラフデータベースは如何に自然言語を理解するか?
Docker and the Oracle Database
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
事例を通じて機械学習とは何かを説明する
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
MBAAで覚えるDBREの大事なおしごと
グラフデータベースは如何に自然言語を理解するか?
DBREから始めるデータベースプラットフォーム
SQL Server エンジニアのためのコンテナ入門
Lunch & Learn, AWS NoSQL Services
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
難しいアプリケーション移行、手軽に試してみませんか?
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
そのデータベース、クラウドで使ってみませんか?
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Approach and Philosophy of On baking technology
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Modernizing your data center with Dell and AMD
PDF
KodekX | Application Modernization Development
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Understanding_Digital_Forensics_Presentation.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Approach and Philosophy of On baking technology
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Modernizing your data center with Dell and AMD
KodekX | Application Modernization Development
Per capita expenditure prediction using model stacking based on satellite ima...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Understanding_Digital_Forensics_Presentation.pptx

[db tech showcase Tokyo 2018] #dbts2018 #B38 『Big Data and the Multi-model Database』

  • 1. Big Data and the Multi-model Database Heli Helskyaho DB Tech Showcase Tokyo, 2018
  • 2. Copyright © Miracle Finland Oy  Graduated from University of Helsinki (Master of Science, computer science), currently a doctoral student, researcher and lecturer (databases, Big Data, Multi-model Databases, methods and tools for utilizing semi-structured data for decision making) at University of Helsinki  Worked with Oracle products since 1993, worked for IT since 1990  Data and Database!  CEO for Miracle Finland Oy  Oracle ACE Director  Ambassador for EOUC (EMEA Oracle Users Group Community)  Listed as one of the TOP 100 influences on IT sector in Finland (2015, 2016, 2017)  Public speaker and an author  Author of the book Oracle SQL Developer Data Modeler for Database Design Mastery (Oracle Press, 2015), co-author for Real World SQL and PL/SQL: Advice from the Experts (Oracle Press, 2016) Introduction, Heli
  • 3. Copyright © Miracle Finland Oy
  • 4. Copyright © Miracle Finland Oy References  [1] Marcello Buoncristiano, Giansalvatore Mecca, Elisa Quintarelli, Manuel Roveri, Donatello Santoro, Letizia Tanca: Database Challenges for Exploratory Computing. SIGMOD Record 44(2): 17-22 (2015).  [2] Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri: Overview of Data Exploration Techniques. SIGMOD Conference 2015: 277-281.  [3] Zhen Hua Liu, Dieter Gawlick: Management of Flexible Schema Data in RDBMSs - Opportunities and Limitations for NoSQL. 7th Biennial Conference on Innovative Data Systems Research (CIDR ’15) January 4-7, 2015, Asilomar, California, USA.  [4] Z. H. Liu, B. Hammerschmidt, D. McMahon, Y. Liu, H.J.Chang: Closing the functional and Performance Gap between SQL and NoSQL. SIGMOD Conference 2016: 227-238  [5] Z. H. Liu, B. Christoph Hammerschmidt, D. McMahon: JSON data management: supporting schema-less development in RDBMS. SIGMOD Conference 2014: 1247-1258  [6] https://docs.oracle.com/database/122/ADJSN/json-dataguide.htm#ADJSN-GUID- 219FC30E-89A7-4189-BC36-7B961A24067C
  • 5. Copyright © Miracle Finland Oy  Solutions for storing and retrieving data  The network data model, the hierarchical data model …  The relational data model  The relational database management system (RDBMS)  has been de facto as a data model solution for decades  based on solid theory  standardized environment for storing and retrieving data The history of data models, briefly
  • 6. Copyright © Miracle Finland Oy  of software development and programming languages leads to new demands for data models The evolution…
  • 7. Copyright © Miracle Finland Oy  The need to save and retrieve objects easily using object-oriented programming languages  -> Object databases  -> Later these features were added to a RDBMS making it an ORDMS (Object-Relational DataBase Management System)  Supported in Oracle Database Objects
  • 8. Copyright © Miracle Finland Oy  To transfer data in a standardized way  The need to save and retrieve XML documents easily  XML Databases  Later these features were added to the RDBMS  Supported in Oracle Database (XMLType and its methods, SQL functions, indexing, registering XML Schema,…) XML (Extensible Markup Language)
  • 9. Copyright © Miracle Finland Oy  The need to save and retrieve spatial data easily  Spatial Databases  Later these features were added to the RDBMS  Supported in Oracle Database Spatial
  • 10. Copyright © Miracle Finland Oy  Relational Data: ”schema first, data later”  Flexible Schema Data (FSD): ”data first, schema later/never” New demand for a Data Model: FSD
  • 11. Copyright © Miracle Finland Oy  An ”umbrella” for different solutions, each for a certain purpose/problem  Document Stores  Key-value pair stores  Column stores  Graph stores  No ACID (Atomicity, Consistency, Isolation, and Durability) but maybe BASE (Basically Available, Soft state, Eventual consistency).  Scale-up vs scale-out NoSQL
  • 12. Copyright © Miracle Finland Oy  JSON for Javascript programming, lighter version of XML  The need to save and retrieve JSON (Javascript Notation) documents easily  Supported in Oracle Database (CLOB, NCLOB) Document Stores
  • 13. Copyright © Miracle Finland Oy  something between semi-structured and text, key is structured and value is text  Supported in Oracle Database (+Oracle NoSQL) Key-value pair stores
  • 14. Copyright © Miracle Finland Oy  Data stored in columns, not rows  Supported in Oracle Database (in-memory column store) Column stores
  • 15. Copyright © Miracle Finland Oy  About relationships  Nodes/vertices, links/edges and properties  Each vertix and each edge has a collection of key-value properties  A graph can be created from a relational model quite easily  Supported in Oracle Database  To me the most interesting of these NoSQL solutions….  But worth a presentation of its own Graph stores
  • 16. Copyright © Miracle Finland Oy  Slowly these features have been added to the RDBMS…  But can this continue?  New feature-> new db-> added to RDBMS NoSQL
  • 17. Copyright © Miracle Finland Oy Key Findings  Operational DBMS use cases have expanded beyond traditional transactions — still the leading revenue generator — to encompass three additional use cases: distributed variable data; lightweight events and observations; and hybrid transactional/analytical processing (HTAP).  Relational operational DBMS products and emerging NoSQL and multimodel offerings are entering the market to address some of these new use cases, creating opportunities for best-fit engineering, but threatening established company standards for information management.  Incumbent vendors are responding by expanding the capabilities of their relational database management systems (RDBMSs) into multimodel territory, but hedging their bets by also adding specialized products to their portfolios, creating both multimodel and multiproduct choices — and confusion in the minds of buyers. Gartner, Critical Capabilities for Operational Database Management Systems, Dec 9th 2015
  • 18. Copyright © Miracle Finland Oy Recommendations  Identify the use cases you have under consideration, and assess your current operational DBMS products' fit, costs, deployment options and skills requirements.  Evaluate potential alternative product selections based on best-fit engineering that maps to your use cases — don't pay for features you don't and will not need. Use this Critical Capabilities document alongside its companion Magic Quadrant to identify candidates.  Issue RFPs (using Gartner's RFP Toolkit templates) and conduct proof of concept (POC) exercises to confirm and validate candidates.  Update your standards, training and support organizations to manage, qualify and accommodate additional requirements if your technology portfolio is expanding. Gartner, Critical Capabilities for Operational Database Management Systems, Dec 9th 2015
  • 19. Copyright © Miracle Finland Oy Strategic Planning Assumptions  By 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform.  Through 2018, 50% of small operational DBMS vendors will disappear due to acquisitions, mergers or business failures.  By 2017, the "NoSQL" label will cease to distinguish DBMSs, reducing its value and resulting in the label falling out of use. Gartner, Critical Capabilities for Operational Database Management Systems, Dec 9th 2015
  • 20. Copyright © Miracle Finland Oy  Brings even more than just the need for new data models… The Big Data…
  • 21. Copyright © Miracle Finland Oy  There is no size that makes a data to be ”Big Data”, it always depends on the capabilities  The data is ”Big” when traditional processing with traditional tools is not possible due to the amount or the complexity of the data  You cannot open an attachement in email  You cannot edit a photo  etc. What is Big Data?
  • 22. Copyright © Miracle Finland Oy  Volume, the size/scale of the data  Velocity, the speed of change, analysis of streaming data  Variety, different formats of data sources, different forms of data; structured, semi-structured, unstructured The three V’s
  • 23. Copyright © Miracle Finland Oy  Veracity, the uncertainty of the data, the data is worthless or harmful if it’s not accurate  Viability, validate that hypothesis before taking further action (and, in the process of determining the viability of a variable, we can expand our view to determine other variables)  Value, the potential value  Variability, refers to data whose meaning is constantly changing, in consistency of data; for example words and context  Visualization, a way of presenting the data in a manner that’s readable and accessible The other V’s
  • 24. Copyright © Miracle Finland Oy  Are easy to understand and there are plenty of technical solutions to solve the problems but… Volume and Velocity
  • 25. Copyright © Miracle Finland Oy Variety leads to…  -> diversity, complexity, …  Users can not query all of their data using a single high level declarative query language. They have to  use different query languages for querying different data  implement their own join algorithms to join between relational data and FSD  Also many specialized systems lack essential advanced functionalities, such as ACID and fine grain security, that are the “norm” in RDBMSs
  • 26. Copyright © Miracle Finland Oy Data Exploration  Data exploration is to efficiently extract knowledge from data, even though you do not know what you are looking for.  Exploratory computing: a conversation between a user and a computer.  The system provides active support  The exploration process is investigation, exploration- seeking, comparison-making, and learning  Wide range of different techniques is needed (the use of statistics, data analysis, query suggestion, advanced visualization tools, etc.)
  • 27. Copyright © Miracle Finland Oy Not a new thing…  J. W. Tukey. Exploratory data analysis. Addison- Wesley, Reading,MA, 1977:  “with exploratory data analysis the researcher explores the data in many possible ways, including the use of graphical tools like boxplots or histograms, gaining knowledge from the way data are displayed”
  • 28. Copyright © Miracle Finland Oy Old problems and new problems  More and more data (volume)  Different data models and formats (variety)  Loading in progress while data exploration going on (velocity)  Not all data is reliable (veracity)  We do not know what we are looking for (value, viability, variability)  Must support also non-technical users (journalists, investors, politicians,…) (visualization)  All must be done efficiently and fast
  • 29. Copyright © Miracle Finland Oy Different Layers  the user interaction/user interface  the middleware  the database layer/engine
  • 30. Copyright © Miracle Finland Oy Challenges for the User Interaction  The user interface should help the user to find information she/he was not aware of and to find even more information based on the information already found.  The data is in different formats, it is a large amount of different kinds of data and analyzing it takes time.  The conversation must be fluently responsive.  Not all users are IT professionals  The query result visualization to understand the data and to communicate with the exploration software would be valuable
  • 31. Copyright © Miracle Finland Oy User Interaction, Query Visualization  The role of visualization for data analytics is huge (a picture tells more than 1000 words)  visualization tools that assist users in navigating the underlying data structures  tools that incorporate new types of interactions such as collaborative annotations and searches  Query result reduction (aggregation, sampling and/or filtering operations)  Query result visualization
  • 32. Copyright © Miracle Finland Oy User Interaction, User Involvement  the relevance of an answer for a specific user  predict users’ interests and anticipations in order to issue the most relevant answer  Some possible techniques  Grouping the users (subgroups)  classify different notions of relevance  devise strategies to customize the system response to the user behavior
  • 33. Copyright © Miracle Finland Oy Middleware  Building various optimizations on top of the database layer/engine  Can be used to improve the efficiency of the data exploration without changing the underlying architecture  The exploration tasks are computationally heavy and can be speedup in the middleware layer with  Data Prefetching (and background execution)  Query Approximation
  • 34. Copyright © Miracle Finland Oy Middleware, Data Prefetching  caching data sets which are likely to be used  the trade-off between introducing new results and re-using cached ones  the main challenge is identifying the data set with the highest utility
  • 35. Copyright © Miracle Finland Oy Middleware, Query Approximation  The system offers approximate answers  Allows users to get a quick sense of whether a particular query reveals anything interesting about the data  Need solutions that process queries on sampled data sets to provide fast query response times  the trade-off between results accuracy and query performance
  • 36. Copyright © Miracle Finland Oy Database Engine/Layer, Solutions 1. Adaptive Indexing 2. Adaptive Loading 3. Adaptive Storage 4. Flexible Architecture 5. Architectures tailored for approximation processing
  • 37. Copyright © Miracle Finland Oy Adaptive Indexing  creating indexes incrementally and adaptively during query processing based on the columns, tables and value ranges that queries request  Indexes are built gradually; as more queries arrive indexes are continuously fine-tuned
  • 38. Copyright © Miracle Finland Oy Adaptive Loading  Not all data is needed; users might want to leave some parts of the data unloaded  Not all data is available; users might start querying a database system before all data is loaded
  • 39. Copyright © Miracle Finland Oy Adaptive Storage  There is no one perfect solution for storage layout for all the data.  Adaptive Storage can be used for different needs of storage layouts  Examples  OctopusDB (a central log and Storage Views)  H2O (supports multiple storage layouts and data access patterns, decides during query processing, which design is best for classes of queries)  In-memory columnar stores (for example Oracle)  real-time materialized views (Oracle 12c rel 2)
  • 40. Copyright © Miracle Finland Oy Flexible Architecture  can tune the architecture to the task at hand  by having a declarative interface for the physical data layouts  by having a declarative interface for the whole engine  using organic databases to continuously match incoming data and queries  Start with a schema that is ”good-enough”, schema evolution while loading and querying
  • 41. Copyright © Miracle Finland Oy Architectures tailored for approximation processing  An architecture where storage and access patterns are tailored to efficiently support sampling-based query processing  efficient access and updates of sampled data sets  In the DB core  For example Oracle Database 12c: Adaptive Dynamic Sampling (for the optimizer)  DICE (Distributed and Interactive Cube Exploration) faceted exploration of data cubes; combines sampling with speculative execution in the same engine  Design new engines is one of the open challenges.
  • 42. Copyright © Miracle Finland Oy Techniques for Large Data Size  Summarizing: fully pre-computing is not possible and not smart  The size of data summarized must be large enough to estimate statistical parameters and distributions, but manageable from the computation viewpoint  Data mining techniques  Histograms, presenting a histogram algebra for efficiently executing queries on the histograms instead of on the data, and estimating the quality of the approximate answers  Effective pruning techniques, not all node-paths are interesting from the user viewpoint, and the ones that fail to satisfy this requirement should be discarded as soon as possible.
  • 43. Copyright © Miracle Finland Oy  Two possibilities:  Polyglot persistence  Multimodel database Different Data Models
  • 44. Copyright © Miracle Finland Oy  Different kinds of data models are best dealt with different data stores  Data is stored using multiple data storage technologies to support better the usage of the data (applications, components) Polyglot Persistence
  • 45. Copyright © Miracle Finland Oy  Even using Oracle technology you can build a polyglot solution:  Berkeley DB as a Key-Value store  Oracle NoSQL Database as a Key-Value and sharded database  OracleTimesTen as an In-Memory Database  Essbase for analytic processing  … Oracle and Polyglot Persistence
  • 46. Copyright © Miracle Finland Oy  Different query languages  Different frontends/user interfaces  How to join data?  Is data consistents?  Is data secured?  Many different skills needed  … What’s wrong with polyglot
  • 47. Copyright © Miracle Finland Oy Polyglot vs. Multimodel
  • 48. Copyright © Miracle Finland Oy  A multi-model database is designed to support multiple data models against a single, integrated backend  One query language  All data available  ACID (or any level needed), security, backup/recovery, …  All the ”good” from relational added with all the ”good” from everywhere else Multi-model database
  • 49. Copyright © Miracle Finland Oy How to save and how to retrive the data?
  • 50. Copyright © Miracle Finland Oy Database Engine/Layer  A database is all about saving and retrieving data  The way the data is stored defines the best possible ways of accessing it, the way data is accessed defines the best way to store it…  We might not know what data we will be loading  We do not know what we are looking for -> therefore are unable to define the ”best” way of storing  how can the architecture of database systems be redesigned to aid data exploration tasks?
  • 51. Copyright © Miracle Finland Oy  Efficiently, Reliably  Understanding the data while saving  finding the PK to be able to join to other data  Taxonomies to understand that first name = fname?  Metadata Management  …  Categorizing  Some data is more valuable than other  some data is more reliable than other  not all the data want to be stored in the same, operational database  data classification (reliable/no reliable) Saving
  • 52. Copyright © Miracle Finland Oy  The data need to be classified before storing and the qualification might affect where and on which format the data is stored.  Maybe transforming?  Usually the disk space in RDBMS is expensive  -> new ways to store the data (Hadoop,…)  What data to save close to what and why  todays spam is tomorrows ham (possibility to re- categorize)  How to handle date datatype?  … Saving
  • 53. Copyright © Miracle Finland Oy  Quite often the solution for saving a new data model is CLOB, BLOB or external table  But we need more than that… Oracle Database
  • 54. Copyright © Miracle Finland Oy Interesting solutions in Oracle Database
  • 55. Copyright © Miracle Finland Oy  save as it is, use it with SQL  Now only JSON, something more in the future?  “write without schema, read with schema” JSON DataGuide in Oracle Database 12cR2
  • 56. Copyright © Miracle Finland Oy  Enabling FSDM (Flexible Schema Data Management) in a RDBMS  Storing JSON objects as aggregated, self-described entities without shredding them into relational rows and columns  Indexing JSON using a schema-agnostic strategy to support ad-hoc queries that search both schema and values together  Querying JSON using SQL as the inter-document query language and SQL/JSON path language as the intra- document query language JSON DataGuide in Oracle Database 12cR2 and on
  • 57. Copyright © Miracle Finland Oy  DataGuide information is part of the JSON search index infrastructure  CREATE SEARCH INDEX FOR JSON  Can be created for index only, DataGuide only, both (the default is both)  database privilege CTXAPP needed  The content of DataGuide is automatically updated whenever the index is synchronized.  To obtain the DataGuide information stored in a JSON search index use PL/SQL function DBMS_JSON.get_index_dataguide Creating a DataGuide, Persistent
  • 58. Copyright © Miracle Finland Oy  When the index is synced, additive updates to the document set are automatically reflected in the persisted DataGuide information  Deletions are not updated  you must create the index again if the data of deletion is needed  can also include statistics (how frequently each JSON field is used in the document set)  Explicitly gathered (EXEC DBMS_STATS.gather_index_stats(username, 'json_doc_search_idx', estimate_percent => 90);)  Not updated automatically JSON search index and Data Guide
  • 59. Copyright © Miracle Finland Oy  There are two formats for a DataGuide (both defined as CLOB):  Flat  to query DataGuide information such as field frequencies and types  Hierarchical  to add virtual columns, or to create a view using particular fields that you choose on the basis of DataGuide information  (DBMS.JSON.FORMAT_FLAT or DBMS.JSON.FORMAT_HIERARCHICAL) Formats of DataGuide
  • 60. Copyright © Miracle Finland Oy  Based on DataGuide information for a JSON column, you can project scalar fields as virtual columns to the same table that contains this JSON column  From  hierarchical DataGuide or  DataGuide enabled JSON search index  DBMS_JSON.add_virtual_columns , DBMS_JSON.drop_virtual_columns, DBMS_JSON.rename_column  can decide which columns should be projected based on the frequency of the elements in the documents stored in the JSON column (frequency => ) DataGuide and a Virtual Column
  • 61. Copyright © Miracle Finland Oy  To improve the performance you can  Build an index on it  Gather statistics on it for the optimizer  Load it into the In-Memory Column Store DataGuide and a Virtual Column
  • 62. Copyright © Miracle Finland Oy  Relational views on top of a set of JSON files for querying the data as it were relational  still only saved once and as a JSON document  You can create multiple views based on the same JSON document set, projecting different fields JSON DataGuide, views
  • 63. Copyright © Miracle Finland Oy  Based on a Hierarchical DataGuide  The fields projected are those in the DataGuide  You can edit the DataGuide to include only the fields that you want to project.  DBMS_JSON.create_view, DBMS_JSON. create_view_on_path (create the view based on the frequency of the elements)  Based on a Path Expression  You can use the information in a DataGuide enabled JSON search index to create a database view  Columns project JSON fields from your documents  The fields projected are the scalar fields and the scalar fields in the data targeted by a specified SQL/JSON path expression. Creating views over JSON data
  • 64. Copyright © Miracle Finland Oy CREATE TABLE json_documents ( id RAW(16) NOT NULL, data CLOB, CONSTRAINT json_documents_pk PRIMARY KEY (id), CONSTRAINT json_documents_json_chk CHECK (data IS JSON) ); Creating views over JSON data, the table
  • 65. Copyright © Miracle Finland Oy BEGIN DBMS_JSON.create_view_on_path( viewname => 'json_documents_view', tablename => 'json_documents', jcolname => 'data', path => '$', frequency => 50); END; / Creating views over JSON data
  • 66. Copyright © Miracle Finland Oy DESC json_documents_view Name Null? Type ----------------------------------------- -------- ---------------------------- ID NOT NULL RAW(16) FIRST_NAME VARCHAR2(30) LAST_NAME VARCHAR2(30) STREET VARCHAR2(50) POSTCODE VARCHAR2(10) CITY VARCHAR2(30) COUNTRY VARCHAR2(100) EMAIL VARCHAR2(100) PHONE VARCHAR2(20) Creating views over JSON data, the view
  • 67. Copyright © Miracle Finland Oy  It enables FSDM (Flexible Schema Data Management) in a RDBMS  And as a solution might be used in other cases for different semi-structured data models in the future… Why DataGuide is interesting?
  • 68. Copyright © Miracle Finland Oy  Save as it is, copy to a faster data model kept in- memory  Now only columnar store In-Memory
  • 69. Copyright © Miracle Finland Oy  Very interesting for retrieving data…  Saving data as its original data model and showing a copy of it on an in-memory version of another data model when needed  Original data as it is and a copy of it created in another data model when needed  Fast (relative term ☺ )  Synchronized automatically  No additional storage needed  The version of a data model is used that works best for the need  Maybe other data models could be stored in-memory too? Graph? In-Memory
  • 70. Copyright © Miracle Finland Oy  So many data models  The network data model  The hierarchical data model  The relational data model  Objects  XML  Spatial  NoSQL models  …  Flexible Schema Data (FSD): ”data first, schema later/never” Conclusion
  • 71. Copyright © Miracle Finland Oy  Several V’s related to Big Data…  Volume  Velocity  Variety  Veracity  Viability  Value  Variability  Visualization  … Conclusion
  • 72. Copyright © Miracle Finland Oy Conclusion  Data exploration is about finding new information and knowledge from the data without knowing what you are looking for. And it brings a lot of pressure for the performance…in all levels:  the user interaction/user interface  the middleware  the database layer/engine
  • 73. Copyright © Miracle Finland Oy Conclusion  Various data models, solutions  Polyglot  Multimodel database  Promising solutions in Oracle Database  JSON DataGuide in Oracle Database 12cR2  In-Memory  Plenty of research going on in this area and results needed soon…
  • 74. THANK YOU! QUESTIONS? Email: heli@miracleoy.fi Twitter: @HeliFromFinland Blog: Helifromfinland.com