SlideShare a Scribd company logo
© COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Matt Turner, CTO Media & Entertainment
Introduction to MarkLogic NoSQL
SLIDE: 2 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Outline
• Something’s Happening Here
• The Old and the New
 Data models
 Data access
• Discussion
Analysis Operations Access
DATA MAKES AN IMPACT
SLIDE: 4 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Stress on Traditional Data Approaches
Complexity
 Structured
 Unstructured
 Semi-structured
 Raw
 Streams of data
 Constant change
 Agile analytics
 Fail-fast
Volume
Velocity Variety
Volume • Many months of system log files
• Every tweet
• Years of articles
• Relative to current size of
operation
Velocity • Streams of customer feedback
to determine sentiment
• Real-time risk analysis
• Real-time Business Intelligence
Variety • Database feeds
• Raw logs
• Web crawl data
• Articles
• Multi-media
• ALSO: questions!
Examples
Big Data: Gartner coined the “three V’s” description
 Data: Petabyte
scale
 Nodes:
Thousands
SLIDE: 5 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 6 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Stress on Traditional Data Approaches
Complexity
 Structured
 Unstructured
 Semi-structured
 Raw
 Streams of data
 Constant change
 Agile analytics
 Fail-fast
Volume
Velocity Variety
Volume • Many months of system log files
• Every tweet
• Years of articles
• Relative to current size of
operation
Velocity • Streams of customer feedback
to determine sentiment
• Real-time risk analysis
• Real-time Business Intelligence
Variety • Database feeds
• Raw logs
• Web crawl data
• Articles
• Multi-media
• ALSO: questions!
Examples
Big Data: Gartner coined the “three V’s” description
 Data: Petabyte
scale
 Nodes:
Thousands
SLIDE: 7 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leader Quadrant
Online Transaction
Processing RDBS
(May 2002)
SLIDE: 8 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Leader Quadrant
Operational DBMS
(Oct 2014)
Traditional
Mainstays
Upstarts Storm
the Field
SLIDE: 9 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic:
Best Operational
Data Warehouse
(Aug 2014)
SLIDE: 10 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
A Unified, Actionable
360 View of Data
WHAT BUSINESSES WANT
Analysis Operations Access
DATA MAKES AN IMPACT
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
SLIDE: 13 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Is In Silos
 Data is spread across disconnected databases
 M&A outpaces the speed of data integration
 Data needs to be delivered in real time
THE REALITY
SLIDE: 14 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
80% OF TIME
By data scientists just
wrangling data
WASTED
In 2015 on creating relational
data silos
Of data warehouse projects
is on ETL
The Massive Cost of Integrating Data From Silos
36BILLION IN
SPENDING
$% OF THE
COST60
SLIDE: 15 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Relational Databases with
ETL Sacrifice Agility,
Timeliness, and Cost
 All future data needs must be predictable
 New SQL queries require database re-indexing
 Siloed database changes require ETL re-writes
THE IT CHALLENGE
ETL
OLTP
ARCHIVES
ETL
ETL
ETL
DATA MARTS
ETL
WAREHOUSE
REFERENCE DATA
SLIDE: 16 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
OLTP
Warehouse
Data MartsArchives
“Unstructured”
“ ”
Video
Audio
Signals,
Logs,
Streams
Social
Documents,
Messages
{ }
Metadata
Search🔍
Reference
Data
It’s Complicated
The OLD:
Let’s Design the Application
(And pretend it’s the 80s)
SLIDE: 18 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Name Hair Colour Fulltime Employee? Car type
Paul Blond Y
Alex Auburn Y Porsche
Dom Black Y Hummer
name hr_colr flltme_empl car_tp
Let’s Begin… Cast Members
{
How many characters wide should this be? 8? 16? 32?
{
{
{
SLIDE: 19 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
New Schema – Extend Ours!
name hr_colr flltme_empl car_tp
Paul Blond Y
Alex Auburn Y porsche
Dom Black Y Hummer
house_road town city postcode
11d Yonge Pk Finsbury London N4 3NU
Reading
London N43
• Hang on
• If this table had 10k rows, issues?
• First create new big schema
• Then import rows across
• Delete old table?
• Maybe not, legacy programs might use it!
• What if we want to select “Road” only?
• Split out again
• More extensions?
• House name and number?
SLIDE: 20 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
There is another way!
Create a new table and point to it from the old one!
name hr_colr flltme_empl car_tp Address
Paul Blond Y
Alex Auburn Y porsche
Dom Black Y Hummer
house_road town city postcode
11d Yonge Pk Finsbury London N4 3NU
Reading
London N43
SLIDE: 21 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
…
SLIDE: 22 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Now Let’s Store Something More . . . Complicated
Transcript / Book
Info
Title = “NL April 14”
Author = “SNL Cast”
Section
• Chapter
Page
Paragraph =
“I love penguins because…”
Page
Paragraph =
“On the subject of food…”
• Chapter
Page
Section
• Chapter
• Chapter
• Chapter
• Paragraph
• Paragraph
title author Section
I love
Penguins
S. Lion
Issues with Sections? How many columns?
SLIDE: 23 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Don’t Forget Taxonomies
Hierarchical levels of metadata
Fixed to a specific business purpose
 Can’t be re-used in new contexts
Each record can only be associated with
one level
 How many category fields?
Category
Feature
Series
Action
Drama
Comedy
Documentary
…
Cable
Broadcast
Drama
Comedy
…
Action
Drama
Family
Documentary
…
SLIDE: 24 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Result
Requires everything to be defined up front
Data to be transformed and processed to
fit the system
Needs to be redone as information
changes
Costly to create, maintain and only
captures part of the data!
Title ProductionDate Category AssetType Length
Film1 3/1/14 Feature HD Master 2:40
Show1 6/4/13 Series HD720 0:40
Film2 6/4/05 Feature Archive 1:55
Category
Feature
Series
Action
Drama
Comedy
Documentary
…
Cable
Broadcast
Drama
Comedy
…
Action
Drama
Family
Documentary
…
?
Traditional Technology
SLIDE: 26 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
OLTP
Warehouse
Data MartsArchives
“Unstructured”
“ ”
Video
Audio
Signals,
Logs,
Streams
Social
Documents,
Messages
{ }
Metadata
Search🔍
Reference
Data
*NOTE: We only did this
little bit!
Remember?
SLIDE: 27 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
The NEW! Enter NoSQL
Category Description Examples
Key-value • Persistent hash-table “on steroids”
• Typically no single modeling paradigm (e.g. columns
can be primitives, data structures, binaries, etc.)
• Amazon
DynamoDB
• Redis
• Riak
Columnar • Similar to K-V in some ways
• Column may be arranged in groups (families)
• Data types are usually the expected “primitives”
• Works well with “value crunching” (e.g. time series)
• HBase
• Cassandra
Document • URI-mapped (i.e. keyed) documents in lieu of rows
• Supports structured and unstructured content
• Nested context
• MarkLogic
• MongoDB
• Couchbase
Graph • Deals with inter-object graphs
• Relationship oriented
• Think object cache (with pointers) “on steroids”
• Neo4J
• AllegroGraph
• InfoGrid
SLIDE: 28 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
A Database That
Integrates Data Better,
Faster, with Less Cost
THE DESIRED SOLUTION
SLIDE: 29 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
The MarkLogic Alternative
An Operational and Transactional Enterprise NoSQL Database
 Data ingested as is (no ETL)
 Structured and unstructured data
 Data and metadata together
 Adapts to changing data
and changing data structures
EASY TO
GET DATA IN
Flexible Data Model
 Index once and query endlessly
 Real-time and lightning fast
 Query across JSON, XML, text,
geospatial, and semantic triples
in one database
EASY TO
GET DATA OUT
Ask Anything Universal Index
 Reliable data and transactions
(100% ACID compliant)
 Out-of-the-box automatic
failover, replication, and
backup/recovery
 Enterprise-grade security and
Common Criteria certified
100%
TRUSTED
Enterprise Ready
SLIDE: 30 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
The SNL App
SLIDE: 31 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
No need to define up front
Matched to complex content and
metadata data modeling
Data is managed in its most
accessible, natural form
XML, JSON, RDF, geospatial
Flexible Data Model
Schema-agnostic, structure-aware
SLIDE: 32 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Instead of THIS
SLIDE: 33 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Do it like THIS!
SLIDE: 34 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Search and Query
Search to find answers in documents, relationships, and metadata
 Automatic indexing of every data value, text and data
structure
 Specialized indexes for data values (analytics, facets,
sorting), geospatial and triples
 All updated in the context of ACID transactions to
ensure data integrity and real-time access
 Accessible via fully programmable search API with full-
text search, type-ahead suggestions, facets, snippeting,
highlighted search terms, proximity boosting, relevance
ranking, and language support
JavaScript XQuery SPARQL
Rich Query
Capability
In-database
MapReduce
Full-text
Search
Semantic
Search
Geospatial
Search
Timing
Context
Who’s Smarter?
VS
Do domestic dogs interpret pointing as a command?
Animal Cognition (2012): 1-12 , November 09, 2012
By Scheider, Linda; Kaminski, Juliane; Call, Josep; Tomasello, Michael
Context!
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 39
Machines Don’t Get Context . . .
Manu Sporny Founder/CEO - Digital Bazaar, Inc.
http://www.cambridgesemantics.com/semantic-university/what-is-linked-data
SLIDE: 40 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Enter Semantics!
Manu Sporny Founder/CEO - Digital Bazaar, Inc.
http://www.cambridgesemantics.com/semantic-university/what-is-linked-data
SLIDE: 41 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics
Enterprise triple store, document store, and database combined
 Store and query billions of facts and relationships
 Leverage ontologies for domain and role specific
context access to data and documents
 Efficient metadata management with relationships
to ontologies
 Standards-based for ease of use and integration
– RDF, SPARQL, and standard REST
interfaces
SLIDE: 42 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics to Model Relationships
Data model to manage relationships and link together data
‘triples’ describe single facts
Collections of facts describe complex real-world scenarios
“Chevy” ”NBC"
isOn
”SNL"
isOn
isOn
!
SLIDE: 43 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Ontologies Instead of Categories
Actually model information as it is in
the real world
Not limited to a single purpose
 Ontologies for all categories of
metadata
 Even ‘impossible’ categories
like fictional worlds
SLIDE: 44 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
NoSQL and Semantics!
SLIDE: 45 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Real-Time Analytics
Range indexes can be used for
 Faceted search
 Aggregation and visualization
 Analytics…
…including custom user-defined functions
 Co-occurrence
 SQL, ODBC, and BI integration
SLIDE: 46 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Scalability, Elasticity and Cloud
Massive enterprise scalability and elasticity
 Scale horizontally in clusters on commodity
hardware to hundreds of nodes, petabytes of
data, and billions of documents
 Process thousands of multi-document multi-
statement transactions per second
 Start small and scale up or down to meet capacity
and performance demands without over-
provisioning or over-spending
 Fully cloud enabled for automated deployment
and management on EC2
 Leverage dynamic configurations with Tiered
Storage
D-NODE D-NODE
E-NODE E-NODE
D-NODE
Result: Enterprise-ready to power mission critical products
SLIDE: 47 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Case: Deliver Better Information
Present information based on
relationships
Go beyond traditional technology with
depth of content
Drive efficiency using semantic approach
to tagging
SLIDE: 48 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Case: Go Beyond Search
• Concept instead of keyword search
• Related content and information
drive the content discovery and new
interactions
 SNL40 continuous viewing
• Dynamically tailored to the users
specific attributes or activity
SLIDE: 49 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Case: Integrate Data
• Integrate data across the automoti
Bob Pilz
Taxonomy Manager
Mitchell1
SLIDE: 51 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics-driven search
Talent
Kristen Wiig
Acted in
Episode 4
Anne Hathaway and Killers
Part of
Played
Character
Maharelle Sister
Season 34
Segment
The Lawrence Welk Show
Aired on
Date
10/4/08
Era
Acted in
Includes
Part of
SLIDE: 52 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Intelligent recommendation engine

More Related Content

PPTX
MarkLogic Overview, Ron Avnur, MarkLogic
PPTX
Mark logic Corporate Overview
PPTX
MarkLogic and The Universal Index
PDF
MarkLogic Overview and Use Cases
PDF
The New Database Frontier: Harnessing the Cloud
PDF
PoolParty 6.0 - Climbing the Semantic Ladder
PDF
GraphDB Cloud: Enterprise Ready RDF Database on Demand
PDF
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
MarkLogic Overview, Ron Avnur, MarkLogic
Mark logic Corporate Overview
MarkLogic and The Universal Index
MarkLogic Overview and Use Cases
The New Database Frontier: Harnessing the Cloud
PoolParty 6.0 - Climbing the Semantic Ladder
GraphDB Cloud: Enterprise Ready RDF Database on Demand
PoolParty Semantic Suite - Release 6.0 (Technical Overview)

What's hot (20)

PDF
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
PDF
Smarter content with a Dynamic Semantic Publishing Platform
PDF
Vital.AI Creating Intelligent Apps
PDF
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
KEY
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
PDF
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
PDF
One Ontology, One Data Set, Multiple Shapes with SHACL
PDF
Vital AI: Big Data Modeling
PDF
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
PDF
Schneller Nutzen mit Neo4j: das Beispiel Panama Papers
PDF
Running complex data queries in a distributed system
PDF
Paris Spark Meetup - Trifacta - 03_04_2017
PDF
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
PDF
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
PDF
Hadoop,Big Data Analytics and More
PDF
An introduction to multi-model databases
PDF
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
PDF
Rob peglar introduction_analytics _big data_hadoop
PDF
Leveraging Taxonomy Management with Machine Learning
PDF
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Smarter content with a Dynamic Semantic Publishing Platform
Vital.AI Creating Intelligent Apps
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
One Ontology, One Data Set, Multiple Shapes with SHACL
Vital AI: Big Data Modeling
How Graphs Continue to Revolutionize The Prevention of Financial Crime & Frau...
Schneller Nutzen mit Neo4j: das Beispiel Panama Papers
Running complex data queries in a distributed system
Paris Spark Meetup - Trifacta - 03_04_2017
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
Hadoop,Big Data Analytics and More
An introduction to multi-model databases
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
Rob peglar introduction_analytics _big data_hadoop
Leveraging Taxonomy Management with Machine Learning
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...
Ad

Viewers also liked (19)

PDF
Warum NoSQL Datenbanken auf dem Vormarsch sind
PPTX
Warum NoSQL? Wann macht der Einsatz von NoSQL Datenbanken Sinn?
PPTX
Marklogic and the Linked Data Connection
PDF
Cassandra Consistency: Tradeoffs and Limitations
PPTX
SharePoint 2013 Javascript Object Model
PPTX
Share point hosted add ins munich
PPTX
Essential Knowledge for SharePoint Add-Ins
PPTX
Chris O'Brien - Comparing SharePoint add-ins (apps) with Office 365 apps
PPTX
Rev Your Engines - SharePoint Performance Best Practices
PPTX
Real World SharePoint Add-In Development
PPTX
Develop a SharePoint App in 45 Minutes
PPTX
Top 10 sharepoint interview questions with answers
PDF
10 Reasons your SharePoint Migration Failed
PPTX
10 Reasons to Avoid Folders in SharePoint 2013/2010
PPTX
10 Best SharePoint Features You’ve Never Used (But Should)
PDF
Databases, CAP, ACID, BASE, NoSQL... oh my!
PPTX
SharePoint Permissions Worst Practices
PPTX
10 Best Productivity Features in SharePoint 2013
Warum NoSQL Datenbanken auf dem Vormarsch sind
Warum NoSQL? Wann macht der Einsatz von NoSQL Datenbanken Sinn?
Marklogic and the Linked Data Connection
Cassandra Consistency: Tradeoffs and Limitations
SharePoint 2013 Javascript Object Model
Share point hosted add ins munich
Essential Knowledge for SharePoint Add-Ins
Chris O'Brien - Comparing SharePoint add-ins (apps) with Office 365 apps
Rev Your Engines - SharePoint Performance Best Practices
Real World SharePoint Add-In Development
Develop a SharePoint App in 45 Minutes
Top 10 sharepoint interview questions with answers
10 Reasons your SharePoint Migration Failed
10 Reasons to Avoid Folders in SharePoint 2013/2010
10 Best SharePoint Features You’ve Never Used (But Should)
Databases, CAP, ACID, BASE, NoSQL... oh my!
SharePoint Permissions Worst Practices
10 Best Productivity Features in SharePoint 2013
Ad

Similar to Northeastern DB Class Introduction to Marklogic NoSQL april 2016 (20)

PDF
Data Lake, Virtual Database, or Data Hub - How to Choose?
PDF
The Value of Metadata
PPTX
Operational Analytics Using Spark and NoSQL Data Stores
PDF
Cwin16 - Lyon - partner mark logic - the rise of nosql
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
PDF
The Impact of Smart Content
PPTX
Enabling the Real Time Analytical Enterprise
PDF
A6 big data_in_the_cloud
PDF
A New Way of Thinking About MDM
PDF
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PDF
Insight Platforms Accelerate Digital Transformation
PPTX
MongoDB & Hadoop - Understanding Your Big Data
PDF
Data-Centric Infrastructure for Agile Development
PDF
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
PDF
Foundation for Success: How Big Data Fits in an Information Architecture
PDF
Future of Data Strategy (ASEAN)
PDF
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
PDF
Mastering Your Customer Data on Apache Spark by Elliott Cordo
PDF
Key Methodologies for Migrating from Oracle to Postgres
 
Data Lake, Virtual Database, or Data Hub - How to Choose?
The Value of Metadata
Operational Analytics Using Spark and NoSQL Data Stores
Cwin16 - Lyon - partner mark logic - the rise of nosql
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
The Impact of Smart Content
Enabling the Real Time Analytical Enterprise
A6 big data_in_the_cloud
A New Way of Thinking About MDM
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
The Maturity Model: Taking the Growing Pains Out of Hadoop
Insight Platforms Accelerate Digital Transformation
MongoDB & Hadoop - Understanding Your Big Data
Data-Centric Infrastructure for Agile Development
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
Foundation for Success: How Big Data Fits in an Information Architecture
Future of Data Strategy (ASEAN)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Key Methodologies for Migrating from Oracle to Postgres
 

More from Matt Turner (20)

PPTX
MarkLogic MESA Smart Content Feb 2016 FINAL.pptx
PPTX
How a Major Bank modernized wholesale banking to deliver self-service with ...
PPTX
Presented at CDOIQ 2024: How to Unlock Data for AI by Breaking Through the Da...
PPTX
Data In Action: Business Value of Data
PPTX
Data2030 Summit MEA: Data Chaos to Data Culture March 2023
PPTX
Data2030 Summit Data Megatrends Turner Sept 2022.pptx
PPTX
From Data Chaos to Data Culture
PPTX
How Data is Driving AI Innovation
PPTX
Principles of Information Access
PPTX
Securing the Right Metadata and Making it Work for You
PPTX
Operationalize Your Data and Lead Your Business Transformation
PPTX
Three Cool Things You Can Do with Standards
PPTX
Mark logic Industrialize Your Data IOT Berlin Sept 2019
PPTX
BBC olympics 2012 experience oct18
PPTX
Operationalize Your Linked Data
PPTX
Smart Content Summit: Unlock the Value with the Right Data Pattern
PPTX
Data Security and the Hard Outer Shell
PPTX
Media publishing meetup ocean of data july 2016
PPTX
Metadata Madness: Semantics Takes Center Stage
PPTX
New Trends in Data Management in the Information Industries
MarkLogic MESA Smart Content Feb 2016 FINAL.pptx
How a Major Bank modernized wholesale banking to deliver self-service with ...
Presented at CDOIQ 2024: How to Unlock Data for AI by Breaking Through the Da...
Data In Action: Business Value of Data
Data2030 Summit MEA: Data Chaos to Data Culture March 2023
Data2030 Summit Data Megatrends Turner Sept 2022.pptx
From Data Chaos to Data Culture
How Data is Driving AI Innovation
Principles of Information Access
Securing the Right Metadata and Making it Work for You
Operationalize Your Data and Lead Your Business Transformation
Three Cool Things You Can Do with Standards
Mark logic Industrialize Your Data IOT Berlin Sept 2019
BBC olympics 2012 experience oct18
Operationalize Your Linked Data
Smart Content Summit: Unlock the Value with the Right Data Pattern
Data Security and the Hard Outer Shell
Media publishing meetup ocean of data july 2016
Metadata Madness: Semantics Takes Center Stage
New Trends in Data Management in the Information Industries

Recently uploaded (20)

PDF
Softaken Excel to vCard Converter Software.pdf
PDF
AI in Product Development-omnex systems
PDF
medical staffing services at VALiNTRY
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
ai tools demonstartion for schools and inter college
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Transform Your Business with a Software ERP System
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPT
Introduction Database Management System for Course Database
PDF
Nekopoi APK 2025 free lastest update
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
Softaken Excel to vCard Converter Software.pdf
AI in Product Development-omnex systems
medical staffing services at VALiNTRY
CHAPTER 2 - PM Management and IT Context
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
VVF-Customer-Presentation2025-Ver1.9.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Odoo POS Development Services by CandidRoot Solutions
Design an Analysis of Algorithms I-SECS-1021-03
Operating system designcfffgfgggggggvggggggggg
ai tools demonstartion for schools and inter college
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Transform Your Business with a Software ERP System
Odoo Companies in India – Driving Business Transformation.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Introduction Database Management System for Course Database
Nekopoi APK 2025 free lastest update
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
How to Migrate SBCGlobal Email to Yahoo Easily

Northeastern DB Class Introduction to Marklogic NoSQL april 2016

  • 1. © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Matt Turner, CTO Media & Entertainment Introduction to MarkLogic NoSQL
  • 2. SLIDE: 2 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Outline • Something’s Happening Here • The Old and the New  Data models  Data access • Discussion
  • 4. SLIDE: 4 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Stress on Traditional Data Approaches Complexity  Structured  Unstructured  Semi-structured  Raw  Streams of data  Constant change  Agile analytics  Fail-fast Volume Velocity Variety Volume • Many months of system log files • Every tweet • Years of articles • Relative to current size of operation Velocity • Streams of customer feedback to determine sentiment • Real-time risk analysis • Real-time Business Intelligence Variety • Database feeds • Raw logs • Web crawl data • Articles • Multi-media • ALSO: questions! Examples Big Data: Gartner coined the “three V’s” description  Data: Petabyte scale  Nodes: Thousands
  • 5. SLIDE: 5 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
  • 6. SLIDE: 6 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Stress on Traditional Data Approaches Complexity  Structured  Unstructured  Semi-structured  Raw  Streams of data  Constant change  Agile analytics  Fail-fast Volume Velocity Variety Volume • Many months of system log files • Every tweet • Years of articles • Relative to current size of operation Velocity • Streams of customer feedback to determine sentiment • Real-time risk analysis • Real-time Business Intelligence Variety • Database feeds • Raw logs • Web crawl data • Articles • Multi-media • ALSO: questions! Examples Big Data: Gartner coined the “three V’s” description  Data: Petabyte scale  Nodes: Thousands
  • 7. SLIDE: 7 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Leader Quadrant Online Transaction Processing RDBS (May 2002)
  • 8. SLIDE: 8 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Leader Quadrant Operational DBMS (Oct 2014) Traditional Mainstays Upstarts Storm the Field
  • 9. SLIDE: 9 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. MarkLogic: Best Operational Data Warehouse (Aug 2014)
  • 10. SLIDE: 10 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Unified, Actionable 360 View of Data WHAT BUSINESSES WANT
  • 13. SLIDE: 13 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Is In Silos  Data is spread across disconnected databases  M&A outpaces the speed of data integration  Data needs to be delivered in real time THE REALITY
  • 14. SLIDE: 14 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. 80% OF TIME By data scientists just wrangling data WASTED In 2015 on creating relational data silos Of data warehouse projects is on ETL The Massive Cost of Integrating Data From Silos 36BILLION IN SPENDING $% OF THE COST60
  • 15. SLIDE: 15 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Relational Databases with ETL Sacrifice Agility, Timeliness, and Cost  All future data needs must be predictable  New SQL queries require database re-indexing  Siloed database changes require ETL re-writes THE IT CHALLENGE ETL OLTP ARCHIVES ETL ETL ETL DATA MARTS ETL WAREHOUSE REFERENCE DATA
  • 16. SLIDE: 16 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. OLTP Warehouse Data MartsArchives “Unstructured” “ ” Video Audio Signals, Logs, Streams Social Documents, Messages { } Metadata Search🔍 Reference Data It’s Complicated
  • 17. The OLD: Let’s Design the Application (And pretend it’s the 80s)
  • 18. SLIDE: 18 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Name Hair Colour Fulltime Employee? Car type Paul Blond Y Alex Auburn Y Porsche Dom Black Y Hummer name hr_colr flltme_empl car_tp Let’s Begin… Cast Members { How many characters wide should this be? 8? 16? 32? { { {
  • 19. SLIDE: 19 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. New Schema – Extend Ours! name hr_colr flltme_empl car_tp Paul Blond Y Alex Auburn Y porsche Dom Black Y Hummer house_road town city postcode 11d Yonge Pk Finsbury London N4 3NU Reading London N43 • Hang on • If this table had 10k rows, issues? • First create new big schema • Then import rows across • Delete old table? • Maybe not, legacy programs might use it! • What if we want to select “Road” only? • Split out again • More extensions? • House name and number?
  • 20. SLIDE: 20 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. There is another way! Create a new table and point to it from the old one! name hr_colr flltme_empl car_tp Address Paul Blond Y Alex Auburn Y porsche Dom Black Y Hummer house_road town city postcode 11d Yonge Pk Finsbury London N4 3NU Reading London N43
  • 21. SLIDE: 21 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. …
  • 22. SLIDE: 22 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Now Let’s Store Something More . . . Complicated Transcript / Book Info Title = “NL April 14” Author = “SNL Cast” Section • Chapter Page Paragraph = “I love penguins because…” Page Paragraph = “On the subject of food…” • Chapter Page Section • Chapter • Chapter • Chapter • Paragraph • Paragraph title author Section I love Penguins S. Lion Issues with Sections? How many columns?
  • 23. SLIDE: 23 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Don’t Forget Taxonomies Hierarchical levels of metadata Fixed to a specific business purpose  Can’t be re-used in new contexts Each record can only be associated with one level  How many category fields? Category Feature Series Action Drama Comedy Documentary … Cable Broadcast Drama Comedy … Action Drama Family Documentary …
  • 24. SLIDE: 24 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Result Requires everything to be defined up front Data to be transformed and processed to fit the system Needs to be redone as information changes Costly to create, maintain and only captures part of the data! Title ProductionDate Category AssetType Length Film1 3/1/14 Feature HD Master 2:40 Show1 6/4/13 Series HD720 0:40 Film2 6/4/05 Feature Archive 1:55 Category Feature Series Action Drama Comedy Documentary … Cable Broadcast Drama Comedy … Action Drama Family Documentary … ?
  • 26. SLIDE: 26 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. OLTP Warehouse Data MartsArchives “Unstructured” “ ” Video Audio Signals, Logs, Streams Social Documents, Messages { } Metadata Search🔍 Reference Data *NOTE: We only did this little bit! Remember?
  • 27. SLIDE: 27 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. The NEW! Enter NoSQL Category Description Examples Key-value • Persistent hash-table “on steroids” • Typically no single modeling paradigm (e.g. columns can be primitives, data structures, binaries, etc.) • Amazon DynamoDB • Redis • Riak Columnar • Similar to K-V in some ways • Column may be arranged in groups (families) • Data types are usually the expected “primitives” • Works well with “value crunching” (e.g. time series) • HBase • Cassandra Document • URI-mapped (i.e. keyed) documents in lieu of rows • Supports structured and unstructured content • Nested context • MarkLogic • MongoDB • Couchbase Graph • Deals with inter-object graphs • Relationship oriented • Think object cache (with pointers) “on steroids” • Neo4J • AllegroGraph • InfoGrid
  • 28. SLIDE: 28 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Database That Integrates Data Better, Faster, with Less Cost THE DESIRED SOLUTION
  • 29. SLIDE: 29 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. The MarkLogic Alternative An Operational and Transactional Enterprise NoSQL Database  Data ingested as is (no ETL)  Structured and unstructured data  Data and metadata together  Adapts to changing data and changing data structures EASY TO GET DATA IN Flexible Data Model  Index once and query endlessly  Real-time and lightning fast  Query across JSON, XML, text, geospatial, and semantic triples in one database EASY TO GET DATA OUT Ask Anything Universal Index  Reliable data and transactions (100% ACID compliant)  Out-of-the-box automatic failover, replication, and backup/recovery  Enterprise-grade security and Common Criteria certified 100% TRUSTED Enterprise Ready
  • 30. SLIDE: 30 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. The SNL App
  • 31. SLIDE: 31 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. No need to define up front Matched to complex content and metadata data modeling Data is managed in its most accessible, natural form XML, JSON, RDF, geospatial Flexible Data Model Schema-agnostic, structure-aware
  • 32. SLIDE: 32 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Instead of THIS
  • 33. SLIDE: 33 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Do it like THIS!
  • 34. SLIDE: 34 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Search and Query Search to find answers in documents, relationships, and metadata  Automatic indexing of every data value, text and data structure  Specialized indexes for data values (analytics, facets, sorting), geospatial and triples  All updated in the context of ACID transactions to ensure data integrity and real-time access  Accessible via fully programmable search API with full- text search, type-ahead suggestions, facets, snippeting, highlighted search terms, proximity boosting, relevance ranking, and language support JavaScript XQuery SPARQL Rich Query Capability In-database MapReduce Full-text Search Semantic Search Geospatial Search
  • 38. Do domestic dogs interpret pointing as a command? Animal Cognition (2012): 1-12 , November 09, 2012 By Scheider, Linda; Kaminski, Juliane; Call, Josep; Tomasello, Michael Context!
  • 39. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 39 Machines Don’t Get Context . . . Manu Sporny Founder/CEO - Digital Bazaar, Inc. http://www.cambridgesemantics.com/semantic-university/what-is-linked-data
  • 40. SLIDE: 40 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Enter Semantics! Manu Sporny Founder/CEO - Digital Bazaar, Inc. http://www.cambridgesemantics.com/semantic-university/what-is-linked-data
  • 41. SLIDE: 41 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics Enterprise triple store, document store, and database combined  Store and query billions of facts and relationships  Leverage ontologies for domain and role specific context access to data and documents  Efficient metadata management with relationships to ontologies  Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces
  • 42. SLIDE: 42 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics to Model Relationships Data model to manage relationships and link together data ‘triples’ describe single facts Collections of facts describe complex real-world scenarios “Chevy” ”NBC" isOn ”SNL" isOn isOn !
  • 43. SLIDE: 43 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Ontologies Instead of Categories Actually model information as it is in the real world Not limited to a single purpose  Ontologies for all categories of metadata  Even ‘impossible’ categories like fictional worlds
  • 44. SLIDE: 44 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. NoSQL and Semantics!
  • 45. SLIDE: 45 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Real-Time Analytics Range indexes can be used for  Faceted search  Aggregation and visualization  Analytics… …including custom user-defined functions  Co-occurrence  SQL, ODBC, and BI integration
  • 46. SLIDE: 46 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Scalability, Elasticity and Cloud Massive enterprise scalability and elasticity  Scale horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documents  Process thousands of multi-document multi- statement transactions per second  Start small and scale up or down to meet capacity and performance demands without over- provisioning or over-spending  Fully cloud enabled for automated deployment and management on EC2  Leverage dynamic configurations with Tiered Storage D-NODE D-NODE E-NODE E-NODE D-NODE Result: Enterprise-ready to power mission critical products
  • 47. SLIDE: 47 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use Case: Deliver Better Information Present information based on relationships Go beyond traditional technology with depth of content Drive efficiency using semantic approach to tagging
  • 48. SLIDE: 48 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use Case: Go Beyond Search • Concept instead of keyword search • Related content and information drive the content discovery and new interactions  SNL40 continuous viewing • Dynamically tailored to the users specific attributes or activity
  • 49. SLIDE: 49 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use Case: Integrate Data • Integrate data across the automoti
  • 51. SLIDE: 51 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics-driven search Talent Kristen Wiig Acted in Episode 4 Anne Hathaway and Killers Part of Played Character Maharelle Sister Season 34 Segment The Lawrence Welk Show Aired on Date 10/4/08 Era Acted in Includes Part of
  • 52. SLIDE: 52 © COPYRIGHT 2016 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Intelligent recommendation engine

Editor's Notes

  • #4: 3
  • #12: 11
  • #15: Sources: 80% of time spent by data scientists on just wrangling data “Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.” Steve Lohr. “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights.” The New York Times. August 17, 2014. <http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html> 60% of the cost of data warehouse projects is on ETL “In a report sponsored by Informatica, analysts at TDWI estimate between 60% and 80% of the total cost of a data warehouse project may be taken up by ETL software and processes.” $36 Billion in spending on database management systems in 2015 Gartner. Forecast: Enterprise Software Markets, Worldwide, 2011-2018, 4Q14. 2014. <https://www.gartner.com/doc/2944023/forecast-enterprise-software-markets-worldwide>