New Trends in Data Management in the Information Industries

Download as PPTX, PDF

1 like913 views

The document discusses new trends and challenges in data management within the information industries, emphasizing the shift towards schema-agnostic, flexible data models that can handle various data types at scale. It highlights the importance of improving data insight accessibility, reducing operational risks, and optimizing decision-making through innovative applications. Additionally, it outlines the capabilities of MarkLogic's enterprise NoSQL database platform, which integrates search and analytics to enhance data usability and management.

Technology

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
New Trends in Data Management in the
Information Industries
Presented by: Matt Turner, CTO Media and Publishing
February, 2015

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 2
Agenda
 Introduction
 Information Industries Trends
 Top 5 Challenges in the Industry
 New Approaches and Solutions

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 3
Hierarchical Era
For your application
data!
• Application- and
hardware-specific
Data Drives the Need for a New Generation Database
Relational Era
“For all your structured
data!”
• Normalized, tabular
model
• Application-
independent query
• User control
Any Structure Era
“For all your data!”
• Schema-agnostic
• Massive scale
• Query and search
• Analytics
• Heterogeneous data
• Faster time-to-results

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 4
Harnessing Data & Reimagining Applications
 Reduce Risk
 Manage Compliance
 Create New Value from Data
 Optimize Operations
 Lower TCO / Better IT Economics
 Better Decision-making

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 5
MarkLogic:
Best Operational
Data Warehouse
(Aug 2014)

Enterprise NoSQL Database Platform
Flexible Data
Model
Store and manage
JSON, XML, RDF,
and Geospatial data
with a document-
centric, schema-
agnostic database
Scalability
and Elasticity
ACID
Transactions
Search and
Query
Semantics Certified
Security
Hadoop
Integration
Scale to
petabytes of data
without over-
provisioning or
over-spending
Avoid data loss,
data corruption,
and stale
reads—even at
speed and scale
Lightning fast,
sophisticated,
sub-second
search and
query across all
of your data
Store and query
linked data as
RDF and
SPARQL
Make your
Hadoop better
by connecting
it to MarkLogic
Government-
grade, granular,
role-based
security

DECADE+ OF INNOVATION
Working Together To Reimagine Applications

New Trends in Data Management in the Information Industries

FROM PUBLISHERS TO
INFORMATION PROVIDERS

TRADITIONAL PUBLISHING
FORM BASED
PRODUCTS
DEDICATED
PRODUCT
INFRASTRUCTURE
Product A Dedicated
Infrastructure
(database + search engine)
Product B
Product C
Company Data
Industry Data
Filings
Reports

INFORMATION DELIVERY PLATFORM
FORMAT
INDEPENDEN
T
INFORMATION
CENTRIC
DYNAMIC
DELIVERY
Company Data
Industry Data
Filings
Reports
Deliver the right content,
to the right user,
in the right format,
in real time

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 13
Top 5 Requirements for Information Providers
Getting data IN fast isn’t the problem – it’s getting insights OUT Faster!
Data is complex – but users want complexity hidden!
Not everyone has permission to access all the data…
Repurpose, repurpose, repurpose. Repeat
Once you attract them – you must be reliable
1
2
3
4
5

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 14
Traditional Technology
 Rows and columns for content strip
information
Title Publication
Date
Category Abstract Section Section 2?
Science
Article 1
3/1/14 Biology Abstract
text . . .
Section
text
Section text
Research
Book
6/4/13 Surgery Abstract
text . . .
Section
text
Section text
Science
Article 2
6/4/05 Chemistry Abstract
text . . .
Section
text
Section text
?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 15
Traditional Technology
 Rows and columns for content strip
information
 Hierarchical taxonomies overlap and don’t
capture the complexity
Title Publication
Date
Category Abstract Section Section 2?
Science
Article 1
3/1/14 Biology Abstract
text . . .
Section
text
Section text
Research
Book
6/4/13 Surgery Abstract
text . . .
Section
text
Section text
Science
Article 2
6/4/05 Chemistry Abstract
text . . .
Section
text
Section text
?
Research
Medicine
Science
Surgery
Orthopedics
Cell Biology
Biochemistry
….
Life Sciences
Biomedical
Sciences
Cell Biology
Biology
Biochemistry
…Chemistry
Microbiology
Biochemistry
…
?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 16
8. Develop, integrate and test
infrastructure & applications
4. Define schemas, indexes
and services
1. Design infrastructure,
services & applications 2. Analyze Data Formats
Articles Books
Industry
Data
Reports
5. Build databases,
middleware and services
infrastructure
6. Define & implement ETL
processes
The Functional Solution Silos & Treadmill
7. Load and normalize data
3. Define queries & Service
APIs
?
?

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 17
Hierarchical Era
For your application
data!
• Application- and
hardware-specific
Data Drives the Need for a New Generation Database
Relational Era
“For all your structured
data!”
• Normalized, tabular
model
• Application-
independent query
• User control
Any Structure Era
“For all your data!”
• Schema-agnostic
• Massive scale
• Query and search
• Analytics
• Heterogeneous data
• Faster time-to-results

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18
 No need to define up front
 Matched to complex content and
metadata data modeling
 Data is managed in its most
accessible, natural form
 XML, JSON, RDF, geospatial
Flexible Data Model
Schema-agnostic, structure-aware
Result: Product content and data from
multiple sources available to be tailored to
any purpose and product

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19
Search and Query
Search to find answers in documents, relationships, and metadata
 Automatic indexing of every data value, text and data
structure
 Specialized indexes for data values (analytics, facets,
sorting), geospatial and triples
 All updated in the context of ACID transactions to
ensure data integrity and real-time access
 Accessible via fully programmable search API with full-
text search, type-ahead suggestions, facets, snippeting,
highlighted search terms, proximity boosting, relevance
ranking, and language support
JavaScript XQuery SPARQL
Rich Query
Capability
In-database
MapReduce
Full-text
Search
Semantic
Search
Geospatial
Search
Result: simplified architecture with a single
component for search and database

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20
Semantics
Enterprise triple store, document store, and database combined
 Store and query billions of facts and relationships
 Leverage ontologies for domain and role specific
context access to data and documents
 Efficient metadata management with relationships
to ontologies
 Standards-based for ease of use and integration
– RDF, SPARQL, and standard REST
interfaces

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21
Documents, data and triples provide complete picture of content
Semantics
Result: context to tailor information to your user’s role, activity and location

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22
Scalability, Elasticity and Cloud
Massive enterprise scalability and elasticity
 Scale horizontally in clusters on commodity
hardware to hundreds of nodes, petabytes of
data, and billions of documents
 Process thousands of multi-document multi-
statement transactions per second
 Start small and scale up or down to meet capacity
and performance demands without over-
provisioning or over-spending
 Fully cloud enabled for automated deployment
and management on EC2
 Leverage dynamic configurations with Tiered
Storage
D-NODE D-NODE
E-NODE E-NODE
D-NODE
Result: Enterprise-ready to power mission critical products

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 23
8. Develop, integrate and test
infrastructure & applications
1. Design infrastructure,
services & applications
With MarkLogic…
3. Define queries & Service
APIs
?
?
When something changes.... It’s no big deal

INFORMATION DELIVERY PLATFORM EXTENDED
Content and
Customers
Complete Picture of
Business
Metrics Driving Product
Development and
Sales
Company Data
Industry Data
Filings
Reports
Catalogs Lists
Authors Institutions Social Media +
Usage

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 25
Use Case: Master Data
 Foundational data for
digital products
 Industry topology and
trends to drive innovation
 User and content metrics
to drive product
development

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 26
Use Case: Enhance Digital Products
 Present information based on
relationships
 Go beyond traditional technology with
depth of content
 Drive efficiency using semantic
approach to tagging

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 27
Use Case: Go Beyond Search
 Concept instead of keyword search
 Related content and information
drive the content discovery and new
interactions
– SNL40 continuous viewing
 Dynamically tailored to the users
specific attributes or activity

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 28
Use Case: ‘Everything Else’
 Tailor views and access to
information with multiple ontologies
 Example: follow scientist from
research to the workbench to
conferences to publishing
 Content delivery tailored to the
users role, activity and location

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 29
Top 5 Requirements for Information Providers
Getting data IN fast isn’t the problem – it’s getting insights OUT Faster!
Data is complex – but users want complexity hidden!
Not everyone has permission to access all the data…
Repurpose, repurpose, repurpose. Repeat
Once you attract them – you must be reliable
1
2
3
4
5

New Trends in Data Management in the Information Industries

3. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 3 Hierarchical Era For your application data! • Application- and hardware-specific Data Drives the Need for a New Generation Database Relational Era “For all your structured data!” • Normalized, tabular model • Application- independent query • User control Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Heterogeneous data • Faster time-to-results

4. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 4 Harnessing Data & Reimagining Applications  Reduce Risk  Manage Compliance  Create New Value from Data  Optimize Operations  Lower TCO / Better IT Economics  Better Decision-making

6. Enterprise NoSQL Database Platform Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document- centric, schema- agnostic database Scalability and Elasticity ACID Transactions Search and Query Semantics Certified Security Hadoop Integration Scale to petabytes of data without over- provisioning or over-spending Avoid data loss, data corruption, and stale reads—even at speed and scale Lightning fast, sophisticated, sub-second search and query across all of your data Store and query linked data as RDF and SPARQL Make your Hadoop better by connecting it to MarkLogic Government- grade, granular, role-based security

7. DECADE+ OF INNOVATION Working Together To Reimagine Applications

8. PUBLISHING: CHANGE IS THE ONLY CONSTANT

10. FROM PUBLISHERS TO INFORMATION PROVIDERS

11. TRADITIONAL PUBLISHING FORM BASED PRODUCTS DEDICATED PRODUCT INFRASTRUCTURE Product A Dedicated Infrastructure (database + search engine) Product B Product C Company Data Industry Data Filings Reports

12. INFORMATION DELIVERY PLATFORM FORMAT INDEPENDEN T INFORMATION CENTRIC DYNAMIC DELIVERY Company Data Industry Data Filings Reports Deliver the right content, to the right user, in the right format, in real time

13. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 13 Top 5 Requirements for Information Providers Getting data IN fast isn’t the problem – it’s getting insights OUT Faster! Data is complex – but users want complexity hidden! Not everyone has permission to access all the data… Repurpose, repurpose, repurpose. Repeat Once you attract them – you must be reliable 1 2 3 4 5

14. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 14 Traditional Technology  Rows and columns for content strip information Title Publication Date Category Abstract Section Section 2? Science Article 1 3/1/14 Biology Abstract text . . . Section text Section text Research Book 6/4/13 Surgery Abstract text . . . Section text Section text Science Article 2 6/4/05 Chemistry Abstract text . . . Section text Section text ?

15. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 15 Traditional Technology  Rows and columns for content strip information  Hierarchical taxonomies overlap and don’t capture the complexity Title Publication Date Category Abstract Section Section 2? Science Article 1 3/1/14 Biology Abstract text . . . Section text Section text Research Book 6/4/13 Surgery Abstract text . . . Section text Section text Science Article 2 6/4/05 Chemistry Abstract text . . . Section text Section text ? Research Medicine Science Surgery Orthopedics Cell Biology Biochemistry …. Life Sciences Biomedical Sciences Cell Biology Biology Biochemistry …Chemistry Microbiology Biochemistry … ?

16. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 16 8. Develop, integrate and test infrastructure & applications 4. Define schemas, indexes and services 1. Design infrastructure, services & applications 2. Analyze Data Formats Articles Books Industry Data Reports 5. Build databases, middleware and services infrastructure 6. Define & implement ETL processes The Functional Solution Silos & Treadmill 7. Load and normalize data 3. Define queries & Service APIs ? ?

17. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 17 Hierarchical Era For your application data! • Application- and hardware-specific Data Drives the Need for a New Generation Database Relational Era “For all your structured data!” • Normalized, tabular model • Application- independent query • User control Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Heterogeneous data • Faster time-to-results

18. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18  No need to define up front  Matched to complex content and metadata data modeling  Data is managed in its most accessible, natural form  XML, JSON, RDF, geospatial Flexible Data Model Schema-agnostic, structure-aware Result: Product content and data from multiple sources available to be tailored to any purpose and product

19. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19 Search and Query Search to find answers in documents, relationships, and metadata  Automatic indexing of every data value, text and data structure  Specialized indexes for data values (analytics, facets, sorting), geospatial and triples  All updated in the context of ACID transactions to ensure data integrity and real-time access  Accessible via fully programmable search API with full- text search, type-ahead suggestions, facets, snippeting, highlighted search terms, proximity boosting, relevance ranking, and language support JavaScript XQuery SPARQL Rich Query Capability In-database MapReduce Full-text Search Semantic Search Geospatial Search Result: simplified architecture with a single component for search and database

20. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20 Semantics Enterprise triple store, document store, and database combined  Store and query billions of facts and relationships  Leverage ontologies for domain and role specific context access to data and documents  Efficient metadata management with relationships to ontologies  Standards-based for ease of use and integration – RDF, SPARQL, and standard REST interfaces

21. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21 Documents, data and triples provide complete picture of content Semantics Result: context to tailor information to your user’s role, activity and location

22. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22 Scalability, Elasticity and Cloud Massive enterprise scalability and elasticity  Scale horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documents  Process thousands of multi-document multi- statement transactions per second  Start small and scale up or down to meet capacity and performance demands without over- provisioning or over-spending  Fully cloud enabled for automated deployment and management on EC2  Leverage dynamic configurations with Tiered Storage D-NODE D-NODE E-NODE E-NODE D-NODE Result: Enterprise-ready to power mission critical products

23. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 23 8. Develop, integrate and test infrastructure & applications 1. Design infrastructure, services & applications With MarkLogic… 3. Define queries & Service APIs ? ? When something changes.... It’s no big deal

24. INFORMATION DELIVERY PLATFORM EXTENDED Content and Customers Complete Picture of Business Metrics Driving Product Development and Sales Company Data Industry Data Filings Reports Catalogs Lists Authors Institutions Social Media + Usage

25. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 25 Use Case: Master Data  Foundational data for digital products  Industry topology and trends to drive innovation  User and content metrics to drive product development

26. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 26 Use Case: Enhance Digital Products  Present information based on relationships  Go beyond traditional technology with depth of content  Drive efficiency using semantic approach to tagging

27. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 27 Use Case: Go Beyond Search  Concept instead of keyword search  Related content and information drive the content discovery and new interactions – SNL40 continuous viewing  Dynamically tailored to the users specific attributes or activity

28. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 28 Use Case: ‘Everything Else’  Tailor views and access to information with multiple ontologies  Example: follow scientist from research to the workbench to conferences to publishing  Content delivery tailored to the users role, activity and location

29. © COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 29 Top 5 Requirements for Information Providers Getting data IN fast isn’t the problem – it’s getting insights OUT Faster! Data is complex – but users want complexity hidden! Not everyone has permission to access all the data… Repurpose, repurpose, repurpose. Repeat Once you attract them – you must be reliable 1 2 3 4 5

30. Any Questions?

Editor's Notes

#7: These are the key features to focus on when introducing MarkLogic, and each of these is covered in this deck. The previous slide showed ALL of the features that MarkLogic includes, but here we are focusing on the top 7 key features to help explain what MarkLogic is, and what makes the technology so unique and powerful. There is no other database in the world that has this list of features. To start, if you only know 2 things about MarkLogic, it’s the flexible data model and search and query. These two features are core to how MarkLogic works, and underpin a lot of the other features such as MarkLogic’s ability to scale while still maintaining complex and consistent transactions. In MarkLogic 7 we introduced semantics. MarkLogic is a native document store, and also a native triple store. Triples are stored as RDF and queried with SPARQL—formats defined as W3C standard for linked data. With semantics, you can store and query billions of facts and relationships, and even infer new facts. These facts and relationships provide context for better search and provide flexible data modeling to integrate and link data from different sources. Scalability and elasticity, ACID transactions, and security are three of MarkLogic’s key “enterprise” features to ensure you can easily store and manage all of your data while not breaking the bank, losing any data, or allowing data to get into the wrong hands. It turns out that these features are not to be taken for granted, because they are really hard to do right. MarkLogic has spent a decade building a hardened, trusted platform, and these features are some of the reasons why MarkLogic is the leading enterprise NoSQL database. Lastly, MarkLogic integrates easily with Hadoop and will make Hadoop better. Hadoop has gotten traction lately but most people realize now that it’s not a database. It’s a great place to put your data, and MarkLogic has a lot of unique ways for doing more with your data if you currently have it in Hadoop.
#8: [Celebrate the success of what our customers have been able to achieve over the last decade] MarkLogic recently celebrated 10 years on the market. And, it’s been 10 years, working side by side with publishers and media, to reimagine what publishing is. Over 10 years, your businesses have changed dramatically – and not surprisingly – with the web and kindles and ipads, it was digitize or die. Because you were forced into reimainging your business and the technology that drives it, Publishing and media have lead the way in doing more with data. I’m often surprised at how many other industries are just waking up to the notion that there are other ways to store and use data rather than the traditional relational way they’ve been doing it for 30 years. Rather than treating your content as flat files, or cramming it into database cells, you’ve been using the right tool for the job – which has allowed you to do more with your content, repurpose it, be agile, move quickly and create and deliver products fast. Amazing to see how many organizations stick to using the hammer to get the screw out of the board. With the right tools, you can do more. You can create more. You can repurpose more. Development cycles go from Months not years - Handfuls of developers not armies Re-emphasis the benefit of using the right tool.
#9: Or Top 10 Search Requirements we hear from today’s most successful information providers… This is a subject we love – change is the only constant and this is what we ene
#10: “New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7. Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features** MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more. Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility** Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast. Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements** MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
#11: I think about this in terms of the move to information provider Putting the value of information in front of the form of delivery And I’m not alone
#12: “New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7. Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features** MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more. Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility** Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast. Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements** MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
#13: “New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7. Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features** MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more. Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility** Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast. Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements** MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
#14: *Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document. With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file. The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried. More Information on Schemas A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.
#17: There is a change control cost in between each one of these steps – not just doing the same job multiple times but also incurring change costs! Bad guy = all the tools you’re using to do this – RDBMS, ETL, etc. Change control processes are what’s stopping you from being productive! Lots of paperwork involved…
#20: Short Description: MarkLogic has built-in search and query capabilities. MarkLogic’s sophisticated indexes provide the power to search and query across hundreds of terabytes worth of documents, relationships, and metadata with the flexibility of multiple query languages. *Note: Server-side JavaScript is a MarkLogic 8 feature. Longer Description: Most databases separate search and query into two distinct functions. MarkLogic changes that, starting with the idea that you should be able to ask your database what’s inside of it. This means not having to bolt-on a separate search solution, and not having to worry about when and how to build the right indexes, or how those indexes can be utilized to perform certain queries. MarkLogic is designed with over 30 sophisticated indexes that can be adjusted and tuned to make even the most complex queries as fast as possible without requiring data duplication, and data is ingested as-is and immediately searchable. The sophisticated indexes mean that developers can ask harder questions and get faster responses. MarkLogic uses multiple query languages for each data types (JavaScript for JSON, XQuery for XML, and SPARQL for RDF). These query languages enable full-text search across unstructured content, rich query capability needed to make complex queries fast, Geospatial search for multiple formats and types (including connections to ESRI ArcGIS and Google Maps), Semantic search across linked data (similar to graph search, and MarkLogic 8 even includes inferencing), and also in-database MapReduce for running massive parallelized queries. One of the unique capabilities of MarkLogic is that the indexes are designed so that developers can write complex queries that run across multiple indexes without causing a performance bottleneck. With MarkLogic, you can query data as-is, or transform and manage data in-place—all with the reliability of a transactional system that maintains full ACID properties. But, it’s important not to overlook the enterprise search experience. Many of MarkLogic’s first customers such as Elsevier were publishers who just needed a way to quickly search across massive amounts of content. The user experience is not too different from that of any major Web search engines, and in fact, MarkLogic’s founder Christopher Lindblad came from the search world, having been the architect on Ultraseek Server, an early enterprise search application developed at Infoseek. MarkLogic has many of the same features that user’s now expect in an enterprise search application, such as type-ahead suggestions, relevance ranking, and snippeting. MarkLogic also includes language support for over 200 languages, including advanced support with tokenization, stemming, and collation for some of the most common languages. And, just to reiterate, all of this comes built-in with MarkLogic—you don’t have to bolt-on any other solution. This simplifies your architecture, and makes things incredibly easy for DBAs and developers. Having integrated search means one less additional platform to worry about. Developers don’t have to use a “lite” version of other search software during testing and eliminate additional, and unnecessary ETL procedures, which reduces risk. System-wide setting such as security are setup once and applied everywhere. If permissions are updated on documents, those updates are reflected automatically and immediately in searches.
#21: Short Description: Store RDF triples and query them using SPARQL—providing meaning and context to your data using the only database that can handle a combination of documents, data, and triples. *Note: MarkLogic 8 extends the use of standard SPARQL so you can do analytics (aggregates) over triples; explore semantics graphs using property paths; and update semantic triples; all using the standard SPARQL 1.1 language over standard protocols. In addition, MarkLogic 8 lets you discover new facts and relationships with automatic inference. Long Description: Semantics provides a universal framework to describe and link different data so that it can be better understood and searched holistically, allowing both people and computers to see and discover relationships in the data. MarkLogic provides the capability to store and query linked data, including a native RDF Triple Store for storing and managing hundreds of billions of triples that can be queried with SPARQL—all right inside MarkLogic. Not only that, but MarkLogic combines the triple store with its document store, providing the capability to store and manage documents, data, and triples together so you can discover, understand, and make decisions in context. Script for Presenting: Enterprise triple store, document store, database …combined MarkLogic Semantics adds the capabilities of an Enterprise Triple Store to its document store and database. Store and query billions of facts and relationships; infer new facts The triple store lets you store and query billions of facts (assertions) and relationships. Facts/relationships are represented as triples, made up of subject, predicate, and object For example, we can represent the facts "John lives in London" and "London is in England" as triples like this: Subject Predicate Object John livesIn London London isIn England We can also infer new facts. From what we (as humans) know about "livesIn" and "isIn", we can infer that John lives in England. The triple store can do that too – you can specify rules that say exactly what a predicate means, and the triple store will infer new facts when querying. Many of these rules are specified in the RDFS and OWL specifications, and can be applied in MarkLogic queries out of the box. Facts and relationships provide context for better search Imagine how much better a search application can be if the app has access to billions of facts and relationships. The app can leverage those facts in several ways (see future slide): Find more relevant information by expanding the terms the user typed in Present more/better information about whatever the user is searching for Publish information dynamically to web or print or mobile Flexible data modeling - integrate and link data from different sources Triples are atomic and schemaless – so they are easy to share, easy to combine. When you model data as triples, it's easy to load the data as-is, and query across all your data. You can also link data from different sources by creating new triples. For example, if you have information about the same customer from two sources, and one source calls the customer "cust123" while the other calls the same customer "cus_id_456", Simply add a triple cust123 sameAs cus_id_456 and you can query across all the information about that customer in a single simple query. As well as creating and extracting your own triples, there are billions of triples available on the Open Linked Data web. For example, you can download sections of dbpedia (the triples version of wikipedia) Einstein was born in Germany Buzz Aldrin was on the crew of Apollo 11 A labrador is a type of dog Or you can download facts from Geonames: London is in England London has a population of 7,504,800 London is at lat/long position 51.5/-0.16667 Or you can go to data.gov to get facts about food from the Dept of Agriculture (http://data-gov.tw.rpi.edu/wiki/Dataset_1294) Pineapple juice has 140 calories per serving See http://www.w3.org/wiki/DataSetRDFDumps for a partial listing of RDF data available for download and ingestion into MarkLogic. See http://data-gov.tw.rpi.edu/wiki/Data.gov_Catalog_-_Complete for a listing of Open Government RDF datasets. Standards-based for ease of use and integration MarkLogic Semantics is based on W3C standards. RDF describes the data model for facts and relationships (http://www.w3.org/RDF/). MarkLogic can load RDF files in all the popular RDF formats – RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, and TriG (http://docs.marklogic.com/guide/semantics/loading#id_70682) SPARQL is the W3C standard language for querying RDF. MarkLogic supports SPARQL 1.1, which includes paths, aggregates, and inserts/deletes. (http://www.w3.org/TR/sparql11-query/ and http://www.w3.org/TR/sparql11-update/) MarkLogic also supports standard interfaces. http://www.w3.org/TR/sparql11-protocol/ defines a SPARQL endpoint, which is a standard REST endpoint for SPARQL queries. http://www.w3.org/TR/sparql11-http-rdf-update/ defines the Graph Store HTTP Protocol, which is a standard REST endpoint for managing RDF graphs. Even better with search, bitemporal The real power of MarkLogic comes not from a single feature, but in the ability to combine features in a single, powerful query. Semantics isn't a product, it's a feature of a product. MarkLogic Semantics works particularly well with search (including GeoSpatial search) and bitemporal. In MarkLogic, you can embed triples in XML or JSON documents and run combination queries. You can combine SPARQL and cts:query in two ways: run a SPARQL query that is filtered by a cts:query condition; or embed a cts:triple-range-query (which returns a cts:query) in a cts:search. For example, you might want to ask "show me all the people who met with John". If you have triples of the form "john metWith X", that's a simple SPARQL query. But if those triples are embedded in the documents where that fact was asserted or discovered – say, a police report or e-mail exchange – you can ask much richer questions such as "show me all the people who met with John, where the fact was discovered in the last 6 months and the source is a police report from a county in the eastern US and that report also mentions some kind of weapon and some kind of controlled substance". Or you might want to ask "how many emails and tweets in my sample are generally positive?" If you have triples of the form "message1002 hasSentiment +9", that's a simple SPARQL query. But if those triples are embedded in the messages, you can ask much richer questions such as "show me snippets of all the messages that were overwhelmingly positive, and were sent by someone who is an executive of a Fortune 500 company, between these dates, and which mention the companies ‘IBM’ and ‘Oracle’, and mention a word that has something to do with takeovers or acquisitions". Bitemporal (MarkLogic 8 feature): Bitemporal Data Management handles historical data along two different timelines, making it possible to rewind the information “as it actually was” in combination with “as it was recorded” at some point in time. It facilitates the creation of complete audit trail of data. Since you can compose SPARQL and cts:query, you can do a bitemporal SPARQL query! Simply run the SPARQL query with a cts:query constraint over one or both bitemporal axes.
#23: Short Description: MarkLogic scales horizontally in clusters on commodity hardware to hundreds of nodes, petabytes of data, and billions of documents—and still processes thousands of transactions per second. Longer Description: Elasticity and scalability are critical to address the growing volumes of data. By 2020, the digital universe will grow to 40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every man, woman, and child). The need already exists to process petabytes worth of data fast and with low overhead. MarkLogic allows you to start small or go big. From 3 node clusters to 250+ node clusters or 10,000 documents to 1 Billion—MarkLogic scales horizontally as your data grows or shrinks. You can add or remove nodes easily, helping you keep the database in line with performance needs without over-provisioning. And, MarkLogic doesn’t need “big iron.” Run it on cost-effective commodity hardware in any environment—in the cloud, virtualized, or on-premises. MarkLogic also handles thousands of transactions per second, even at scale—all while maintaining full ACID properties. This unique capability positioned MarkLogic as the best choice to run healthcare.gov and a large operational trade store at a top investment bank. Performance usually suffers at scale with most databases. But, MarkLogic scales easily to handle hundreds of Terabytes using shared-nothing architecture. Data partitions are completely independent of each other and can act independently. So, when you need more partitions, you just add more and queries run just as efficient as they did with the first cluster. Changing cluster configurations is a pain with most databases but MarkLogic provides easy administration to add or remove clusters. Another feature that helps you manage your data at scale is tiered storage. MarkLogic tiered storage provides the ability to store and manage data in different tiers based on cost and performance trade-offs—whether it’s flash storage, traditional local or shared disk storage, HDFS, or Amazon cloud storage. With tiered storage, data is easily migrated between these tiers without any ETL, additional software, or expensive infrastructure changes. Organizations can easily balance performance and capacity through the information lifecycle—meeting performance SLAs and making data governance easy. MarkLogic Large Deployment Example 4 clusters 16 databases 200 D-Nodes 50 E-Nodes 800 Forests 1.2B+ documents 22k QPS 45 racks 1PB of storage 57TB of RAM 15K cores of compute
#24: With MarkLogic – keep going / no traffic lights - We’ve got a single platform with database, built-in search, and application services so there’s less work up front - We don’t analyze data formats, just load ‘em in! When it comes to schemas – evolution not revolution – don’t have to stop, and if you pull a wire out the thing doesn’t break; “sustainable evolution” (way to describe semantics) You’ve only got one database and infrastructure, so nothing to do there…. There’s no complicated ETL or data normalization required… And our robust single stack platform of database, search, and application services means there’s less to test - LESS TO TEST / LESS CHANGE, FASTER TESTING, LESS COST – FASTER TO VALUE – “GO FASTER” STRIPES ON HERE
#25: “New” badge – indicates features that are new in MarkLogic 8. MarkLogic 8 also includes enhancements to other features such as the REST API, Java Client API, and Incremental/Customizable backup. These features are currently available in the Early Access program and are not discussed in detail in this deck. They are only highlighted here for awareness. All of the other features on this slide are fully available in MarkLogic 7. Powerful - Deliver more value, build better apps **these are all of MarkLogic’s unique features** MarkLogic is designed for today’s data, helping you find answers in documents, relationships, and metadata by storing and managing JSON, XML, RDF, Geospatial data, and more. MarkLogic serves as an intelligent data layer, giving you the freedom to do more. Agile - Prepare for and respond to change **these are all of the features that focus on ease-of-use and flexibility** Enjoy the flexibility of NoSQL to integrate data, and deploy in any environment—whether using Amazon Web Services, virtual machines, or on-premise hardware. With the agility and adaptability of MarkLogic, you can build applications fast. Trusted - Enterprise-ready for mission-critical uses **these are all the features that ensure MarkLogic meets enterprise requirements** MarkLogic is a hardened platform that is trusted to run mission-critical applications. It has higher security certifications than any other NoSQL database, and has uncompromised data resiliency with features that ensure you will never lose data.
#30: *Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document. With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file. The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried. More Information on Schemas A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.
#31: *Note: In MarkLogic 8, the examples above shows an XML document, but in MarkLogic, JSON will be stored natively and we could replace this with a similar looking JSON document. With MarkLogic, you can load all of your data as-is and only define a schema when you need it. You can even change your schema without having to redefine your entire data model. MarkLogic is also structure-aware, and you can even query the structure of documents. In MarkLogic, data is stored as self-contained documents – not in rows and columns – which means no foreign keys and no normalization. The data doesn’t have to be shredded across tables. Also, data is often in a document format already, such as XML, SGML, FpML, HTML, and JSON. When handling a document, MarkLogic starts by parsing and indexing the document contents, converting the document from serialized document format to a compressed binary fragment representation. Due to highly efficient compression, the data is much smaller than you would find with a typical file. The example above shows how MarkLogic ‘sees’ an XML document in its hierarchical tree structure. Shown like this, you can see how the document model is self describing. This example shows a “Suspicious Activities Report”, but you could easily imagine how it could also be a trade document, medical record, book chapter, email, metadata file—hundreds of different things that model well in a document structure. The example above shows something else that’s unique about MarkLogic as well. It shows various types of data including values, geospatial, unstructured full text, and semantic triples. All of this is indexed and can be queried. More Information on Schemas A database schema is a blueprint, or set of constraints, that define how data is structured and organized in the database. In the relational world, the schema is defined before ingesting data, and it has relations, tuples, and attributes represented as tables, rows, and columns. In the non-relational world, the relational mathematics at work with SQL do not apply, and schema is less rigid and does not have to be pre-defined. Well-formed XML, for example, can be parsed at ingestion and the database will use the inherent XML structure as the schema.

New Trends in Data Management in the Information Industries

More Related Content

What's hot (16)

Similar to New Trends in Data Management in the Information Industries (20)

More from Matt Turner (20)

Recently uploaded (20)

New Trends in Data Management in the Information Industries

Editor's Notes