SlideShare a Scribd company logo
© 2014
The Emerging Data Lake IT Strategy
An Evolving Approach for Dealing with Big Data & Changing Environments
SPEAKERS:
Thomas Kelly, Practice Director
Cognizant Technology Solutions
Sean Martin, Founder and CTO
Cambridge Semantics
bit.ly/DataLake
© 20142
We’re living in an amazing world of information sharing,
connecting with family, neighbors, vendors, and customers
all over the world
© 20143
Telling the world
about what we like
and don’t like
#HIMYMfinale
@MLB
… is now following Cognizant Technology Solutions
and Cambridge Semantics
© 20144
What we’re doing and how we’re succeeding
© 20145
We’re deciding what advertising that we want to see…
… and what we don’t
Unsubscribe
Influencing
how business
and customers
engage
© 20146
Many businesses have emerged that embrace this model of
customer engagement
and we’ve said Goodbye to businesses that didn’t
10 million stays in 2013,
without owning a hotel
Grew to nearly $75B in
annual retail revenue in 2013,
without opening a storefront Shares over 40 million
photos each day
© 20147
Retail
Engaging in a more
personalized shopping
experience, retailers are
building a stronger
relationship with each
customer
© 20148
Customer Service
Delivering a positive and
successful experience for
each customer
© 20149
Life Sciences and Healthcare
Combining health, genetic,
clinical, and public sciences
data to bring effective
therapies to patients sooner
© 201410
Financial Services
Delivering innovative
products and services,
based on a 360° view of
the Customer, across all
business lines, engaging
all available data assets,
internal and external
© 201411
The Challenges That We're Addressing
Onboarding and Integrating Data is Slow and Expensive
• Transforming data from a growing variety of technologies
• Custom coded ETL
• Existing ETL processes are not reusable
• Optimization for analytics is time-consuming and costly
• Often wait until there is a defined need for a set of data, delaying benefits
realization while waiting to onboard the data
Data Provenance is Often Poorly Recorded
• Data meaning is “lost in translation”
• Data transformations tracked in spreadsheets
• Post-onboarding, maintenance and analysis cost for onboarded data is high
• Recreating data lineage is manual, time-consuming, and error-prone
© 201412
The Challenges That We're Addressing
Target Data is Difficult to Consume
• Optimization favors known analytics, but not well suited to new requirements
• A one-size-fits-all canonical view is used rather than fit-for-purpose views
• Or, lacks a conceptual model to easily consume the target data
• Difficult to identify what data is available, how to get access, and how to
integrate the data to answer a question
Industrializing the Big Data Environment is Difficult to Manage
• Proliferation of data silos leads to inconsistency/syncing issues
• Conflicting objectives of opening access to data assets while managing
security and privacy requirements
• Velocity of business change rapidly invalidate data organization and analytics
optimizations
• Managing the integration/interaction with the multiple data management
technologies that make up the Big Data environment
© 201413
Data
Ingestion
The Data Lake is made up of four key
components
Data Lake Management
Data Management Query Management
Delivering
• Low Cost, High Performance Storage
• Flexible, Easy-to-Use Data Organization
• Performance-Optimized Analytics
• Automation of most manual Development and
Query Activities
• Self-Service End-User Features
• Intelligent Processing
© 201414
Data Ingestion
Data Lake Management
Data Management Query Management
Data Sources
Linked Data
Internet of Things IoT
Data
Ingestion
On-Demand
Query
Streaming
Semantic
Tagging
Scheduled
Batch Load
Model-
Driven
Self-Service
Desktop and Mobile
Operational
Systems
Social Media and
Cloud
© 201415
Data Management
Data Lake Management
Data Management Query Management
Provenance
Data
Movement
Data Sources
Linked Data
Internet of Things IoT
Semantic
Graph
Columnar
In Memory
Data
Ingestion
On-Demand
Query
Streaming
Semantic
Tagging
Scheduled
Batch Load
Model-
Driven
Self-Service
Desktop and Mobile
NoSQL Map Reduce
Operational
Systems
Social Media and
Cloud
HDFS Storage
Structured and
Unstructured Data
HDFS Storage
© 201416
Data
Ingestion
Data Lake Management
Data Management Query Management
Semantic
Graph
Columnar
In Memory
Provenance
Data
Movement
Data Lake Management
Data Assets
Catalog
WorkflowModels
Access
Management
Data Sources
Linked Data
Internet of Things IoT
Data Mappings
• Source-to-Target
• Transformations
• Internal and External
Data Assets
• Defined Data Orgs
(ontologies,
taxonomies, thesauri)
• Authorization and Access Rules
• Rule-based Security
• Group, Role, and User Level
Authorization
• Auditable Access
• Processes
• Schedules
• Provenance
Capture
On-Demand
Query
Streaming
Semantic
Tagging
Scheduled
Batch Load
Model-
Driven
Self-Service
Business-Focused
• Business Unit Data
Organization and Terms
• Optimized to Assist
Analytics
Monitoring
• Monitor and Manage
Data Lake Operations
Desktop and Mobile
Data Governance
• Focus on Shared Data
• Standard Models
• Controlled Vocabulary
• Common Definitions
• Standards-based Data
Views (FIBO, CDISC/RDF)
NoSQL Map Reduce
Operational
Systems
Social Media and
Cloud
Structured and
Unstructured Data
HDFS Storage
© 201417
Query Management
Data
Ingestion
On-Demand
Query
Streaming
Semantic
Tagging
Data Lake Management
Data Management
Scheduled
Batch Load
Model-
Driven
Self-Service
Query Management
Provenance
Data
Movement
Data Sources
Linked Data
Internet of Things IoT
Semantic
Graph
Columnar
In Memory
Query Data, Metadata,
and Provenance
Capture and Share
Analytics Expertise
Semantic Search
Analytics Directed to
the Best Query Engine
Data Discovery
Desktop and Mobile
NoSQL Map Reduce
Operational
Systems
Social Media and
Cloud
HDFS Storage
Structured and
Unstructured Data
HDFS Storage
© 201418
Semantic Technology Delivers “Smart” Data
Integrates a network of internal and external data assets,
insulating end users from the details of the underlying
technologies
Captures expertise (logic, inferencing) and integrates it with
the data, delivering “smart” data to non-expert users
Manages a comprehensive inventory of the data assets
Secures access to the right data assets by the right users
© 201419
Key W3C Standards in Semantic Technology
Resource Description
Framework (RDF)
Framework for storing and
integrating data and data
definitions in the form of subject-
predicate-object expressions, or
“triples”. Relationships are
organized in a logical graph
model. Reduced development
time and cost; faster time-to-
business value.
Web Ontology Language
(OWL)
An ontology is a comprehensive
model of data definitions and
relationships that is human- and
machine-readable. Ontologies
are inheritable and extensible.
Improved application quality,
flexible iterative / investigative
approach, easily adapts to
business change.
SPARQL
Query Language
SQL-like query language for
semantic data that can leverage
the ontological relationships and
constructs to execute smarter
queries. Access multiple
internal and external databases
simultaneously in a single query.
Access and integrate data
across business silos.
Inference
Reasoning over data through
business rules. Expertise is
captured and embedded in the
ontology model, accessible
through user queries. This is
the “smart” in Smart Data.
Easier end user access to
expertise; intelligent systems
capabilities.
Linked Data
Connects data contained in
different databases, allowing
queries to find, share and
combine data so insights can be
identified across the Web.
Connect disparate databases to
navigate and integrate data
regardless of location or
technology platform.
RDB to RDF Mapping
Language (R2RML)
Preserving current investments
in relational technology, R2RML
maps relational data to an
ontology. SPARQL can query
RDF and relational databases
simultaneously.
Low cost of entry to use
Semantic Technology to deliver
high-value solutions
© 201420
The Common Model is the “Data Glue”
Lead
(SFA system)
Quote
(Quote system)
Order
(OMS system)
Contract
(CMS system)
Common Model
(“Data Glue”)
Source Systems
• Different business entities in
physical systems actually share
many of the same concepts,
meanings, and relationships
• Semantic data science exposes
common business concepts and
connects them with their physical
expression in production systems
• Data is “glued” together by its
business meaning, rather than
physical structures dictated by
the underlying technologies
The conceptual model can be directly used by both business and IT users to
operationalize data services, understand the data landscape, track data lineage, and
conduct downstream analytics.
© 201421
Semantic Models Relate Data by Business
Meaning
Life
Events
Life Style
Preferences
Interests
Customer
Music
Purchasing
Personal
Network
Entertainment
Profession
© 201422
Implications to the Existing IT Architecture
and Practices
User Tools to Discover
and Optimize Data
Relationships
Structured and
Unstructured
Data, Voice,
and Video
Data Analysis
Automation
Extends Existing
Investments in
IT Architecture
Manages
Secure Access
Builds Out Enterprise
Data Models, with
Integration Hub
Capabilities
Self-Service Data Feeds
and Analytics
Infrastructure
Capacity
Elasticity
Reduction of
Data Mart Silos
Easier
Access
to
External
Data
© 201423
Data Lake Approach to Meeting Business Needs
Business Needs
Traditional Technologies
and Practices
Data Lake Technologies
and Practices
Onboard New Data
 Comprehensive analysis creates rigid
structure that is difficult to change, or
 Minimal definition of data organization
requires detailed understanding of data
contents
 Flexible data model can be revised or extended
without redesign of the database
 Agile, evolutionary refinement of the data
organization, leveraging new insights as users work
with the data
Connect External Data
 External data is collected and loaded into
the analytics repository.
 Data is streamed, or is refreshed on a
scheduled frequency.
 External data can be sourced from databases,
spreadsheets, Web pages, news feeds, and more;
data is queried through common methods, without
regard to location, with real-time values delivered at
query time.
Integrate Data between
Business Units or Business
Partners
 Governance activities establish common
vocabulary, and data definitions
 And, systems of record publish existing data
specifications or ontology model; each organization
defines data in a manner that is best suited for its
business.
 Shared data is copied to an integrated
database.
 Federation and virtualization features provide
choices in which data to copy and which data to
retain in the system(s) of record
 Organization-specific definitions may
require duplicating certain data in marts
 All models can be supported through a single copy of
the data, maintained in the data lake or system of
record.
Capture and Embed Expertise
 Expertise often captured in the reporting
and analytics; change management
challenge when updates required.
 Expertise captured in the data definitions; single,
shared definition minimizes change management
efforts
© 201424
Lessons learned from early adopters
Prioritize
Prioritize data onboarding by the data’s ability to
contribute to customer engagement
Onboard Onboard data assets as they become available
Connect Connect to available internal and external data assets
Load Load the data unfiltered/untransformed
Organize Use models to provide organization to the data
Customize
Create models that are tailored to the needs of the
business groups
Search Make it easy to find data
Secure
Manage security and privacy, but make it easy to
authorize access to data that users need
© 201425
Addressing Challenges
- Privacy vs Personal Value
- Granularity of customer understanding
- Delivering strategic objectives when projects tend
to have a technical focus
- Opening access to data
- Need for executive sponsorship
- Access to external data
- Establishing firewalls
- Persistent, pervasive data quality issues
© 201426
Clues to better customer engagement will be
found in the ever-growing volume of data that
we’re creating
© 201427
A Data Lake Strategy helps you to create a
personalized, engaging experience with each
customer
Visibility Self-Service
SmartProvenance
Open, yet Secure
Internet Scale
Agile
Adaptable
Universal
Data Access
© 201428
Questions?
© 201429
Thank you!

More Related Content

PDF
Data lake benefits
PPTX
Big Data: Setting Up the Big Data Lake
PDF
Making Big Data Easy for Everyone
PDF
The principles of the business data lake
PDF
Setting Up the Data Lake
PDF
Incorporating the Data Lake into Your Analytic Architecture
PDF
Data Lake,beyond the Data Warehouse
PDF
Benefits of the Azure Cloud
Data lake benefits
Big Data: Setting Up the Big Data Lake
Making Big Data Easy for Everyone
The principles of the business data lake
Setting Up the Data Lake
Incorporating the Data Lake into Your Analytic Architecture
Data Lake,beyond the Data Warehouse
Benefits of the Azure Cloud

What's hot (20)

PDF
The Data Lake - Balancing Data Governance and Innovation
PDF
Hadoop Big Data Lakes Keynote
PDF
Intro to Data Science on Hadoop
PDF
Creating a Next-Generation Big Data Architecture
PPTX
Big Data's Impact on the Enterprise
PDF
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
PDF
The Data Lake and Getting Buisnesses the Big Data Insights They Need
PDF
Moving Past Infrastructure Limitations
PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
PDF
Building the Enterprise Data Lake: A look at architecture
PDF
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
PDF
You're the New CDO, Now What?
PPTX
The Future of Data Management: The Enterprise Data Hub
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
PPTX
Developing a Strategy for Data Lake Governance
PDF
Mastering Customer Data on Apache Spark
PPTX
2012 10 bigdata_overview
PDF
The Emerging Role of the Data Lake
PDF
Data Lake, Virtual Database, or Data Hub - How to Choose?
PDF
Taming Big Data With Modern Software Architecture
The Data Lake - Balancing Data Governance and Innovation
Hadoop Big Data Lakes Keynote
Intro to Data Science on Hadoop
Creating a Next-Generation Big Data Architecture
Big Data's Impact on the Enterprise
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Moving Past Infrastructure Limitations
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Building the Enterprise Data Lake: A look at architecture
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
You're the New CDO, Now What?
The Future of Data Management: The Enterprise Data Hub
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
Developing a Strategy for Data Lake Governance
Mastering Customer Data on Apache Spark
2012 10 bigdata_overview
The Emerging Role of the Data Lake
Data Lake, Virtual Database, or Data Hub - How to Choose?
Taming Big Data With Modern Software Architecture
Ad

Similar to The Emerging Data Lake IT Strategy (20)

PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
Semantic 'Radar' Steers Users to Insights in the Data Lake
PDF
Data Virtualization: An Essential Component of a Cloud Data Lake
PDF
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
PDF
Semantic 'Radar' Steers Users to Insights in the Data Lake
PDF
Enabling digital business with governed data lake
PDF
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
PPTX
Cognitive data
PDF
Seminaire bigdata23102014
PPTX
Democratizing Data Science in the Enterprise
PDF
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
PPTX
WebAction In-Memory Computing Summit 2015
PPTX
BI, AI/ML, Use Cases, Business Impact and how to get started
PDF
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
PPTX
Big data architectures and the data lake
PDF
DAMA - Innovations in DG Architecture and Analytics (online)
PDF
Are You Killing the Benefits of Your Data Lake?
PDF
Data Lakes: A Logical Approach for Faster Unified Insights
Unlock Your Data for ML & AI using Data Virtualization
Semantic 'Radar' Steers Users to Insights in the Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
Semantic 'Radar' Steers Users to Insights in the Data Lake
Enabling digital business with governed data lake
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Cognitive data
Seminaire bigdata23102014
Democratizing Data Science in the Enterprise
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
WebAction In-Memory Computing Summit 2015
BI, AI/ML, Use Cases, Business Impact and how to get started
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
Big data architectures and the data lake
DAMA - Innovations in DG Architecture and Analytics (online)
Are You Killing the Benefits of Your Data Lake?
Data Lakes: A Logical Approach for Faster Unified Insights
Ad

More from Thomas Kelly, PMP (7)

PDF
Semantic Analytics
PDF
Enterprise Semantic Technology
PDF
Mobile semantic technology
PDF
Rapid data integration and curation
PDF
Transforming Big Data into Big Value
PDF
Semantic Technology for the Data Warehousing Practitioner
PDF
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Analytics
Enterprise Semantic Technology
Mobile semantic technology
Rapid data integration and curation
Transforming Big Data into Big Value
Semantic Technology for the Data Warehousing Practitioner
Semantic Technology for Provider-Payer-Pharma Data Collaboration

Recently uploaded (20)

PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to machine learning and Linear Models
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
A Quantitative-WPS Office.pptx research study
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Foundation of Data Science unit number two notes
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
Reliability_Chapter_ presentation 1221.5784
Galatica Smart Energy Infrastructure Startup Pitch Deck
Supervised vs unsupervised machine learning algorithms
Business Acumen Training GuidePresentation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
IB Computer Science - Internal Assessment.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to machine learning and Linear Models
.pdf is not working space design for the following data for the following dat...
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Acceptance and paychological effects of mandatory extra coach I classes.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
A Quantitative-WPS Office.pptx research study
Business Ppt On Nestle.pptx huunnnhhgfvu
Foundation of Data Science unit number two notes
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Knowledge Engineering Part 1

The Emerging Data Lake IT Strategy

  • 1. © 2014 The Emerging Data Lake IT Strategy An Evolving Approach for Dealing with Big Data & Changing Environments SPEAKERS: Thomas Kelly, Practice Director Cognizant Technology Solutions Sean Martin, Founder and CTO Cambridge Semantics bit.ly/DataLake
  • 2. © 20142 We’re living in an amazing world of information sharing, connecting with family, neighbors, vendors, and customers all over the world
  • 3. © 20143 Telling the world about what we like and don’t like #HIMYMfinale @MLB … is now following Cognizant Technology Solutions and Cambridge Semantics
  • 4. © 20144 What we’re doing and how we’re succeeding
  • 5. © 20145 We’re deciding what advertising that we want to see… … and what we don’t Unsubscribe Influencing how business and customers engage
  • 6. © 20146 Many businesses have emerged that embrace this model of customer engagement and we’ve said Goodbye to businesses that didn’t 10 million stays in 2013, without owning a hotel Grew to nearly $75B in annual retail revenue in 2013, without opening a storefront Shares over 40 million photos each day
  • 7. © 20147 Retail Engaging in a more personalized shopping experience, retailers are building a stronger relationship with each customer
  • 8. © 20148 Customer Service Delivering a positive and successful experience for each customer
  • 9. © 20149 Life Sciences and Healthcare Combining health, genetic, clinical, and public sciences data to bring effective therapies to patients sooner
  • 10. © 201410 Financial Services Delivering innovative products and services, based on a 360° view of the Customer, across all business lines, engaging all available data assets, internal and external
  • 11. © 201411 The Challenges That We're Addressing Onboarding and Integrating Data is Slow and Expensive • Transforming data from a growing variety of technologies • Custom coded ETL • Existing ETL processes are not reusable • Optimization for analytics is time-consuming and costly • Often wait until there is a defined need for a set of data, delaying benefits realization while waiting to onboard the data Data Provenance is Often Poorly Recorded • Data meaning is “lost in translation” • Data transformations tracked in spreadsheets • Post-onboarding, maintenance and analysis cost for onboarded data is high • Recreating data lineage is manual, time-consuming, and error-prone
  • 12. © 201412 The Challenges That We're Addressing Target Data is Difficult to Consume • Optimization favors known analytics, but not well suited to new requirements • A one-size-fits-all canonical view is used rather than fit-for-purpose views • Or, lacks a conceptual model to easily consume the target data • Difficult to identify what data is available, how to get access, and how to integrate the data to answer a question Industrializing the Big Data Environment is Difficult to Manage • Proliferation of data silos leads to inconsistency/syncing issues • Conflicting objectives of opening access to data assets while managing security and privacy requirements • Velocity of business change rapidly invalidate data organization and analytics optimizations • Managing the integration/interaction with the multiple data management technologies that make up the Big Data environment
  • 13. © 201413 Data Ingestion The Data Lake is made up of four key components Data Lake Management Data Management Query Management Delivering • Low Cost, High Performance Storage • Flexible, Easy-to-Use Data Organization • Performance-Optimized Analytics • Automation of most manual Development and Query Activities • Self-Service End-User Features • Intelligent Processing
  • 14. © 201414 Data Ingestion Data Lake Management Data Management Query Management Data Sources Linked Data Internet of Things IoT Data Ingestion On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Desktop and Mobile Operational Systems Social Media and Cloud
  • 15. © 201415 Data Management Data Lake Management Data Management Query Management Provenance Data Movement Data Sources Linked Data Internet of Things IoT Semantic Graph Columnar In Memory Data Ingestion On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Desktop and Mobile NoSQL Map Reduce Operational Systems Social Media and Cloud HDFS Storage Structured and Unstructured Data HDFS Storage
  • 16. © 201416 Data Ingestion Data Lake Management Data Management Query Management Semantic Graph Columnar In Memory Provenance Data Movement Data Lake Management Data Assets Catalog WorkflowModels Access Management Data Sources Linked Data Internet of Things IoT Data Mappings • Source-to-Target • Transformations • Internal and External Data Assets • Defined Data Orgs (ontologies, taxonomies, thesauri) • Authorization and Access Rules • Rule-based Security • Group, Role, and User Level Authorization • Auditable Access • Processes • Schedules • Provenance Capture On-Demand Query Streaming Semantic Tagging Scheduled Batch Load Model- Driven Self-Service Business-Focused • Business Unit Data Organization and Terms • Optimized to Assist Analytics Monitoring • Monitor and Manage Data Lake Operations Desktop and Mobile Data Governance • Focus on Shared Data • Standard Models • Controlled Vocabulary • Common Definitions • Standards-based Data Views (FIBO, CDISC/RDF) NoSQL Map Reduce Operational Systems Social Media and Cloud Structured and Unstructured Data HDFS Storage
  • 17. © 201417 Query Management Data Ingestion On-Demand Query Streaming Semantic Tagging Data Lake Management Data Management Scheduled Batch Load Model- Driven Self-Service Query Management Provenance Data Movement Data Sources Linked Data Internet of Things IoT Semantic Graph Columnar In Memory Query Data, Metadata, and Provenance Capture and Share Analytics Expertise Semantic Search Analytics Directed to the Best Query Engine Data Discovery Desktop and Mobile NoSQL Map Reduce Operational Systems Social Media and Cloud HDFS Storage Structured and Unstructured Data HDFS Storage
  • 18. © 201418 Semantic Technology Delivers “Smart” Data Integrates a network of internal and external data assets, insulating end users from the details of the underlying technologies Captures expertise (logic, inferencing) and integrates it with the data, delivering “smart” data to non-expert users Manages a comprehensive inventory of the data assets Secures access to the right data assets by the right users
  • 19. © 201419 Key W3C Standards in Semantic Technology Resource Description Framework (RDF) Framework for storing and integrating data and data definitions in the form of subject- predicate-object expressions, or “triples”. Relationships are organized in a logical graph model. Reduced development time and cost; faster time-to- business value. Web Ontology Language (OWL) An ontology is a comprehensive model of data definitions and relationships that is human- and machine-readable. Ontologies are inheritable and extensible. Improved application quality, flexible iterative / investigative approach, easily adapts to business change. SPARQL Query Language SQL-like query language for semantic data that can leverage the ontological relationships and constructs to execute smarter queries. Access multiple internal and external databases simultaneously in a single query. Access and integrate data across business silos. Inference Reasoning over data through business rules. Expertise is captured and embedded in the ontology model, accessible through user queries. This is the “smart” in Smart Data. Easier end user access to expertise; intelligent systems capabilities. Linked Data Connects data contained in different databases, allowing queries to find, share and combine data so insights can be identified across the Web. Connect disparate databases to navigate and integrate data regardless of location or technology platform. RDB to RDF Mapping Language (R2RML) Preserving current investments in relational technology, R2RML maps relational data to an ontology. SPARQL can query RDF and relational databases simultaneously. Low cost of entry to use Semantic Technology to deliver high-value solutions
  • 20. © 201420 The Common Model is the “Data Glue” Lead (SFA system) Quote (Quote system) Order (OMS system) Contract (CMS system) Common Model (“Data Glue”) Source Systems • Different business entities in physical systems actually share many of the same concepts, meanings, and relationships • Semantic data science exposes common business concepts and connects them with their physical expression in production systems • Data is “glued” together by its business meaning, rather than physical structures dictated by the underlying technologies The conceptual model can be directly used by both business and IT users to operationalize data services, understand the data landscape, track data lineage, and conduct downstream analytics.
  • 21. © 201421 Semantic Models Relate Data by Business Meaning Life Events Life Style Preferences Interests Customer Music Purchasing Personal Network Entertainment Profession
  • 22. © 201422 Implications to the Existing IT Architecture and Practices User Tools to Discover and Optimize Data Relationships Structured and Unstructured Data, Voice, and Video Data Analysis Automation Extends Existing Investments in IT Architecture Manages Secure Access Builds Out Enterprise Data Models, with Integration Hub Capabilities Self-Service Data Feeds and Analytics Infrastructure Capacity Elasticity Reduction of Data Mart Silos Easier Access to External Data
  • 23. © 201423 Data Lake Approach to Meeting Business Needs Business Needs Traditional Technologies and Practices Data Lake Technologies and Practices Onboard New Data  Comprehensive analysis creates rigid structure that is difficult to change, or  Minimal definition of data organization requires detailed understanding of data contents  Flexible data model can be revised or extended without redesign of the database  Agile, evolutionary refinement of the data organization, leveraging new insights as users work with the data Connect External Data  External data is collected and loaded into the analytics repository.  Data is streamed, or is refreshed on a scheduled frequency.  External data can be sourced from databases, spreadsheets, Web pages, news feeds, and more; data is queried through common methods, without regard to location, with real-time values delivered at query time. Integrate Data between Business Units or Business Partners  Governance activities establish common vocabulary, and data definitions  And, systems of record publish existing data specifications or ontology model; each organization defines data in a manner that is best suited for its business.  Shared data is copied to an integrated database.  Federation and virtualization features provide choices in which data to copy and which data to retain in the system(s) of record  Organization-specific definitions may require duplicating certain data in marts  All models can be supported through a single copy of the data, maintained in the data lake or system of record. Capture and Embed Expertise  Expertise often captured in the reporting and analytics; change management challenge when updates required.  Expertise captured in the data definitions; single, shared definition minimizes change management efforts
  • 24. © 201424 Lessons learned from early adopters Prioritize Prioritize data onboarding by the data’s ability to contribute to customer engagement Onboard Onboard data assets as they become available Connect Connect to available internal and external data assets Load Load the data unfiltered/untransformed Organize Use models to provide organization to the data Customize Create models that are tailored to the needs of the business groups Search Make it easy to find data Secure Manage security and privacy, but make it easy to authorize access to data that users need
  • 25. © 201425 Addressing Challenges - Privacy vs Personal Value - Granularity of customer understanding - Delivering strategic objectives when projects tend to have a technical focus - Opening access to data - Need for executive sponsorship - Access to external data - Establishing firewalls - Persistent, pervasive data quality issues
  • 26. © 201426 Clues to better customer engagement will be found in the ever-growing volume of data that we’re creating
  • 27. © 201427 A Data Lake Strategy helps you to create a personalized, engaging experience with each customer Visibility Self-Service SmartProvenance Open, yet Secure Internet Scale Agile Adaptable Universal Data Access