SlideShare a Scribd company logo
Who changed my data?
Need for data governance and
provenance in a streaming world
Digital capability requires granular control of all data assets.
Dinesh Chandrasekhar
Director, Product Marketing
Paige Bartley
Senior Analyst, Data and
Enterprise Intelligence
Ovum | TMT intelligence | informa2 Copyright © Informa PLC
Ovum | TMT intelligence | informa3 Copyright © Informa PLC
Business challenges in achieving digital capability include:
 Reproducibility of analytics results
 Debugging of models and algorithms
 Ensuring correct access rights to data
 Consistent application of data policies
 Meeting regulatory compliance requirements
 Unifying data across repositories and silos
 Finding the right data at the right time
Digital Capability Depends on Full Control of Data
Addressing these
challenges requires
understanding how
data changes over time.
Ovum | TMT intelligence | informa4 Copyright © Informa PLC
Governance and Transparency of
Data Assets is More Important
than Ever
Ovum | TMT intelligence | informa5 Copyright © Informa PLC
More Data:
 Economics of storage have made keeping data cheap.
 New data types – sensor data, etc. – need to be combined with historical data.
More Users:
 Self-service era means more data consumers and more frequent data access.
 Varying users have varying access rights and privileges.
 More users means more proliferation of data versions.
More Complexity:
 Data repositories have become more distributed, and data sources more varied.
 Data resides in more locations than ever before, in the cloud and on-prem.
Factors Within the Enterprise
Ovum | TMT intelligence | informa6 Copyright © Informa PLC
More Regulatory Pressure
Regulations such as GDPR have indirect requirements for tracking lineage.
 Article 30 requirements for record keeping necessitate knowledge of provenance.
More Competitive Pressure
 Leverage of data is increasingly a competitive differentiator.
 Pace of change is accelerating, and comprehensive understanding of data is critical.
 Disruptors are emerging from unlikely industries, using data to their advantage.
Factors External to the Enterprise
Ovum | TMT intelligence | informa7 Copyright © Informa PLC
 Article 4: Definition of Personal Data
A person can be identified indirectly or directly
Data sources can be combined to make personal data
 Article 9: Processing of Special Categories of Personal Data
Processing of biometric data is highly restricted
Many types of sensors produce biometric data
 Article 30: Records of Processing Activities
“Who, what, when, where, and why” of processing
Need deep understanding of metadata and data lineage.
GDPR doesn’t differentiate between data-in-motion and data-at-rest! “Who changed what” is critical.
Lineage and provenance, while not directly required by GDPR, are critical to meeting requirements.
GDPR’s Specific Requirements for Data
Ovum | TMT intelligence | informa8 Copyright © Informa PLC
<<
<<
From an analytics standpoint, reaping the benefits of
big data means investing in data management and
governance. Without the correct people, processes, and
infrastructure, more casual business users will likely
struggle to see the benefits of big data technologies.
Laurent-Olivier Lioté
Analyst, Data and Enterprise Intelligence, Ovum
Ovum | TMT intelligence | informa9 Copyright © Informa PLC
A Holistic View of Data Requires Both Data-in-Motion and Data-at-Rest
Data at Rest Data in Motion
Contextual
Understanding
of Data
Ovum | TMT intelligence | informa10 Copyright © Informa PLC
Having a common enterprise metadata framework allows data of different types and from different sources
to be managed consistently.
A common metadata framework allows for:
 Common search and lineage for datasets
 Lifecycle management from ingestion to disposition
 Metadata exchange with other metadata tools
 Analysis of data usage and access trends
 Consistent application of access rights
 Analysis of behavior and anomalies
How Do We Do This? Metadata Management is Necessary for Governance
Metadata
Creation
Metadata
Enrichment
Metadata
Analysis
Ovum | TMT intelligence | informa11 Copyright © Informa PLC
The data lake, if properly managed, can support a common metadata framework which underpins enterprise data.
 Data-in-motion
 Data-at-rest
 Structured data
 Unstructured data
Common management of metadata allows for streamlined control and
visibility into data. Better control of data results in better business outcomes.
The Managed Data Lake Can Support a Common Metadata Framework
All metadata, managed together.
Ovum | TMT intelligence | informa12 Copyright © Informa PLC
The enterprise increasingly wants to analyze all data, both in-motion and at-rest, in context with each other.
Governance and lineage for data-in-motion allows for:
 Audit and regulatory compliance
 Insight into data history and provenance
 Comprehensive lifecycle management
 Security and access controls
 Better quality data = better analytics
Governance standards for data-in-motion need to match those for data-at-rest.
Governance Standards Need to be Equal
Common Metadata Framework
Data-in-Motion Data-at-Rest
Data Management Platform
13 © Hortonworks Inc. 2011–2018. All rights reserved
Changing face of data
Challenges and Solutions
14 © Hortonworks Inc. 2011–2018. All rights reserved
The New Way of Business Is Fueled By Connected Data
• Connected Customers,
Vehicles, Devices
• Socially crowd-sourced
requirements
• Digital design and
analysis
• Digital prototypes and
tests (simulations)
• Connected Factories,
Sensors, Devices
• Human-robotic
interaction
• 3D-printing on
demand
• Connected Trucks,
Inventory
• Location, traffic,
weather-aware
distribution
• Real-time inventory
visibility
• Dynamic rerouting
• Connected Customers,
Devices
• Omni- channel
demand sensing
• Real-Time
Recommendations
• Connected Assets
• Remote service
monitoring & delivery
• Predictive
maintenance
• OTA Updates
DEVELOPMENT MANUFACTURING DISTRIBUTION MARKETING/SALES SERVICE
15 © Hortonworks Inc. 2011–2018. All rights reserved
Today’s Digital Enterprises
RFID TRACKERS AND
NANO-DEVICES
to give you visibility into
movement of your goods
MOBILE NOTIFICATIONS
to inform you of shipment
delay from a supplier
BLOCKCHAINS
to give complete trust and
provenance in your supply
chain
VIRTUAL ASSISTANTS
to enhance your customer
experience
AI-POWERED CHATBOTS
to improve your customer
support functions
ELECTRONIC B2B
EXCHANGES
to streamline order processing
with partners
16
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Modern Data Architecture
DATA CENTER
Machine
Learning/
Artificial
Intelligence
Telemetry –
Connected
Devices
Time Series
Databases
Stream Analytics
Deep Historical
Analysis
Exception
Monitoring
Legacy/
Operational
Data
Sensors,
Control
Systems
Cyber
Security
Edge
Analytics
Social Mobile
IoT
IoT
CLOUD
Geo Location
17
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Data Challenges
Cannot get a
360 VIEW of
your customer? DROWNING in
data lakes?
TOO MUCH DATA
coming in from
TOO MANY
SOURCES and
devices?
New business
initiatives leading
to EXCESSIVE IT
COSTS?`
MOST IMPORTANTLY…
Don’t have the right data at the right time to make the right decision?
18
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
G L O B A L D ATA M A N A G E M E N T
DATA
SOURCES
DATA CENTER CLOUD EDGE
Exception
Monitoring
360 View of
Operations
Cyber
Security
Telemetry –
Connected
Devices
Time Series
Sensors,
Control
Systems
Telemetry –
Connected
Devices
Sensors,
Control
Systems
Time Series
Exception
Monitoring
Cyber
Security
Legacy/
Operational
Data
Global Data Management Enables Modern Data Architecture
19
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Data Management Challenges
• Dealing with multi-clouds
• Avoiding cloud/ vendor lock-in
• Future proofing your architecture
• Common view of security, governance
• Manage all data, regardless of type or location
• Maximize data re-use for multiple workloads
DATA
SOURCES
DATA CENTER CLOUD EDGE
Exception
Monitoring
360 View of
Operations
Cyber
Security
Telemetry –
Connected
Devices
Time Series
Sensors,
Control
Systems
20
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Global Data Management Platform
DATA
SOURCES
DATA CENTER CLOUD EDGE
Exception
Monitoring
360 View of
Operations
Cyber
Security
Telemetry –
Connected
Devices
Time Series
Sensors,
Control
Systems
DATA-IN-MOTION DATA-AT-REST
MANAGE, SECURE, GOVERN, CONSUME
21
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Global Data Management - Powering Innovation
MODERN DATA USE CASES
EDW
OPTIMIZATION
CYBERSECURITY DATA SCIENCE
ADVANCED
ANALYTICS
IOT/ STREAMING
ANALYTICS
DATA
SOURCES
DATA CENTER CLOUD EDGE
Exception
Monitoring
360 View of
Operations
Cyber
Security
Telemetry –
Connected
Devices
Time Series
Sensors,
Control
Systems
DATA-IN-MOTION DATA-AT-REST
MANAGE, SECURE, GOVERN, CONSUME
22
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Apache NiFi Overview
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
23 © Hortonworks Inc. 2011–2018. All rights reserved
Watch real time flow of data: Data Provenance in Apache NiFi
Select Data Provenance
24 © Hortonworks Inc. 2011–2018. All rights reserved
Easily access and trace changes to dataflow in Apache NiFi
25 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Atlas
• Enterprise data
governance
• Integration with
Apache NiFi
• Integration with
Apache Ranger
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Tag Based
Policies
Data Lifecycle
Management
Real Time Tag BasedAccess Control
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Energy
PPDM
Retail
PCI
PII
Other
CWM SERVICE: DATA STEWARD STUDIODSS
Discover&
Fingerprint
Data
Smart
Enterprise
Search
Data & Metadata
Security
Data Lineage &
Impact Analysis
Enterprise
Data
Catalog
Organize&
CurateData
26 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you

More Related Content

PPTX
Global Data Management – a practical framework to rethinking enterprise, oper...
PPTX
San Antonio’s electric utility making big data analytics the business of the ...
PPTX
Security and governance
PPTX
Risk listening: monitoring for profitable growth
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Real time trade surveillance in financial markets
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
GE’s Industrial Data Lake Platform
Global Data Management – a practical framework to rethinking enterprise, oper...
San Antonio’s electric utility making big data analytics the business of the ...
Security and governance
Risk listening: monitoring for profitable growth
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Real time trade surveillance in financial markets
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
GE’s Industrial Data Lake Platform

What's hot (20)

PPTX
Hilton's enterprise data journey
PPTX
Harnessing the Power of Big Data at Freddie Mac
PDF
Driving Digital Transformation Through Global Data Management
PPTX
Klarna Tech Talk - Mind the Data!
PDF
The Manulife Journey
PPTX
Fighting Financial Crime with Artificial Intelligence
PPTX
The Power of your Data Achieved - Next Gen Modernization
PPTX
Risk listening: monitoring for profitable growth
PDF
Hybrid Cloud Strategy for Big Data and Analytics
PDF
Strategyzing big data in telco industry
PDF
Connecting Home/Building, Life and Car..The Importance of Insurance Risk Moni...
PPTX
How Universities Use Big Data to Transform Education
PDF
Data Science Crash Course
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PPTX
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
PDF
Active Governance Across the Delta Lake with Alation
PPTX
Monitizing Big Data at Telecom Service Providers
PDF
Journey to Big Data: Main Issues, Solutions, Benefits
PDF
Accelerate Return on Data
PPTX
Adapting to the exponential development of technology
Hilton's enterprise data journey
Harnessing the Power of Big Data at Freddie Mac
Driving Digital Transformation Through Global Data Management
Klarna Tech Talk - Mind the Data!
The Manulife Journey
Fighting Financial Crime with Artificial Intelligence
The Power of your Data Achieved - Next Gen Modernization
Risk listening: monitoring for profitable growth
Hybrid Cloud Strategy for Big Data and Analytics
Strategyzing big data in telco industry
Connecting Home/Building, Life and Car..The Importance of Insurance Risk Moni...
How Universities Use Big Data to Transform Education
Data Science Crash Course
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
Active Governance Across the Delta Lake with Alation
Monitizing Big Data at Telecom Service Providers
Journey to Big Data: Main Issues, Solutions, Benefits
Accelerate Return on Data
Adapting to the exponential development of technology
Ad

Similar to Who changed my data? Need for data governance and provenance in a streaming world (20)

PDF
Compliance in Motion: Aligning Data Governance Initiatives with Business Obje...
PDF
Harness the power of data
PDF
Enabling a Culture of Self-Service Analytics
PDF
What makes an effective data team?
PDF
The Rise of Big Data and the Chief Data Officer (CDO)
PPTX
Big Data LDN 2016: Case Studies of Business Transformation through Big Data
PDF
In:Confidence 2019 - A foundation for Insight in a data-rich world
PDF
Gse uk-cedrinemadera-2018-shared
PDF
Reinvent Your Data Management Strategy for Successful Digital Transformation
PDF
Data estate modernization feb webinar 2 18 2020
PPTX
Achieving a 360 degree view of manufacturing
PDF
Regulation and Compliance in the Data Driven Enterprise
PDF
Time Difference: How Tomorrow's Companies Will Outpace Today's
PDF
Achieving a 360-degree view of manufacturing via open source industrial data ...
PPT
MIT ICIQ 2017 Keynote: Data Governance and Data Capitalization in the Big Dat...
PDF
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
PDF
Semantic 'Radar' Steers Users to Insights in the Data Lake
PPTX
DataPlatform.pptx
PPTX
Data Governance in the Big Data Era
PDF
Slow Data Kills Business eBook - Improve the Customer Experience
Compliance in Motion: Aligning Data Governance Initiatives with Business Obje...
Harness the power of data
Enabling a Culture of Self-Service Analytics
What makes an effective data team?
The Rise of Big Data and the Chief Data Officer (CDO)
Big Data LDN 2016: Case Studies of Business Transformation through Big Data
In:Confidence 2019 - A foundation for Insight in a data-rich world
Gse uk-cedrinemadera-2018-shared
Reinvent Your Data Management Strategy for Successful Digital Transformation
Data estate modernization feb webinar 2 18 2020
Achieving a 360 degree view of manufacturing
Regulation and Compliance in the Data Driven Enterprise
Time Difference: How Tomorrow's Companies Will Outpace Today's
Achieving a 360-degree view of manufacturing via open source industrial data ...
MIT ICIQ 2017 Keynote: Data Governance and Data Capitalization in the Big Dat...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
Semantic 'Radar' Steers Users to Insights in the Data Lake
DataPlatform.pptx
Data Governance in the Big Data Era
Slow Data Kills Business eBook - Improve the Customer Experience
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
20250228 LYD VKU AI Blended-Learning.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MYSQL Presentation for SQL database connectivity

Who changed my data? Need for data governance and provenance in a streaming world

  • 1. Who changed my data? Need for data governance and provenance in a streaming world Digital capability requires granular control of all data assets. Dinesh Chandrasekhar Director, Product Marketing Paige Bartley Senior Analyst, Data and Enterprise Intelligence
  • 2. Ovum | TMT intelligence | informa2 Copyright © Informa PLC
  • 3. Ovum | TMT intelligence | informa3 Copyright © Informa PLC Business challenges in achieving digital capability include:  Reproducibility of analytics results  Debugging of models and algorithms  Ensuring correct access rights to data  Consistent application of data policies  Meeting regulatory compliance requirements  Unifying data across repositories and silos  Finding the right data at the right time Digital Capability Depends on Full Control of Data Addressing these challenges requires understanding how data changes over time.
  • 4. Ovum | TMT intelligence | informa4 Copyright © Informa PLC Governance and Transparency of Data Assets is More Important than Ever
  • 5. Ovum | TMT intelligence | informa5 Copyright © Informa PLC More Data:  Economics of storage have made keeping data cheap.  New data types – sensor data, etc. – need to be combined with historical data. More Users:  Self-service era means more data consumers and more frequent data access.  Varying users have varying access rights and privileges.  More users means more proliferation of data versions. More Complexity:  Data repositories have become more distributed, and data sources more varied.  Data resides in more locations than ever before, in the cloud and on-prem. Factors Within the Enterprise
  • 6. Ovum | TMT intelligence | informa6 Copyright © Informa PLC More Regulatory Pressure Regulations such as GDPR have indirect requirements for tracking lineage.  Article 30 requirements for record keeping necessitate knowledge of provenance. More Competitive Pressure  Leverage of data is increasingly a competitive differentiator.  Pace of change is accelerating, and comprehensive understanding of data is critical.  Disruptors are emerging from unlikely industries, using data to their advantage. Factors External to the Enterprise
  • 7. Ovum | TMT intelligence | informa7 Copyright © Informa PLC  Article 4: Definition of Personal Data A person can be identified indirectly or directly Data sources can be combined to make personal data  Article 9: Processing of Special Categories of Personal Data Processing of biometric data is highly restricted Many types of sensors produce biometric data  Article 30: Records of Processing Activities “Who, what, when, where, and why” of processing Need deep understanding of metadata and data lineage. GDPR doesn’t differentiate between data-in-motion and data-at-rest! “Who changed what” is critical. Lineage and provenance, while not directly required by GDPR, are critical to meeting requirements. GDPR’s Specific Requirements for Data
  • 8. Ovum | TMT intelligence | informa8 Copyright © Informa PLC << << From an analytics standpoint, reaping the benefits of big data means investing in data management and governance. Without the correct people, processes, and infrastructure, more casual business users will likely struggle to see the benefits of big data technologies. Laurent-Olivier Lioté Analyst, Data and Enterprise Intelligence, Ovum
  • 9. Ovum | TMT intelligence | informa9 Copyright © Informa PLC A Holistic View of Data Requires Both Data-in-Motion and Data-at-Rest Data at Rest Data in Motion Contextual Understanding of Data
  • 10. Ovum | TMT intelligence | informa10 Copyright © Informa PLC Having a common enterprise metadata framework allows data of different types and from different sources to be managed consistently. A common metadata framework allows for:  Common search and lineage for datasets  Lifecycle management from ingestion to disposition  Metadata exchange with other metadata tools  Analysis of data usage and access trends  Consistent application of access rights  Analysis of behavior and anomalies How Do We Do This? Metadata Management is Necessary for Governance Metadata Creation Metadata Enrichment Metadata Analysis
  • 11. Ovum | TMT intelligence | informa11 Copyright © Informa PLC The data lake, if properly managed, can support a common metadata framework which underpins enterprise data.  Data-in-motion  Data-at-rest  Structured data  Unstructured data Common management of metadata allows for streamlined control and visibility into data. Better control of data results in better business outcomes. The Managed Data Lake Can Support a Common Metadata Framework All metadata, managed together.
  • 12. Ovum | TMT intelligence | informa12 Copyright © Informa PLC The enterprise increasingly wants to analyze all data, both in-motion and at-rest, in context with each other. Governance and lineage for data-in-motion allows for:  Audit and regulatory compliance  Insight into data history and provenance  Comprehensive lifecycle management  Security and access controls  Better quality data = better analytics Governance standards for data-in-motion need to match those for data-at-rest. Governance Standards Need to be Equal Common Metadata Framework Data-in-Motion Data-at-Rest Data Management Platform
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Changing face of data Challenges and Solutions
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved The New Way of Business Is Fueled By Connected Data • Connected Customers, Vehicles, Devices • Socially crowd-sourced requirements • Digital design and analysis • Digital prototypes and tests (simulations) • Connected Factories, Sensors, Devices • Human-robotic interaction • 3D-printing on demand • Connected Trucks, Inventory • Location, traffic, weather-aware distribution • Real-time inventory visibility • Dynamic rerouting • Connected Customers, Devices • Omni- channel demand sensing • Real-Time Recommendations • Connected Assets • Remote service monitoring & delivery • Predictive maintenance • OTA Updates DEVELOPMENT MANUFACTURING DISTRIBUTION MARKETING/SALES SERVICE
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Today’s Digital Enterprises RFID TRACKERS AND NANO-DEVICES to give you visibility into movement of your goods MOBILE NOTIFICATIONS to inform you of shipment delay from a supplier BLOCKCHAINS to give complete trust and provenance in your supply chain VIRTUAL ASSISTANTS to enhance your customer experience AI-POWERED CHATBOTS to improve your customer support functions ELECTRONIC B2B EXCHANGES to streamline order processing with partners
  • 16. 16 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Modern Data Architecture DATA CENTER Machine Learning/ Artificial Intelligence Telemetry – Connected Devices Time Series Databases Stream Analytics Deep Historical Analysis Exception Monitoring Legacy/ Operational Data Sensors, Control Systems Cyber Security Edge Analytics Social Mobile IoT IoT CLOUD Geo Location
  • 17. 17 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Data Challenges Cannot get a 360 VIEW of your customer? DROWNING in data lakes? TOO MUCH DATA coming in from TOO MANY SOURCES and devices? New business initiatives leading to EXCESSIVE IT COSTS?` MOST IMPORTANTLY… Don’t have the right data at the right time to make the right decision?
  • 18. 18 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. G L O B A L D ATA M A N A G E M E N T DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems Telemetry – Connected Devices Sensors, Control Systems Time Series Exception Monitoring Cyber Security Legacy/ Operational Data Global Data Management Enables Modern Data Architecture
  • 19. 19 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Data Management Challenges • Dealing with multi-clouds • Avoiding cloud/ vendor lock-in • Future proofing your architecture • Common view of security, governance • Manage all data, regardless of type or location • Maximize data re-use for multiple workloads DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems
  • 20. 20 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Global Data Management Platform DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems DATA-IN-MOTION DATA-AT-REST MANAGE, SECURE, GOVERN, CONSUME
  • 21. 21 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Global Data Management - Powering Innovation MODERN DATA USE CASES EDW OPTIMIZATION CYBERSECURITY DATA SCIENCE ADVANCED ANALYTICS IOT/ STREAMING ANALYTICS DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems DATA-IN-MOTION DATA-AT-REST MANAGE, SECURE, GOVERN, CONSUME
  • 22. 22 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Apache NiFi Overview • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Watch real time flow of data: Data Provenance in Apache NiFi Select Data Provenance
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Easily access and trace changes to dataflow in Apache NiFi
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Apache Atlas • Enterprise data governance • Integration with Apache NiFi • Integration with Apache Ranger Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Tag Based Policies Data Lifecycle Management Real Time Tag BasedAccess Control REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM SERVICE: DATA STEWARD STUDIODSS Discover& Fingerprint Data Smart Enterprise Search Data & Metadata Security Data Lineage & Impact Analysis Enterprise Data Catalog Organize& CurateData
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

Editor's Notes

  • #3: Let’s step away from compliance, regulation, and requirements, and look at the major trends and drivers within the enterprise. Governance and provenance are often discussed as “checkbox” requirements, rather than as enablers. ICT Enterprise Insights survey identified “create digital capability” and “manage security, identity, and privacy” as the top two IT trends in the enterprise. What do these trends have in common?
  • #4: There are three pillars to creating digital capability. The first pillar is the creation of the digital platform and infrastructure itself. The second pillar is the creation of the ability to effectively exploit and utilize data. The third pillar is the development of the enterprise's innovation process and methodology for the digital age. All three are underpinned by a clearly articulated digital strategy.
  • #8: Article 4: Any information relating to an identified or identifiable natural person; a natural person can be identified indirectly or directly , and the enterprise needs to be cautious with combining data sources to ensure that innocuous information doesn’t become personal information Article 9: Processing of biometric data for the purpose of uniquely identifying a person is inherently prohibited, unless certain conditions are met, and this applies to several types of data in motion: sensor data from wearables, medical devices, and fitness devices. Article 30: Must document purposes of processing, transfers of data to non-EU countries, and the envisaged time limits for erasure of the data
  • #11: Data policies are applied and encoded at the metadata level. Metadata, or data about data, is critical to providing a common foundation for understanding the qualities of data residing in different systems and to provide lineage and cataloging capabilities. A shared or common metadata framework, where all metadata is managed together, allows data to be centrally searched, tracked, and monitored regardless of its "home" repository.
  • #13: To make this a reality, the same governance standards need to be applied to all enterprise data equally. There needs to be a single platform environment where data-in-motion and data-at-rest can be managed together, with a common metadata framework. All data-in-motion sources need a way to be ingested into this platform, with provenance and lineage tracked as they flow in.
  • #14: TALK TRACK Hortonworks Powers the Future of Data: data-in-motion, data-at-rest, and Modern Data Applications. [NEXT SLIDE]
  • #15: Data is often referred to as the fuel of today’s businesses. In reality, every business has data and perhaps can have access to the same types of data than most of their competitors. The real paradigm is not data but who uses it smarter with greater effect. And that usage often rely on connecting the data dots across your organization. By connecting customers to products to channels through which they interact of prefer to interact we can drive better customer experiences – resulting in better loyalty and hopefully better revenues. Every industry is being transformed through these connected use cases.
  • #19: 1) Data is in multiple places (data centers that the company owns, cloud, owned by a third party,). 2) Different data in different places (data in your databases – numbers – data from sensors in a connected product not arranged in a database; 3) data flowing back and forth between data center and cloud. Talking points: There is a an entire new world being created by combining lots of data with break through tools. Data could be on-premises and in the cloud Data is moving from sensors in real time across our data fabric and giving us precise instrumentation of what happened just before an event as well as after the event. This is true for customers buying on the web as well as products that might fail. We can run our machine learning and deep learning on these vast repositories of data And we can push these models down to the edges to automate decision Note: For us as a community and as a company, we need to continue to innovate around the core technology, while thinking about how we enable 3 personas to be successful. This is the logical evolution and transformation that’s happening now.
  • #20: You need to holistically manage all the data in all places, then begin to move our platform into place
  • #21: You need to holistically manage all the data in all places, then begin to move our platform into place
  • #22: You need to holistically manage all the data in all places, then begin to move our platform into place
  • #24: HDF provides very fine-grained, high fidelity reporting about the origins of data, how it was used, who used it etc.