SlideShare a Scribd company logo
Operationalizing your Data Lake:
Get Ready for Advanced Analytics
October 22nd , 2017
Parth Patel | Big Data Solutions Engineer
ppatel1@zaloni.com
2
Industry-leading enterprise
data lake management,
governance and
self-service platform
Expert data lake
professional services
(Design, Implementation,
Workshops, Training)
Solutions to simplify
implementation
and reduce business risk
Enabling the data-powered enterprise
Zaloni Confidential and Proprietary
3 Zaloni proprietary – do not duplicate without permission
Increased
Agility
New
Insights
Improved
Scalability
Data lakes are central to the modern data architecture
4 Zaloni Proprietary
Data architecture modernizationTraditionalModern
Data Lake
Sources ETL EDW
Derived
(Transformed)
Discovery Sandbox
EDW
Streaming
Unstructured Data
Various
Sources
Data Discovery
Analytics BI
Data Science
Data Discovery
Analytics BI
Zaloni Confidential and Proprietary - Provided under NDA
5 Zaloni Proprietary
0% of market
Optimize
Self-Organizing Data Lake
• Self-improving data
lake via machine
learning algorithms
• True democratization
of big data and
analytics
• Intelligent data
remediation and
curation
• Recommended Data
Security, and
Governance policies
• Lights out business
operations optimized
for business success
2% of market
Automate
Responsive Data Lake
• Self-Service Ingestion
& Provisioning
• 360 View of Customer,
Product, etc
• Enterprise Data
Discovery
• Operationalize
analytical models into
business fabric
• Enables immediate data
impact on business
operations
Manage
10% of market
Managed Data Lake
• Acquire useful data from
across the enterprise
• Improved visibility and
understanding via
managed Ingestion of
data and metadata
• Ensure security and
privacy of sensitive data
• Operationalize
data at scale
• Leverage enterprise
governance &
security policies
• Scalable production data
lake for new and improved
business insights
22% of market
Store
Data Swamp
• Hadoop on premises
or in the Cloud
• Limited visibility and
usability of data
• Limited corporate
oversight & governance
• Sandbox or Dev
Environments
• Ad hoc and incremental
growth of big data
applications
• Ad-hoc and exploratory
insights for individual
use cases
Zaloni Big Data Maturity Model
Stage:
Characteristics:
Descriptor:
Stage Today:
Business
Impact:
Ignore
66% of market
• Emphasis on
structured data
• Limited ability to
leverage data at
scale
• Business emphasis
on retrospective
reporting and
analysis
• Strong governance
and security policies
• Slow to
accommodate
business changes
Data Warehouse
Value Realized
6 Zaloni proprietary – do not duplicate without permission
Managing the Data Supply Chain from Source to Consumer
CONSUMERS
Business Analysts
Researchers
Data Scientists
Applications
• Data Lake Management Platform
• A software solution for data lake management that enables enterprise-wide scalability
• Provides end to end capabilities
Self-Service
Data
Data Lake Management Platform
Enable Govern Engage
Batch ingestion
Streaming
Ingestion
Auto
discovery
Data Quality
Data Privacy and
Security
Data Lifecycle
Management
CatalogMetadata
Management
Operationalize
Transformations
Self-Service
Data Preparation
PRODUCERS
File Data
Streaming
Relational
On-premise
7 Zaloni Proprietary
Data Lake Reference Architecture
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
• Standardized on corporate governance/ quality policies
• Consumers are anyone with appropriate role-based access
• Single version of truth
Transient
Landing Zone
Raw
Zone
Refined Zone
Trusted Zone
Sandbox
Data Lake
• Temporary store of
source data
• Consumers are IT,
Data Stewards
• Implemented in highly
regulation industries
• Original source data
ready for consumption
• Consumers are ETL
developers, data
stewards, some data
scientists
• Single source of truth
with history
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
Sensors
(or other time series data)
Relational Data
Stores
(OLTP/ODS/DW)
Logs
(or other unstructured
data)
Social and
shared data
8 Zaloni Proprietary
Machine learning for data lake implementations
Loyalty
Customer
Service
TransactionsMarketing
3rd Party
● Easily integrate data silos
● Probabilistic data matching and record
linkage
● Automatically classify, encrypt/mask
PII/Sensitive data for regulatory
compliance
Integrate Data
Silos
9 Zaloni Proprietary
• Extend data lake beyond Hadoop
• Catalog traditional sources
• Ingest datasets without IT
• Prepare & provision data to your tool of
choice
Increasing data lake adoption through self-service
Self-Service Data
Preparation
10 Zaloni Proprietary
DON’T GO IN THE
DATA LAKE
WITHOUT US
Questions?

More Related Content

PPTX
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
PDF
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
PDF
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
PDF
Data Lakes - The Key to a Scalable Data Architecture
PPT
Emergence of MongoDB as an Enterprise Data Hub
PPTX
Developing a Strategy for Data Lake Governance
PPTX
Rethink Analytics with an Enterprise Data Hub
PDF
Data Discovery and BI - Is there Really a Difference?
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Data Lakes - The Key to a Scalable Data Architecture
Emergence of MongoDB as an Enterprise Data Hub
Developing a Strategy for Data Lake Governance
Rethink Analytics with an Enterprise Data Hub
Data Discovery and BI - Is there Really a Difference?

What's hot (20)

PPTX
Data driven decision making through analytics and IoT
PDF
Constant Contact: An Online Marketing Leader’s Data Lake Journey
PDF
Houd controle over uw data
PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
PDF
Modern Data Management for Federal Modernization
PPTX
The Future of Data Management: The Enterprise Data Hub
PDF
5 Myths about Spark and Big Data by Nik Rouda
PPTX
Building a Modern Analytic Database with Cloudera 5.8
PPTX
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
PDF
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
PPTX
Introduction to Data Engineering
PDF
Performance Acceleration: Summaries, Recommendation, MPP and more
PPTX
Big Data Analytics Webinar
PPTX
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
PPTX
Big Data Maturity Scorecard
PPTX
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
PPTX
Enterprise 360 - Graphs at the Center of a Data Fabric
PPTX
Solution Architecture US healthcare
PPSX
Lean Data Lineage
Data driven decision making through analytics and IoT
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Houd controle over uw data
Enterprise Data Hub: The Next Big Thing in Big Data
Modern Data Management for Federal Modernization
The Future of Data Management: The Enterprise Data Hub
5 Myths about Spark and Big Data by Nik Rouda
Building a Modern Analytic Database with Cloudera 5.8
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
Introduction to Data Engineering
Performance Acceleration: Summaries, Recommendation, MPP and more
Big Data Analytics Webinar
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Big Data Maturity Scorecard
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Enterprise 360 - Graphs at the Center of a Data Fabric
Solution Architecture US healthcare
Lean Data Lineage
Ad

Similar to Operationalizing your Data Lake: Get Ready for Advanced Analytics (20)

PDF
Strata San Jose 2017 - Ben Sharma Presentation
PDF
Creating a Modern Data Architecture
PDF
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
PPTX
How to build a successful data lake Presentation.pptx
PDF
Cloud Computing and Big Data
PDF
Data Virtualization: An Essential Component of a Cloud Data Lake
PDF
Data Lake Architecture
PPTX
How to build a successful Data Lake
PPTX
Data Lake Organization (Data Mining and Knowledge discovery)
PDF
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
PDF
Building the Enterprise Data Lake: A look at architecture
PDF
Enabling digital business with governed data lake
PDF
Agile enterprise analytics on aws
PDF
The Maturity Model: Taking the Growing Pains Out of Hadoop
PDF
Data Lakes: A Logical Approach for Faster Unified Insights
PDF
Data lakes
PDF
Total Data Industry Report
PDF
Gse uk-cedrinemadera-2018-shared
PPTX
Creating an Enterprise AI Strategy
PPTX
Assessing New Databases– Translytical Use Cases
Strata San Jose 2017 - Ben Sharma Presentation
Creating a Modern Data Architecture
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
How to build a successful data lake Presentation.pptx
Cloud Computing and Big Data
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Lake Architecture
How to build a successful Data Lake
Data Lake Organization (Data Mining and Knowledge discovery)
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Building the Enterprise Data Lake: A look at architecture
Enabling digital business with governed data lake
Agile enterprise analytics on aws
The Maturity Model: Taking the Growing Pains Out of Hadoop
Data Lakes: A Logical Approach for Faster Unified Insights
Data lakes
Total Data Industry Report
Gse uk-cedrinemadera-2018-shared
Creating an Enterprise AI Strategy
Assessing New Databases– Translytical Use Cases
Ad

More from IDEAS - Int'l Data Engineering and Science Association (20)

PPTX
How to deliver effective data science projects
PPTX
Digital cracks in banking--Sid Nandi
PDF
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
PPTX
Battling Skynet: The Role of Humanity in Artificial Intelligence
PPTX
Implementing Artificial Intelligence with Big Data
PPSX
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
PDF
Blockchain Application in Real Estate Transactions
PDF
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
PPTX
Practical Machine Learning at Work
PDF
Artificial Intelligence: Hype, Reality, Vision.
PDF
Introduction to Deep Reinforcement Learning
PPTX
Best Practices in Data Partnerships Between Mayor's Office and Academia
PDF
Everything You Wish You Knew About Search
PPTX
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
PPTX
Data-Driven AI for Entertainment and Healthcare
PDF
PDF
Using AI to Tackle the Future of Health Care Data
PDF
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
PDF
Machine Learning in Healthcare and Life Science
How to deliver effective data science projects
Digital cracks in banking--Sid Nandi
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
Battling Skynet: The Role of Humanity in Artificial Intelligence
Implementing Artificial Intelligence with Big Data
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
Blockchain Application in Real Estate Transactions
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Practical Machine Learning at Work
Artificial Intelligence: Hype, Reality, Vision.
Introduction to Deep Reinforcement Learning
Best Practices in Data Partnerships Between Mayor's Office and Academia
Everything You Wish You Knew About Search
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
Data-Driven AI for Entertainment and Healthcare
Using AI to Tackle the Future of Health Care Data
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
Machine Learning in Healthcare and Life Science

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Big Data Technologies - Introduction.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Spectroscopy.pptx food analysis technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
NewMind AI Weekly Chronicles - August'25 Week I
Spectroscopy.pptx food analysis technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”

Operationalizing your Data Lake: Get Ready for Advanced Analytics

  • 1. Operationalizing your Data Lake: Get Ready for Advanced Analytics October 22nd , 2017 Parth Patel | Big Data Solutions Engineer ppatel1@zaloni.com
  • 2. 2 Industry-leading enterprise data lake management, governance and self-service platform Expert data lake professional services (Design, Implementation, Workshops, Training) Solutions to simplify implementation and reduce business risk Enabling the data-powered enterprise Zaloni Confidential and Proprietary
  • 3. 3 Zaloni proprietary – do not duplicate without permission Increased Agility New Insights Improved Scalability Data lakes are central to the modern data architecture
  • 4. 4 Zaloni Proprietary Data architecture modernizationTraditionalModern Data Lake Sources ETL EDW Derived (Transformed) Discovery Sandbox EDW Streaming Unstructured Data Various Sources Data Discovery Analytics BI Data Science Data Discovery Analytics BI
  • 5. Zaloni Confidential and Proprietary - Provided under NDA 5 Zaloni Proprietary 0% of market Optimize Self-Organizing Data Lake • Self-improving data lake via machine learning algorithms • True democratization of big data and analytics • Intelligent data remediation and curation • Recommended Data Security, and Governance policies • Lights out business operations optimized for business success 2% of market Automate Responsive Data Lake • Self-Service Ingestion & Provisioning • 360 View of Customer, Product, etc • Enterprise Data Discovery • Operationalize analytical models into business fabric • Enables immediate data impact on business operations Manage 10% of market Managed Data Lake • Acquire useful data from across the enterprise • Improved visibility and understanding via managed Ingestion of data and metadata • Ensure security and privacy of sensitive data • Operationalize data at scale • Leverage enterprise governance & security policies • Scalable production data lake for new and improved business insights 22% of market Store Data Swamp • Hadoop on premises or in the Cloud • Limited visibility and usability of data • Limited corporate oversight & governance • Sandbox or Dev Environments • Ad hoc and incremental growth of big data applications • Ad-hoc and exploratory insights for individual use cases Zaloni Big Data Maturity Model Stage: Characteristics: Descriptor: Stage Today: Business Impact: Ignore 66% of market • Emphasis on structured data • Limited ability to leverage data at scale • Business emphasis on retrospective reporting and analysis • Strong governance and security policies • Slow to accommodate business changes Data Warehouse Value Realized
  • 6. 6 Zaloni proprietary – do not duplicate without permission Managing the Data Supply Chain from Source to Consumer CONSUMERS Business Analysts Researchers Data Scientists Applications • Data Lake Management Platform • A software solution for data lake management that enables enterprise-wide scalability • Provides end to end capabilities Self-Service Data Data Lake Management Platform Enable Govern Engage Batch ingestion Streaming Ingestion Auto discovery Data Quality Data Privacy and Security Data Lifecycle Management CatalogMetadata Management Operationalize Transformations Self-Service Data Preparation PRODUCERS File Data Streaming Relational On-premise
  • 7. 7 Zaloni Proprietary Data Lake Reference Architecture • Data required for LOB specific views - transformed from existing certified data • Consumers are anyone with appropriate role-based access • Standardized on corporate governance/ quality policies • Consumers are anyone with appropriate role-based access • Single version of truth Transient Landing Zone Raw Zone Refined Zone Trusted Zone Sandbox Data Lake • Temporary store of source data • Consumers are IT, Data Stewards • Implemented in highly regulation industries • Original source data ready for consumption • Consumers are ETL developers, data stewards, some data scientists • Single source of truth with history • Data required for LOB specific views - transformed from existing certified data • Consumers are anyone with appropriate role-based access Sensors (or other time series data) Relational Data Stores (OLTP/ODS/DW) Logs (or other unstructured data) Social and shared data
  • 8. 8 Zaloni Proprietary Machine learning for data lake implementations Loyalty Customer Service TransactionsMarketing 3rd Party ● Easily integrate data silos ● Probabilistic data matching and record linkage ● Automatically classify, encrypt/mask PII/Sensitive data for regulatory compliance Integrate Data Silos
  • 9. 9 Zaloni Proprietary • Extend data lake beyond Hadoop • Catalog traditional sources • Ingest datasets without IT • Prepare & provision data to your tool of choice Increasing data lake adoption through self-service Self-Service Data Preparation
  • 10. 10 Zaloni Proprietary DON’T GO IN THE DATA LAKE WITHOUT US Questions?