SlideShare a Scribd company logo
Cloud computing and big data
September 27, 2016
Ben Sharma | CEO
ben@zaloni.com
•  Award-winning provider of enterprise data lake
management solutions:
Integrated data lake management platform
Self-service data preparation
•  Data Lake Design and Implementation Services:
POC, Pilot, Production, Operations, Training
•  Data Science Professional Services
Zaloni Proprietary
Why cloud now?
“By 2018, at least half of IT
spending will be cloud-based,
reaching 60% of all IT infrastructure”
From IDC Research:
“By 2018, cloud becomes a
preferred delivery mechanism for
analytics, increasing public information
consumption by 150%”
Zaloni Confidential and Proprietary
Why are companies moving to a cloud-based platform
Infrastructure Drivers
•  Infrastructure agility
•  Cost
•  Compute and
storage elasticity
•  Heterogeneous
compute and storage
platforms
•  Converged
architectures for
various workloads
Data Locality
•  Data Gravity
•  Compliance and
regulatory
requirements
(international)
•  Keep data close to
where it is generated
New Requirements
•  Lot of data is
generated externally
•  Need to handle all
types of data –
Structured,
unstructured, images,
etc.
•  Latency and Currency
Zaloni Confidential and Proprietary
5.4 BILLION
IoT volume driving to cloud adoption
Cloud computing required
to provide the virtual
infrastructure needed to
process enormous volume
of data from the IoT
By 2020 there will be
Connected devices1, like smart
meters and connected cars —
This is the Internet of Things.
And it’s going to be big…
Exponential growth
loT: THE NEXT BIG THING
1.2B
5.4B
Source: ABI Research
2011 2014 2020
Zaloni Confidential and Proprietary
On-
Premises
32%
Cloud
Only
23%
Cloud Plus
On-
Premises
29%
Gartner’s Sept 2015 report: Hadoop Expansion Boosts Cloud and Unsupported On-Premises Deployment
Hadoop deployment trends
Zaloni Confidential and Proprietary
Cloud big data use case: Real-time data processing
Fleet Data Collection
Streaming Analytics
Idle Time Calculation Idle Time
reporting
Data-driven
Apps
Dispatchers
QueueCollectors
Ingestion
On-board
Unit
Data
Collectors
Zaloni Confidential and Proprietary
Data Lake in the Cloud
Consumption
Zone
Source
System
File Data
DB Data
ETL Extracts
Streaming
Transient
Loading Zone
Raw Data
Refined
Data
Trusted
Data
Discovery
Sandbox
Original unaltered
data attributes
Tokenized Data
APIs
Reference Data Master Data
Data Wrangling
Data Discovery
Exploratory Analytics
Metadata Data Quality Data Catalog Security
Data Lake
Integrate to
common format
Data Validation
Data Cleansing
Aggregations
OLTP or ODS
Enterprise Data
Warehouse
Logs
(or other unstructured
data)
Data Services
Business Analysts
Researchers
Data Scientists
Zaloni Confidential and Proprietary
•  Storage – Block, object and file level abstractions, with different degrees of
redundancy, availability and consistency guarantees, and cost considerations.
•  Compute - A variety of compute server types are possible, optimized for
different types of memory and processing requirements depending on the
workload.
•  Cloud native services – Higher levels of platform abstractions such as cloud
provider managed Hadoop clusters, managed databases, warehouses,
messaging services, etc.
•  Data Management, Governance, Entitlements and Security
Cloud Data Lake options
Zaloni Confidential and Proprietary
Cloud Data Lake Maturity model
Lift and Shift
Cloud Native
features
Multi and
Hybrid Cloud
Replicate on-
premise Data Lake
in the cloud
Leverage Object
stores, Transient
compute platforms,
Messaging systems
Abstraction over
multiple clouds,
consistent Data
Management and
Governance
Zaloni Confidential and Proprietary
•  Patterns:
§  Implement Data Lake in the cloud using elastic compute and cloud
optimized storage
§  Use Data Lake provided as a cloud service that is managed and optimized
by the cloud provider
§  Data pipelines with processing components decoupled by queuing
services
§  Leaving the heavy lifting to cloud provider services, example, for elastic
clusters, streaming, analytics and machine learning
§  Using cloud storage rather than ephemeral storage with data lifecycle
management
§  Real time processing with event driven architectures for streaming data
Patterns and Anti-patterns
Zaloni Confidential and Proprietary
•  Anti-Patterns:
§  Fork lift migration of on-premise Data Lake to the cloud.
§  Unmanaged, unmonitored, long term usage of resources such as
persistent on-demand compute instances.
§  Dedicating cloud resources for service peaks rather than using auto scaling
cloud services
Patterns and Anti-patterns
Zaloni Confidential and Proprietary
Governance considerations within cloud/hybrid environments
Zaloni Confidential and Proprietary
•  Repeatable Ingestion of vast amounts of data from a wide
variety of sources and formats (streaming, files, custom)
•  Data visibility across hybrid cloud environments with
proper security and access control. Data Masking, and
Encryption of sensitive data
•  Need to capture operational metadata implicitly during
ingestion and processing. Metadata persistent across
cluster instances
•  Reusable Managed Data Pipelines for Processing:
Validation, Standardization, Enrichments
Zaloni Confidential and Proprietary
•  Data Lake on IaaS with bare metal or virtualized infrastructures.
•  PaaS layers - managed data platforms that include various options for event
based data ingestion, data processing and serving layers.
•  Several cloud providers are also starting to offer Analytics as a Service with
Machine Learning offerings built on top of their IaaS and PaaS layers.
•  Geographical coverage due to any local in-country data requirements.
•  Cost, TCO for Cloud Data Lake
Assessing Cloud Data providers
Cloud options in the context of big data and data science
Zaloni Confidential and Proprietary15
IaaS
Platform
Analytics
Machine Learning
OR
OR
Cloud Providers Hadoop Ecosystem
Cortana
Amazon EMR
HDInsight
Cloud Machine Learning
MLlib
Streams
AWS Lambda
OR
Streaming
Analytics Dataflow
Dataproc
Streaming
DATA LAKE MANAGEMENT
AND GOVERNANCE PLATFORM
SELF-SERVICE DATA PREPARATION
FREE T-SHIRT!
Building a Modern Data Architecture
Ben Sharma, CEO and Founder, Zaloni
Wednesday, 2:05 p.m. – 1 E 09
Demo and FREE
copy of book
“Architecting Data Lakes”
Speaking Sessions:
Cloud Computing and Big Data
Ben Sharma, CEO and Founder, Zaloni
Tuesday, 9:30 a.m. – 1B 01/02
Visit Booth #644 for these giveaways!

More Related Content

PDF
Big data and cloud computing 9 sep-2017
PPTX
Introduction to Cloud computing and Big Data-Hadoop
PDF
Overview of big data in cloud computing
PPTX
The rise of “Big Data” on cloud computing
PDF
Simplifying Cloud Architectures with Data Virtualization
PDF
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
PDF
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
PDF
Why Data Virtualization? An Introduction.
Big data and cloud computing 9 sep-2017
Introduction to Cloud computing and Big Data-Hadoop
Overview of big data in cloud computing
The rise of “Big Data” on cloud computing
Simplifying Cloud Architectures with Data Virtualization
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management
Why Data Virtualization? An Introduction.

What's hot (20)

PDF
Hadoop Big Data Lakes Keynote
PDF
A beginners guide to Cloudera Hadoop
PPTX
Relationship between cloud computing and big data
PDF
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
PPTX
Big Data in the Cloud
PPTX
Applying Big Data Superpowers to Healthcare
PDF
Modern Data Management for Federal Modernization
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
PDF
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
PPT
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
PDF
A Journey to the Cloud with Data Virtualization
PDF
Datamesh community meetup 28th jan 2021
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PDF
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
PPTX
Big Data in Action : Operations, Analytics and more
PDF
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
PDF
Cloud computing & big data for service innovation & learning
PDF
Big data storage
PDF
What is big data - Architectures and Practical Use Cases
Hadoop Big Data Lakes Keynote
A beginners guide to Cloudera Hadoop
Relationship between cloud computing and big data
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Big Data in the Cloud
Applying Big Data Superpowers to Healthcare
Modern Data Management for Federal Modernization
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
A Journey to the Cloud with Data Virtualization
Datamesh community meetup 28th jan 2021
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Big Data in Action : Operations, Analytics and more
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
Cloud computing & big data for service innovation & learning
Big data storage
What is big data - Architectures and Practical Use Cases
Ad

Viewers also liked (20)

PDF
Creating a Modern Data Architecture
PDF
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
PDF
Understanding Metadata: Why it's essential to your big data solution and how ...
PDF
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
PPTX
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
PPTX
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
PDF
Strata San Jose 2017 - Ben Sharma Presentation
PDF
Webinar: Is Spark Hadoop's Friend or Foe?
PDF
Data Virtualization Primer - Introduction
PPTX
Crash Course in Cloud Computing
DOCX
Cloud Computing And Virtualization
PDF
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
PPTX
Cloud Computing & Big Data
PPTX
Big Data and Cloud Computing
PDF
Big data on virtualized infrastucture
PDF
Latest Trends in Technology: BigData Analytics, Virtualization, Cloud Computi...
PDF
Practical Akka HTTP - introduction
PPSX
Big Data
PPTX
The Power of your Data Achieved - Next Gen Modernization
PDF
Big Data and Data Virtualization
Creating a Modern Data Architecture
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Understanding Metadata: Why it's essential to your big data solution and how ...
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Strata San Jose 2017 - Ben Sharma Presentation
Webinar: Is Spark Hadoop's Friend or Foe?
Data Virtualization Primer - Introduction
Crash Course in Cloud Computing
Cloud Computing And Virtualization
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Cloud Computing & Big Data
Big Data and Cloud Computing
Big data on virtualized infrastucture
Latest Trends in Technology: BigData Analytics, Virtualization, Cloud Computi...
Practical Akka HTTP - introduction
Big Data
The Power of your Data Achieved - Next Gen Modernization
Big Data and Data Virtualization
Ad

Similar to Cloud Computing and Big Data (20)

PPTX
Cloud Computing Overview
PDF
IBM Cloud Day January 2021 - A well architected data lake
PDF
Houd controle over uw data
PDF
iFCloud Secure File Sharing
PDF
Next Gen Analytics Going Beyond Data Warehouse
PDF
Oracle Storage Cloud Conference
PDF
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
PPTX
High-Performance Analytics in the Cloud with Apache Impala
PPTX
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
PDF
Cloud Migration and Portability Best Practices
PDF
Architecting SaaS
PDF
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
PPTX
IBM Relay 2015: Open for Data
 
PPT
Virgílio Vargas Presentations / CloudViews.Org - Cloud Computing Conference 2...
PDF
IaaS Trends: Migrate Servers to Cloud
PDF
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
PPTX
LightEdge Partner Cloud Overview
PDF
Module 1 - CP Datalake on AWS
PDF
05 internet-of-things-io t-cloudcomputing
PDF
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
Cloud Computing Overview
IBM Cloud Day January 2021 - A well architected data lake
Houd controle over uw data
iFCloud Secure File Sharing
Next Gen Analytics Going Beyond Data Warehouse
Oracle Storage Cloud Conference
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
High-Performance Analytics in the Cloud with Apache Impala
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
Cloud Migration and Portability Best Practices
Architecting SaaS
A1 keynote oracle_infrastructure_as_a_service_move_any_workload_to_the_cloud
IBM Relay 2015: Open for Data
 
Virgílio Vargas Presentations / CloudViews.Org - Cloud Computing Conference 2...
IaaS Trends: Migrate Servers to Cloud
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
LightEdge Partner Cloud Overview
Module 1 - CP Datalake on AWS
05 internet-of-things-io t-cloudcomputing
ADV Slides: Building and Growing Organizational Analytics with Data Lakes

Recently uploaded (20)

PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced IT Governance
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced IT Governance
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Modernizing your data center with Dell and AMD
GamePlan Trading System Review: Professional Trader's Honest Take
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.

Cloud Computing and Big Data

  • 1. Cloud computing and big data September 27, 2016 Ben Sharma | CEO ben@zaloni.com
  • 2. •  Award-winning provider of enterprise data lake management solutions: Integrated data lake management platform Self-service data preparation •  Data Lake Design and Implementation Services: POC, Pilot, Production, Operations, Training •  Data Science Professional Services
  • 3. Zaloni Proprietary Why cloud now? “By 2018, at least half of IT spending will be cloud-based, reaching 60% of all IT infrastructure” From IDC Research: “By 2018, cloud becomes a preferred delivery mechanism for analytics, increasing public information consumption by 150%”
  • 4. Zaloni Confidential and Proprietary Why are companies moving to a cloud-based platform Infrastructure Drivers •  Infrastructure agility •  Cost •  Compute and storage elasticity •  Heterogeneous compute and storage platforms •  Converged architectures for various workloads Data Locality •  Data Gravity •  Compliance and regulatory requirements (international) •  Keep data close to where it is generated New Requirements •  Lot of data is generated externally •  Need to handle all types of data – Structured, unstructured, images, etc. •  Latency and Currency
  • 5. Zaloni Confidential and Proprietary 5.4 BILLION IoT volume driving to cloud adoption Cloud computing required to provide the virtual infrastructure needed to process enormous volume of data from the IoT By 2020 there will be Connected devices1, like smart meters and connected cars — This is the Internet of Things. And it’s going to be big… Exponential growth loT: THE NEXT BIG THING 1.2B 5.4B Source: ABI Research 2011 2014 2020
  • 6. Zaloni Confidential and Proprietary On- Premises 32% Cloud Only 23% Cloud Plus On- Premises 29% Gartner’s Sept 2015 report: Hadoop Expansion Boosts Cloud and Unsupported On-Premises Deployment Hadoop deployment trends
  • 7. Zaloni Confidential and Proprietary Cloud big data use case: Real-time data processing Fleet Data Collection Streaming Analytics Idle Time Calculation Idle Time reporting Data-driven Apps Dispatchers QueueCollectors Ingestion On-board Unit Data Collectors
  • 8. Zaloni Confidential and Proprietary Data Lake in the Cloud Consumption Zone Source System File Data DB Data ETL Extracts Streaming Transient Loading Zone Raw Data Refined Data Trusted Data Discovery Sandbox Original unaltered data attributes Tokenized Data APIs Reference Data Master Data Data Wrangling Data Discovery Exploratory Analytics Metadata Data Quality Data Catalog Security Data Lake Integrate to common format Data Validation Data Cleansing Aggregations OLTP or ODS Enterprise Data Warehouse Logs (or other unstructured data) Data Services Business Analysts Researchers Data Scientists
  • 9. Zaloni Confidential and Proprietary •  Storage – Block, object and file level abstractions, with different degrees of redundancy, availability and consistency guarantees, and cost considerations. •  Compute - A variety of compute server types are possible, optimized for different types of memory and processing requirements depending on the workload. •  Cloud native services – Higher levels of platform abstractions such as cloud provider managed Hadoop clusters, managed databases, warehouses, messaging services, etc. •  Data Management, Governance, Entitlements and Security Cloud Data Lake options
  • 10. Zaloni Confidential and Proprietary Cloud Data Lake Maturity model Lift and Shift Cloud Native features Multi and Hybrid Cloud Replicate on- premise Data Lake in the cloud Leverage Object stores, Transient compute platforms, Messaging systems Abstraction over multiple clouds, consistent Data Management and Governance
  • 11. Zaloni Confidential and Proprietary •  Patterns: §  Implement Data Lake in the cloud using elastic compute and cloud optimized storage §  Use Data Lake provided as a cloud service that is managed and optimized by the cloud provider §  Data pipelines with processing components decoupled by queuing services §  Leaving the heavy lifting to cloud provider services, example, for elastic clusters, streaming, analytics and machine learning §  Using cloud storage rather than ephemeral storage with data lifecycle management §  Real time processing with event driven architectures for streaming data Patterns and Anti-patterns
  • 12. Zaloni Confidential and Proprietary •  Anti-Patterns: §  Fork lift migration of on-premise Data Lake to the cloud. §  Unmanaged, unmonitored, long term usage of resources such as persistent on-demand compute instances. §  Dedicating cloud resources for service peaks rather than using auto scaling cloud services Patterns and Anti-patterns
  • 13. Zaloni Confidential and Proprietary Governance considerations within cloud/hybrid environments Zaloni Confidential and Proprietary •  Repeatable Ingestion of vast amounts of data from a wide variety of sources and formats (streaming, files, custom) •  Data visibility across hybrid cloud environments with proper security and access control. Data Masking, and Encryption of sensitive data •  Need to capture operational metadata implicitly during ingestion and processing. Metadata persistent across cluster instances •  Reusable Managed Data Pipelines for Processing: Validation, Standardization, Enrichments
  • 14. Zaloni Confidential and Proprietary •  Data Lake on IaaS with bare metal or virtualized infrastructures. •  PaaS layers - managed data platforms that include various options for event based data ingestion, data processing and serving layers. •  Several cloud providers are also starting to offer Analytics as a Service with Machine Learning offerings built on top of their IaaS and PaaS layers. •  Geographical coverage due to any local in-country data requirements. •  Cost, TCO for Cloud Data Lake Assessing Cloud Data providers
  • 15. Cloud options in the context of big data and data science Zaloni Confidential and Proprietary15 IaaS Platform Analytics Machine Learning OR OR Cloud Providers Hadoop Ecosystem Cortana Amazon EMR HDInsight Cloud Machine Learning MLlib Streams AWS Lambda OR Streaming Analytics Dataflow Dataproc Streaming
  • 16. DATA LAKE MANAGEMENT AND GOVERNANCE PLATFORM SELF-SERVICE DATA PREPARATION
  • 17. FREE T-SHIRT! Building a Modern Data Architecture Ben Sharma, CEO and Founder, Zaloni Wednesday, 2:05 p.m. – 1 E 09 Demo and FREE copy of book “Architecting Data Lakes” Speaking Sessions: Cloud Computing and Big Data Ben Sharma, CEO and Founder, Zaloni Tuesday, 9:30 a.m. – 1B 01/02 Visit Booth #644 for these giveaways!