SlideShare a Scribd company logo
Databases vs Hadoop vs 
Cloud Storage 
Presented by: William McKnight
President, McKnight Consulting Group
williammcknight
www.mcknightcg.com
(214) 514‐1444
William McKnight
President, McKnight Consulting Group 
• Frequent keynote speaker and trainer internationally 
• Consulted to many Global 1000 companies
• Hundreds of articles, blogs, white papers, field tests, etc.
in publication
• Focused on delivering business value and solving business 
problems utilizing proven, streamlined approaches to 
information management
• Former Database Engineer, Fortune 50 Information 
Technology executive and Ernst&Young Entrepreneur of 
Year Finalist
• Owner/consultant: Data strategy and implementation 
consulting firm
• 25+ years of information management and data 
experience
2
McKnight Consulting Group Offerings
Strategy
Training
Strategy
 Trusted Advisor
 Action Plans
 Roadmaps
 Tool Selections
 Program Management
Training
 Classes
 Workshops
Implementation
 Data/Data Warehousing/Business 
Intelligence/Analytics
 Master Data Management
 Governance/Quality
 Big Data
Implementation
3
2000’s
•
2010’s
Give Me
All Data
Fast & 
Effectively!
Give Me
Good Data
But Do It 
Efficiently!
1990’s
Just Give Me 
Some Data 
and Fast!
All Data!
4
This guy has nothing on us
5
AI Data
• Call center recordings and chat logs
• Streaming sensor data, historical maintenance records and 
search logs
• Customer account data and purchase history 
• Email response metrics 
• Product catalogs and data sheets 
• Public references 
• YouTube video content audio tracks 
• User website behaviors 
• Sentiment analysis, user‐generated content, social graph data, 
and other external data sources 
6
Priorities
Best Category and Top Tool Picked 
Best Category Picked
Top 2 Category Picked
Same Ol’ Platform
80%
70%
60%
50%
Increasing Probability that Platform 
Selection Leads to Success
What is it for?
• Operational Database 
• Operational Real‐Time
• Operational Big Data
• Operational Data Hub
• Master Data Management
• A Data Warehouse
• A Dependent Data Mart 
– Dependent
– Independent  
• A Data Lake
• Analytic Big Data Application
• Archive Storage
• A Staging Area
9
3 Major Decisions
• Decision #1: The Data Store Type
– The largest factor for distinguishing between databases and file‐based scale‐out system utilization is the 
data profile. The latter is best for data that fits the loose label of 'unstructured' (or semi‐structured) 
data, while more traditional data ‐‐ and smaller volumes of all data ‐‐ still belong in a relational 
database.
• Decision #2: Data Store Placement
– You must also decide where to place your data store ‐‐ on‐premises or in the cloud (and which cloud). In 
the past, the only clear choice for most organizations was on‐premises data. However, the costs of scale 
are gnawing away at the notion that this remains the best approach for a data platform. For more on 
why databases are moving to the cloud, please read this article. 
• Decision #3: The Workload Architecture
– Finally, you must keep in mind the distinction between operational or analytical workloads. Short 
transactional requests and more complex (often longer) analytics requests demand different 
architectures. Analytics databases, though quite diverse, are the preferred platforms for the analytics 
workload.
(and Price)
10
Data Warehouses, Data Marts, 
Data Lakes, Big Data
Data Warehousing
• Data Warehouses (still) have a lower 
total cost of ownership than data 
marts
• A data warehouse is a SHARED 
platform
– Build once, use many
– Access at Data Warehouse 
– Access by creating a mart off the DW
• Still A LOT cheaper than building from scratch
“… a subject‐
oriented, integrated, 
non‐volatile, time‐
variant collection of 
data, organized to 
support 
management 
needs.” — Bill Inmon
The Data Warehouse Ecosystem
Hadoop
DW
DM
DM
13
Data Warehouses Have Flavors
● The Customer Experience Transformation Data Warehouse focuses on 
customer attributes and touchpoints to improve the value of 
customers.
● The Asset Maximization with IoT data warehouse deals with the high 
volume of edge data tracking the physical assets of the organization.
● The Operational Extension Data Warehouse supports company 
operations directly with real‐ time analytics.
● The Risk Management Data Warehouse supports the ever‐growing 
compliance and reporting requirements and corporate risk.
● The Finance Modernization Data Warehouse handles the voluminous 
financial reporting and ensures the bottom line is considered in every 
aspect of the business.
● The Product Innovation Data Warehouse delivers all product‐related 
information into the decisions of the product life cycle.
Required for Modern Analytics
• In‐database analytics
• In‐memory capabilities
• Columnar orientation
• Modern programming languages
• New data types
15
Decisions
Stanford Study
“The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM”
17
Columnar Orientation
18
Cloud Analytic Databases
Disruption Vectors
• Robustness of SQL
• Built‐in optimization
• On‐the‐fly elasticity
• Dynamic Environment Adaption
• Separation of compute from storage
• Support for diverse data
20
Cloud Analytic Databases in the Enterprise
• Can be used for test/dev or prod; disaster recovery; bursting
• CAPEX accounting
• The cloud now offers attractive options with better 
economics, such as pay‐as‐you‐go which is easier to justify 
and budget, better logistics (streamlined administration and 
management), and better scale (elasticity and the ability to 
expand a cluster within minutes). 
• While on‐premises‐first development brings a robust 
database to the table, not all functions are always part of the 
cloud solution and not all of the organizations behind them 
have made the transition to cloud. 
• Data gravity in the cloud.
21
Performance
• Managed cloud databases are the winner for 
performance
• Querying cloud storage directly is inefficient and 
bringing subsets of data down for on‐premise 
processing takes time and costs egress fees
• Performance testing on Hadoop engines like Hive, 
Spark, and Impala have shown improvements in 
performance, but they still lag significantly behind 
the performance and power of a solid relational 
cloud database/data warehouse
22
Administration
• Managed cloud databases win this category too. 
• Many of the latest and greatest fully‐managed cloud 
database platforms are streamlining and subsuming 
much of the DBA work these days. Things like indexes, 
constraints, partitioning, and other DBA‐level 
performance tuning are fading away.
• Second is cloud storage, because of its very simple 
architecture.
• Last place in Administration is Hadoop. You will still need 
expertise to help diagnose why Spark executors fail or 
Hive throws an exception or why troublesome queries 
never finish.
23
However… Why Big Data Technologies for Big 
Data
• New Data Types
• Schemaless
• Relaxed ACID
• Faster, Less Expensive Provisioning
• Programmer Freedoms
• Fault‐Tolerant Redundancy
• Scale Out (to Webscale)
• Automatic Sharding
Data Lake
Data Scientist Workbench and Data Warehouse 
Staging 
OLTP
Systems
Data Lake
Data Scientists
ERP
CRM
Supply
Chain
MDM
…
Data
Warehouse
Data Mart
Stream or
Batch
Updates
DI
Real-Time,
Event-Driven
Apps
25
HDFS vs Cloud Storage
• Cloud Storage is more scalable and persistent
• Cloud Storage is backed up and supports 
compression, making the cost of big data less
• HDFS has 2‐3x better query performance
• Cloud Storage has object size and single PUT 
limits that need workarounds
26
Data Lakes with Analytic Access Pricing
• Pair a lake with an analytical engine that charges 
only by what you use
• If you have a ton of data that can sit in cold storage 
and only needs to be accessed or analyzed 
occasionally, store it in Amazon S3/Azure Blob 
Storage/Google Cloud Storage
– Use a database (on‐premise or in the cloud) that can 
create external tables that point at the storage 
– Analysts can query directly against it, or draw down a 
subset for some deeper/intensive analysis
– The GB/month storage fee plus data transfer/egress 
fees will be much cheaper than leaving it in a data 
warehouse
27
Leveraging Cloud Storage for Data Lakes
• More Achievable separate compute and storage architecture
• Compute resources (Map/Reduce, Hive, Spark, etc.) can be taken 
down, scaled up or out, or interchanged without data movement
• Storage can be centralized, but compute can be distributed
• Major players have mechanism to ensure consistency to achieve 
ACID‐like compliance for remote data changes
• Some vendors also have remote data replication to ensure 
redundancy and recovery
• Most of the query execution is processing time, and not data 
transport, so if cloud compute and storage are in the same cloud 
vendor region, performance is hardly impacted
28
Graph Databases
How to Identify a Graph Workload
• Workload is identified by “network, hierarchy, 
tree, ancestry, structure” words
• You are planning to use the relational 
performance tricks
• Your queries will be about pathing
• You are limiting queries by their complexity
• A quick POC with a graph database impresses
• You are looking for “non‐obvious” patterns in 
the data
30
Graph Databases
Bridge 
vertex
Bridge 
vertex
31
Future
GPU Databases
33
• A GPU Database performs at least some 
operations using the GPU
• Uses SQL
• Uses each GPUs local memory store, 
which is used as a data cache that 
operates many times faster than the CPU 
cache or main memory itself
Hybrid Databases
• Combination row‐based for transactions and 
column‐based for analytics
• Can process both orders and machine learning 
models simultaneously with fast performance 
and reduced complexity
34
Databases vs Hadoop vs 
Cloud Storage 
Presented by: William McKnight
President, McKnight Consulting Group
williammcknight
www.mcknightcg.com
(214) 514‐1444

More Related Content

PDF
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
PDF
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
DAS Slides: Data Quality Best Practices
PDF
Implementing the Data Maturity Model (DMM)
PDF
Data-Ed Online: Unlock Business Value through Document & Content Management
PDF
Data-Ed Online: Unlock Business Value through Reference & MDM
PDF
Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan
PDF
DAS Slides: Data Modeling at the Environment Agency of England – Case Study
ADV Slides: Strategies for Fitting a Data Lake into a Modern Data Architecture
DAS Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Data Quality Best Practices
Implementing the Data Maturity Model (DMM)
Data-Ed Online: Unlock Business Value through Document & Content Management
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan
DAS Slides: Data Modeling at the Environment Agency of England – Case Study

What's hot (20)

PDF
DataEd Slides: Data Architecture versus Data Modeling
PDF
Data Architecture vs Data Modeling
PDF
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
PDF
Master Data Management - Aligning Data, Process, and Governance
PDF
DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...
PDF
Data-Ed Webinar: Data-centric Strategy & Roadmap
PDF
Data-Ed Webinar: Best Practices with the DMM
PDF
Data-Ed Online Webinar: Data-centric Strategy & Roadmap
PDF
A Modern Approach to DI & MDM
PDF
Data-Ed Webinar: The Importance of MDM
PDF
Metadata Strategies - Data Squared
PDF
Data-Ed Online Webinar: Business Value from MDM
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Holistic data governance frame work whitepaper
PDF
Data-Ed Online: Data Architecture Requirements
PDF
Data-Ed: Data Architecture Requirements
PDF
Data Leadership - Stop Talking About Data and Start Making an Impact!
PDF
Essential Metadata Strategies
PDF
Data-Ed: A Framework for no sql and Hadoop
PDF
Data-Ed Online Webinar: Data Architecture Requirements
DataEd Slides: Data Architecture versus Data Modeling
Data Architecture vs Data Modeling
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Master Data Management - Aligning Data, Process, and Governance
DataEd Slides: Unlock Business Value Using Reference and Master Data Manageme...
Data-Ed Webinar: Data-centric Strategy & Roadmap
Data-Ed Webinar: Best Practices with the DMM
Data-Ed Online Webinar: Data-centric Strategy & Roadmap
A Modern Approach to DI & MDM
Data-Ed Webinar: The Importance of MDM
Metadata Strategies - Data Squared
Data-Ed Online Webinar: Business Value from MDM
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Holistic data governance frame work whitepaper
Data-Ed Online: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
Data Leadership - Stop Talking About Data and Start Making an Impact!
Essential Metadata Strategies
Data-Ed: A Framework for no sql and Hadoop
Data-Ed Online Webinar: Data Architecture Requirements
Ad

Similar to ADV Slides: Databases vs Hadoop vs Cloud Storage (20)

PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
PDF
When and How Data Lakes Fit into a Modern Data Architecture
PPTX
Choosing technologies for a big data solution in the cloud
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PDF
Are You Prepared For The Future Of Data Technologies?
PDF
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
PDF
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
PPTX
Database-Management-Systems-An-Introduction (1).pptx
PPTX
Big data oracle_introduccion
PDF
Modern data integration expert sessions
PPTX
Modern Data Integration Expert Session Webinar
 
PPTX
IBM Relay 2015: Open for Data
 
PDF
The Shifting Landscape of Data Integration
PPTX
Cloud Databases and Big Data - Mechlin.pptx
PDF
Bigdatappt 140225061440-phpapp01
PPTX
Introduction to Harnessing Big Data
PPTX
Essential Tools For Your Big Data Arsenal
PPTX
Your Big Data Arsenal - Strata 2013
PPTX
When SAP alone is not enough
PPT
Designing Scalable Data Warehouse Using MySQL
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
When and How Data Lakes Fit into a Modern Data Architecture
Choosing technologies for a big data solution in the cloud
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
Are You Prepared For The Future Of Data Technologies?
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
Database-Management-Systems-An-Introduction (1).pptx
Big data oracle_introduccion
Modern data integration expert sessions
Modern Data Integration Expert Session Webinar
 
IBM Relay 2015: Open for Data
 
The Shifting Landscape of Data Integration
Cloud Databases and Big Data - Mechlin.pptx
Bigdatappt 140225061440-phpapp01
Introduction to Harnessing Big Data
Essential Tools For Your Big Data Arsenal
Your Big Data Arsenal - Strata 2013
When SAP alone is not enough
Designing Scalable Data Warehouse Using MySQL
Ad

More from DATAVERSITY (20)

PDF
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
PDF
Data at the Speed of Business with Data Mastering and Governance
PDF
Exploring Levels of Data Literacy
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
Make Data Work for You
PDF
Data Catalogs Are the Answer – What is the Question?
PDF
Data Catalogs Are the Answer – What Is the Question?
PDF
Data Modeling Fundamentals
PDF
Showing ROI for Your Analytic Project
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Is Enterprise Data Literacy Possible?
PDF
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Data Governance Trends - A Look Backwards and Forwards
PDF
Data Governance Trends and Best Practices To Implement Today
PDF
2023 Trends in Enterprise Analytics
PDF
Data Strategy Best Practices
PDF
Who Should Own Data Governance – IT or Business?
PDF
Data Management Best Practices
PDF
MLOps – Applying DevOps to Competitive Advantage
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Data at the Speed of Business with Data Mastering and Governance
Exploring Levels of Data Literacy
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Make Data Work for You
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What Is the Question?
Data Modeling Fundamentals
Showing ROI for Your Analytic Project
How a Semantic Layer Makes Data Mesh Work at Scale
Is Enterprise Data Literacy Possible?
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends and Best Practices To Implement Today
2023 Trends in Enterprise Analytics
Data Strategy Best Practices
Who Should Own Data Governance – IT or Business?
Data Management Best Practices
MLOps – Applying DevOps to Competitive Advantage

Recently uploaded (20)

PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to machine learning and Linear Models
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Computer network topology notes for revision
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Supervised vs unsupervised machine learning algorithms
Introduction to machine learning and Linear Models
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Acumen Training GuidePresentation.pptx
ISS -ESG Data flows What is ESG and HowHow
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Database Infoormation System (DBIS).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Fluorescence-microscope_Botany_detailed content
.pdf is not working space design for the following data for the following dat...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Computer network topology notes for revision
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

ADV Slides: Databases vs Hadoop vs Cloud Storage