SlideShare a Scribd company logo
A presentation by
W H Inmon
DATA LAKEHOUSE –
THE BASIC ELEMENTS
All data in the corporation
Structured
data
Textual
data
Analog/IoT
data
Structured
data
Textual
data
Analog/IoT
data
Each of the different types of data have their
own unique characteristics
Structured
data Usually transaction based
record
key attribute
index
Bank transactions
Point of sale
Telephone call
Payments made
Payments received
…………………….
Structured
data
The same record type is repeated
Each record has different contents
Textual
data
Medical records
Contracts
Internet
Call centers
Warranty claims
Insurance claims
Email
………………..
Text is found everywhere
English
Spanish
Portuguese
French
Mandarin
Korean
German
Formal language
Slang
Acronyms
……………..
Voice
Written
Internet
Video
………………..
Textual
data
Textual
ETL
taxonomies
Text is transformed
Into a structured
format
Analog/IoT
Machine generated
Drones
Electric eye
Temperature gauge
Speed
Mechanical
Telemetry
…………………….
Analog/IoT
telemetry
Date Sept 2, 2021
Time 11:21 am
Location from Denver
Location to Co Spgs
Elevation
786
792
812
854
901
978
1012
1256
1469
1672
2018
2259
2871
……..
Speed
0
35
79
124
197
276
367
416
521
702
835
915
…..
Telemetry data is generated as the
rocket is launched and is measured
throughout the flight
Analog/IoT
The data lake is created by throwing data
all the data into the lake
Textual
data
Structured
data
Analog/IoT
Soon the data lake
turned into a swamp
Analog/IoT
The data swamp was not good for anyone….
Analog/IoT
The data lake needs to be turned
into a lakehouse
Analog/IoT
All this education and 95% of my job
is being a data garbageman
Data scientist
Analog/IoT
Data scientist
Ah, that’s more
like it
infrastructure
Analog/IoT
Machine generated
Time – 0912
Time – 0916
Time – 1002
Time – 1008
Time – 1017
…………….
Basic, raw measurements
High probability
High performance
Low probability
Bulk storage
Analog/IoT data is often
segmented
High probability
High performance
Low probability
Bulk storage
Date of launch
Ultimate speed
Ultimate height
Final landing point
Second by second
measurements
Structured
data
Textual
data
Analog/IoT
data
Relative volumes of data in each sector
Structured
data
Textual
data
Analog/IoT
data
Business value and the volumes of data
Structured
data
Textual
data
Analog/IoT
data
Relational format Raw data format
From a format standpoint, the structured and the textual
environments are very different from the analog/IoT
environment
Format compatibility
Structured
data
Textual
data
Analog/IoT
data
Key compatibility – very unintegrated
Content compatibility
Structured
data
Textual
data
Analog/IoT
data
In order to do analytics, there must be
some common data on which to do a
comparison
Without common data it is very difficult
to do a meaningful comparison
Structured
data
Textual
data
Analog/IoT
data
The problem is that there may be no obvious,
easy way to isolate common identifiers
Structured
data
Textual
data
Analog/IoT
data
Fortunately there are such things as
universal common connectors
Structured
data
Textual
data
Analog/IoT
data
Universal common connectors exist regardless of the
way that data has been collected
Structured
data
Textual
data
Analog/IoT
data
Universal common connector for anything
geography
time
dollar amount
General common connectors
Structured
data
Textual
data
Analog/IoT
data
Universal common connector for humans
gender
age
race
Common connectors for humans
Structured
data
Textual
data
Analog/IoT
data
Universal common connector for physical objects
weight
color
cost
size
shape
Common connectors for objects
SOME EXAMPLES
Universal common connector
Healthcare – outcomes analysis
Did the medicine work?
Did the vaccination work?
Did the operation have the right effect?
Outcome analysis
Structured
data
Textual
data
Analog/IoT
data
Prolia
Estrogen
Vitamin D
Algaecal
Calcitonin
Sales of -
Doctor’s notes
tests
diagnosis
procedure
medication
history
……………
X rays
date
location
patient age
examination results
Structured
data
Textual
data
Analog/IoT
data
What medicines
have been
purchased
What medicines
have been
prescribed and/or
discussed with
doctors
By state
By age
By gender
By state
By age
By gender What outcomes have
been achieved
By state
By age
By gender
What medicines
have been
purchased
By state
By age
By gender
What medicines
have been
prescribed and/or
discussed with
doctors
By state
By age
By gender
What outcomes have
been achieved
By state
By age
By gender
Analyses –
how does treatment in Utah vary from treatment in Oregon?
is Prolia more effective than estrogen?
when patients are treated with Algaecal, what other side effects are noticed?
do women have better results than men?
how much does age affect –
the types of treatment for osteoporosis
the effectiveness of treatment
whether men react differently than women
What medicines
have been
purchased
By state
By age
By gender
What medicines
have been
prescribed and/or
discussed with
doctors
By state
By age
By gender
What outcomes have
been achieved
By state
By age
By gender
When you have both treatment and outcome data together, you can
answer – for the first time – important questions about treatment,
medication, dosage, side, effects, demographics of treatment
You can match outcome with treatment
What medicines
have been
purchased
By state
By age
By gender
What medicines
have been
prescribed and/or
discussed with
doctors
By state
By age
By gender
What outcomes have
been achieved
By state
By age
By gender
The result is healthier people
and longer life and better quality
of life
Manufacturing
Structured
data
Textual
data
Analog/IoT
data
Sales data
unit sold
date of sale
location of sale
customer address
Warranty claims
unit
unit type
defect
severity
in use desc
Manufacturing data
unit id
lot id
date of manufacture
machine used
operator
Textual
data
Structured
data
Analog/IoT
data
Units sold
Date of sale
Location of sale
Unit id
Defect description
Date of warranty
Unit id
Machine used for manufacture
Date of manufacture
Operator
Lot id
Manufacture telemetry
Unit id
Unit id
Unit id
Units sold
Date of sale
Location of sale
Unit id
Defect description
Date of warranty
Unit id
Machine used for manufacture
Date of manufacture
Operator
Lot id
Manufacture telemetry
Analyses –
what manufacturing machines are producing defects
what manufacturing machines are not producing defects
what operators are producing defects
what operators are not producing defects
what telemetry needs to be adjusted
under what conditions are defects created
…………………………………………………………
Units sold
Date of sale
Location of sale
Unit id
Defect description
Date of warranty
Unit id
Machine used for manufacture
Date of manufacture
Operator
Lot id
Manufacture telemetry
With all of this data together and able to be analyzed
you can now tell what defects can be corrected and what
conditions cause defects to occur. The manufacturing
process can be materially improved
Units sold
Date of sale
Location of sale
Unit id
Defect description
Date of warranty
Unit id
Machine used for manufacture
Date of manufacture
Operator
Lot id
Manufacture telemetry
Now manufacturing can be done
efficiently and in a cost effective
manner
With analytics from the data lakehouse, you can improve the
lives and livelihood of many people

More Related Content

PDF
Learn to Use Databricks for Data Science
PPTX
DW Migration Webinar-March 2022.pptx
PDF
Snowflake for Data Engineering
PPTX
Data Lakehouse Symposium | Day 4
PDF
Introduction SQL Analytics on Lakehouse Architecture
PPTX
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
PDF
How to Build a ML Platform Efficiently Using Open-Source
PPTX
Databricks Platform.pptx
Learn to Use Databricks for Data Science
DW Migration Webinar-March 2022.pptx
Snowflake for Data Engineering
Data Lakehouse Symposium | Day 4
Introduction SQL Analytics on Lakehouse Architecture
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
How to Build a ML Platform Efficiently Using Open-Source
Databricks Platform.pptx

What's hot (20)

PPT
Data Lakehouse Symposium | Day 1 | Part 2
PDF
Modernizing to a Cloud Data Architecture
PDF
Moving to Databricks & Delta
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PPTX
Building a modern data warehouse
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PPTX
Databricks Fundamentals
PPTX
Free Training: How to Build a Lakehouse
PDF
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
PDF
Modern Data architecture Design
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PPT
Data Governance
PDF
Intro to Delta Lake
PDF
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
PDF
How a Semantic Layer Makes Data Mesh Work at Scale
PDF
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Data Mesh for Dinner
PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Data Lakehouse Symposium | Day 1 | Part 2
Modernizing to a Cloud Data Architecture
Moving to Databricks & Delta
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building a modern data warehouse
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks Fundamentals
Free Training: How to Build a Lakehouse
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Modern Data architecture Design
Architect’s Open-Source Guide for a Data Mesh Architecture
Data Governance
Intro to Delta Lake
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
How a Semantic Layer Makes Data Mesh Work at Scale
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Mesh for Dinner
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Ad

Similar to Data Lakehouse Symposium | Day 1 | Part 1 (20)

PPTX
A View on AI in Insurance - Chris Madsen - H2O AI World London 2018
PPTX
Critical Relationships for HR Professionals to Mitigate Risks and Navigate Ch...
PDF
Smartphone Forensic Challenges
PPT
Developing a Federal Vision for Identity Management
PDF
Intel HIMSS WoHIT mhealth
PPTX
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
PDF
Improving Life With Connected Medical Devices
PDF
Business Case for Connected Insurance Ecosystem
PDF
So, My FitBit is Clinical Trial Grade Right?
PDF
Preparing Testimony about Cellebrite UFED In a Daubert or Frye Hearing
PPTX
Fast and fire-walled IOT healthcare-Baseer
PDF
Architecting, designing and building medical devices in an outcomes focused B...
PPTX
MOBILE PHONE DATA presentation with all necessary details
PDF
eBook-IoTPractice
PDF
Big_data_analytics_for_life_insurers_published
PDF
Big data analytics for life insurers
PPT
497secondary
PDF
Practical Guide - www.devicematters.com
PPTX
Big data in IoT for healthcare - www.pepgra.com
PPT
Trends in Wireless Working
A View on AI in Insurance - Chris Madsen - H2O AI World London 2018
Critical Relationships for HR Professionals to Mitigate Risks and Navigate Ch...
Smartphone Forensic Challenges
Developing a Federal Vision for Identity Management
Intel HIMSS WoHIT mhealth
Data Driven Disruption - Why Marketing and Advertising in WA lags - ADMA WA 2...
Improving Life With Connected Medical Devices
Business Case for Connected Insurance Ecosystem
So, My FitBit is Clinical Trial Grade Right?
Preparing Testimony about Cellebrite UFED In a Daubert or Frye Hearing
Fast and fire-walled IOT healthcare-Baseer
Architecting, designing and building medical devices in an outcomes focused B...
MOBILE PHONE DATA presentation with all necessary details
eBook-IoTPractice
Big_data_analytics_for_life_insurers_published
Big data analytics for life insurers
497secondary
Practical Guide - www.devicematters.com
Big data in IoT for healthcare - www.pepgra.com
Trends in Wireless Working
Ad

More from Databricks (20)

PPTX
Data Lakehouse Symposium | Day 2
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
PDF
Machine Learning CI/CD for Email Attack Detection
PDF
Jeeves Grows Up: An AI Chatbot for Performance and Quality
PDF
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
PDF
Infrastructure Agnostic Machine Learning Workload Deployment
PDF
Improving Apache Spark for Dynamic Allocation and Spot Instances
PDF
Importance of ML Reproducibility & Applications with MLfLow
Data Lakehouse Symposium | Day 2
Democratizing Data Quality Through a Centralized Platform
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Machine Learning CI/CD for Email Attack Detection
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Infrastructure Agnostic Machine Learning Workload Deployment
Improving Apache Spark for Dynamic Allocation and Spot Instances
Importance of ML Reproducibility & Applications with MLfLow

Recently uploaded (20)

PDF
DOCX
Parkville marketing plan .......MR.docx
PDF
Unit 1 -2 THE 4 As of RURAL MARKETING MIX.pdf
PPTX
Amazon - STRATEGIC.......................pptx
PDF
Future Retail Disruption Trends and Observations
PPTX
Your score increases as you pick a category, fill out a long description and ...
PDF
UNIT 1 -3 Factors Influencing RURAL CONSUMER BEHAVIOUR.pdf
PDF
Digital Marketing Agency in Thrissur with Proven Strategies for Local Growth
PDF
Unlocking Future Growth: Attract Customers with Automation & Fresh Strategies...
PDF
Buy LinkedIn Accounts In This Years 2025
PPTX
UNIT 3 - 5 INDUSTRIAL PRICING.ppt x
PDF
Master Fullstack Development Course in Chennai – Enroll Now!
PDF
AFCAT Syllabus 2026 Guide by Best Defence Academy in Lucknow.pdf
PDF
UNIT 1 -4 Profile of Rural Consumers (1).pdf
PDF
How the Minnesota Vikings Used Community to Drive 170% Growth and Acquire 34K...
PPTX
Final Project parkville.............pptx
PDF
DIGITAL MARKETING STRATEGIST IN KASARAGOD
PDF
Ramjilal Ramsaroop || Trending Branding
PDF
Hidden gems in Microsoft ads with Navah Hopkins
PDF
EVOLUTION OF RURAL MARKETING IN INDIAN CIVILIZATION
Parkville marketing plan .......MR.docx
Unit 1 -2 THE 4 As of RURAL MARKETING MIX.pdf
Amazon - STRATEGIC.......................pptx
Future Retail Disruption Trends and Observations
Your score increases as you pick a category, fill out a long description and ...
UNIT 1 -3 Factors Influencing RURAL CONSUMER BEHAVIOUR.pdf
Digital Marketing Agency in Thrissur with Proven Strategies for Local Growth
Unlocking Future Growth: Attract Customers with Automation & Fresh Strategies...
Buy LinkedIn Accounts In This Years 2025
UNIT 3 - 5 INDUSTRIAL PRICING.ppt x
Master Fullstack Development Course in Chennai – Enroll Now!
AFCAT Syllabus 2026 Guide by Best Defence Academy in Lucknow.pdf
UNIT 1 -4 Profile of Rural Consumers (1).pdf
How the Minnesota Vikings Used Community to Drive 170% Growth and Acquire 34K...
Final Project parkville.............pptx
DIGITAL MARKETING STRATEGIST IN KASARAGOD
Ramjilal Ramsaroop || Trending Branding
Hidden gems in Microsoft ads with Navah Hopkins
EVOLUTION OF RURAL MARKETING IN INDIAN CIVILIZATION

Data Lakehouse Symposium | Day 1 | Part 1