SlideShare a Scribd company logo
Copyright Third Nature, Inc.
Everything has 
changed except us
February, 2015
Mark Madsen
www.ThirdNature.net
@markmadsen
Copyright Third Nature, Inc.
The DW group as the crazy 
uncle of the organization
Madness: doing more of what 
you already did and expecting 
different results.
We’ve been struggling with 
shrinking load windows, 
performance problems, and 
most important, inability to 
quickly meet data needs, for a 
decade, yet we keep doing the 
same things to try to fix them.
Copyright Third Nature, Inc.
I never said the
“E” in EDW meant
“everything”…
What do you
mean, “Just
tables?”
Copyright Third Nature, Inc.
It’s going to get a lot worse
Not E
E
Conclusion: any methodology built on the premise that you 
must know and model all the data first is untenable 
© Third Nature Inc.© Third Nature Inc.
The good news is: we solved the bigness problem
Source: Noumenal, Inc.
Copyright Third Nature, Inc.
Now, analytics embiggens the data volume problem
Many of the processing problems are O(n2) or worse, so 
small data can be a problem for DB‐based platforms
© Third Nature Inc.© Third Nature Inc.
What makes data “big”?
Aside from very large amounts:
Hierarchical structures
Nested structures
Linked structures
Encoded values
Non‐standard (for a database) 
types
Deep structure
Human authored text
“big” is better off being defined as “complex” or “hard to manage”
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Datasets today: Interconnection and Dependency
Dynamic models are 
missing from most 
data systems today. 
These drive new 
workloads, generate 
different data, need 
new techniques. 
Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data,
Danny Holten
Copyright Third Nature, Inc.
It’s not the number of genes
that determine complexity, it’s
the interactions between them.
Source: M. Pertea and S. Salzberg/Genome Biology 2010
Copyright Third Nature, Inc.
It’s not the number of genes
that determine complexity, it’s
the interactions between them.
Source: M. Pertea and S. Salzberg/Genome Biology 2010
Copyright Third Nature, Inc.
Categorizing the measurement data we collect
The convenient data is the 
transactional data.
▪ Goes in the DW and is used, even 
if it isn’t the right measurement.
The inconvenient data is 
observational data.
▪ It’s not neat, clean, or designed 
into most systems of operation.
The difficult and misleading data 
is declarative data.
▪ What people say and what they 
do require ground truth.
We need an architecture that 
supports all three categories.
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Observations
Sensor data doesn’t fit well with current methods of collection and
storage, or with the technology to process and analyze it.
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Declarations
Copyright Third Nature, Inc.
Unstructured is Not Really Unstructured
Slide 14
Unstructured data isn’t 
really unstructured: objects 
have structure, language 
has structure. Text can 
contain traditional 
structured data elements. 
The problem is that the 
content is unmodeled.
Our real problem is making 
implicit structure explicit.
Conclusion: the data warehouse must cope with more
complex data structures, storage and processing.
Copyright Third Nature, Inc.
The creation, flow and use of data is different for 
transactions and machine‐generated events
Data entry Extract Cleanse Load UseStore
Transactions
MDM
Generate Store
Use
UseCleanse
Program
Capture
This runs at human speed
This runs at machine speed, with slower feedback cycle
Copyright Third Nature, Inc.
We’re moving BI from information to actuation
This means 
monitoring as 
data flows, 
detecting rather 
than querying, as 
well as feedback 
to the sources.
Copyright Third Nature, Inc.
The architecture we’ve been using.
The general concept of a 
separate architecture for BI 
has been around longer, but 
this paper by Devlin and 
Murphy is the first formal 
data warehouse architecture 
and definition published.
17
“An architecture for a business and
information system”, B. A. Devlin,
P. T. Murphy, IBM Systems Journal,
Vol.27, No. 1, (1988)
Slide 17Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Origins: in 1988 there was only big hair.
▪ No real commercial email, public internet barely started
▪ Storage state of the art: 100MB, cost $10,000/GB
▪ Oracle Applications v1 GL released; SAP goes public, 
enters US market
▪ Unix is mostly run by long‐haired freaks
▪ Mobile was this
This is the context: scarcity of data, of system resources, of automated 
systems outside core financials, of money to pay for storage.
Copyright Third Nature, Inc.
We think of BI as publishing, an old metaphor.
Publishing has value, but may not be actionable.
Copyright Third Nature, Inc.
Data strategy means understanding the context of 
data use so we can build the right infrastructure
Collect
new data
Monitor
Analyze
Exceptions
Analyze
Causes
Decide Act
Act on the process
Act within the process
We need to focus on what people do with information as
the primary task, not on the data or the technology.
Copyright Third Nature, Inc.
The usage models for conventional BI
Collect
new data
Monitor
Analyze
Exceptions
Analyze
Causes
Decide Act
No problem No idea Do nothing
Act on the process
Usually days/longer timeframe
Act within the process
Usually real-time to daily
This is what we’ve been
doing with BI so far: static
reporting, dashboards,
ad-hoc query, OLAP
Copyright Third Nature, Inc.
The usage models for analytics and “big data” 
Collect
new data
Monitor
Analyze
Exceptions
Analyze
Causes
Decide Act
No problem No idea Do nothing
Act on the process
Usually days/longer timeframe
Act within the process
Usually real-time to daily
Analytics and big data is
focused on new use
cases: deeper analysis,
causes, prediction,
optimizing decisions
This isn’t ad-hoc,
reporting, or OLAP.
Copyright Third Nature, Inc.
As practices evolve based on new capabilities…
A new level of 
complexity 
develops over 
top of the 
older, now 
better 
understood 
processes, 
leading to new 
data and 
analysis needs.
Copyright Third Nature, Inc.
Growing complexity has changed our context
Internal 3rd party & custom applications, logs, event 
streams, hosted & external apps, 3rd party datasets… 
Copyright Third Nature, Inc.
Enterprise architecture changes
External = no data layer access
SOA and REST = no data layer access
Streams and messages are becoming the norm
Observations and Transactions
Copyright Third Nature, Inc.
Reality: continuous change in the DW
You can’t keep up with source changes
You can’t keep up with new data requests
You are already scale, performance, latency limited
But:
Many parts of the organization need current operational data
Copyright Third Nature, Inc.
The emerging big data market has an answer…
Copyright Third Nature, Inc.
Centralize: that solves all problems!
Creates bottlenecks
Causes scale problems
Enforces a single model
Copyright Third Nature, Inc.
Data quality and definitions in a single schema are 
based on the strictest requirement, reducing flexibility
Copyright Third Nature, Inc.
The data warehouse vs business agility
All the data
Common, typed, tabular data
The bottleneck is you
Copyright Third Nature, Inc.
We have a design for stability. We need one for adaptability
Copyright Third Nature, Inc.
Which is best, 3NF or dimensional?
The core assumption that
there can be just one big
schema model on one big
platform is flawed.
Answer: neither.
We think we can model all
the data before use, but
that’s a bottleneck. Current
techniques for modeling and
managing data are too rigid
and incapable of describing
all the possible relationships.
Copyright Third Nature, Inc.
A core problem with one big schema is change
Copyright Third Nature, Inc.
Big data answer?
Schema‐on‐read!
There’s a price to pay 
with using “schema‐on‐
read” for everything.
You won’t see the 
problems with this until 
you add a second 
application, and a third.
“One writer‐many 
readers” kills schema‐on 
read benefits.
Copyright Third Nature, Inc.
Why is the choice no schema or hard schema?
Simple key‐value files give you flexibility in some 
areas. Tables give you flexibility in other areas.
Which area do you need flexibility in and why?
Programs writing data?
Files Tables
Programs processing data?
Programs reading data?
Why not flexible schemas instead of either-or?
Copyright Third Nature, Inc.
“We can't solve problems by using the 
same kind of thinking we used when 
we created them.”
Albert Einstein
Page 37
Copyright Third Nature, Inc.
With too much data the approach has to be inverted
The process we still use:
1. Model
2. Collect
3. Analyze
The new process is:
1. Collect
2. Analyze
3. Model
4. Promote
This is a shift from
planned design to
evolutionary design for
the data warehouse
Copyright Third Nature, Inc. Slide 39
The solution to our problems isn’t 
necessarily technology, it’s architecture.
Copyright Third Nature, Inc.
Workloads
OLTP BI Analytics
Access Read‐Write Read‐only Read‐mostly
Predictability Predictable Unpredictable Fixed path
Selectivity High Low Low
Retrieval Low Low High
Latency Milliseconds < seconds msecs to days
Concurrency Huge Moderate 1 to huge
Model 3NF, nested object Dim, denorm BWT
Task size Small Large Small to huge
Copyright Third Nature, Inc.
DATA ARCHITECTURE
We’re so focused on the light switch that we’re not 
talking about the light
Copyright Third Nature, Inc.
Decoupled Data Architecture
The core of the data warehouse isn’t the 
database, it’s the data architecture that the 
database and tools implement.
We need a data architecture that is not limiting:
▪ Deals with data and schema change easily
▪ Does not always require up front modeling
▪ Does not limit the format or structure of data
▪ Assumes a full range of data latencies, from 
streaming to one‐time bulk loads, both in and out, 
Copyright Third Nature, Inc.
Food supply chain: an analogy for data
Multiple contexts of use, differing quality levels
Integrate
Manage
Decouple data architecture layers
Use
This implies a new warehouse architecture and data modeling approaches
Collect
Transactions Observations Declarations
Copyright Third Nature, Inc.
Break down the monolithic architecture
The technology architecture 
must change, based on work 
done with the data:
▪ Collection separate from
▪ Data management separate from
▪ Data delivery and use
Data may live in more than 
one place because it may have 
more than one model, for 
more than one use, using 
more than one engine
Copyright Third Nature, Inc.
Reinforcing relationships keep architectures from 
changing, despite radical technology shifts
Note how only one third is tech
Architectural
Regime
MethodologyTechnology
Organization
Organization 
defines where the 
work is done and 
the roles.
Technology 
defines what 
work can be done 
in a given area. Methodology 
defines how 
work is done 
and what that 
work is.
Slide 49Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Agile architectures without agile methods fail
Copyright Third Nature, Inc.
How can you move to a more agile architecture?
Start by deploying faster.
Things will break.
You will fix them.
You will get better.
So will your architecture.
Copyright Third Nature, Inc.
The geography we have been using is out of date
The box we created:
• not any data, rigidly typed data
• not any form, tabular rows and 
columns of typed data
• not any latency, persist what the 
DB can keep up with
• not any process, only queries
The digital world was diminished 
to only what’s inside the box until 
we forgot the box was there.
Copyright Third Nature, Inc.
Data infrastructure is a platform
▪ Any data – structures, forms
▪ Any latency –in motion, at rest
▪ Any process – query, algorithm, transform
▪ Any access – SQL, API, queue, file movement
Copyright Third Nature, Inc.
Don’t follow the market
Some people can’t resist 
getting the next new thing 
because it’s new and new is 
always better.
Many IT organizations are like 
this, promoting a solution and 
hunting for the problem that 
matches it.
Better to ask “What is the 
problem for which this 
technology is the answer?”
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Think like an architect, 
not like a consumer
No more “enterprise 
standard” ‐ now “what 
works”
The technology providers 
are selling you what they 
have, not what you need.
Follow the goals of the 
business.
Translate the goals into 
capabilities and match 
those to the architecture 
required.
Copyright Third Nature, Inc.
“The future, according to some scientists, will be exactly like 
the past, only far more expensive.” ~ John Sladek
Copyright Third Nature, Inc.
CC Image Attributions
Thanks to the people who supplied the creative commons licensed images used in this presentation:
round hole square peg ‐ https://www.flickr.com/photos/epublicist/3546059144
firemen not noticing fire.jpg ‐ http://flickr.com/photos/oldonliner/1485881035/
pyramid_camel_rider.jpg ‐ http://www.flickr.com/photos/khalid‐almasoud/1528054134/
House on fire ‐ http://flickr.com/photos/oldonliner/1485881035/
glass_buildings.jpg ‐ http://www.flickr.com/photos/erikvanhannen/547701721
Circos, Hierarchical Edge Bundles:Visualization of Adjacency Relations in Hierarchical Data, Danny 
Holten
text composition ‐ http://flickr.com/photos/candiedwomanire/60224567/
Building demolition ‐ https://www.flickr.com/photos/gregpc/4429888820
peek_fence_dog.jpg ‐ http://www.flickr.com/photos/webwalker/114998078/
donuts_4_views.jpg ‐ http://www.flickr.com/photos/le_hibou/76718773/
shady_puppy_sales.jpg ‐ http://www.flickr.com/photos/brizzlebornandbred/5001120150
subway dc metro  ‐ http://flickr.com/photos/musaeum/509899161/
Copyright Third Nature, Inc.
About the Presenter
Mark Madsen is president of Third 
Nature, a technology research and 
consulting firm focused on business 
intelligence, data integration and data 
management. Mark is an award‐winning 
author, architect and CTO whose work 
has been featured in numerous industry 
publications. Over the past ten years 
Mark received awards for his work from 
the American Productivity & Quality 
Center, TDWI, and the Smithsonian 
Institute. He is an international speaker, 
a contributor to Forbes Online and on 
the O’Reilly Strata program committee. 
For more information or to contact 
Mark, follow @markmadsen on Twitter 
or visit  http://ThirdNature.net 
About Third Nature
Third Nature is a research and consulting firm focused on new and
emerging technology and practices in analytics, business intelligence,
information strategy and data management. If your question is related to
data, analytics, information strategy and technology infrastructure then
you‘re at the right place.
Our goal is to help organizations solve problems using data. We offer
education, consulting and research services to support business and IT
organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT
needs. We specialize in product and technology analysis, so we look at
emerging technologies and markets, evaluating technology and hw it is
applied rather than vendor market positions.

More Related Content

PDF
Bi isn't big data and big data isn't BI (updated)
PDF
Everything Has Changed Except Us: Modernizing the Data Warehouse
PDF
Disruptive Innovation: how do you use these theories to manage your IT?
PDF
Briefing room: An alternative for streaming data collection
PDF
Big Data and Bad Analogies
PDF
Data Architecture: OMG It’s Made of People
PDF
Solve User Problems: Data Architecture for Humans
PDF
Assumptions about Data and Analysis: Briefing room webcast slides
Bi isn't big data and big data isn't BI (updated)
Everything Has Changed Except Us: Modernizing the Data Warehouse
Disruptive Innovation: how do you use these theories to manage your IT?
Briefing room: An alternative for streaming data collection
Big Data and Bad Analogies
Data Architecture: OMG It’s Made of People
Solve User Problems: Data Architecture for Humans
Assumptions about Data and Analysis: Briefing room webcast slides

What's hot (20)

PPTX
Innovation med big data – chr. hansens erfaringer
PDF
Architecting a Platform for Enterprise Use - Strata London 2018
PDF
How to understand trends in the data & software market
PDF
Briefing Room analyst comments - streaming analytics
PDF
5 Factors Impacting Your Big Data Project's Performance
PDF
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
PDF
The Black Box: Interpretability, Reproducibility, and Data Management
PDF
Lean approach to IT development
PDF
Building Data Science Teams
 
PDF
Pay no attention to the man behind the curtain - the unseen work behind data ...
PPTX
2015 04 bio it world
PDF
Operationalizing Machine Learning in the Enterprise
PDF
Big Data: Issues and Challenges
PDF
Python's Role in the Future of Data Analysis
PPTX
Data science
PDF
Building a Data Platform Strata SF 2019
PDF
Big Data - Insights & Challenges
DOCX
Big Data: Are you ready for it? Can you handle it?
PDF
Big Data Fundamentals
PDF
Big data issues and challenges
Innovation med big data – chr. hansens erfaringer
Architecting a Platform for Enterprise Use - Strata London 2018
How to understand trends in the data & software market
Briefing Room analyst comments - streaming analytics
5 Factors Impacting Your Big Data Project's Performance
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
The Black Box: Interpretability, Reproducibility, and Data Management
Lean approach to IT development
Building Data Science Teams
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
2015 04 bio it world
Operationalizing Machine Learning in the Enterprise
Big Data: Issues and Challenges
Python's Role in the Future of Data Analysis
Data science
Building a Data Platform Strata SF 2019
Big Data - Insights & Challenges
Big Data: Are you ready for it? Can you handle it?
Big Data Fundamentals
Big data issues and challenges

Viewers also liked (12)

PPTX
Everything has changed narrative analysis
PDF
Poetry elements
PPTX
Analysing live it up
PDF
Crossing the chasm with a high performance dynamically scalable open source p...
PDF
A Pragmatic Approach to Analyzing Customers
PPT
Mid Term Break
PDF
Determine the Right Analytic Database: A Survey of New Data Technologies
PDF
The State of Open Source BI Adoption
PDF
On the edge: analytics for the modern enterprise (analyst comments)
PDF
Building the Enterprise Data Lake: A look at architecture
PPTX
Mid term break by Seamus Heany
PPT
Third Nature - Open Source Data Warehousing
Everything has changed narrative analysis
Poetry elements
Analysing live it up
Crossing the chasm with a high performance dynamically scalable open source p...
A Pragmatic Approach to Analyzing Customers
Mid Term Break
Determine the Right Analytic Database: A Survey of New Data Technologies
The State of Open Source BI Adoption
On the edge: analytics for the modern enterprise (analyst comments)
Building the Enterprise Data Lake: A look at architecture
Mid term break by Seamus Heany
Third Nature - Open Source Data Warehousing

Similar to Everything has changed except us (20)

PDF
Wake up and smell the data
PPTX
What is Big Data , 5'v of BIG DATA and Challenges
PPTX
What is big data and 5'v of big data....
PPTX
Review on the Ted Talk- What do we do with all this big data?
PDF
How to succeed at data without even trying!
PPTX
Second Presentation Big Data2222222.pptx
PPTX
Big Data
PDF
The Role of Data Wrangling in Driving Hadoop Adoption
PDF
Embracing data science
DOCX
What is big data
PPTX
Data Mining and Data Warehouse
PPT
Big Data
PDF
Data Science towards the Digital Enterprise
PDF
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
PPTX
Big data.pptx
PDF
Big Data at a Glance
PDF
What makes an effective data team?
PPTX
Making Big Data a First Class citizen in the enterprise
PPTX
BIG DATA & DATA ANALYTICS
PDF
The book of elephant tattoo
Wake up and smell the data
What is Big Data , 5'v of BIG DATA and Challenges
What is big data and 5'v of big data....
Review on the Ted Talk- What do we do with all this big data?
How to succeed at data without even trying!
Second Presentation Big Data2222222.pptx
Big Data
The Role of Data Wrangling in Driving Hadoop Adoption
Embracing data science
What is big data
Data Mining and Data Warehouse
Big Data
Data Science towards the Digital Enterprise
Challenges Of A Junior Data Scientist_ Best Tips To Help You Along The Way.pdf
Big data.pptx
Big Data at a Glance
What makes an effective data team?
Making Big Data a First Class citizen in the enterprise
BIG DATA & DATA ANALYTICS
The book of elephant tattoo

More from mark madsen (9)

PDF
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
PDF
Don't let data get in the way of a good story
PDF
Don't follow the followers
PDF
Exploring cloud for data warehousing
PDF
Open Data: Free Data Isn't the Same as Freeing Data
PDF
Exploring cloud for data warehousing
PDF
Big Data Wonderland: Two Views on the Big Data Revolution
PDF
Using Data Virtualization to Integrate With Big Data
PDF
One Size Doesn't Fit All: The New Database Revolution
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
Don't let data get in the way of a good story
Don't follow the followers
Exploring cloud for data warehousing
Open Data: Free Data Isn't the Same as Freeing Data
Exploring cloud for data warehousing
Big Data Wonderland: Two Views on the Big Data Revolution
Using Data Virtualization to Integrate With Big Data
One Size Doesn't Fit All: The New Database Revolution

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Mega Projects Data Mega Projects Data
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Foundation of Data Science unit number two notes
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to Knowledge Engineering Part 1
Mega Projects Data Mega Projects Data
Major-Components-ofNKJNNKNKNKNKronment.pptx
Foundation of Data Science unit number two notes
Launch Your Data Science Career in Kochi – 2025
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
climate analysis of Dhaka ,Banglades.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Business Acumen Training GuidePresentation.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction-to-Cloud-ComputingFinal.pptx
Database Infoormation System (DBIS).pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Reliability_Chapter_ presentation 1221.5784
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx

Everything has changed except us