SlideShare a Scribd company logo
Let’s solve user problems
(data architecture for humans)
March, 2021
Mark Madsen - @markmadsen - https://www.linkedin.com/in/markmadsen/
Where I am in my career
(and number of
mistakes I make)
© Third Nature Inc.
This talk will not be about best practices
Best practice in the early
market is usual a euphemism
for “workaround”
What the innovator did may
not be right, it may just be
not wrong
What the analyst firms call
best practice is often better
described as survival bias
© Third Nature Inc.
There’s a difference between having no past and actively ignoring it
Copyright Third Nature, Inc.
A HISTORY OF REINVENTION
Copyright Third Nature, Inc.
"Those who cannot remember
the past are condemned to
repeat it.”
George Santayana
If there’s one lesson we can take from history, It’s that nobody
learns any lessons from history.
Copyright Third Nature, Inc.
Online
Realtime
For decision making
Today we’ll call it streaming
Copyright Third Nature, Inc.
Technology patterns
New “Data Bases” ™
Storage virtualization
Separation of storage and
compute
Copyright Third Nature, Inc.
Technology patterns
New “Data Bases” ™
Storage virtualization
Separation of storage and
compute
What year was it?
Copyright Third Nature, Inc.
Technology patterns
New “Data Bases” ™
Storage virtualization
Separation of storage and
compute
Welcome to 1975
Copyright Third Nature, Inc.
BETTER
is
New
Our core beliefs in software are based
on this. Progress is not a promise.
Copyright Third Nature, Inc.
BETTER ?
is
New
This is fundamentally a belief in leading
with technology to solve problems…
© Third Nature Inc.
Technology Adoption
Some people can’t resist getting the
next new thing because it’s new.
Many IT organizations are like this,
promoting a solution and hunting for
the problem that matches it.
Better to ask “What is the problem
for which this technology is the
answer?”
Copyright Third Nature, Inc.
The solution to a puppy problem is not to add more puppies
© Third Nature Inc.
Marketing and case studies: what people say vs reality
Beware the case study,
unless they talk about it
from first hand experience,
in production, and say what
did not work. Which should
be most of it. Most cases and
vendor testimonials are:
▪ Aspirational
▪ Immature
▪ Apply to 10 companies
world wide
© Third Nature Inc.
Design tip:
Be skeptical about
anything you hear
regarding new data
platform technology
• Optimism
• Ignorance
• Lacking info on
“what it does
poorly”, which you
know very well about
your existing vendors
© Third Nature Inc.
Be skeptical because technology
has a tendency to solve a problem
with a problem.
Solve scalability with brute
force parallelism. Now you
have an availability problem.
Solve that with redundant
copies. Now you have a
consistency problem…
© Third Nature Inc.
THE EVOLUTION OF
ORGANIZATIONAL DATA USE
© Third Nature Inc.
History: This is how BI was done through the 80s
First there were files and reporting programs.
Application files feed through a data processing pipeline to generate an
output file. The file is used by a report formatter for print/screen. Files are
largely single-purpose use.
Every report is a program written by a developer.
Data pipeline
code
© Third Nature Inc.
History: This is how BI ended the 80s
The inevitable situation was...
Data pipeline
code
© Third Nature Inc.
History: This is how we started the 90s
Collect data in a database. Queries replaced a LOT of application code
because much was just joins. We learned about “dead code”
SQL
SQL
SQL
SQL
SQL
© Third Nature Inc.
Pragmatism and Data
Lessons learned during the ad-
hoc SQL era of the DW market:
When the technology is awkward for
the users, the users will stop trying
to use it.
Even “simple” schemas weren’t
enough for anyone other than
analysts and their Brio…
Led to the evolution of
metadata-driven SQL-
generating BI tools, ETL tools.
© Third Nature Inc.
BI evolved to hiding query generation for end users
With more regular schema models, in particular dimensional models that
didn’t contain cyclic join paths, it was possible to automate SQL generation
via semantic mapping layers created by analysts.
We developed data pipeline building tools (aka ETL).
Query via business terms made BI usable by non-technical people.
ETL
SQL
Life got much easier…for a while
© Third Nature Inc.
Today’s model: Lake + data engineers, looks familiar…
The Lake with data pipelines to files or Hive tables is
exactly the same pattern as the COBOL batch
Pipeline
code
We already know that people don’t scale. Don’t do this
© Third Nature Inc.
DESIGN AND COMPLEXITY TODAY
Copyright Third Nature, Inc.
"Always design a thing by considering it in its next larger
context - a chair in a room, a room in a house, a house in
an environment, an environment in a city plan." – Eliel
Saarinen
Copyright Third Nature, Inc.
Order Entry
Order
Database
Customer
Service
Integration
Program
Inventory
Database
Distribution
Integration
Program
Receivables
Database
Accounts
Receivable
Data
Warehouse
Analysts &
users
This is the simplistic view people have of IT, if they see
even this level of detail
© Third Nature Inc.
© Third Nature Inc.
Real complexity is based on communication, which is data flows
Internal 3rd party & custom applications, event streams, logs, external & SaaS
applications, 3rd party datasets… – this is the reality
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Monthly
Production plans
Weekly pre-
orders for
bulk cheese
Availability
confirmation
and location
In store system
Store
Stock
Management
Store EPOS
data
Category
Supervis
or
Stock
adjustments/
order
interventions
Order
adjustment
Stock/order
interventions
*
*
Orders
(based on 6
day
forecast)
Dallas
Distrib Centre
WMS
Picking/load
teams
Pos/Pick
lists/Load
sheets
Confirmed
Deliveries/
Confirmed
picks +
loads
Farmers
Milk intake/
silos Cheese plant
Plant
Processor
In-house Cheese
store
Contract Cheese
store
Processor
Packing plant
Processor
National
Distribution
Centre
Retailer
RDC
Retailer Stores
(550)
Retailer HQ
Consolidated
Demand
Ordering
Processor NDC
Customer
Services
Daily order -
SKU/Depot/
Vol
Sent @ 12.30-13:00
Delivery
orders
Processor HQ
Sales
Team/
Account
Manager
Processor HQ
Forecasting
Team
Processor HQ
Bulk Planning
Team
Cheese plant
Planner/Stock
office
Processor HQ
Milk
Purchasing
Team
Cheese plant
Transport
Manager
Actual
daily
delivery
figures
Daily
collection
planning
Weekly order for delivery to
Packing plant
Daily &
weekly Call-
off
Daily Call-off
15/day
22 pallet loads 15/day
A80
Shortages/
Allocation
instructions
Annual
Buying plan
Milk Availability
Forecast
Annual
prediction
of milk
production
Shortages/
Allocation
instructions
Daily milk
intake
Weekly milk
shortages
shortages
Spot mkt or
Processor
ingredients
Packing plant
Planning
Team
Processor HQ
JBA Invoicing
and
Sales Monitor
FGI and Last 5
weeks sales
Expedite
Changes
to existing
forecast -
exceptions
Retailer HQ
Retailer Buyer
Meeting
every 6
weeks
Packing plant
Cheese
ordering
10 day stock
plan
On line
stock info
7 day order
plan for bulk
cheese
Arrange
daily
delivery
schedule
Emergency
call-off
Daily
optimisation
of loads
Service
Monitor
Despatch and
delivery
confirmations
Processor NDC
Transport
Planning
Transport
Plan
Processor NDC
Inventory
Monitoring
Stock and
delivery
monitoring
Processor NDC
Warehouse
management
syatem
Operation
Instructions
Key
Shaded Boxes = Product flow system
Un-shaded boxes = Information flow system
Retailer
Cheese Processor
Farms
Schedule
weekly &
Daily
10 Day
plan(wed) and
daily plan
15/day
Changes
to existing
forecast -
exceptions
Stock
availability
Monthly
review
Annual
f/cast
Source: IGD Food Chain Centre, February 2008
All companies operate in the context of an industry. The external data
interchanges and market signals are today as important as the internal
data, for both strategic and operational decision making.
Gray = companies in value chain
Red = information flows and systems
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Monthly
Production plans
Weekly pre-
orders for
bulk cheese
Availability
confirmation
and location
In store system
Store
Stock
Management
Store EPOS
data
Category
Supervis
or
Stock
adjustments/
order
interventions
Order
adjustment
Stock/order
interventions
*
*
Orders
(based on 6
day
forecast)
Dallas
Distrib Centre
WMS
Picking/load
teams
Pos/Pick
lists/Load
sheets
Confirmed
Deliveries/
Confirmed
picks +
loads
Farmers
Milk intake/
silos Cheese plant
Plant
Processor
In-house Cheese
store
Contract Cheese
store
Processor
Packing plant
Processor
National
Distribution
Centre
Retailer
RDC
Retailer Stores
(550)
Retailer HQ
Consolidated
Demand
Ordering
Processor NDC
Customer
Services
Daily order -
SKU/Depot/
Vol
Sent @ 12.30-13:00
Delivery
orders
Processor HQ
Sales
Team/
Account
Manager
Processor HQ
Forecasting
Team
Processor HQ
Bulk Planning
Team
Cheese plant
Planner/Stock
office
Processor HQ
Milk
Purchasing
Team
Cheese plant
Transport
Manager
Actual
daily
delivery
figures
Daily
collection
planning
Weekly order for delivery to
Packing plant
Daily &
weekly Call-
off
Daily Call-off
15/day
22 pallet loads 15/day
A80
Shortages/
Allocation
instructions
Annual
Buying plan
Milk Availability
Forecast
Annual
prediction
of milk
production
Shortages/
Allocation
instructions
Daily milk
intake
Weekly milk
shortages
shortages
Spot mkt or
Processor
ingredients
Packing plant
Planning
Team
Processor HQ
JBA Invoicing
and
Sales Monitor
FGI and Last 5
weeks sales
Expedite
Changes
to existing
forecast -
exceptions
Retailer HQ
Retailer Buyer
Meeting
every 6
weeks
Packing plant
Cheese
ordering
10 day stock
plan
On line
stock info
7 day order
plan for bulk
cheese
Arrange
daily
delivery
schedule
Emergency
call-off
Daily
optimisation
of loads
Service
Monitor
Despatch and
delivery
confirmations
Processor NDC
Transport
Planning
Transport
Plan
Processor NDC
Inventory
Monitoring
Stock and
delivery
monitoring
Processor NDC
Warehouse
management
syatem
Operation
Instructions
Key
Shaded Boxes = Product flow system
Un-shaded boxes = Information flow system
Retailer
Cheese Processor
Farms
Schedule
weekly &
Daily
10 Day
plan(wed) and
daily plan
15/day
Changes
to existing
forecast -
exceptions
Stock
availability
Monthly
review
Annual
f/cast
The real data context of the
organization that is assembled
by the data platforms is
subsets of all of these systems.
The complexity of a DW is a
function of the complexity of
the organization and all the
integration points.
There’s more to it than just the
systems and technologies…
Data
Warehouse
Data is transformed, cleaned, integrated, and new data is
derived. This adds a level of temporal and semantic
complexity to data management, and it’s always hidden.
Machine learning won’t “solve” data integration. It will
help in some areas, mostly with augmenting simpler tasks.
Data flows – the dark matter of your architecture diagrams
© Third Nature Inc.
© Third Nature Inc.
Data Complexity, one application
Meanwhile, people complain when a data model looks like this
This is a map of one
organization’s analytic data,
showing the dataset
complexity inherent in a mid-
sized organization.
Different views of
data complexity
Data complexity is not just
based on the number of
datasets, or the number of
tables.
It is based on the number of
connections. This is an order
of magnitude higher than
number of objects.
Organizational complexity
drives communication
complexity drives data
complexity.
Different views of
data complexity
This view is only
showing connections
between objects in
data sets based on
data relationships. All
these connections are
joins you must take
care with in a well
managed platform.
Different views of
data complexity
A reverse gravity view, showing
the mass of reused / replicated
information at the center and
the nodes where large
interchanges occur.
These different views show
how complex an organization’s
data really is, rather than the
abstract list of sources and
terabytes stored.
This is why managing data is a
difficult job
Different views –
data and use
The value of data is tied to its
use. This shows relationships
between people and data used.
This and the prior diagram
show an important point: 70%
of the data is used and reused
constantly. 30% of the data is
used by one or a few people,
often new data with
undetermined value.
This information can be used to
determine where and how you
should spend your limited
resources and money.
© Third Nature Inc.
Connections and uses of data are scale free networks
The connections in the data have an exponential distribution. Each new copy
of data (or derived set, subset, aggregate) adds N-1 possible connections.
You can’t manage all the data. Which data do you spend time on?
Nodes (tables)
Number of
connections
Used often: 70% of
the connections tie to
a small set of data,
the core of reuse.
Centrally manage this
Used seldom: 10% of the
connections go to a large
number of objects (new, low
value, narrow). Locally
curate this
Copyright Third Nature, Inc.
The reality of data availability is that it can only be a subset
The rest of the data
is still here…
There will always be more data
available than ability to analyze
it. Some judgement must be
applied to sort the more from
the less important
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Loosely
managed
data
User
managed
data
In an expanded ecosystem of data, curation processes are
needed to address quality, definition and structure
Closely
managed
data
High quality,
well-known
Directional
quality
Unknown / low
quality
Curation is directly attached to your data architecture...
© Third Nature Inc.
Data curation is an undeveloped practice
The problem with so many sources,
types, formats and latencies of data
is that it is now impossible to create
in advance one model for all of it.
Data modeling is about the inside of
a dataset. Curation is about the
entire dataset.
It isn’t development. It’s about:
creating, labeling, organizing, finding,
navigating, archiving.
Data curation, rather than data
modeling, is becoming the most
important data management
practice.
Copyright Third Nature, Inc.
The real purpose of this work is not to help IT be more
productive. IT exists to help users be more productive
Starting with technology is like getting excited by a new chisel.
© Third Nature Inc.
© Third Nature Inc.
Today’s market “solution”:
Replace the data
warehouse with the data
lake and self-service*
*Picture for illustrative purposes
only, no warranty express or
implied, actual system you
receive may vary.
By a lot.
© Third Nature Inc.
Today’s market solution:
the Data Lake to replace
the Data Warehouse
Data hoarding is not a data
management strategy
© Third Nature Inc.
The solution to one technology problem is another technology
Buy a catalog!
Just add more
technology to solve
your non-technical
problem.
Now that you know
what data is there -
how do you find it?
How do you get it?
© Third Nature Inc.
Practices need to catch up to technologies
A catalog is a
useful, necessary
component.
It is useless
without
organizing
principles and
practices.
AKA data
curation and
data architecture
© Third Nature Inc.
So who maintains the catalog now?
IT is already
viewed as a
bottleneck.
Many
organizations do
not have full-
time data
administrators,
and the DW
team is already
overtaxed.
© Third Nature Inc.
Why not let the user drive?
© Third Nature Inc.
Have you ever looked at user generated taxonomies?
Users also have a job to do and won’t welcome more administrative work
Copyright Third Nature, Inc.
Developers
think of self-
service as
data access
– the user
must be
self-reliant
Copyright Third Nature, Inc.
Users think
of self-
service in
terms of a
finished data
product
© Third Nature Inc.
<Problem>
creates
<Opportunity for new technology solution>
creates
<Different problem>
repeat
Seems
familiar…
© Third Nature Inc.
WHAT CAN WE DO ABOUT THESE
PROBLEMS?
Copyright Third Nature, Inc.
Value is not in the product, it’s in the practice
The poor carpenter blames his tools
You are a designer. You need to think like one.
“Everyone designs who devises
courses of action aimed at changing
existing situations into preferred
ones.” ~ Herbert Simon
Copyright Third Nature, Inc.
We seldom think
systemically. It’s time to
start (again)
© Third Nature Inc.
Often we got here because of bad policy, not technology – people would
rather work around their data teams than work with them on data initiatives
http://akvkbi.blogspot.com/2017/06/dwh-development-related-survey-results.html
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Design tip: any time you deny a behavior or a request, ask yourself “how
will they do this on their own? What do they do instead?”
Bad policy causes more problems than bad technology
Copyright Third Nature, Inc.
Shape the architecture for the people, don’t shape try to shape people
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
Data should be governed by policy, e.g. zoning
http://welcometocup.org/file_columns/0000/0530/cup-whatiszoning-guidebook.pdf
We need to do today what we were doing 30 years ago
We spent time then to understand the users, what they wanted,
the needs, and found ways to justify the work to meet those needs.
We don’t
do enough
of this
We over-
emphasize
this
© Third Nature Inc.
© Third Nature Inc.
Technology is a tool
Tools enable you to build things
You build things for people
So start here
The primary focus should be on goals, specifically of the users
“The engineer, and more generally
the designer, is concerned with how
things ought to be - how they ought
to be in order to attain goals, and to
function.” ~ Herbert Simon
© Third Nature Inc.
© Third Nature Inc.
What organizations say they want
Time to value
Ability to do new things more
easily, aka innovate
• with data
• with technology
Efficiency, aka reduce costs
© Third Nature Inc.
© Third Nature Inc.
What organizations say sounds like “do more with less”
Time to value
Ability to do new things more
easily, aka innovate
• with data
• with technology
Efficiency, aka reduce costs
I end up with more questions:
• TTV for whom?
• TTV for what?
• New thing for whom?
• One time or recurring (cost
or TTV)?
• TTV as latency of
throughput?
• Local cost or global cost?
• More efficient vs less
flexible?
This is manager-speak – we need to talk about users
© Third Nature Inc.
The questions for the
data ecosystem from an
architect’s perspective
What people?
What goals?
What uses?
What time frames?
© Third Nature Inc.
Mapping User Need: don’t work on assumptions
Work-as-Prescribed
Work-as-Imagined
Work-as-Done
Work-as-Disclosed
We are oriented here. But people encounter obstacles to their work. They
create solutions. The solutions become part of the work, how the work is
done. Work as-done diverges from the official definition of the work as-
prescribed or as-disclosed. Most tech startups have no real idea about the
work – they believe they are solving technical problems and the user is IT.
We need to
focus here
https://safetydifferently.com/the-varieties-of-human-work/
Starting with technology is starting in the solution space, not the problem space
https://indiyoung.com/about-problem-space/
Analysis and data science workflows are generally poorly understood
An analyst trying to answer a question has highs and lows along their workflow.
The environment is defined by independent, often mismatched tools, some fit for
purpose and others not, with no single product capable of meeting their needs.
Each usage model has several of these maps tied to different roles
Where is
data? Can I
access new
data?
Why does IT
have to be
involved?
Green = solved
Yellow = gap, poss opportunity
Red = obstacle, opportunity
Why can’t I
store data
I’m working
on?
How do I link
new data to
existing
data?
How do I
share
information
with others?
Copyright Third Nature, Inc.
User goals: more than accessing the data
Explore and
Understand
Inform
and
Explain
Convince
and
Decice
Deliver
Process
Copyright Third Nature, Inc.
The real design criteria:
context and point of use
Information use is diverse and varies
based on context:
▪ Get a quick answer
▪ Solve a one-off problem
▪ Analyze causes
▪ Do experiments
▪ Make repetitive decisions
▪ Use data in routine processes
▪ Make complex decisions
▪ Choose a course of action
▪ Convince others to take action
One size doesn’t fit all.
Copyright Third Nature, Inc.
Data architecture requires understanding data use so we
can build the right infrastructure
Monitor
Analyze
Exceptions
Analyze
Causes
Decide Act
No problem No idea Do nothing
Understanding the details of uses, workflows, tasks, and
activities allows us to look at the higher organizational level again
Copyright Third Nature, Inc.
Copyright Third Nature, Inc.
This is part of a larger system. Feedback loops exist and
operate at different frequencies.
Collect
new data
Monitor
Analyze
Exceptions
Analyze
Causes
Decide Act
Act on the process
Act within the process
Copyright Third Nature, Inc.
Data platforms are the most complex in the organization, far
more complex than any web application or ERP system.
Copyright Third Nature, Inc.
Manage your data
(or it will manage you)
Data management is where
developers are weakest.
Modern engineering practices
are where data management is
weakest.
Users care about their tasks.
You need to bridge these
groups and practices in the
organization if you want to do
meaningful work with data.
© Third Nature Inc.
Mark spent most of the past 25 years working in
the analytics field, starting in AI at the University
of Pittsburgh and autonomous robotics at
Carnegie Mellon University before moving into
technology management. Today he is a Fellow in
the Technology & Innovation Office at Teradata.
Previously, he was president of Third Nature, an
advisory firm focused on services for analytics and
technology strategy, and product design.
Mark is an award-winning author, architect and
CTO who has received awards for his work from
the American Productivity & Quality Center,
Smithsonian Institute, and industry associations.
He is an international speaker, and chairs several
conferences and program committees. You can
find him on LinkedIn at
https://www.linkedin.com/in/markmadsen
About the Presenter
Copyright Third Nature, Inc.
Further Reading
Thinking in Systems, Donella Meadows
An Introduction to Systems Thinking, Gerald Weinberg
Contextual Design, Beyer & Holtzblatt
Badass: making Users Awesome, Kathy Sierra
Information Design, Jacobsen
Data: A Guide to Humans, Phil Harvey
In Search of Certainty, Burgess
https://indiyoung.com/about-problem-space/
http://welcometocup.org/file_columns/0000/0530/cup-whatiszoning-guidebook.pdf
Copyright Third Nature, Inc.
CC Image Attributions
Thanks to the people who supplied the creative commons licensed images used in this presentation:
well town hall.jpg - http://flickr.com/photos/tuinkabouter/1135560976/
seattle library 1 - http://www.flickr.com/photos/thomashawk/2671536366/
chicken_head2.jpg - http://www.flickr.com/photos/coycholla/4901760905
egg_face1.jpg - http://www.flickr.com/photos/sally_monster/3228248457
indonesian angry mask phone - Erik De Castro Reuters.jpg

More Related Content

PDF
Data Architecture: OMG It’s Made of People
PDF
Pay no attention to the man behind the curtain - the unseen work behind data ...
PDF
Architecting a Platform for Enterprise Use - Strata London 2018
PDF
How to understand trends in the data & software market
PDF
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
PDF
Assumptions about Data and Analysis: Briefing room webcast slides
PDF
The Black Box: Interpretability, Reproducibility, and Data Management
PDF
Operationalizing Machine Learning in the Enterprise
Data Architecture: OMG It’s Made of People
Pay no attention to the man behind the curtain - the unseen work behind data ...
Architecting a Platform for Enterprise Use - Strata London 2018
How to understand trends in the data & software market
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Assumptions about Data and Analysis: Briefing room webcast slides
The Black Box: Interpretability, Reproducibility, and Data Management
Operationalizing Machine Learning in the Enterprise

What's hot (20)

PDF
Building a Data Platform Strata SF 2019
PDF
Everything Has Changed Except Us: Modernizing the Data Warehouse
PDF
Disruptive Innovation: how do you use these theories to manage your IT?
PPTX
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
PDF
Bi isn't big data and big data isn't BI (updated)
PDF
Building Data Science Teams
 
PDF
Briefing room: An alternative for streaming data collection
PPTX
Machine Learning in Big Data
PDF
Everything has changed except us
PDF
Analytics 3.0 Measurable business impact from analytics & big data
PDF
Big dataplatform operationalstrategy
PDF
Intro to Data Science for Non-Data Scientists
PDF
How to Build Data Science Teams
PDF
Big Data and Bad Analogies
PDF
Embracing data science
PPTX
Managing Data Science | Lessons from the Field
PDF
Wake up and smell the data
PDF
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
PPTX
Idiots guide to setting up a data science team
PPTX
Building Data Science Teams: A Moneyball Approach
Building a Data Platform Strata SF 2019
Everything Has Changed Except Us: Modernizing the Data Warehouse
Disruptive Innovation: how do you use these theories to manage your IT?
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Bi isn't big data and big data isn't BI (updated)
Building Data Science Teams
 
Briefing room: An alternative for streaming data collection
Machine Learning in Big Data
Everything has changed except us
Analytics 3.0 Measurable business impact from analytics & big data
Big dataplatform operationalstrategy
Intro to Data Science for Non-Data Scientists
How to Build Data Science Teams
Big Data and Bad Analogies
Embracing data science
Managing Data Science | Lessons from the Field
Wake up and smell the data
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Idiots guide to setting up a data science team
Building Data Science Teams: A Moneyball Approach
Ad

Similar to Solve User Problems: Data Architecture for Humans (20)

PPTX
Data Mining and Data Warehouse
PDF
The 3 Key Barriers Keeping Companies from Deploying Data Products
PDF
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
PDF
Overview of mit sloan case study on ge data and analytics initiative titled g...
PPTX
2018 10 igneous
PDF
S ba0881 big-data-use-cases-pearson-edge2015-v7
PDF
Gerenral insurance Accounts IT and Investment
PDF
What makes an effective data team?
PDF
Expert Big Data Tips
PDF
Challenges of Big Data Research
PDF
The Data Lake: Empowering Your Data Science Team
PDF
The Evolving Role of the Data Engineer - Whitepaper | Qubole
PDF
What Managers Need to Know about Data Science
PDF
The Cloud Data Lake Early Release Rukmani Gopalan
PPTX
Big Data Analytics with Microsoft
PDF
EVAIN Artificial intelligence and semantic annotation: are you serious about it?
PPTX
Top Business Intelligence Trends for 2016 by Panorama Software
PDF
Collections Databases; Making the system work for you
PPTX
Just ask Watson Seminar
PDF
Big data rmoug
Data Mining and Data Warehouse
The 3 Key Barriers Keeping Companies from Deploying Data Products
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Overview of mit sloan case study on ge data and analytics initiative titled g...
2018 10 igneous
S ba0881 big-data-use-cases-pearson-edge2015-v7
Gerenral insurance Accounts IT and Investment
What makes an effective data team?
Expert Big Data Tips
Challenges of Big Data Research
The Data Lake: Empowering Your Data Science Team
The Evolving Role of the Data Engineer - Whitepaper | Qubole
What Managers Need to Know about Data Science
The Cloud Data Lake Early Release Rukmani Gopalan
Big Data Analytics with Microsoft
EVAIN Artificial intelligence and semantic annotation: are you serious about it?
Top Business Intelligence Trends for 2016 by Panorama Software
Collections Databases; Making the system work for you
Just ask Watson Seminar
Big data rmoug
Ad

More from mark madsen (14)

PDF
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
PDF
A Pragmatic Approach to Analyzing Customers
PDF
Building the Enterprise Data Lake: A look at architecture
PDF
Briefing Room analyst comments - streaming analytics
PDF
On the edge: analytics for the modern enterprise (analyst comments)
PDF
Crossing the chasm with a high performance dynamically scalable open source p...
PDF
Don't let data get in the way of a good story
PDF
Don't follow the followers
PDF
Exploring cloud for data warehousing
PDF
Open Data: Free Data Isn't the Same as Freeing Data
PDF
Exploring cloud for data warehousing
PDF
Big Data Wonderland: Two Views on the Big Data Revolution
PDF
Using Data Virtualization to Integrate With Big Data
PDF
One Size Doesn't Fit All: The New Database Revolution
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Pragmatic Approach to Analyzing Customers
Building the Enterprise Data Lake: A look at architecture
Briefing Room analyst comments - streaming analytics
On the edge: analytics for the modern enterprise (analyst comments)
Crossing the chasm with a high performance dynamically scalable open source p...
Don't let data get in the way of a good story
Don't follow the followers
Exploring cloud for data warehousing
Open Data: Free Data Isn't the Same as Freeing Data
Exploring cloud for data warehousing
Big Data Wonderland: Two Views on the Big Data Revolution
Using Data Virtualization to Integrate With Big Data
One Size Doesn't Fit All: The New Database Revolution

Recently uploaded (20)

PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
Introduction to Knowledge Engineering Part 1
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Mega Projects Data Mega Projects Data
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Business Acumen Training GuidePresentation.pptx
Reliability_Chapter_ presentation 1221.5784
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Clinical guidelines as a resource for EBP(1).pdf
Acceptance and paychological effects of mandatory extra coach I classes.pptx
1_Introduction to advance data techniques.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx

Solve User Problems: Data Architecture for Humans

  • 1. Let’s solve user problems (data architecture for humans) March, 2021 Mark Madsen - @markmadsen - https://www.linkedin.com/in/markmadsen/
  • 2. Where I am in my career (and number of mistakes I make)
  • 3. © Third Nature Inc. This talk will not be about best practices Best practice in the early market is usual a euphemism for “workaround” What the innovator did may not be right, it may just be not wrong What the analyst firms call best practice is often better described as survival bias
  • 4. © Third Nature Inc. There’s a difference between having no past and actively ignoring it
  • 5. Copyright Third Nature, Inc. A HISTORY OF REINVENTION
  • 6. Copyright Third Nature, Inc. "Those who cannot remember the past are condemned to repeat it.” George Santayana If there’s one lesson we can take from history, It’s that nobody learns any lessons from history.
  • 7. Copyright Third Nature, Inc. Online Realtime For decision making Today we’ll call it streaming
  • 8. Copyright Third Nature, Inc. Technology patterns New “Data Bases” ™ Storage virtualization Separation of storage and compute
  • 9. Copyright Third Nature, Inc. Technology patterns New “Data Bases” ™ Storage virtualization Separation of storage and compute What year was it?
  • 10. Copyright Third Nature, Inc. Technology patterns New “Data Bases” ™ Storage virtualization Separation of storage and compute Welcome to 1975
  • 11. Copyright Third Nature, Inc. BETTER is New Our core beliefs in software are based on this. Progress is not a promise.
  • 12. Copyright Third Nature, Inc. BETTER ? is New This is fundamentally a belief in leading with technology to solve problems…
  • 13. © Third Nature Inc. Technology Adoption Some people can’t resist getting the next new thing because it’s new. Many IT organizations are like this, promoting a solution and hunting for the problem that matches it. Better to ask “What is the problem for which this technology is the answer?”
  • 14. Copyright Third Nature, Inc. The solution to a puppy problem is not to add more puppies
  • 15. © Third Nature Inc. Marketing and case studies: what people say vs reality Beware the case study, unless they talk about it from first hand experience, in production, and say what did not work. Which should be most of it. Most cases and vendor testimonials are: ▪ Aspirational ▪ Immature ▪ Apply to 10 companies world wide
  • 16. © Third Nature Inc. Design tip: Be skeptical about anything you hear regarding new data platform technology • Optimism • Ignorance • Lacking info on “what it does poorly”, which you know very well about your existing vendors
  • 17. © Third Nature Inc. Be skeptical because technology has a tendency to solve a problem with a problem. Solve scalability with brute force parallelism. Now you have an availability problem. Solve that with redundant copies. Now you have a consistency problem…
  • 18. © Third Nature Inc. THE EVOLUTION OF ORGANIZATIONAL DATA USE
  • 19. © Third Nature Inc. History: This is how BI was done through the 80s First there were files and reporting programs. Application files feed through a data processing pipeline to generate an output file. The file is used by a report formatter for print/screen. Files are largely single-purpose use. Every report is a program written by a developer. Data pipeline code
  • 20. © Third Nature Inc. History: This is how BI ended the 80s The inevitable situation was... Data pipeline code
  • 21. © Third Nature Inc. History: This is how we started the 90s Collect data in a database. Queries replaced a LOT of application code because much was just joins. We learned about “dead code” SQL SQL SQL SQL SQL
  • 22. © Third Nature Inc. Pragmatism and Data Lessons learned during the ad- hoc SQL era of the DW market: When the technology is awkward for the users, the users will stop trying to use it. Even “simple” schemas weren’t enough for anyone other than analysts and their Brio… Led to the evolution of metadata-driven SQL- generating BI tools, ETL tools.
  • 23. © Third Nature Inc. BI evolved to hiding query generation for end users With more regular schema models, in particular dimensional models that didn’t contain cyclic join paths, it was possible to automate SQL generation via semantic mapping layers created by analysts. We developed data pipeline building tools (aka ETL). Query via business terms made BI usable by non-technical people. ETL SQL Life got much easier…for a while
  • 24. © Third Nature Inc. Today’s model: Lake + data engineers, looks familiar… The Lake with data pipelines to files or Hive tables is exactly the same pattern as the COBOL batch Pipeline code We already know that people don’t scale. Don’t do this
  • 25. © Third Nature Inc. DESIGN AND COMPLEXITY TODAY
  • 26. Copyright Third Nature, Inc. "Always design a thing by considering it in its next larger context - a chair in a room, a room in a house, a house in an environment, an environment in a city plan." – Eliel Saarinen
  • 27. Copyright Third Nature, Inc. Order Entry Order Database Customer Service Integration Program Inventory Database Distribution Integration Program Receivables Database Accounts Receivable Data Warehouse Analysts & users This is the simplistic view people have of IT, if they see even this level of detail
  • 28. © Third Nature Inc. © Third Nature Inc. Real complexity is based on communication, which is data flows Internal 3rd party & custom applications, event streams, logs, external & SaaS applications, 3rd party datasets… – this is the reality
  • 29. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Monthly Production plans Weekly pre- orders for bulk cheese Availability confirmation and location In store system Store Stock Management Store EPOS data Category Supervis or Stock adjustments/ order interventions Order adjustment Stock/order interventions * * Orders (based on 6 day forecast) Dallas Distrib Centre WMS Picking/load teams Pos/Pick lists/Load sheets Confirmed Deliveries/ Confirmed picks + loads Farmers Milk intake/ silos Cheese plant Plant Processor In-house Cheese store Contract Cheese store Processor Packing plant Processor National Distribution Centre Retailer RDC Retailer Stores (550) Retailer HQ Consolidated Demand Ordering Processor NDC Customer Services Daily order - SKU/Depot/ Vol Sent @ 12.30-13:00 Delivery orders Processor HQ Sales Team/ Account Manager Processor HQ Forecasting Team Processor HQ Bulk Planning Team Cheese plant Planner/Stock office Processor HQ Milk Purchasing Team Cheese plant Transport Manager Actual daily delivery figures Daily collection planning Weekly order for delivery to Packing plant Daily & weekly Call- off Daily Call-off 15/day 22 pallet loads 15/day A80 Shortages/ Allocation instructions Annual Buying plan Milk Availability Forecast Annual prediction of milk production Shortages/ Allocation instructions Daily milk intake Weekly milk shortages shortages Spot mkt or Processor ingredients Packing plant Planning Team Processor HQ JBA Invoicing and Sales Monitor FGI and Last 5 weeks sales Expedite Changes to existing forecast - exceptions Retailer HQ Retailer Buyer Meeting every 6 weeks Packing plant Cheese ordering 10 day stock plan On line stock info 7 day order plan for bulk cheese Arrange daily delivery schedule Emergency call-off Daily optimisation of loads Service Monitor Despatch and delivery confirmations Processor NDC Transport Planning Transport Plan Processor NDC Inventory Monitoring Stock and delivery monitoring Processor NDC Warehouse management syatem Operation Instructions Key Shaded Boxes = Product flow system Un-shaded boxes = Information flow system Retailer Cheese Processor Farms Schedule weekly & Daily 10 Day plan(wed) and daily plan 15/day Changes to existing forecast - exceptions Stock availability Monthly review Annual f/cast Source: IGD Food Chain Centre, February 2008 All companies operate in the context of an industry. The external data interchanges and market signals are today as important as the internal data, for both strategic and operational decision making. Gray = companies in value chain Red = information flows and systems
  • 30. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Monthly Production plans Weekly pre- orders for bulk cheese Availability confirmation and location In store system Store Stock Management Store EPOS data Category Supervis or Stock adjustments/ order interventions Order adjustment Stock/order interventions * * Orders (based on 6 day forecast) Dallas Distrib Centre WMS Picking/load teams Pos/Pick lists/Load sheets Confirmed Deliveries/ Confirmed picks + loads Farmers Milk intake/ silos Cheese plant Plant Processor In-house Cheese store Contract Cheese store Processor Packing plant Processor National Distribution Centre Retailer RDC Retailer Stores (550) Retailer HQ Consolidated Demand Ordering Processor NDC Customer Services Daily order - SKU/Depot/ Vol Sent @ 12.30-13:00 Delivery orders Processor HQ Sales Team/ Account Manager Processor HQ Forecasting Team Processor HQ Bulk Planning Team Cheese plant Planner/Stock office Processor HQ Milk Purchasing Team Cheese plant Transport Manager Actual daily delivery figures Daily collection planning Weekly order for delivery to Packing plant Daily & weekly Call- off Daily Call-off 15/day 22 pallet loads 15/day A80 Shortages/ Allocation instructions Annual Buying plan Milk Availability Forecast Annual prediction of milk production Shortages/ Allocation instructions Daily milk intake Weekly milk shortages shortages Spot mkt or Processor ingredients Packing plant Planning Team Processor HQ JBA Invoicing and Sales Monitor FGI and Last 5 weeks sales Expedite Changes to existing forecast - exceptions Retailer HQ Retailer Buyer Meeting every 6 weeks Packing plant Cheese ordering 10 day stock plan On line stock info 7 day order plan for bulk cheese Arrange daily delivery schedule Emergency call-off Daily optimisation of loads Service Monitor Despatch and delivery confirmations Processor NDC Transport Planning Transport Plan Processor NDC Inventory Monitoring Stock and delivery monitoring Processor NDC Warehouse management syatem Operation Instructions Key Shaded Boxes = Product flow system Un-shaded boxes = Information flow system Retailer Cheese Processor Farms Schedule weekly & Daily 10 Day plan(wed) and daily plan 15/day Changes to existing forecast - exceptions Stock availability Monthly review Annual f/cast The real data context of the organization that is assembled by the data platforms is subsets of all of these systems. The complexity of a DW is a function of the complexity of the organization and all the integration points. There’s more to it than just the systems and technologies… Data Warehouse
  • 31. Data is transformed, cleaned, integrated, and new data is derived. This adds a level of temporal and semantic complexity to data management, and it’s always hidden. Machine learning won’t “solve” data integration. It will help in some areas, mostly with augmenting simpler tasks. Data flows – the dark matter of your architecture diagrams
  • 32. © Third Nature Inc. © Third Nature Inc. Data Complexity, one application Meanwhile, people complain when a data model looks like this
  • 33. This is a map of one organization’s analytic data, showing the dataset complexity inherent in a mid- sized organization. Different views of data complexity
  • 34. Data complexity is not just based on the number of datasets, or the number of tables. It is based on the number of connections. This is an order of magnitude higher than number of objects. Organizational complexity drives communication complexity drives data complexity. Different views of data complexity
  • 35. This view is only showing connections between objects in data sets based on data relationships. All these connections are joins you must take care with in a well managed platform.
  • 36. Different views of data complexity A reverse gravity view, showing the mass of reused / replicated information at the center and the nodes where large interchanges occur. These different views show how complex an organization’s data really is, rather than the abstract list of sources and terabytes stored. This is why managing data is a difficult job
  • 37. Different views – data and use The value of data is tied to its use. This shows relationships between people and data used. This and the prior diagram show an important point: 70% of the data is used and reused constantly. 30% of the data is used by one or a few people, often new data with undetermined value. This information can be used to determine where and how you should spend your limited resources and money.
  • 38. © Third Nature Inc. Connections and uses of data are scale free networks The connections in the data have an exponential distribution. Each new copy of data (or derived set, subset, aggregate) adds N-1 possible connections. You can’t manage all the data. Which data do you spend time on? Nodes (tables) Number of connections Used often: 70% of the connections tie to a small set of data, the core of reuse. Centrally manage this Used seldom: 10% of the connections go to a large number of objects (new, low value, narrow). Locally curate this
  • 39. Copyright Third Nature, Inc. The reality of data availability is that it can only be a subset The rest of the data is still here… There will always be more data available than ability to analyze it. Some judgement must be applied to sort the more from the less important
  • 40. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Loosely managed data User managed data In an expanded ecosystem of data, curation processes are needed to address quality, definition and structure Closely managed data High quality, well-known Directional quality Unknown / low quality Curation is directly attached to your data architecture...
  • 41. © Third Nature Inc. Data curation is an undeveloped practice The problem with so many sources, types, formats and latencies of data is that it is now impossible to create in advance one model for all of it. Data modeling is about the inside of a dataset. Curation is about the entire dataset. It isn’t development. It’s about: creating, labeling, organizing, finding, navigating, archiving. Data curation, rather than data modeling, is becoming the most important data management practice.
  • 42. Copyright Third Nature, Inc. The real purpose of this work is not to help IT be more productive. IT exists to help users be more productive Starting with technology is like getting excited by a new chisel.
  • 43. © Third Nature Inc. © Third Nature Inc. Today’s market “solution”: Replace the data warehouse with the data lake and self-service* *Picture for illustrative purposes only, no warranty express or implied, actual system you receive may vary. By a lot.
  • 44. © Third Nature Inc. Today’s market solution: the Data Lake to replace the Data Warehouse Data hoarding is not a data management strategy
  • 45. © Third Nature Inc. The solution to one technology problem is another technology Buy a catalog! Just add more technology to solve your non-technical problem. Now that you know what data is there - how do you find it? How do you get it?
  • 46. © Third Nature Inc. Practices need to catch up to technologies A catalog is a useful, necessary component. It is useless without organizing principles and practices. AKA data curation and data architecture
  • 47. © Third Nature Inc. So who maintains the catalog now? IT is already viewed as a bottleneck. Many organizations do not have full- time data administrators, and the DW team is already overtaxed.
  • 48. © Third Nature Inc. Why not let the user drive?
  • 49. © Third Nature Inc. Have you ever looked at user generated taxonomies? Users also have a job to do and won’t welcome more administrative work
  • 50. Copyright Third Nature, Inc. Developers think of self- service as data access – the user must be self-reliant
  • 51. Copyright Third Nature, Inc. Users think of self- service in terms of a finished data product
  • 52. © Third Nature Inc. <Problem> creates <Opportunity for new technology solution> creates <Different problem> repeat Seems familiar…
  • 53. © Third Nature Inc. WHAT CAN WE DO ABOUT THESE PROBLEMS?
  • 54. Copyright Third Nature, Inc. Value is not in the product, it’s in the practice The poor carpenter blames his tools
  • 55. You are a designer. You need to think like one. “Everyone designs who devises courses of action aimed at changing existing situations into preferred ones.” ~ Herbert Simon
  • 56. Copyright Third Nature, Inc. We seldom think systemically. It’s time to start (again)
  • 57. © Third Nature Inc. Often we got here because of bad policy, not technology – people would rather work around their data teams than work with them on data initiatives http://akvkbi.blogspot.com/2017/06/dwh-development-related-survey-results.html
  • 58. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Design tip: any time you deny a behavior or a request, ask yourself “how will they do this on their own? What do they do instead?” Bad policy causes more problems than bad technology
  • 59. Copyright Third Nature, Inc. Shape the architecture for the people, don’t shape try to shape people
  • 60. Copyright Third Nature, Inc. Copyright Third Nature, Inc. Data should be governed by policy, e.g. zoning http://welcometocup.org/file_columns/0000/0530/cup-whatiszoning-guidebook.pdf
  • 61. We need to do today what we were doing 30 years ago We spent time then to understand the users, what they wanted, the needs, and found ways to justify the work to meet those needs. We don’t do enough of this We over- emphasize this
  • 62. © Third Nature Inc. © Third Nature Inc. Technology is a tool Tools enable you to build things You build things for people So start here
  • 63. The primary focus should be on goals, specifically of the users “The engineer, and more generally the designer, is concerned with how things ought to be - how they ought to be in order to attain goals, and to function.” ~ Herbert Simon
  • 64. © Third Nature Inc. © Third Nature Inc. What organizations say they want Time to value Ability to do new things more easily, aka innovate • with data • with technology Efficiency, aka reduce costs
  • 65. © Third Nature Inc. © Third Nature Inc. What organizations say sounds like “do more with less” Time to value Ability to do new things more easily, aka innovate • with data • with technology Efficiency, aka reduce costs I end up with more questions: • TTV for whom? • TTV for what? • New thing for whom? • One time or recurring (cost or TTV)? • TTV as latency of throughput? • Local cost or global cost? • More efficient vs less flexible? This is manager-speak – we need to talk about users
  • 66. © Third Nature Inc. The questions for the data ecosystem from an architect’s perspective What people? What goals? What uses? What time frames?
  • 67. © Third Nature Inc. Mapping User Need: don’t work on assumptions Work-as-Prescribed Work-as-Imagined Work-as-Done Work-as-Disclosed We are oriented here. But people encounter obstacles to their work. They create solutions. The solutions become part of the work, how the work is done. Work as-done diverges from the official definition of the work as- prescribed or as-disclosed. Most tech startups have no real idea about the work – they believe they are solving technical problems and the user is IT. We need to focus here https://safetydifferently.com/the-varieties-of-human-work/
  • 68. Starting with technology is starting in the solution space, not the problem space https://indiyoung.com/about-problem-space/
  • 69. Analysis and data science workflows are generally poorly understood An analyst trying to answer a question has highs and lows along their workflow. The environment is defined by independent, often mismatched tools, some fit for purpose and others not, with no single product capable of meeting their needs. Each usage model has several of these maps tied to different roles Where is data? Can I access new data? Why does IT have to be involved? Green = solved Yellow = gap, poss opportunity Red = obstacle, opportunity Why can’t I store data I’m working on? How do I link new data to existing data? How do I share information with others?
  • 70. Copyright Third Nature, Inc. User goals: more than accessing the data Explore and Understand Inform and Explain Convince and Decice Deliver Process
  • 71. Copyright Third Nature, Inc. The real design criteria: context and point of use Information use is diverse and varies based on context: ▪ Get a quick answer ▪ Solve a one-off problem ▪ Analyze causes ▪ Do experiments ▪ Make repetitive decisions ▪ Use data in routine processes ▪ Make complex decisions ▪ Choose a course of action ▪ Convince others to take action One size doesn’t fit all.
  • 72. Copyright Third Nature, Inc. Data architecture requires understanding data use so we can build the right infrastructure Monitor Analyze Exceptions Analyze Causes Decide Act No problem No idea Do nothing Understanding the details of uses, workflows, tasks, and activities allows us to look at the higher organizational level again Copyright Third Nature, Inc.
  • 73. Copyright Third Nature, Inc. This is part of a larger system. Feedback loops exist and operate at different frequencies. Collect new data Monitor Analyze Exceptions Analyze Causes Decide Act Act on the process Act within the process
  • 74. Copyright Third Nature, Inc. Data platforms are the most complex in the organization, far more complex than any web application or ERP system.
  • 75. Copyright Third Nature, Inc. Manage your data (or it will manage you) Data management is where developers are weakest. Modern engineering practices are where data management is weakest. Users care about their tasks. You need to bridge these groups and practices in the organization if you want to do meaningful work with data.
  • 76. © Third Nature Inc. Mark spent most of the past 25 years working in the analytics field, starting in AI at the University of Pittsburgh and autonomous robotics at Carnegie Mellon University before moving into technology management. Today he is a Fellow in the Technology & Innovation Office at Teradata. Previously, he was president of Third Nature, an advisory firm focused on services for analytics and technology strategy, and product design. Mark is an award-winning author, architect and CTO who has received awards for his work from the American Productivity & Quality Center, Smithsonian Institute, and industry associations. He is an international speaker, and chairs several conferences and program committees. You can find him on LinkedIn at https://www.linkedin.com/in/markmadsen About the Presenter
  • 77. Copyright Third Nature, Inc. Further Reading Thinking in Systems, Donella Meadows An Introduction to Systems Thinking, Gerald Weinberg Contextual Design, Beyer & Holtzblatt Badass: making Users Awesome, Kathy Sierra Information Design, Jacobsen Data: A Guide to Humans, Phil Harvey In Search of Certainty, Burgess https://indiyoung.com/about-problem-space/ http://welcometocup.org/file_columns/0000/0530/cup-whatiszoning-guidebook.pdf
  • 78. Copyright Third Nature, Inc. CC Image Attributions Thanks to the people who supplied the creative commons licensed images used in this presentation: well town hall.jpg - http://flickr.com/photos/tuinkabouter/1135560976/ seattle library 1 - http://www.flickr.com/photos/thomashawk/2671536366/ chicken_head2.jpg - http://www.flickr.com/photos/coycholla/4901760905 egg_face1.jpg - http://www.flickr.com/photos/sally_monster/3228248457 indonesian angry mask phone - Erik De Castro Reuters.jpg