SlideShare a Scribd company logo
INTEGRATEINFORMATION
QUALITYINYOURDATA
WAREHOUSEARCHITECTURE
Data Warehouse Automation Day
F e b 1 3 , 2 0 2 0
Ivan Schotsmans ©2019
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
AboutUs
DV-Community a meeting place for DataWarehouseAutomation
addicts to get information, share resources and solutions,
increase networking and expand DWA expertise.
DataWarehouse Automation Special Interest Group
» Information Hub for Data Vault
» DWA – events
» Training
» Webinars
» Software / Application information
2
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
IvanSchotsmans
» Data Evangelist with +30 years experience
» (Co-) Founder local chaptersTDWI, DAMA, BI-Community,
DV-Community, IAIDQ
» Data Warehouse – Business Intelligence – Data Governance
» NOW: Master Data Officer
3
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
»Business Case
»DataChallenges
»Data Strategy
»DataQuality
»DataArchitecture
Agenda
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Customer Case
5
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Scope: Don’tboiltheocean
6
» Start with critical applications
» Parameters
• Criticality
• Impacts
• Depreciation
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
BusinessRequirements
7
» Data Quality Audit starts from a MASTER application (reference table)
• Starting point ReferenceTable
• Compare against ReferenceTable
Master
APPL21APPL20APPL01 …
Customer 1
AAA
Customer 1
ProductXXX
Customer 1
YYY
Customer 1
ZZZ
Customer 1
NNN
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
DataDrivenBusinessRules
Root Product ProductType Key value Application 1 Application 2 Condition Old Product
number
Product 1 Access Value 1 PTXGI FFTH AND 123812
Product 1 Access Value 1 PTXGI GTFR AND 89103
Product 1 Access Value 1 PTXGI DHFD NA 180153
Product 2 Cable Value 1 PTXGI PFDR OR 115976
Product 2 Cable Value 1 PTXGI WSHN OR 100153
Product 2 Cable Value 1 PTXGI AZFD NA 100152
8
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
DataQualityChecks
9
Prepare Execute Report
Master Reference Table
Support Mapping Table
APPL01
APPL02
APPL03
APPL04
APPL…
XLS
Reporting
Read Mapping Join
Error
Flags
Mapping process
Error Checking
Flag Setting Outcome in one big XLS File
Source for different dashboards
One outcome table per application
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
CleanupStatus
Total Products 79.730
Sales 7.696
Customers 4.642 Customers 72.034
Products Maintenance Fee 1.908 Product Maintenance Fee 0
Active Products 1.649 Active Products 0
Suspended 257 Suspended 0
New 0 New 0
Out of Service 2 Out of Service 0
Unknown 0 Unknown 0
Products without Maintenance 3.054 Products without Maintenance 72.034
Active Products 2.237 Active Products 29.255
Suspended 323 Suspended 6.843
New 9 New 740
Out of Service 485 Out of Service 35.196
Unknown 0 Unknown 0
10
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
RawDataQualityAnalysis
Product
Number
SAP Code Latest
Version Date
F_
Clean_
OK
Begin_
Date
Last_
Usage_Date
Total_
Revenue
Nbr_
custs
Appl_
01
Appl_
02
Appl_
---
Last_
Invoice Date
65 20041128 0 19960104 19981020 0 Zero 0 0 0
66 680039 20041128 0 19963112 20011017 0 Zero 0 1 0
67 680013 20041128 0 2000101 20010131 0 Zero 0 0 0
68 680044 20060315 0 19960101 20050514 0 Zero 0 0 0
69 680034 20060315 0 19971020 20050514 1.250 LT10 4 3 6
70 20060315 0 20050701 20070514 0 Zero 0 0 1 20070531
71 70310 20060315 0 20050514 20060909 0 Zero 0 0 0
72 896401 20060315 1 20050701 20060101 0 Zero 0 2 0 20060201
73 20060315 0 20050514 20070112 0 Zero 0 0 0
11
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Data Challenges
12
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
OurDatastatuswasa“DisparateDataCycle”, …
13
People Create their
own Data
Can’t Find
Don’t Trust
Can’t Access Data
Data Not
Integrated
Or
Documented
People Come
Looking for data
People Uncertain
About the Data
People Come With
Own Data
The Disparate Data Cycle (Michael Brackett)
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
…butweneededtotransformtoaComparateDataCycle.
14
New Data
Created When
Necessary
People Find
Trust and
Access Data
New Data
Integrated
And
Documented
People Come
Looking for data
Existing Data
Resource
Readily Shared
People With
New Data
Check First
The Comparate Data Cycle (Michael Brackett)
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
15
Achallenging data strategy will ensure that the our organization is better placed to
meet its challenges in a fast changing environment.
FOCUS AREAS
One central Data Governance Team
CHALLENGES CHALLENGES
VALUES
One version of the truth
Process Harmonization
Focus
Specialization
Simplification
People
Data = Asset
DG VISION
improve efficiency, increase
punctuality and optimize decision
making by ensuring that the highest
quality data is delivered.
» Missing key elements (taxonomies,
data dictionaries, data quality
metrics)
» Data Duplication,
Overlaps
» Time to Market
• Professionalism • Teamwork
• Reference and Master Data
• Enterprise Data Model
• Clear responsibilities
• Data Scientists
• Data Stewards
• Data Curators
• One function, one tool
• IT Landscape
• Deduplication
• The right person in the right
place at the right time
• Timely and relevant training
• Awareness Raising
• Data quality
Customer Satisfaction
• Respect • Entrepreneurship
» Liberalization
» Legal requirements (GDPR)
» Shadow IT
» Complexity
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Data Strategy
16
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Wedefinedadatastrategycoveringpeople,processes,dataandtechnology.
Embedding a culture of transparency and diversity, identifying
the capabilities we need for the future, and developing better
and clearer career paths for our employees
Simplifying processes and applying customer-centric design
and Lean principles where appropriate. Leveraging automation
to reduce manual processes and End User Computing
Better understanding of our data to enable value-added
analysis and support strategic decision making .
Making strategic investments to simplify the technology
environment and ensure that it enables our desired capabilities
People
Processes
Data
Technology
&Tools
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
18
Weintroducedateamof dataspecialistswithspecificrolesandresponsibilities,…
• DataOwner: working within the business, accountable for content and quality of
an enterprise data asset.
• Data Steward: working within the business, responsible for the quality of an
information asset on a day-to-day basis.
• DataAnalysts: working within the business and relying on IT to provide access
to data from different applications and systems.
• Data Scientists: working within the business and relying on IT to provide access
to data from different applications and systems.
• Data Engineers: working within IT and having a deep understating of the
systems and infrastructure that generate and store the business data.
• DataCurators: working within IT and curating data for different analytical tasks,
to allocate resources for accelerating data analysis, adding semantic meaning to
data catalogs or repositories, to blending and organizing data sets.
Data
Asset
Data Owner
Data
Steward
Information
Worker
Data
Analyst
Data
Scientist
Data
Engineer
Data
Curator
Data
Consumers
Data
Custodians
Data
Owners
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
…toemphasizetheimportanceof businesscommitment.
Data Management
Office
DM IT Team
Data Engineers
Data Curator
DM Business Team
Data Scientist
Data Analyst
Business Domain
Data Owner
Data Steward
Information
Worker
Business Domain
Data Owner
Data Steward
User
Data Curator
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Wecoveredthebusinessdemandforscalabilityandflexibilitywiththeuseof data
vault.
20
Data Vault Characteristics
• Agile
• Set of Best Practices
• Historization
• Logging
• Unique IDs (hash-keys)
• Reconciliation.
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Duetoitsflexibilitydatavaultnotonlyguaranteesanagileapproachbutalsoa
fastertimetomarket.
• Proven enterprise data warehouse framework
• Single version of the facts
• Business rule neutral
• Source system neutral
• Agility (case study granularity change)
• Data ingestion performance: massive parallel processing
• Auditability: full historization
• Adaptability:
• Business rules can change
• Master data management maturity can evolve
• Source system landscape can change
21
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Data Quality
22
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Ourfirstchallengewasimprovinginformationqualityanddataprocesses…
23
What is the best way to save the fish ?
Filter the stream to
clean the water?
or
Find and eliminate
the sources of
pollution?
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
…toreachanacceptablelevel.
24
Strategy Defense Offense
Key Objectives
Ensure data security, privacy,
integrity, quality, regulatory
compliance and governance
Improve competitive position
and profitability
Core Activities
Optimize data extraction,
standardization, storage, and
access
Optimize data analytics,
modelling, visualization,
transformation and
enrichment
Data Management
Orientation
Control Flexibility
Enabling Architecture
SSOT
(Single source of truth)
MVOTs
(Multiple versions of the truth)
Source: “What’s your data strategy?” by Leandro Dallemule andThomas H. Davenport May-June 2017 ©HBR.ORG
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Dataqualityhasthreedimensions: definition,contentandpresentation.
» Data Definition Quality
• The extent to which the data definition accurately describes the data of the real-world entity type
or fact-type the data represent and meet the need of all information users (Larry English 1999);
• Clear, precise and complete definition and business rules;
• Data definition quality is measured using metadata.
» Data Content Quality
• A measure of the quality of the data stored in systems;
• The correctness of data values. Conformance to the defined and approved business rules and the accuracy of data.
• Data content quality is measured using validation and verification checks that are
developed using the business rules and other criteria specified in the data dictionary.
» Data Presentation Quality
• A way of explaining the available data
• Transforming the data material into a useful information product, and accessible when needed.
25
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Dataprofilingisanimportanttoolorganizationscanusetoimprovethedataquality.
» More Complete information
» More Accurate information
» More Consistent information
» More Timely information
» More Useful information
» More Standardized Information
26
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Wemeasurecompleteness,accuracy,consistency…
Data Completeness:
Ø Degree to which values are present in the attributes that require them.
Ø Metric: Percent of data fields having values entered in them
Data Accuracy
Ø A qualitative assessment of freedom from error
Ø Metric: Percent of values that are correct when compared to the actual value
Data Consistency
Ø Measures the degree to which a set of data satisfies a set of constraints regardless of the number of times it is
replicated across files or tables
Ø Metric: Percent of matching values across tables and files
27
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Timeliness,uniquenessandstandardizationtoguideourdatacleaningprocess.
DataTimeliness:
Ø Measures the degree to which data values are up-to-date. Also measures the effectiveness of data provisioning
relative to its need.
Ø Metric: Percent of data available within a specified threshold timeframe
Data Uniqueness
Ø The state of being the only one of its kind.
Ø Metric: Percent of records having a unique key
Data Standardization
Ø Measures the degree to which formats are consistent for data items sharing common characteristics, such as date
fields.
Ø Metric: Percent of fields with like characteristics utilizing a common format
28
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
TheDataQualitydashboardactasaninstrumentforthedatastewardtofulfilhis/herrole.
• Stewards should be considered data subject-matter experts for their respective
business functions and processes.
• Stewards are responsible for guiding the effort, not necessarily executing it themselves.
• Their roles as stewards should be to guide and influence others in implementing the
changes necessary to improve data quality.They should be viewed as the leaders of the
data quality improvement effort, not necessarily the "doers.“
• Stewards should define and monitor quality measures to justify the program but also
must have specific goals for data quality improvement.
• Stewards must be accountable
• Stewardship should be based on manageable subsets of data.
29
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
DataQualityimprovementisonlysuccessfulif youcanoptimizethelinkbetween
people,process,data,technologyandtools.
The data steward (business) and data curator (IT) are
responsible to deliver trusted data to the information users.
We support: data handling in the different projects but also an
overall program to streamline all data activities.
Data Glossary, Data Dictionary are still important but the end
goal must be a data catalog. It informs information users
about available data, metadata and context.
Ideally you have a typical metadata tool to support your data
strategy. You need to find a tool which fits in your overall
architecture and approach.
People
Process
Data
Technology
&Tools
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Data Architecture
31
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
OurDataStrategyfitsinthedesignedArchitectureforDataWarehousing,…
32
Master Data Management
Data Warehouse
Use Cases
Staging Integration Presentation
Staging /
Loading Area
Raw DataVault
Business Data
Vault
Raw Data Mart
Information Mart
Hard
Rules
Hard
Rules
Soft
Rules
Soft
Rules
Soft
Rules
RDBMS
Hadoop / NoSQL
OtherBatch
Batch
Near Real Time
Near Real Time
BI, analytics, Cubes, reports
Services, APIs
Labs. Exploration
Analytics, Data Science
OLTP
Semi-structured
And unstructured data
APIs
Rules Engine
Queue / ESB
Data Sources
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
…anddataqualitycheckswhereexecutedintheintegrationlayer.
33
Master Data ManagementData Sources
Data Warehouse
Use Cases
Staging Integration Presentation
Staging /
Loading Area
Raw DataVault
Business Data
Vault
Raw Data Mart
Information Mart
Hard
Rules
Hard
Rules
Soft
Rules
Soft
Rules
Soft
Rules
RDBMS
Hadoop / NoSQL
OtherBatch
Batch
Near Real Time
Near Real Time
BI, analytics, Cubes, reports
Services, APIs
Labs. Exploration
Analytics, Data Science
OLTP
Semi-structured
And unstructured data
APIs
Rules Engine
Queue / ESB
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Data Lake
Gateway
Staging Area (CBG Ingestion Layer)
Raw Data Vault (CBG Logic Layer)
Business Data Vault (CBG Storage Layer)
External Source
Systems
Information marts (CBG Reporting Layer)
>
>
SAP
SAPBW/4HANA
>>
Data Labs
(Semi-) Unstructured
Data
Internal Source
Systems
>
Data
Catalog
>
>
>
>
>
>
>
>
>
>
>
API Management
>
>
>
>>
>
Gateway Gateway Gateway
>
>
>
>
>
>
>
>
>
>>
>>
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Adatadrivenapproachistheendgoalinourautomateddataqualityprocess,…
35
Data Base
with rules
Rules
Engine
Generic
program
Program
Simple
dashboard
Result
Rulenr Database Field Rule Combine
1200 Customer Custmr NA
1201 Product Prodnr 98105 AND
1201 Product Prodtype Direct AND
Select &Field&
From &Database&
Where Prodnr = “98105”
And Prodtype = “Direct”
Product R1200 R1201 R9999
98105 0 0 0
124195 0 1 0
98105 0 0 0
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
…andisessentialtominimize(oreliminate)scrapandrework.
» Data Cleansing is part of a technical process, and ensures that the data integrated
into the data warehouse undergoes transformations to improve the quality:
• Reduce data overlap and data redundancy
• Complete records
• Correct inaccurate data fields
• Adjust data formatting
• Complete empty data
• Enforce referential integrity
36
DWA-Day
F e b r u a r y 1 3 . B e l g i u m
Finally,dataqualityisembeddedintodatagovernanceandneedsacyclingprocess
Rules
Action
Plans
37
Embed Data Quality
in your daily work
Do it right the
first time
Assess and analyse
Root CausesImprove Data
Quality
Communicate and
gain trust
Involve &Train
Communication Governance
Data
Validation
Thank You
Data Warehouse Automation
F r e e m e m b e r s h i p
D V - C o m m u n i t y . o r g
Ivan Schotsmans
+32 495 55 1907
ischotsm@dv-community.org
https://www.dv-community.org/
https://www.bi-community.org/
FgtT@2020!
DWA – Day
Thursday 13 Feb 2020
Belgium

More Related Content

PDF
Presentation by Michiel De Keyzer (PwC) at the Data Vault Modelling and Data ...
PDF
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
PPTX
191017 scamander non invasive data governance - with link to movie with bob s...
PDF
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
PDF
Tim scottkoenverheyenpresentation
PDF
Modern Data Architecture
PDF
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
PDF
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work
Presentation by Michiel De Keyzer (PwC) at the Data Vault Modelling and Data ...
Presentation by Kasper Kisjes (Rijkswaterstaat) and Christoph Balduck (Data T...
191017 scamander non invasive data governance - with link to movie with bob s...
Presentation by Cédric Charlier (Elia) at the Data Vault Modelling and Data G...
Tim scottkoenverheyenpresentation
Modern Data Architecture
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Denodo DataFest 2016: The Governed Data Lake – Putting Big Data to Work

What's hot (20)

PDF
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
PDF
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
PDF
Agile Data Management with Enterprise Data Fabric (ASEAN)
PDF
Advanced Analytics and Machine Learning with Data Virtualization (India)
PPTX
Making big data work
PDF
Data Virtualization: An Introduction
PDF
Multi-Cloud Data Integration with Data Virtualization (APAC)
PDF
Big Data Fabric Capability Maturity Model
PDF
Why Data Virtualization Matters in Your Portfolio
PPTX
A Big Data Journey
PDF
Building Your Data Hub to Support Digital
PDF
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
PPTX
Rick Mutsaers Informatica
PDF
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
PDF
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
PDF
The Top 5 Factors to Consider When Choosing a Big Data Solution
PPTX
Logical Data Warehouse: The Foundation of Modern Data and Analytics
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
PPTX
Self-Service Analytics
PDF
Abn amro altares Marijne le Comte
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
Agile Data Management with Enterprise Data Fabric (ASEAN)
Advanced Analytics and Machine Learning with Data Virtualization (India)
Making big data work
Data Virtualization: An Introduction
Multi-Cloud Data Integration with Data Virtualization (APAC)
Big Data Fabric Capability Maturity Model
Why Data Virtualization Matters in Your Portfolio
A Big Data Journey
Building Your Data Hub to Support Digital
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Rick Mutsaers Informatica
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
The Top 5 Factors to Consider When Choosing a Big Data Solution
Logical Data Warehouse: The Foundation of Modern Data and Analytics
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Self-Service Analytics
Abn amro altares Marijne le Comte
Ad

Similar to Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling and Data Governance conference on Oct. 17, 2019: Integrate Information Quality in your Data Warehouse Architecture (20)

PDF
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
PDF
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
PDF
How Can Analytics Improve Business?
PDF
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
PPT
Using Big Data and AI for Customer Analytics
PDF
Five Things to Consider About Data Mesh and Data Governance
PDF
Transforming GE Healthcare with Data Platform Strategy
PDF
The Bigger They Are The Harder They Fall
PPTX
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
PPTX
How to build a successful Data Lake
PDF
Building a Data Strategy – Practical Steps for Aligning with Business Goals
PDF
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
PPTX
Clare Somerville Trish O’Kane Data in Databases
PDF
Big data and the data quality imperative
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
PDF
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
PDF
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
PPTX
The Path to Data and Analytics Modernization
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
How Can Analytics Improve Business?
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Using Big Data and AI for Customer Analytics
Five Things to Consider About Data Mesh and Data Governance
Transforming GE Healthcare with Data Platform Strategy
The Bigger They Are The Harder They Fall
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
How to build a successful Data Lake
Building a Data Strategy – Practical Steps for Aligning with Business Goals
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
Clare Somerville Trish O’Kane Data in Databases
Big data and the data quality imperative
Advanced Analytics and Machine Learning with Data Virtualization
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
The Path to Data and Analytics Modernization
Ad

More from Patrick Van Renterghem (20)

PDF
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
PDF
Implementing error-proof, business-critical Machine Learning, presentation by...
PDF
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
PDF
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
PDF
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
PDF
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
PPTX
How obedient digital twins and intelligent beings contribute to ethics and ex...
PDF
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
PDF
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
PDF
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
PDF
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
PDF
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
PDF
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
PDF
Engie's Digital Workplace and "Connecting the company" business case, present...
PDF
Face your communication challenges when implementing a digital workplace, bas...
PDF
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
PDF
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
PDF
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
PDF
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
PDF
Presentation by Bart Gielen (DataSense) at the Data Vault Modelling and Data ...
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Implementing error-proof, business-critical Machine Learning, presentation by...
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
How obedient digital twins and intelligent beings contribute to ethics and ex...
He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processin...
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
Engie's Digital Workplace and "Connecting the company" business case, present...
Face your communication challenges when implementing a digital workplace, bas...
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Bart Gielen (DataSense) at the Data Vault Modelling and Data ...

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Introduction to Business Data Analytics.
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Quality review (1)_presentation of this 21
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
climate analysis of Dhaka ,Banglades.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Clinical guidelines as a resource for EBP(1).pdf
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Business Data Analytics.
Business Ppt On Nestle.pptx huunnnhhgfvu
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Quality review (1)_presentation of this 21
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
.pdf is not working space design for the following data for the following dat...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
STUDY DESIGN details- Lt Col Maksud (21).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling and Data Governance conference on Oct. 17, 2019: Integrate Information Quality in your Data Warehouse Architecture

  • 2. DWA-Day F e b r u a r y 1 3 . B e l g i u m AboutUs DV-Community a meeting place for DataWarehouseAutomation addicts to get information, share resources and solutions, increase networking and expand DWA expertise. DataWarehouse Automation Special Interest Group » Information Hub for Data Vault » DWA – events » Training » Webinars » Software / Application information 2
  • 3. DWA-Day F e b r u a r y 1 3 . B e l g i u m IvanSchotsmans » Data Evangelist with +30 years experience » (Co-) Founder local chaptersTDWI, DAMA, BI-Community, DV-Community, IAIDQ » Data Warehouse – Business Intelligence – Data Governance » NOW: Master Data Officer 3
  • 4. DWA-Day F e b r u a r y 1 3 . B e l g i u m »Business Case »DataChallenges »Data Strategy »DataQuality »DataArchitecture Agenda
  • 5. DWA-Day F e b r u a r y 1 3 . B e l g i u m Customer Case 5
  • 6. DWA-Day F e b r u a r y 1 3 . B e l g i u m Scope: Don’tboiltheocean 6 » Start with critical applications » Parameters • Criticality • Impacts • Depreciation
  • 7. DWA-Day F e b r u a r y 1 3 . B e l g i u m BusinessRequirements 7 » Data Quality Audit starts from a MASTER application (reference table) • Starting point ReferenceTable • Compare against ReferenceTable Master APPL21APPL20APPL01 … Customer 1 AAA Customer 1 ProductXXX Customer 1 YYY Customer 1 ZZZ Customer 1 NNN
  • 8. DWA-Day F e b r u a r y 1 3 . B e l g i u m DataDrivenBusinessRules Root Product ProductType Key value Application 1 Application 2 Condition Old Product number Product 1 Access Value 1 PTXGI FFTH AND 123812 Product 1 Access Value 1 PTXGI GTFR AND 89103 Product 1 Access Value 1 PTXGI DHFD NA 180153 Product 2 Cable Value 1 PTXGI PFDR OR 115976 Product 2 Cable Value 1 PTXGI WSHN OR 100153 Product 2 Cable Value 1 PTXGI AZFD NA 100152 8
  • 9. DWA-Day F e b r u a r y 1 3 . B e l g i u m DataQualityChecks 9 Prepare Execute Report Master Reference Table Support Mapping Table APPL01 APPL02 APPL03 APPL04 APPL… XLS Reporting Read Mapping Join Error Flags Mapping process Error Checking Flag Setting Outcome in one big XLS File Source for different dashboards One outcome table per application
  • 10. DWA-Day F e b r u a r y 1 3 . B e l g i u m CleanupStatus Total Products 79.730 Sales 7.696 Customers 4.642 Customers 72.034 Products Maintenance Fee 1.908 Product Maintenance Fee 0 Active Products 1.649 Active Products 0 Suspended 257 Suspended 0 New 0 New 0 Out of Service 2 Out of Service 0 Unknown 0 Unknown 0 Products without Maintenance 3.054 Products without Maintenance 72.034 Active Products 2.237 Active Products 29.255 Suspended 323 Suspended 6.843 New 9 New 740 Out of Service 485 Out of Service 35.196 Unknown 0 Unknown 0 10
  • 11. DWA-Day F e b r u a r y 1 3 . B e l g i u m RawDataQualityAnalysis Product Number SAP Code Latest Version Date F_ Clean_ OK Begin_ Date Last_ Usage_Date Total_ Revenue Nbr_ custs Appl_ 01 Appl_ 02 Appl_ --- Last_ Invoice Date 65 20041128 0 19960104 19981020 0 Zero 0 0 0 66 680039 20041128 0 19963112 20011017 0 Zero 0 1 0 67 680013 20041128 0 2000101 20010131 0 Zero 0 0 0 68 680044 20060315 0 19960101 20050514 0 Zero 0 0 0 69 680034 20060315 0 19971020 20050514 1.250 LT10 4 3 6 70 20060315 0 20050701 20070514 0 Zero 0 0 1 20070531 71 70310 20060315 0 20050514 20060909 0 Zero 0 0 0 72 896401 20060315 1 20050701 20060101 0 Zero 0 2 0 20060201 73 20060315 0 20050514 20070112 0 Zero 0 0 0 11
  • 12. DWA-Day F e b r u a r y 1 3 . B e l g i u m Data Challenges 12
  • 13. DWA-Day F e b r u a r y 1 3 . B e l g i u m OurDatastatuswasa“DisparateDataCycle”, … 13 People Create their own Data Can’t Find Don’t Trust Can’t Access Data Data Not Integrated Or Documented People Come Looking for data People Uncertain About the Data People Come With Own Data The Disparate Data Cycle (Michael Brackett)
  • 14. DWA-Day F e b r u a r y 1 3 . B e l g i u m …butweneededtotransformtoaComparateDataCycle. 14 New Data Created When Necessary People Find Trust and Access Data New Data Integrated And Documented People Come Looking for data Existing Data Resource Readily Shared People With New Data Check First The Comparate Data Cycle (Michael Brackett)
  • 15. DWA-Day F e b r u a r y 1 3 . B e l g i u m 15 Achallenging data strategy will ensure that the our organization is better placed to meet its challenges in a fast changing environment. FOCUS AREAS One central Data Governance Team CHALLENGES CHALLENGES VALUES One version of the truth Process Harmonization Focus Specialization Simplification People Data = Asset DG VISION improve efficiency, increase punctuality and optimize decision making by ensuring that the highest quality data is delivered. » Missing key elements (taxonomies, data dictionaries, data quality metrics) » Data Duplication, Overlaps » Time to Market • Professionalism • Teamwork • Reference and Master Data • Enterprise Data Model • Clear responsibilities • Data Scientists • Data Stewards • Data Curators • One function, one tool • IT Landscape • Deduplication • The right person in the right place at the right time • Timely and relevant training • Awareness Raising • Data quality Customer Satisfaction • Respect • Entrepreneurship » Liberalization » Legal requirements (GDPR) » Shadow IT » Complexity
  • 16. DWA-Day F e b r u a r y 1 3 . B e l g i u m Data Strategy 16
  • 17. DWA-Day F e b r u a r y 1 3 . B e l g i u m Wedefinedadatastrategycoveringpeople,processes,dataandtechnology. Embedding a culture of transparency and diversity, identifying the capabilities we need for the future, and developing better and clearer career paths for our employees Simplifying processes and applying customer-centric design and Lean principles where appropriate. Leveraging automation to reduce manual processes and End User Computing Better understanding of our data to enable value-added analysis and support strategic decision making . Making strategic investments to simplify the technology environment and ensure that it enables our desired capabilities People Processes Data Technology &Tools
  • 18. DWA-Day F e b r u a r y 1 3 . B e l g i u m 18 Weintroducedateamof dataspecialistswithspecificrolesandresponsibilities,… • DataOwner: working within the business, accountable for content and quality of an enterprise data asset. • Data Steward: working within the business, responsible for the quality of an information asset on a day-to-day basis. • DataAnalysts: working within the business and relying on IT to provide access to data from different applications and systems. • Data Scientists: working within the business and relying on IT to provide access to data from different applications and systems. • Data Engineers: working within IT and having a deep understating of the systems and infrastructure that generate and store the business data. • DataCurators: working within IT and curating data for different analytical tasks, to allocate resources for accelerating data analysis, adding semantic meaning to data catalogs or repositories, to blending and organizing data sets. Data Asset Data Owner Data Steward Information Worker Data Analyst Data Scientist Data Engineer Data Curator Data Consumers Data Custodians Data Owners
  • 19. DWA-Day F e b r u a r y 1 3 . B e l g i u m …toemphasizetheimportanceof businesscommitment. Data Management Office DM IT Team Data Engineers Data Curator DM Business Team Data Scientist Data Analyst Business Domain Data Owner Data Steward Information Worker Business Domain Data Owner Data Steward User Data Curator
  • 20. DWA-Day F e b r u a r y 1 3 . B e l g i u m Wecoveredthebusinessdemandforscalabilityandflexibilitywiththeuseof data vault. 20 Data Vault Characteristics • Agile • Set of Best Practices • Historization • Logging • Unique IDs (hash-keys) • Reconciliation.
  • 21. DWA-Day F e b r u a r y 1 3 . B e l g i u m Duetoitsflexibilitydatavaultnotonlyguaranteesanagileapproachbutalsoa fastertimetomarket. • Proven enterprise data warehouse framework • Single version of the facts • Business rule neutral • Source system neutral • Agility (case study granularity change) • Data ingestion performance: massive parallel processing • Auditability: full historization • Adaptability: • Business rules can change • Master data management maturity can evolve • Source system landscape can change 21
  • 22. DWA-Day F e b r u a r y 1 3 . B e l g i u m Data Quality 22
  • 23. DWA-Day F e b r u a r y 1 3 . B e l g i u m Ourfirstchallengewasimprovinginformationqualityanddataprocesses… 23 What is the best way to save the fish ? Filter the stream to clean the water? or Find and eliminate the sources of pollution?
  • 24. DWA-Day F e b r u a r y 1 3 . B e l g i u m …toreachanacceptablelevel. 24 Strategy Defense Offense Key Objectives Ensure data security, privacy, integrity, quality, regulatory compliance and governance Improve competitive position and profitability Core Activities Optimize data extraction, standardization, storage, and access Optimize data analytics, modelling, visualization, transformation and enrichment Data Management Orientation Control Flexibility Enabling Architecture SSOT (Single source of truth) MVOTs (Multiple versions of the truth) Source: “What’s your data strategy?” by Leandro Dallemule andThomas H. Davenport May-June 2017 ©HBR.ORG
  • 25. DWA-Day F e b r u a r y 1 3 . B e l g i u m Dataqualityhasthreedimensions: definition,contentandpresentation. » Data Definition Quality • The extent to which the data definition accurately describes the data of the real-world entity type or fact-type the data represent and meet the need of all information users (Larry English 1999); • Clear, precise and complete definition and business rules; • Data definition quality is measured using metadata. » Data Content Quality • A measure of the quality of the data stored in systems; • The correctness of data values. Conformance to the defined and approved business rules and the accuracy of data. • Data content quality is measured using validation and verification checks that are developed using the business rules and other criteria specified in the data dictionary. » Data Presentation Quality • A way of explaining the available data • Transforming the data material into a useful information product, and accessible when needed. 25
  • 26. DWA-Day F e b r u a r y 1 3 . B e l g i u m Dataprofilingisanimportanttoolorganizationscanusetoimprovethedataquality. » More Complete information » More Accurate information » More Consistent information » More Timely information » More Useful information » More Standardized Information 26
  • 27. DWA-Day F e b r u a r y 1 3 . B e l g i u m Wemeasurecompleteness,accuracy,consistency… Data Completeness: Ø Degree to which values are present in the attributes that require them. Ø Metric: Percent of data fields having values entered in them Data Accuracy Ø A qualitative assessment of freedom from error Ø Metric: Percent of values that are correct when compared to the actual value Data Consistency Ø Measures the degree to which a set of data satisfies a set of constraints regardless of the number of times it is replicated across files or tables Ø Metric: Percent of matching values across tables and files 27
  • 28. DWA-Day F e b r u a r y 1 3 . B e l g i u m Timeliness,uniquenessandstandardizationtoguideourdatacleaningprocess. DataTimeliness: Ø Measures the degree to which data values are up-to-date. Also measures the effectiveness of data provisioning relative to its need. Ø Metric: Percent of data available within a specified threshold timeframe Data Uniqueness Ø The state of being the only one of its kind. Ø Metric: Percent of records having a unique key Data Standardization Ø Measures the degree to which formats are consistent for data items sharing common characteristics, such as date fields. Ø Metric: Percent of fields with like characteristics utilizing a common format 28
  • 29. DWA-Day F e b r u a r y 1 3 . B e l g i u m TheDataQualitydashboardactasaninstrumentforthedatastewardtofulfilhis/herrole. • Stewards should be considered data subject-matter experts for their respective business functions and processes. • Stewards are responsible for guiding the effort, not necessarily executing it themselves. • Their roles as stewards should be to guide and influence others in implementing the changes necessary to improve data quality.They should be viewed as the leaders of the data quality improvement effort, not necessarily the "doers.“ • Stewards should define and monitor quality measures to justify the program but also must have specific goals for data quality improvement. • Stewards must be accountable • Stewardship should be based on manageable subsets of data. 29
  • 30. DWA-Day F e b r u a r y 1 3 . B e l g i u m DataQualityimprovementisonlysuccessfulif youcanoptimizethelinkbetween people,process,data,technologyandtools. The data steward (business) and data curator (IT) are responsible to deliver trusted data to the information users. We support: data handling in the different projects but also an overall program to streamline all data activities. Data Glossary, Data Dictionary are still important but the end goal must be a data catalog. It informs information users about available data, metadata and context. Ideally you have a typical metadata tool to support your data strategy. You need to find a tool which fits in your overall architecture and approach. People Process Data Technology &Tools
  • 31. DWA-Day F e b r u a r y 1 3 . B e l g i u m Data Architecture 31
  • 32. DWA-Day F e b r u a r y 1 3 . B e l g i u m OurDataStrategyfitsinthedesignedArchitectureforDataWarehousing,… 32 Master Data Management Data Warehouse Use Cases Staging Integration Presentation Staging / Loading Area Raw DataVault Business Data Vault Raw Data Mart Information Mart Hard Rules Hard Rules Soft Rules Soft Rules Soft Rules RDBMS Hadoop / NoSQL OtherBatch Batch Near Real Time Near Real Time BI, analytics, Cubes, reports Services, APIs Labs. Exploration Analytics, Data Science OLTP Semi-structured And unstructured data APIs Rules Engine Queue / ESB Data Sources
  • 33. DWA-Day F e b r u a r y 1 3 . B e l g i u m …anddataqualitycheckswhereexecutedintheintegrationlayer. 33 Master Data ManagementData Sources Data Warehouse Use Cases Staging Integration Presentation Staging / Loading Area Raw DataVault Business Data Vault Raw Data Mart Information Mart Hard Rules Hard Rules Soft Rules Soft Rules Soft Rules RDBMS Hadoop / NoSQL OtherBatch Batch Near Real Time Near Real Time BI, analytics, Cubes, reports Services, APIs Labs. Exploration Analytics, Data Science OLTP Semi-structured And unstructured data APIs Rules Engine Queue / ESB
  • 34. DWA-Day F e b r u a r y 1 3 . B e l g i u m Data Lake Gateway Staging Area (CBG Ingestion Layer) Raw Data Vault (CBG Logic Layer) Business Data Vault (CBG Storage Layer) External Source Systems Information marts (CBG Reporting Layer) > > SAP SAPBW/4HANA >> Data Labs (Semi-) Unstructured Data Internal Source Systems > Data Catalog > > > > > > > > > > > API Management > > > >> > Gateway Gateway Gateway > > > > > > > > > >> >>
  • 35. DWA-Day F e b r u a r y 1 3 . B e l g i u m Adatadrivenapproachistheendgoalinourautomateddataqualityprocess,… 35 Data Base with rules Rules Engine Generic program Program Simple dashboard Result Rulenr Database Field Rule Combine 1200 Customer Custmr NA 1201 Product Prodnr 98105 AND 1201 Product Prodtype Direct AND Select &Field& From &Database& Where Prodnr = “98105” And Prodtype = “Direct” Product R1200 R1201 R9999 98105 0 0 0 124195 0 1 0 98105 0 0 0
  • 36. DWA-Day F e b r u a r y 1 3 . B e l g i u m …andisessentialtominimize(oreliminate)scrapandrework. » Data Cleansing is part of a technical process, and ensures that the data integrated into the data warehouse undergoes transformations to improve the quality: • Reduce data overlap and data redundancy • Complete records • Correct inaccurate data fields • Adjust data formatting • Complete empty data • Enforce referential integrity 36
  • 37. DWA-Day F e b r u a r y 1 3 . B e l g i u m Finally,dataqualityisembeddedintodatagovernanceandneedsacyclingprocess Rules Action Plans 37 Embed Data Quality in your daily work Do it right the first time Assess and analyse Root CausesImprove Data Quality Communicate and gain trust Involve &Train Communication Governance Data Validation
  • 38. Thank You Data Warehouse Automation F r e e m e m b e r s h i p D V - C o m m u n i t y . o r g Ivan Schotsmans +32 495 55 1907 ischotsm@dv-community.org https://www.dv-community.org/ https://www.bi-community.org/ FgtT@2020! DWA – Day Thursday 13 Feb 2020 Belgium