SlideShare a Scribd company logo
Understanding Your Data
Series: Foundational Strategies Trust in Big Data – Part 2
Webcast Audio
• Today’s webcast audio is streamed through your computer
speakers.
• If you need technical assistance with the web interface or audio,
please reach out to us using the Q&A box.
Questions Welcome
• Submit your questions at any time during the presentation using
the Q&A box.
• We will answer them during our Q&A session following the
presentation.
Recording and slides
• This webcast is being recorded. You will receive an email following
the webcast with a link to download both the recording and the
slides.
Housekeeping
Arianna Valentini
Product Marketing Manager
What You Will Learn Today
• Quick refresh on ingredients of successful Big Data
• Common challenges of Big Data and data profiling
• The top 5 steps needed for effective data profiling
• How another company saw success through data profiling
• What you can do in the next 90 days to take action on DI
Wrap up with:
• Q&A
3
4
Ingredients of Successful Big Data
1. Clear Business Case 2. Extract Data 3. Understand Data 4. Trace Lineage
Data Governance
80%of AI/ML projects are stalling
due to poor data quality
Dimensional Research, 2019
Big Data Needs
Data Quality
“Societal trust in business is
arguably at an all-time low
and, in a world increasingly
driven by data and
technology,
reputations and brands are
ever harder to protect.”
EY “Trust in Data and Why it Matters”, 2017.
The importance of data
quality in the enterprise:
• Decision making
• Customer centricity
• Compliance
• Machine learning & AI
5
64%of IT executives have
trouble finding and cleaning
the right data for strategic
data projects
Sierra Venture, 2020
90%of executives are concerned
about the how misused data
can impact corporate
reputation
PWC, 22nd Annual Global CEO Survey, 2019
Understanding Your Data
Data Profiling
The set of analytical techniques that
evaluate actual data content (vs.
metadata) to provide a complete view
of each data element in a data source.
Provides summarized inferences, and
details of value and pattern frequencies
to quickly gain data insights.
Business Rules
The data quality or validation rules that
help ensure that data is “fit for use” in
its intended operational and decision-
making contexts.
Covers the accuracy, completeness,
consistency, relevance, timeliness and
validity of data.
6
Five Key Steps to effective Data Profiling
These are not new, but good to reiterate in the
context of Big Data:
1. How you want to analyze the data?
2. What should you review? (there's a lot of stuff)
3. What should you look for? (based on data “type”)
4. When should you build rules? (laser-focus; CDE’s)
5. What needs to be communicated?
7
1. How do you want to analyze the data?
“
”
Never lead with a data set;
lead with a question.
Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet
Forbes Insights, May 31, 2017, “The Data Differentiator”
Universal DQ best practices:
Understand the End Goal
• How does the business intend to
use the data (i.e. what’s the use
case)?
• Empower users (“Who”) to gain
new clarity into the core problem
(“Why”)
• What will the data be used for?
• What defines the Fitness for your
Purpose?
Establish Scope
• Ask the “right questions” about the
use case and the data (not just
“what” and “how”)
• What data is relevant to the effort?
• Big Data or other, you need to set
boundaries for the work
Understand Context
• How does the business define the
data?
• What are the important
characteristics and context of the
data?
• What are the Critical Data
Elements?
• What qualities will you need to
address, or leave alone?
• “High-quality data” definition will
vary by business problem“If you don’t know what you want to
get out of the data, how can you
know what data you need – and
what insight you’re looking for?”
Wolf Ruzicka, Chairman of the Board at EastBanc Technologies,
Blog post: June 1, 2017, “Grow A Data Tree Out Of The “Big Data”
Swamp”
10
To Sample or not to Sample?
Sampling helps with:
• Data Integration
• Source-to-target mapping
• Data Modeling
• Discovering Correlations
When the focus is on the structure of the data
• REMEMBER: your target is a
statistically valid sample!
• ~16k records gives you 99% confidence
with a margin of error of 1% for 100B
records
• ~66k records gives you 99% confidence
with a margin of error of .5% for same
Full Volume needed with:
• Data Quality
• Data Governance
• Regulatory Compliance
• Finding Outliers and Issues
with Content
• “Needles in the haystack”
When the focus is on the quality of or risks within
the data
• Focus on critical data elements and
leverage tools that scale to data volume
11
Big Data at scale distributes data across many
nodes – not necessarily with other relevant data!
• Processing routines must apply same approach and logic each
time
• Implications for profiling, joining, sorting, and matching data,
whether for enrichment, verification against trusted sources, or a
consolidated single view
Data Quality functions must be performed in a consistent manner,
no matter where actual processing takes place, how the data is
segmented, and what the data volume is.
Scaling Data Quality best practices:
Consistent processing at scale
Source: HP Analyst Briefing
12
2. What do you want to review?
Common Data Quality Measurements
What measures can we take advantage of?
1. Completeness – Are the relevant fields populated?
2. Integrity – Does the data maintain an internal structural
integrity or a relational integrity across sources
3. Uniqueness – Are keys or records unique?
4. Validity – Does the data have the correct values?
• Code and reference values
• Valid ranges
• Valid value combinations
5. Consistency – Is the data at consistent levels of
aggregation or does it have consistent valid values
over time?
6. Timeliness – Did the data arrive in a time
period that makes it useful or usable?
14
New data, new data quality challenges
• 3rd Party and external data with unknown provenance or relevance
• Bias in the data – whether in collection, extraction, or other processing
• Data without standardized structure or formatting
• Continuously streaming data
• Disjointed data (e.g. gaps in receipt)
• Consistency and verification of data sources
• Changes and transformation applied to data (i.e. does it really
represent the original input)
New Data Quality Problems
“34 percent of bankers in our survey report that their organization
has been the target of adversarial AI at least once, and 78 percent
believe automated systems create new risks, such as fake data,
external data manipulation, and inherent bias.”
Accenture Banking Technology Vision 2018
15
• Contextual visualizations
• Value and pattern distributions
• Attribute summaries and metadata
• Sort and filter to quickly find data
of interest
• Detail drilldowns to any content
Let Data Profiling guide you
16
3. What should you look for?
Common Data Types
What variances do you need awareness of?
1. Identifiers – data that uniquely identifies something
2. Indicators – data that flags a specific condition
3. Dates – data that identifies a point in time
4. Quantities – data that identifies an amount or value of something
5. Codes – data that segments other data
6. Text – data that describes or names something
18
4. When do you build rules?
Focus on:
• Critical Data Elements (data quality dimensions)
• Policy-based conditions (e.g. regulatory
compliance)
• Correlated data conditions (e.g. If x, then y)
• Filtering and segmenting data (refining
evaluations; investigating root cause)
Build Rules for Defined Conditions
20
• Validate critical requirements within or
across data sources
• Build common rules that can be readily
tested and shared
• Evaluate and remediate issues
• Take action on incorrect data and defaults
• Create flags for subsequent use in marking
or remediating data
• Filter result sets and export for additional
use
Benefits of Business Rules
21
5. What should you communicate?
23
Communicate!
Culture of Data Literacy
• “Democratization of Data” requires cultural support
Program of Data Governance
• Provide the processes and practices necessary for
success
Center of Excellence/Knowledge Base
• Where do you go to find answers?
• Who can help show you how?
• Annotate what you’ve found
Annotate Results with Findings
24
British Airways
Leveraging Data as a Critical Asset
About
• World’s leading international
premium airlines
• 33M passengers every year
• 35,000+ employees
• Fleet of 240 aircraft
Goal
• Ensure accurate data to support
customer service, marketing,
retention and loyalty
• Implement enterprise-wide data
governance
Challenge
• Data from multiple
sources/systems, stored in many
different formats​
• No enterprise standard for data
quality
• Point solutions led to varying
levels of cleanliness, inefficiencies25
British Airways
Results: Trusted data for improved analysis
Solution
• Trillium Data Quality
Benefits Achieved
• Trusted data for faster, better
strategic and operational decision
making​
• More effective marketing and
better customer service
26
Looking at the Next 90 Days…
• Make profiling actionable
• You don’t know what you don’t know until you profile
• Keep the 5 key questions top of mind!
• Join us tomorrow for part 3 of our webinar series!
27
Questions?
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data

More Related Content

PDF
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
PDF
DAS Slides: Data Quality Best Practices
PDF
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
PDF
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
PPTX
How to use your data science team: Becoming a data-driven organization
PDF
Data Strategy Best Practices
PDF
Data Management Meets Human Management - Why Words Matter
PDF
Focus on Your Analysis, Not Your SQL Code
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
DAS Slides: Data Quality Best Practices
TDWI Spotlight: Enabling Data Self-Service with Security, Governance, and Reg...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
How to use your data science team: Becoming a data-driven organization
Data Strategy Best Practices
Data Management Meets Human Management - Why Words Matter
Focus on Your Analysis, Not Your SQL Code

What's hot (20)

PDF
Death of the Dashboard
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PDF
The Shifting Landscape of Data Integration
PDF
The Death of the Star Schema
PDF
Data Profiling: The First Step to Big Data Quality
PDF
Slides: Go Beyond Dashboards With the Next Generation of Analytics
PDF
Slides: Data Governance Reality Check
PDF
You Can’t Have Best in Class Governance Without Best in Class Data Lineage
PDF
When and How Data Lakes Fit into a Modern Data Architecture
PDF
Data Catalog as the Platform for Data Intelligence
PDF
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
PDF
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
PDF
Applying Data Quality Best Practices at Big Data Scale
PDF
Platforming the Major Analytic Use Cases for Modern Engineering
PDF
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
PPTX
Why Data Science Projects Fail
PDF
Predictive analytics in decision management systems
PDF
Enterprise Data World Webinar: A Strategic Approach to Data Quality
PDF
How to Consume Your Data for AI
PPT
Data Quality
Death of the Dashboard
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
The Shifting Landscape of Data Integration
The Death of the Star Schema
Data Profiling: The First Step to Big Data Quality
Slides: Go Beyond Dashboards With the Next Generation of Analytics
Slides: Data Governance Reality Check
You Can’t Have Best in Class Governance Without Best in Class Data Lineage
When and How Data Lakes Fit into a Modern Data Architecture
Data Catalog as the Platform for Data Intelligence
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Applying Data Quality Best Practices at Big Data Scale
Platforming the Major Analytic Use Cases for Modern Engineering
Subscribing to Your Critical Data Supply Chain - Getting Value from True Data...
Why Data Science Projects Fail
Predictive analytics in decision management systems
Enterprise Data World Webinar: A Strategic Approach to Data Quality
How to Consume Your Data for AI
Data Quality
Ad

Similar to Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data (20)

PPTX
Transform Your Downstream Cloud Analytics with Data Quality 
PDF
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
PPTX
DataSpryng Overview
PPTX
000 introduction to big data analytics 2021
PDF
Lesson_1_definitions_BIG DATA INROSUCTIONUE.pdf
PDF
Data driven decision making
PPTX
Predictive Human Capital Analytics (1).pptx
PPTX
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
PDF
How to unlock new data-driven potential for your organization
PPTX
You Need a Data Catalog. Do You Know Why?
PPTX
You Need a Data Catalog. Do You Know Why?
PDF
You Need a Data Catalog. Do You Know Why?
PDF
Data-Ed: Data Governance Strategies
PDF
Data-Ed Webinar: Data Governance Strategies
PPTX
Health Information Analytics: Data Governance, Data Quality and Data Standards
PPTX
Moving Data Science from an Event to A Program: Considerations in Creating Su...
PDF
Data analytics and Access Program Recommendations
PDF
Big Data Analytics M1.pdf big data analytics
PPTX
Data Analytics: Better Decision, Better Business
PPTX
Lesson 9- Data Governance and Ethics.pptx
Transform Your Downstream Cloud Analytics with Data Quality 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
DataSpryng Overview
000 introduction to big data analytics 2021
Lesson_1_definitions_BIG DATA INROSUCTIONUE.pdf
Data driven decision making
Predictive Human Capital Analytics (1).pptx
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
How to unlock new data-driven potential for your organization
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
Data-Ed: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
Health Information Analytics: Data Governance, Data Quality and Data Standards
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Data analytics and Access Program Recommendations
Big Data Analytics M1.pdf big data analytics
Data Analytics: Better Decision, Better Business
Lesson 9- Data Governance and Ethics.pptx
Ad

More from Precisely (20)

PDF
The Future of Automation: AI, APIs, and Cloud Modernization.pdf
PDF
Unlock new opportunities with location data.pdf
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
PDF
Introducing Syncsort™ Storage Management.pdf
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
PDF
Solving the CIO’s Dilemma: Speed, Scale, and Smarter SAP Modernization.pdf
PDF
Solving the Data Disconnect: Why Success Hinges on Pre-Linked Data.pdf
PDF
Cooking Up Clean Addresses - 3 Ways to Whip Messy Data into Shape.pdf
PDF
Building Confidence in AI & Analytics with High-Integrity Location Data.pdf
PDF
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
PDF
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
PDF
The 2025 Guide on What's Next for Automation.pdf
PDF
Outdated Tech, Invisible Expenses – How Data Silos Undermine Operational Effi...
PDF
Modernización de SAP: Maximizando el Valor de su Migración a SAP S/4HANA.pdf
PDF
Outdated Tech, Invisible Expenses – The Hidden Cost of Disconnected Data Syst...
PDF
Migration vers SAP S/4HANA: Un levier stratégique pour votre transformation d...
PDF
Outdated Tech, Invisible Expenses: The Hidden Cost of Poor Data Integration o...
PDF
The Changing Compliance Landscape in 2025.pdf
The Future of Automation: AI, APIs, and Cloud Modernization.pdf
Unlock new opportunities with location data.pdf
Reimagining Insurance: Connected Data for Confident Decisions.pdf
Introducing Syncsort™ Storage Management.pdf
Enable Enterprise-Ready Security on IBM i Systems.pdf
A Day in the Life of Location Data - Turning Where into How.pdf
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Solving the CIO’s Dilemma: Speed, Scale, and Smarter SAP Modernization.pdf
Solving the Data Disconnect: Why Success Hinges on Pre-Linked Data.pdf
Cooking Up Clean Addresses - 3 Ways to Whip Messy Data into Shape.pdf
Building Confidence in AI & Analytics with High-Integrity Location Data.pdf
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
The 2025 Guide on What's Next for Automation.pdf
Outdated Tech, Invisible Expenses – How Data Silos Undermine Operational Effi...
Modernización de SAP: Maximizando el Valor de su Migración a SAP S/4HANA.pdf
Outdated Tech, Invisible Expenses – The Hidden Cost of Disconnected Data Syst...
Migration vers SAP S/4HANA: Un levier stratégique pour votre transformation d...
Outdated Tech, Invisible Expenses: The Hidden Cost of Poor Data Integration o...
The Changing Compliance Landscape in 2025.pdf

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
cuic standard and advanced reporting.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
MYSQL Presentation for SQL database connectivity
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
cuic standard and advanced reporting.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
The AUB Centre for AI in Media Proposal.docx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...

Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data

  • 1. Understanding Your Data Series: Foundational Strategies Trust in Big Data – Part 2
  • 2. Webcast Audio • Today’s webcast audio is streamed through your computer speakers. • If you need technical assistance with the web interface or audio, please reach out to us using the Q&A box. Questions Welcome • Submit your questions at any time during the presentation using the Q&A box. • We will answer them during our Q&A session following the presentation. Recording and slides • This webcast is being recorded. You will receive an email following the webcast with a link to download both the recording and the slides. Housekeeping Arianna Valentini Product Marketing Manager
  • 3. What You Will Learn Today • Quick refresh on ingredients of successful Big Data • Common challenges of Big Data and data profiling • The top 5 steps needed for effective data profiling • How another company saw success through data profiling • What you can do in the next 90 days to take action on DI Wrap up with: • Q&A 3
  • 4. 4 Ingredients of Successful Big Data 1. Clear Business Case 2. Extract Data 3. Understand Data 4. Trace Lineage Data Governance
  • 5. 80%of AI/ML projects are stalling due to poor data quality Dimensional Research, 2019 Big Data Needs Data Quality “Societal trust in business is arguably at an all-time low and, in a world increasingly driven by data and technology, reputations and brands are ever harder to protect.” EY “Trust in Data and Why it Matters”, 2017. The importance of data quality in the enterprise: • Decision making • Customer centricity • Compliance • Machine learning & AI 5 64%of IT executives have trouble finding and cleaning the right data for strategic data projects Sierra Venture, 2020 90%of executives are concerned about the how misused data can impact corporate reputation PWC, 22nd Annual Global CEO Survey, 2019
  • 6. Understanding Your Data Data Profiling The set of analytical techniques that evaluate actual data content (vs. metadata) to provide a complete view of each data element in a data source. Provides summarized inferences, and details of value and pattern frequencies to quickly gain data insights. Business Rules The data quality or validation rules that help ensure that data is “fit for use” in its intended operational and decision- making contexts. Covers the accuracy, completeness, consistency, relevance, timeliness and validity of data. 6
  • 7. Five Key Steps to effective Data Profiling These are not new, but good to reiterate in the context of Big Data: 1. How you want to analyze the data? 2. What should you review? (there's a lot of stuff) 3. What should you look for? (based on data “type”) 4. When should you build rules? (laser-focus; CDE’s) 5. What needs to be communicated? 7
  • 8. 1. How do you want to analyze the data?
  • 9. “ ” Never lead with a data set; lead with a question. Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet Forbes Insights, May 31, 2017, “The Data Differentiator”
  • 10. Universal DQ best practices: Understand the End Goal • How does the business intend to use the data (i.e. what’s the use case)? • Empower users (“Who”) to gain new clarity into the core problem (“Why”) • What will the data be used for? • What defines the Fitness for your Purpose? Establish Scope • Ask the “right questions” about the use case and the data (not just “what” and “how”) • What data is relevant to the effort? • Big Data or other, you need to set boundaries for the work Understand Context • How does the business define the data? • What are the important characteristics and context of the data? • What are the Critical Data Elements? • What qualities will you need to address, or leave alone? • “High-quality data” definition will vary by business problem“If you don’t know what you want to get out of the data, how can you know what data you need – and what insight you’re looking for?” Wolf Ruzicka, Chairman of the Board at EastBanc Technologies, Blog post: June 1, 2017, “Grow A Data Tree Out Of The “Big Data” Swamp” 10
  • 11. To Sample or not to Sample? Sampling helps with: • Data Integration • Source-to-target mapping • Data Modeling • Discovering Correlations When the focus is on the structure of the data • REMEMBER: your target is a statistically valid sample! • ~16k records gives you 99% confidence with a margin of error of 1% for 100B records • ~66k records gives you 99% confidence with a margin of error of .5% for same Full Volume needed with: • Data Quality • Data Governance • Regulatory Compliance • Finding Outliers and Issues with Content • “Needles in the haystack” When the focus is on the quality of or risks within the data • Focus on critical data elements and leverage tools that scale to data volume 11
  • 12. Big Data at scale distributes data across many nodes – not necessarily with other relevant data! • Processing routines must apply same approach and logic each time • Implications for profiling, joining, sorting, and matching data, whether for enrichment, verification against trusted sources, or a consolidated single view Data Quality functions must be performed in a consistent manner, no matter where actual processing takes place, how the data is segmented, and what the data volume is. Scaling Data Quality best practices: Consistent processing at scale Source: HP Analyst Briefing 12
  • 13. 2. What do you want to review?
  • 14. Common Data Quality Measurements What measures can we take advantage of? 1. Completeness – Are the relevant fields populated? 2. Integrity – Does the data maintain an internal structural integrity or a relational integrity across sources 3. Uniqueness – Are keys or records unique? 4. Validity – Does the data have the correct values? • Code and reference values • Valid ranges • Valid value combinations 5. Consistency – Is the data at consistent levels of aggregation or does it have consistent valid values over time? 6. Timeliness – Did the data arrive in a time period that makes it useful or usable? 14
  • 15. New data, new data quality challenges • 3rd Party and external data with unknown provenance or relevance • Bias in the data – whether in collection, extraction, or other processing • Data without standardized structure or formatting • Continuously streaming data • Disjointed data (e.g. gaps in receipt) • Consistency and verification of data sources • Changes and transformation applied to data (i.e. does it really represent the original input) New Data Quality Problems “34 percent of bankers in our survey report that their organization has been the target of adversarial AI at least once, and 78 percent believe automated systems create new risks, such as fake data, external data manipulation, and inherent bias.” Accenture Banking Technology Vision 2018 15
  • 16. • Contextual visualizations • Value and pattern distributions • Attribute summaries and metadata • Sort and filter to quickly find data of interest • Detail drilldowns to any content Let Data Profiling guide you 16
  • 17. 3. What should you look for?
  • 18. Common Data Types What variances do you need awareness of? 1. Identifiers – data that uniquely identifies something 2. Indicators – data that flags a specific condition 3. Dates – data that identifies a point in time 4. Quantities – data that identifies an amount or value of something 5. Codes – data that segments other data 6. Text – data that describes or names something 18
  • 19. 4. When do you build rules?
  • 20. Focus on: • Critical Data Elements (data quality dimensions) • Policy-based conditions (e.g. regulatory compliance) • Correlated data conditions (e.g. If x, then y) • Filtering and segmenting data (refining evaluations; investigating root cause) Build Rules for Defined Conditions 20
  • 21. • Validate critical requirements within or across data sources • Build common rules that can be readily tested and shared • Evaluate and remediate issues • Take action on incorrect data and defaults • Create flags for subsequent use in marking or remediating data • Filter result sets and export for additional use Benefits of Business Rules 21
  • 22. 5. What should you communicate?
  • 23. 23 Communicate! Culture of Data Literacy • “Democratization of Data” requires cultural support Program of Data Governance • Provide the processes and practices necessary for success Center of Excellence/Knowledge Base • Where do you go to find answers? • Who can help show you how?
  • 24. • Annotate what you’ve found Annotate Results with Findings 24
  • 25. British Airways Leveraging Data as a Critical Asset About • World’s leading international premium airlines • 33M passengers every year • 35,000+ employees • Fleet of 240 aircraft Goal • Ensure accurate data to support customer service, marketing, retention and loyalty • Implement enterprise-wide data governance Challenge • Data from multiple sources/systems, stored in many different formats​ • No enterprise standard for data quality • Point solutions led to varying levels of cleanliness, inefficiencies25
  • 26. British Airways Results: Trusted data for improved analysis Solution • Trillium Data Quality Benefits Achieved • Trusted data for faster, better strategic and operational decision making​ • More effective marketing and better customer service 26
  • 27. Looking at the Next 90 Days… • Make profiling actionable • You don’t know what you don’t know until you profile • Keep the 5 key questions top of mind! • Join us tomorrow for part 3 of our webinar series! 27