SlideShare a Scribd company logo
4/23/15	
  
1	
  
Survival	
  Guide:	
  Taming	
  the	
  Data	
  
Quality	
  Beast	
  
By	
  Shauna	
  Ayers	
  	
  
and	
  Catherine	
  Cruz	
  Agosto	
  
About	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .	
  
•  Availity	
  is	
  a	
  trusted	
  intermediary	
  for	
  informa:on	
  
exchange	
  between	
  health	
  plans	
  and	
  providers	
  
•  Availity	
  eases	
  the	
  complexity	
  of	
  moving	
  business	
  
and	
  clinical	
  informa:on	
  to	
  health	
  care	
  
stakeholders	
  na:onwide	
  
•  Availity’s	
  real-­‐:me,	
  point-­‐to-­‐point	
  connec:vity	
  
provides	
  speed	
  and	
  accuracy	
  at	
  the	
  intersec:on	
  of	
  
health	
  care	
  and	
  technology	
  
•  Availity’s	
  tools	
  include:	
  
–  A	
  mul:-­‐payer	
  Web	
  Portal	
  
–  An	
  all-­‐payer	
  Advanced	
  Clearinghouse	
  
–  A	
  powerful	
  Revenue	
  Cycle	
  Management	
  suite	
  
–  A	
  smarter	
  Pa:ent	
  Access	
  solu:on	
  
4/23/15	
  
2	
  
Overview	
  
•  Data	
  Quality	
  Defini:ons	
  and	
  Impact	
  
•  The	
  5	
  Goals	
  of	
  Data	
  Quality	
  
•  The	
  4	
  Pillars	
  of	
  Data	
  Quality	
  
•  The	
  Flow	
  of	
  Your	
  Data	
  
•  The	
  4	
  V’s	
  of	
  Your	
  Data	
  Sets	
  
•  The	
  Proper:es	
  of	
  Your	
  Data	
  
•  Sharing	
  the	
  Health	
  of	
  	
  
	
  	
  	
  	
  	
  Your	
  Data	
  
Defini:ons	
  and	
  Impact	
  
•  Data	
  quality	
  is	
  data's	
  fitness	
  and	
  usability	
  for	
  its	
  intended	
  
purpose.	
  	
  	
  
•  Data	
  quality	
  assurance	
  is	
  the	
  monitoring	
  and	
  analysis	
  of	
  
data	
  sets	
  and	
  the	
  processes	
  that	
  create	
  or	
  manipulate	
  data,	
  
in	
  order	
  to	
  ensure	
  the	
  data’s	
  quality	
  meets	
  the	
  company's	
  
needs.	
  	
  
•  The	
  role	
  of	
  data	
  quality	
  assurance	
  within	
  the	
  company	
  is	
  
to	
  iden:fy	
  problems	
  with	
  its	
  data	
  and	
  to	
  manage	
  these	
  
problems,	
  preven:ng	
  them	
  wherever	
  possible,	
  and	
  
correc:ng	
  those	
  that	
  cannot	
  be	
  prevented.	
  
•  Func?ons	
  suppor?ng	
  data	
  quality	
  assurance,	
  and	
  
frequently	
  integrated	
  with	
  it,	
  include	
  but	
  are	
  not	
  limited	
  to	
  
data	
  governance,	
  data	
  architecture,	
  data	
  stewardship,	
  data	
  
quality	
  tes:ng,	
  and	
  data	
  cleansing.	
  
4/23/15	
  
3	
  
The	
  5	
  Goals	
  of	
  Data	
  Quality	
  
•  Prevent	
  
•  Detect	
  
•  Communicate	
  
•  Mi:gate	
  
•  Correct	
  
	
  
These	
  goals	
  guide	
  us	
  	
  
and	
  light	
  our	
  path.	
  
The	
  4	
  Pillars	
  of	
  Data	
  Quality	
  
•  Analysis	
  and	
  Profiling	
  
•  Strategies	
  and	
  Tac:cs	
  
•  Tes:ng	
  
•  Intelligence	
  
4/23/15	
  
4	
  
•  Data	
  is	
  not	
  sta:c.	
  It	
  constantly	
  flows	
  between	
  
data	
  sets	
  and	
  applica:ons	
  in	
  con:nuing	
  waves	
  of	
  
gathering,	
  delivery,	
  storage,	
  integra:on	
  /	
  
transforma:on,	
  retrieval	
  and	
  analysis.	
  	
  
	
  
	
  
	
  
	
  
	
  
•  …So,	
  how	
  do	
  we	
  test	
  a	
  moving	
  target?	
  
The	
  Flow	
  of	
  Your	
  Data	
  
The	
  4	
  V’s	
  of	
  Your	
  Data	
  Sets	
  
The	
  scale	
  of	
  your	
  data	
  is	
  driven	
  by	
  the	
  four	
  V’s:	
  
•  Volume	
  
•  Variety	
  
•  Vitality	
  
•  Velocity	
  
	
  
The	
  boundaries	
  of	
  each	
  data	
  set	
  are	
  defined	
  by	
  
business	
  rules	
  and	
  constraints.	
  The	
  content	
  of	
  
each	
  data	
  set	
  is	
  what	
  is	
  measured	
  or	
  evaluated.	
  
Volume
Variety Velocity
Vitality
4/23/15	
  
5	
  
The	
  Proper:es	
  of	
  Your	
  Data	
  
The	
  quality	
  of	
  your	
  data	
  is	
  driven	
  by	
  various	
  proper:es:	
  
•  Accuracy	
  
•  Completeness	
  
•  Timeliness	
  
•  Consistency	
  
•  Validity	
  
•  Temporal	
  Reliability	
  
•  Interpretability	
  
•  Accessibility	
  
•  Usage	
  
•  Precision	
  
•  Uniqueness	
  
Property	
  +	
  Business	
  Value	
  =	
  Impact	
  of	
  Quality	
  problem	
  
Sharing	
  the	
  Health	
  of	
  Your	
  Data	
  
To	
  find	
  your	
  quarry,	
  and	
  tame	
  it,	
  you	
  must	
  be	
  
able	
  to	
  see	
  the	
  forest	
  for	
  the	
  trees.	
  Ar:facts	
  
used	
  to	
  communicate	
  data	
  system	
  health:	
  
•  Dashboards	
  
•  System	
  monitoring	
  alerts	
  
•  Reports	
  
•  Bug-­‐tracking	
  :ckets	
  
4/23/15	
  
6	
  
Analysis	
  and	
  Profiling	
  Pillar	
  
Analyzing	
  the	
  data	
  can	
  give	
  valuable	
  insight	
  into	
  
the	
  data.	
  It	
  can	
  shed	
  light	
  on	
  paberns	
  that	
  might	
  
not	
  have	
  been	
  seen	
  previously.	
  Profiling	
  allows	
  for	
  
similar	
  data	
  to	
  be	
  grouped.	
  
•  Categoriza:on	
  
•  Methods	
  
•  “Gotchas”	
  and	
  possible	
  challenges	
  
•  Gathering	
  metrics	
  
–  On	
  data	
  
–  On	
  test	
  coverage	
  
•  Dependencies,	
  rela:onships	
  and	
  paberns	
  
Strategies	
  and	
  Tac:cs	
  Pillar	
  
Most	
  companies	
  use	
  a	
  mix	
  of	
  strategies	
  and	
  tac:cs,	
  
such	
  as:	
  
•  Input	
  valida:on	
  
•  Cri:cal	
  value	
  checks	
  (sampling	
  or	
  periodic	
  analysis	
  of	
  
standing	
  data)	
  
•  In-­‐line	
  valida:on	
  
•  Hash	
  values	
  and	
  checksums	
  
•  Tolerance	
  checks	
  and	
  sta:s:cal	
  	
  
analysis	
  
•  Architectural	
  and	
  domain	
  	
  
integrity	
  checks	
  
	
  
Without	
  a	
  plan,	
  your	
  results	
  	
  
can	
  be	
  haphazard.	
  	
  
4/23/15	
  
7	
  
Tes:ng	
  Pillar	
  
Types	
  of	
  tests	
  
•  Count	
  checks	
  
•  Compare	
  checks	
  
•  Business	
  Rule	
  Valida:on	
  
•  Null	
  value	
  checks	
  
•  Code	
  Checks	
  
Methods	
  and	
  Strategies	
  
•  Exploratory	
  
•  Manual	
  
•  Automated	
  
Tools	
  
•  Buying	
  vs.	
  In-­‐house	
  
•  Machine	
  cannot	
  replace	
  a	
  human	
  
Intelligence	
  Pillar	
  
Data	
  Quality	
  intelligence	
  provides	
  	
  
visibility	
  of	
  the	
  data	
  environment,	
  	
  
suppor:ng:	
  
•  Opera:onal	
  Troubleshoo:ng	
  
•  Process	
  Improvement	
  
•  Risk	
  Analysis	
  
•  Data	
  Governance	
  and	
  Regulatory	
  Compliance	
  
Metrics	
  useful	
  for	
  DQ	
  Intelligence	
  
•  Current	
  state:	
  unresolved	
  defects	
  or	
  failed	
  tests	
  
•  Property	
  Tolerances:	
  e.g.,	
  histogram	
  analysis,	
  %	
  change	
  over	
  
:me	
  
•  Defect	
  Trends	
  over	
  :me:	
  defect	
  count	
  by	
  data	
  set	
  or	
  type	
  
•  Test	
  Coverage:	
  %	
  implemented/%	
  possible	
  
4/23/15	
  
8	
  
Property:	
  Accuracy	
  
•  Defini:on:	
  Whether	
  the	
  data	
  values	
  stored	
  for	
  
an	
  object	
  are	
  the	
  correct	
  values.	
  To	
  be	
  correct,	
  
a	
  data	
  value	
  must	
  be	
  the	
  right	
  value,	
  and	
  must	
  
be	
  represented	
  in	
  a	
  consistent	
  and	
  
unambiguous	
  form.	
  
•  Possible	
  DQ	
  checks:	
  Hash	
  values	
  and	
  
checksums,	
  business	
  rule	
  valida:ons,	
  source-­‐
to-­‐target	
  value	
  comparisons	
  
•  Examples:	
  	
  
– Mismatch	
  between	
  labeling	
  and	
  content	
  	
  
– American	
  vs	
  European	
  date	
  formats	
  
– “John	
  Doe”	
  vs	
  “JOHN	
  DOE”	
  
Property:	
  Completeness	
  
•  Defini:on:	
  When	
  all	
  the	
  data	
  required	
  to	
  meet	
  
the	
  requirements/business	
  need	
  is	
  available	
  in	
  
the	
  target	
  	
  
•  Possible	
  DQ	
  checks:	
  Source-­‐to-­‐Target	
  Count	
  
checks,	
  Compare	
  Checks,	
  not-­‐null	
  checks	
  
•  Examples:	
  
– Inconsistent	
  data	
  types	
  between	
  source	
  and	
  
target	
  
– Unenforced	
  column	
  is	
  null	
  in	
  the	
  target.	
  
– Missing	
  criteria	
  in	
  filter	
  causing	
  records	
  to	
  be	
  
missed	
  
4/23/15	
  
9	
  
Property:	
  Timeliness	
  
•  Defini:on:	
  Whether	
  data	
  is	
  visible	
  when	
  the	
  
user	
  or	
  consuming	
  applica:on	
  expects	
  it	
  to	
  be.	
  	
  
•  Possible	
  DQ	
  checks:	
  process	
  control	
  tolerance	
  
checks,	
  ID	
  comparisons,	
  missing	
  update	
  
checks	
  
•  Examples:	
  
– Package	
  delivery	
  
– Credit	
  card	
  account	
  ac:vity	
  	
  
– CRM	
  data	
  
Property:	
  Consistency	
  
•  Defini:on:	
  The	
  process	
  works	
  all	
  the	
  :me.	
  No	
  
maber	
  what	
  source	
  you	
  get	
  the	
  data	
  from,	
  it	
  
should	
  be	
  the	
  same	
  if	
  it	
  correlates.	
  
•  Possible	
  DQ	
  checks:	
  Business	
  Rule	
  Valida:on,	
  
Source-­‐to-­‐target	
  Compare	
  
•  Example:	
  
– Table	
  A	
  shows	
  one	
  address	
  for	
  customer	
  and	
  
Table	
  B	
  shows	
  another	
  
– Account	
  informa:on	
  is	
  different	
  when	
  look	
  at	
  
profile	
  on	
  website	
  vs	
  mobile	
  app	
  
4/23/15	
  
10	
  
Property:	
  Validity	
  
•  Defini:on:	
  The	
  correctness	
  and	
  
reasonableness	
  of	
  data,	
  how	
  well	
  it	
  conforms	
  
to	
  the	
  syntax	
  (format,	
  type,	
  range)	
  of	
  its	
  
defini:on.	
  
•  Possible	
  DQ	
  checks:	
  input	
  valida:on,	
  
parametric	
  checks,	
  domain	
  checks	
  
•  Examples:	
  
– Two-­‐digit	
  years	
  on	
  birthdates	
  for	
  Medicare	
  
enrollees	
  
– Nega:ve	
  cycle	
  :mes	
  
– Invalid	
  customer	
  codes	
  
Property:	
  Temporal	
  Reliability	
  
•  Defini:on:	
  Time	
  dependent	
  data	
  
•  Possible	
  DQ	
  checks:	
  Source	
  to	
  target	
  count	
  
checks,	
  Compare	
  checks	
  
•  Example:	
  	
  
– Source	
  to	
  view	
  change	
  from	
  daily	
  to	
  real-­‐:me	
  
– Process	
  loads	
  data	
  to	
  source	
  table	
  is	
  delayed	
  
	
  
4/23/15	
  
11	
  
Property:	
  Interpretability	
  
•  Defini:on:	
  How	
  easy	
  is	
  it	
  to	
  extract	
  
understandable	
  informa:on	
  from	
  the	
  data	
  
•  Possible	
  DQ	
  checks:	
  Histograms,	
  source-­‐to-­‐
target	
  ID	
  compares	
  over	
  date	
  range	
  
•  Examples:	
  
– Units	
  of	
  measurement:	
  Metric	
  mishap	
  caused	
  loss	
  
of	
  NASA	
  orbiter	
  
Property:	
  Accessibility	
  
•  Defini:on:	
  Is	
  it	
  available?	
  
•  Possible	
  DQ	
  checks:	
  Security	
  checks,	
  source-­‐
to-­‐target	
  checks	
  
•  Examples:	
  
– User	
  unable	
  to	
  search	
  for	
  data	
  when	
  using	
  one	
  
iden:fier	
  but	
  can	
  find	
  record	
  using	
  a	
  different	
  
iden:fier	
  
– Order	
  specific	
  
4/23/15	
  
12	
  
Property:	
  Usage	
  
•  Defini:on:	
  Does	
  the	
  data	
  support	
  the	
  usage	
  to	
  
which	
  it	
  is	
  being	
  applied?	
  
•  Possible	
  DQ	
  checks:	
  	
  Duplicate	
  checks,	
  
histograms,	
  ID	
  compares	
  over	
  :me,	
  domain	
  
checks	
  
•  Examples:	
  
– Time	
  Zone	
  assump:ons:	
  Data	
  from	
  the	
  future	
  
– Page	
  rankings	
  derived	
  from	
  links	
  to	
  the	
  page	
  
– Cross-­‐grain	
  configura:on	
  values	
  (“All”	
  or	
  “Other”)	
  
Property:	
  Precision	
  
•  Defini:on:	
  Correla:on	
  between	
  what	
  is	
  reality	
  
and	
  what	
  is	
  shown	
  in	
  the	
  data.	
  
•  Possible	
  DQ	
  checks:	
  Business	
  Rule	
  Valida:on,	
  
Source	
  to	
  target	
  comparison	
  
•  Example:	
  	
  
– Incorrect	
  address	
  displayed	
  for	
  customer	
  
– Showing	
  Customer	
  A	
  data	
  in	
  Customer	
  B’s	
  account	
  
page	
  
– Calcula:ons	
  
4/23/15	
  
13	
  
Property:	
  Uniqueness	
  
•  Defini:on:	
  What	
  makes	
  a	
  data	
  en:ty	
  one	
  of	
  its	
  
kind.	
  	
  
•  Possible	
  DQ	
  checks:	
  	
  Duplicate	
  checks	
  
•  Examples:	
  
– Mul:ple	
  customer	
  entries	
  in	
  CRM	
  system	
  
– Mul:ple	
  conflic:ng	
  configura:on	
  entries	
  for	
  same	
  
en:ty	
  
– Duplicate	
  inventory	
  entries	
  
Overall	
  picture/	
  conclusion	
  
•  Any	
  expedi:on	
  to	
  ensure	
  data	
  quality	
  in	
  the	
  
living,	
  dynamic	
  data	
  ecosystem	
  that	
  occurs	
  in	
  
every	
  company	
  requires	
  the	
  following:	
  
– clear	
  goals	
  to	
  guide	
  efforts,	
  	
  
– a	
  func:onal	
  framework	
  providing	
  the	
  tools	
  to	
  
work	
  with,	
  
– an	
  understanding	
  of	
  the	
  living	
  flow	
  of	
  your	
  data,	
  	
  
– an	
  understanding	
  of	
  its	
  fundamental	
  shape	
  and	
  
nature	
  
– clear	
  communica:on	
  of	
  these	
  elements	
  	
  
to	
  all	
  members	
  of	
  the	
  party	
  involved	
  	
  

More Related Content

PDF
Ensuring data quality
PPTX
Next Gen Clinical Data Sciences
PPTX
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
PPT
Building a Data Quality Program from Scratch
PDF
Statistics — Your Friend, Not Your Foe
PPTX
Developing a Centralized Repository Strategy: The Top Three Success Factors
PPT
MEASURE Evaluation Data Quality Assessment Methodology and Tools
PPTX
Hm306 week 1 ppt A
Ensuring data quality
Next Gen Clinical Data Sciences
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Building a Data Quality Program from Scratch
Statistics — Your Friend, Not Your Foe
Developing a Centralized Repository Strategy: The Top Three Success Factors
MEASURE Evaluation Data Quality Assessment Methodology and Tools
Hm306 week 1 ppt A

What's hot (20)

PDF
( Big ) Data Management - Governance - Global concepts in 5 slides
PPT
Optimising Clinical Trials Monitoring Data review - Neill Barron
PPT
JR's Lifetime Advanced Analytics
PDF
Optimizing a Data Migration with an Assessment
PPTX
Seeing Is Believing: How Clinical Trial Data Transparency is Changing How an...
PPT
Assessing M&E Systems For Data Quality
PPTX
Use of Visualisations to Optimise Clinical Trials - Neill Barron
PDF
Dw-dm-part-03
PPTX
Life Science Analytics
PDF
Data quality
PDF
How to Load Data More Quickly and Accurately into Oracle's Life Sciences Data...
PPT
DIA 2014 Risk Based Monitoring - Neill Barron
PPTX
How BrackenData Leverages Data on Over 250,000 Clinical Trials
PPT
Iso 31000 presentation
DOCX
Data quality management model
PPT
ACDM - "Data Driven" Monitoring of Clinical Trials - Neill Barron
DOC
Resume 2016
PPTX
Clinical research innovation hub walking deck v12
PPTX
Patients outcomes
PDF
The ABCs of Clinical Trial Management Systems
( Big ) Data Management - Governance - Global concepts in 5 slides
Optimising Clinical Trials Monitoring Data review - Neill Barron
JR's Lifetime Advanced Analytics
Optimizing a Data Migration with an Assessment
Seeing Is Believing: How Clinical Trial Data Transparency is Changing How an...
Assessing M&E Systems For Data Quality
Use of Visualisations to Optimise Clinical Trials - Neill Barron
Dw-dm-part-03
Life Science Analytics
Data quality
How to Load Data More Quickly and Accurately into Oracle's Life Sciences Data...
DIA 2014 Risk Based Monitoring - Neill Barron
How BrackenData Leverages Data on Over 250,000 Clinical Trials
Iso 31000 presentation
Data quality management model
ACDM - "Data Driven" Monitoring of Clinical Trials - Neill Barron
Resume 2016
Clinical research innovation hub walking deck v12
Patients outcomes
The ABCs of Clinical Trial Management Systems
Ad

Viewers also liked (15)

PDF
Testing the New Disney World Website
PDF
The Power of an Individual Tester: The HealthCare.gov Experience
PDF
Essential Test Management and Planning
PDF
Innovation for Existing Software Product: An R&D Approach
PDF
The Internet of Things and You
PDF
Implement an Enterprise Performance Test Process
PDF
Why Agile Fails in Large Enterprises—and What to Do about It
PDF
Risk-Based Testing for Agile Projects
PDF
Mobile App Testing: The Good, the Bad, and the Ugly
PDF
Building on Existing Infrastructure for Mobile Applications
PDF
Crafting Smaller User Stories: Examples and Exercises
PDF
Mindmaps: Lightweight Documentation for Testing
PDF
Successful Test Automation: A Manager’s View
PDF
Metrics Program Implementation: Pitfalls and Successes
PDF
Quality Index: A Composite Metric for the Voice of Testing
Testing the New Disney World Website
The Power of an Individual Tester: The HealthCare.gov Experience
Essential Test Management and Planning
Innovation for Existing Software Product: An R&D Approach
The Internet of Things and You
Implement an Enterprise Performance Test Process
Why Agile Fails in Large Enterprises—and What to Do about It
Risk-Based Testing for Agile Projects
Mobile App Testing: The Good, the Bad, and the Ugly
Building on Existing Infrastructure for Mobile Applications
Crafting Smaller User Stories: Examples and Exercises
Mindmaps: Lightweight Documentation for Testing
Successful Test Automation: A Manager’s View
Metrics Program Implementation: Pitfalls and Successes
Quality Index: A Composite Metric for the Voice of Testing
Ad

Similar to Survival Guide: Taming the Data Quality Beast (20)

PDF
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
PDF
Data Quality at the Speed of Work
PPTX
Transform Your Downstream Cloud Analytics with Data Quality 
PDF
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
PDF
Data Quality Dimensions Measurement Strategy Management And Governance Rupa M...
PDF
Data Profiling: The First Step to Big Data Quality
PPTX
Data Quality
PDF
Data quality - The True Big Data Challenge
PDF
A Better Understanding: Solving Business Challenges with Data
PPTX
Data Quality_ the holy grail for a Data Fluent Organization.pptx
PDF
How do you assess the quality and reliability of data sources in data analysi...
PPTX
Data Quality Challenges & Solution Approaches in Yahoo!’s Massive Data
PDF
The Essentials of a Data Quality Framework.pdf
PDF
Getting Data Quality Right
PPT
Lecture 22
PDF
Data quality testing – a quick checklist to measure and improve data quality
PPTX
The New Age Data Quality
PDF
Data Quality Strategy: A Step-by-Step Approach
PPTX
Data_Quality_Awareness_and_Approach.pptx
PDF
AI-Led-Cognitive-Data-Quality.pdf
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Data Quality at the Speed of Work
Transform Your Downstream Cloud Analytics with Data Quality 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Data Quality Dimensions Measurement Strategy Management And Governance Rupa M...
Data Profiling: The First Step to Big Data Quality
Data Quality
Data quality - The True Big Data Challenge
A Better Understanding: Solving Business Challenges with Data
Data Quality_ the holy grail for a Data Fluent Organization.pptx
How do you assess the quality and reliability of data sources in data analysi...
Data Quality Challenges & Solution Approaches in Yahoo!’s Massive Data
The Essentials of a Data Quality Framework.pdf
Getting Data Quality Right
Lecture 22
Data quality testing – a quick checklist to measure and improve data quality
The New Age Data Quality
Data Quality Strategy: A Step-by-Step Approach
Data_Quality_Awareness_and_Approach.pptx
AI-Led-Cognitive-Data-Quality.pdf

More from TechWell (20)

PDF
Failing and Recovering
PDF
Instill a DevOps Testing Culture in Your Team and Organization
PDF
Test Design for Fully Automated Build Architecture
PDF
System-Level Test Automation: Ensuring a Good Start
PDF
Build Your Mobile App Quality and Test Strategy
PDF
Testing Transformation: The Art and Science for Success
PDF
Implement BDD with Cucumber and SpecFlow
PDF
Develop WebDriver Automated Tests—and Keep Your Sanity
PDF
Ma 15
PDF
Eliminate Cloud Waste with a Holistic DevOps Strategy
PDF
Transform Test Organizations for the New World of DevOps
PDF
The Fourth Constraint in Project Delivery—Leadership
PDF
Resolve the Contradiction of Specialists within Agile Teams
PDF
Pin the Tail on the Metric: A Field-Tested Agile Game
PDF
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
PDF
A Business-First Approach to DevOps Implementation
PDF
Databases in a Continuous Integration/Delivery Process
PDF
Mobile Testing: What—and What Not—to Automate
PDF
Cultural Intelligence: A Key Skill for Success
PDF
Turn the Lights On: A Power Utility Company's Agile Transformation
Failing and Recovering
Instill a DevOps Testing Culture in Your Team and Organization
Test Design for Fully Automated Build Architecture
System-Level Test Automation: Ensuring a Good Start
Build Your Mobile App Quality and Test Strategy
Testing Transformation: The Art and Science for Success
Implement BDD with Cucumber and SpecFlow
Develop WebDriver Automated Tests—and Keep Your Sanity
Ma 15
Eliminate Cloud Waste with a Holistic DevOps Strategy
Transform Test Organizations for the New World of DevOps
The Fourth Constraint in Project Delivery—Leadership
Resolve the Contradiction of Specialists within Agile Teams
Pin the Tail on the Metric: A Field-Tested Agile Game
Agile Performance Holarchy (APH)—A Model for Scaling Agile Teams
A Business-First Approach to DevOps Implementation
Databases in a Continuous Integration/Delivery Process
Mobile Testing: What—and What Not—to Automate
Cultural Intelligence: A Key Skill for Success
Turn the Lights On: A Power Utility Company's Agile Transformation

Recently uploaded (20)

PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
ai tools demonstartion for schools and inter college
PDF
System and Network Administration Chapter 2
PPT
Introduction Database Management System for Course Database
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Introduction to Artificial Intelligence
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
AI in Product Development-omnex systems
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
System and Network Administraation Chapter 3
PDF
How Creative Agencies Leverage Project Management Software.pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
2025 Textile ERP Trends: SAP, Odoo & Oracle
Odoo Companies in India – Driving Business Transformation.pdf
How to Migrate SBCGlobal Email to Yahoo Easily
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
ai tools demonstartion for schools and inter college
System and Network Administration Chapter 2
Introduction Database Management System for Course Database
Odoo POS Development Services by CandidRoot Solutions
Introduction to Artificial Intelligence
Design an Analysis of Algorithms I-SECS-1021-03
Softaken Excel to vCard Converter Software.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
AI in Product Development-omnex systems
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
System and Network Administraation Chapter 3
How Creative Agencies Leverage Project Management Software.pdf

Survival Guide: Taming the Data Quality Beast

  • 1. 4/23/15   1   Survival  Guide:  Taming  the  Data   Quality  Beast   By  Shauna  Ayers     and  Catherine  Cruz  Agosto   About                                    .   •  Availity  is  a  trusted  intermediary  for  informa:on   exchange  between  health  plans  and  providers   •  Availity  eases  the  complexity  of  moving  business   and  clinical  informa:on  to  health  care   stakeholders  na:onwide   •  Availity’s  real-­‐:me,  point-­‐to-­‐point  connec:vity   provides  speed  and  accuracy  at  the  intersec:on  of   health  care  and  technology   •  Availity’s  tools  include:   –  A  mul:-­‐payer  Web  Portal   –  An  all-­‐payer  Advanced  Clearinghouse   –  A  powerful  Revenue  Cycle  Management  suite   –  A  smarter  Pa:ent  Access  solu:on  
  • 2. 4/23/15   2   Overview   •  Data  Quality  Defini:ons  and  Impact   •  The  5  Goals  of  Data  Quality   •  The  4  Pillars  of  Data  Quality   •  The  Flow  of  Your  Data   •  The  4  V’s  of  Your  Data  Sets   •  The  Proper:es  of  Your  Data   •  Sharing  the  Health  of              Your  Data   Defini:ons  and  Impact   •  Data  quality  is  data's  fitness  and  usability  for  its  intended   purpose.       •  Data  quality  assurance  is  the  monitoring  and  analysis  of   data  sets  and  the  processes  that  create  or  manipulate  data,   in  order  to  ensure  the  data’s  quality  meets  the  company's   needs.     •  The  role  of  data  quality  assurance  within  the  company  is   to  iden:fy  problems  with  its  data  and  to  manage  these   problems,  preven:ng  them  wherever  possible,  and   correc:ng  those  that  cannot  be  prevented.   •  Func?ons  suppor?ng  data  quality  assurance,  and   frequently  integrated  with  it,  include  but  are  not  limited  to   data  governance,  data  architecture,  data  stewardship,  data   quality  tes:ng,  and  data  cleansing.  
  • 3. 4/23/15   3   The  5  Goals  of  Data  Quality   •  Prevent   •  Detect   •  Communicate   •  Mi:gate   •  Correct     These  goals  guide  us     and  light  our  path.   The  4  Pillars  of  Data  Quality   •  Analysis  and  Profiling   •  Strategies  and  Tac:cs   •  Tes:ng   •  Intelligence  
  • 4. 4/23/15   4   •  Data  is  not  sta:c.  It  constantly  flows  between   data  sets  and  applica:ons  in  con:nuing  waves  of   gathering,  delivery,  storage,  integra:on  /   transforma:on,  retrieval  and  analysis.               •  …So,  how  do  we  test  a  moving  target?   The  Flow  of  Your  Data   The  4  V’s  of  Your  Data  Sets   The  scale  of  your  data  is  driven  by  the  four  V’s:   •  Volume   •  Variety   •  Vitality   •  Velocity     The  boundaries  of  each  data  set  are  defined  by   business  rules  and  constraints.  The  content  of   each  data  set  is  what  is  measured  or  evaluated.   Volume Variety Velocity Vitality
  • 5. 4/23/15   5   The  Proper:es  of  Your  Data   The  quality  of  your  data  is  driven  by  various  proper:es:   •  Accuracy   •  Completeness   •  Timeliness   •  Consistency   •  Validity   •  Temporal  Reliability   •  Interpretability   •  Accessibility   •  Usage   •  Precision   •  Uniqueness   Property  +  Business  Value  =  Impact  of  Quality  problem   Sharing  the  Health  of  Your  Data   To  find  your  quarry,  and  tame  it,  you  must  be   able  to  see  the  forest  for  the  trees.  Ar:facts   used  to  communicate  data  system  health:   •  Dashboards   •  System  monitoring  alerts   •  Reports   •  Bug-­‐tracking  :ckets  
  • 6. 4/23/15   6   Analysis  and  Profiling  Pillar   Analyzing  the  data  can  give  valuable  insight  into   the  data.  It  can  shed  light  on  paberns  that  might   not  have  been  seen  previously.  Profiling  allows  for   similar  data  to  be  grouped.   •  Categoriza:on   •  Methods   •  “Gotchas”  and  possible  challenges   •  Gathering  metrics   –  On  data   –  On  test  coverage   •  Dependencies,  rela:onships  and  paberns   Strategies  and  Tac:cs  Pillar   Most  companies  use  a  mix  of  strategies  and  tac:cs,   such  as:   •  Input  valida:on   •  Cri:cal  value  checks  (sampling  or  periodic  analysis  of   standing  data)   •  In-­‐line  valida:on   •  Hash  values  and  checksums   •  Tolerance  checks  and  sta:s:cal     analysis   •  Architectural  and  domain     integrity  checks     Without  a  plan,  your  results     can  be  haphazard.    
  • 7. 4/23/15   7   Tes:ng  Pillar   Types  of  tests   •  Count  checks   •  Compare  checks   •  Business  Rule  Valida:on   •  Null  value  checks   •  Code  Checks   Methods  and  Strategies   •  Exploratory   •  Manual   •  Automated   Tools   •  Buying  vs.  In-­‐house   •  Machine  cannot  replace  a  human   Intelligence  Pillar   Data  Quality  intelligence  provides     visibility  of  the  data  environment,     suppor:ng:   •  Opera:onal  Troubleshoo:ng   •  Process  Improvement   •  Risk  Analysis   •  Data  Governance  and  Regulatory  Compliance   Metrics  useful  for  DQ  Intelligence   •  Current  state:  unresolved  defects  or  failed  tests   •  Property  Tolerances:  e.g.,  histogram  analysis,  %  change  over   :me   •  Defect  Trends  over  :me:  defect  count  by  data  set  or  type   •  Test  Coverage:  %  implemented/%  possible  
  • 8. 4/23/15   8   Property:  Accuracy   •  Defini:on:  Whether  the  data  values  stored  for   an  object  are  the  correct  values.  To  be  correct,   a  data  value  must  be  the  right  value,  and  must   be  represented  in  a  consistent  and   unambiguous  form.   •  Possible  DQ  checks:  Hash  values  and   checksums,  business  rule  valida:ons,  source-­‐ to-­‐target  value  comparisons   •  Examples:     – Mismatch  between  labeling  and  content     – American  vs  European  date  formats   – “John  Doe”  vs  “JOHN  DOE”   Property:  Completeness   •  Defini:on:  When  all  the  data  required  to  meet   the  requirements/business  need  is  available  in   the  target     •  Possible  DQ  checks:  Source-­‐to-­‐Target  Count   checks,  Compare  Checks,  not-­‐null  checks   •  Examples:   – Inconsistent  data  types  between  source  and   target   – Unenforced  column  is  null  in  the  target.   – Missing  criteria  in  filter  causing  records  to  be   missed  
  • 9. 4/23/15   9   Property:  Timeliness   •  Defini:on:  Whether  data  is  visible  when  the   user  or  consuming  applica:on  expects  it  to  be.     •  Possible  DQ  checks:  process  control  tolerance   checks,  ID  comparisons,  missing  update   checks   •  Examples:   – Package  delivery   – Credit  card  account  ac:vity     – CRM  data   Property:  Consistency   •  Defini:on:  The  process  works  all  the  :me.  No   maber  what  source  you  get  the  data  from,  it   should  be  the  same  if  it  correlates.   •  Possible  DQ  checks:  Business  Rule  Valida:on,   Source-­‐to-­‐target  Compare   •  Example:   – Table  A  shows  one  address  for  customer  and   Table  B  shows  another   – Account  informa:on  is  different  when  look  at   profile  on  website  vs  mobile  app  
  • 10. 4/23/15   10   Property:  Validity   •  Defini:on:  The  correctness  and   reasonableness  of  data,  how  well  it  conforms   to  the  syntax  (format,  type,  range)  of  its   defini:on.   •  Possible  DQ  checks:  input  valida:on,   parametric  checks,  domain  checks   •  Examples:   – Two-­‐digit  years  on  birthdates  for  Medicare   enrollees   – Nega:ve  cycle  :mes   – Invalid  customer  codes   Property:  Temporal  Reliability   •  Defini:on:  Time  dependent  data   •  Possible  DQ  checks:  Source  to  target  count   checks,  Compare  checks   •  Example:     – Source  to  view  change  from  daily  to  real-­‐:me   – Process  loads  data  to  source  table  is  delayed    
  • 11. 4/23/15   11   Property:  Interpretability   •  Defini:on:  How  easy  is  it  to  extract   understandable  informa:on  from  the  data   •  Possible  DQ  checks:  Histograms,  source-­‐to-­‐ target  ID  compares  over  date  range   •  Examples:   – Units  of  measurement:  Metric  mishap  caused  loss   of  NASA  orbiter   Property:  Accessibility   •  Defini:on:  Is  it  available?   •  Possible  DQ  checks:  Security  checks,  source-­‐ to-­‐target  checks   •  Examples:   – User  unable  to  search  for  data  when  using  one   iden:fier  but  can  find  record  using  a  different   iden:fier   – Order  specific  
  • 12. 4/23/15   12   Property:  Usage   •  Defini:on:  Does  the  data  support  the  usage  to   which  it  is  being  applied?   •  Possible  DQ  checks:    Duplicate  checks,   histograms,  ID  compares  over  :me,  domain   checks   •  Examples:   – Time  Zone  assump:ons:  Data  from  the  future   – Page  rankings  derived  from  links  to  the  page   – Cross-­‐grain  configura:on  values  (“All”  or  “Other”)   Property:  Precision   •  Defini:on:  Correla:on  between  what  is  reality   and  what  is  shown  in  the  data.   •  Possible  DQ  checks:  Business  Rule  Valida:on,   Source  to  target  comparison   •  Example:     – Incorrect  address  displayed  for  customer   – Showing  Customer  A  data  in  Customer  B’s  account   page   – Calcula:ons  
  • 13. 4/23/15   13   Property:  Uniqueness   •  Defini:on:  What  makes  a  data  en:ty  one  of  its   kind.     •  Possible  DQ  checks:    Duplicate  checks   •  Examples:   – Mul:ple  customer  entries  in  CRM  system   – Mul:ple  conflic:ng  configura:on  entries  for  same   en:ty   – Duplicate  inventory  entries   Overall  picture/  conclusion   •  Any  expedi:on  to  ensure  data  quality  in  the   living,  dynamic  data  ecosystem  that  occurs  in   every  company  requires  the  following:   – clear  goals  to  guide  efforts,     – a  func:onal  framework  providing  the  tools  to   work  with,   – an  understanding  of  the  living  flow  of  your  data,     – an  understanding  of  its  fundamental  shape  and   nature   – clear  communica:on  of  these  elements     to  all  members  of  the  party  involved