SlideShare a Scribd company logo
White Paper
Data Profiling Best Practices
Data
Profiling
Best
Practices
2
Overview
This white paper provides an overview of best practices with data
– Examines the best scenarios for
WhyUseDataProfilingTechnologies?
Deployment of Data Profiling Technologies
Data Quality Management
>>
>>
>>
White Paper
Data Profiling Best Practices
3
>>
>>
>>
>>
Data Integration
>>
>>
>>
>>
>>
4
Data Profiling Process
Prepare for the Project
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
Analysis Preparation
Review Project Initiation Document
White Paper
Data Profiling Best Practices
5
Current Documentation
>>
>>
>>
Team Training
Internal Setup/Decisions
>>
>>
Profiling Overview
PROJECT
PREPARATION
ANALYSIS
PREPARATION
ANALYSISSAMPLING
EXTRACT&
FORMAT
Project Initiation Document
Project Preparation
Extract & Format
Analyze Samples
Profiling
6
Activity Workflow
>>
>>
>>
>>
>>
>>
>>
Extract and Format the Data
>>
White Paper
Data Profiling Best Practices
7
>>
>>
Create the Extract Program(s)
Load Preparation
>>
>>
>>
Sampling
>>
>>
>>
>>
>>
>>
Load a Sample of the Data
Analysis of the Sample
Csv Each field, if separated by a comma, and text fields
enclosed within quotes. Generally this type of file al-
lows the first row to contain the name of the column.
csv File
Definition
Some product require or allow you to create defini-
tion rules for csv files. It is helpful to add or change
column names or add descriptions to the attributes.
Flat File
Definition
Varies based on the data profiling product chosen. It
varies from a flattened copybook or equivalent for
the language used, to pre-defined formats specific
to the tool itself.
ODBC
Connection
Open DataBase Connectivity, a standard database
access method developed by Microsoft Corporation.
The goal of ODBC is to access any data from any
application, regardless of which database manage-
ment system (DBMS) is handling the data.
8
Adjust the Extracts and Formats of the Data
>>
>>
>>
Produce Deliverables
Delete the Samples
Analysis
Analysis Assistant
>>
>>
>> Code
>>
>>
>>
Blanks/Nulls/Low Values/High Values
White Paper
Data Profiling Best Practices
9
Minimums/Maximums
Patterns
>>
>>
Duplicates / Inconsistencies
Invalid Codes
Identify Keys
Key Testing
Join Testing
Low Value
000-00-0000
NULL
High Value
999-99-9999
System
System 1
Minimum
000-00-00001
Maximum
System
System 1
System 1
System 2
Values
123-45-6789
12-3456789
123456789
Pattern
9(3)-9(2)-(4)
9(2)-(7)
9(9)
System
System 1
System 1
Values
123-45-6789
123-45-6789
System
System 1
System 1
Values
123-45-6789
123-45-6789
10
Outputs
White Paper
Data Profiling Best Practices
11
White Paper
Data Profiling Best Practices
For more information about our products
and services, please log onto our website at
www.g1.com or call us today at 888-413-6763.
4200 Parliament Place, Suite 600
Lanham, MD 20706-1844
1-888-413-6763 • www.g1.com
Group 1, Group 1 Software and the Group 1 logo are registered trademarks of Group 1
Software, Inc. Pitney Bowes and the Pitney Bowes logo are registered trademarks and the
Pitney Bowes Process Bar Design is a trademark of Pitney Bowes Inc. Group 1 Software
is a Pitney Bowes company. All other marks referenced in this material are the property of
their respective owners.
© 2007 Group 1 Software, Inc. All rights reserved.
An Equal Opportunity Employer. Printed in U.S.A.

More Related Content

PPT
Data quality architecture
PPT
Data Quality Integration (ETL) Open Source
PPT
Data Quality Rules introduction
PDF
( Big ) Data Management - Data Quality - Global concepts in 5 slides
PPT
Qiagram
PPT
Labmatrix
PPTX
Тестирование данных с помощью Data Quality Services (MS SQL 12)
ODP
Data quality overview
Data quality architecture
Data Quality Integration (ETL) Open Source
Data Quality Rules introduction
( Big ) Data Management - Data Quality - Global concepts in 5 slides
Qiagram
Labmatrix
Тестирование данных с помощью Data Quality Services (MS SQL 12)
Data quality overview

What's hot (20)

PPT
Building a Data Quality Program from Scratch
PDF
Adventures in Data Profiling
PPT
Data Quality Testing Generic (http://www.geektester.blogspot.com/)
PPT
Data Quality Technical Architecture
PDF
DGIQ 2015 The Fundamentals of Data Quality
PPTX
Evolution of big data
PPTX
Informatica data quality online training
PDF
PPTX
DATA WAREHOUSE -- ETL testing Plan
DOC
ETL QA
PDF
Data Quality Everywhere
PPS
Etl Overview (Extract, Transform, And Load)
PPT
ETL Testing Training Presentation
PPTX
ETL Testing Interview Questions and Answers
PPTX
Informatica Data Quality Training
PPTX
Research trends in data warehousing and data mining
PPTX
3 Ways Tableau Improves Predictive Analytics
PPTX
142230 633685297550892500
DOC
Etl testing
PDF
QuerySurge - the automated Data Testing solution
Building a Data Quality Program from Scratch
Adventures in Data Profiling
Data Quality Testing Generic (http://www.geektester.blogspot.com/)
Data Quality Technical Architecture
DGIQ 2015 The Fundamentals of Data Quality
Evolution of big data
Informatica data quality online training
DATA WAREHOUSE -- ETL testing Plan
ETL QA
Data Quality Everywhere
Etl Overview (Extract, Transform, And Load)
ETL Testing Training Presentation
ETL Testing Interview Questions and Answers
Informatica Data Quality Training
Research trends in data warehousing and data mining
3 Ways Tableau Improves Predictive Analytics
142230 633685297550892500
Etl testing
QuerySurge - the automated Data Testing solution
Ad

Viewers also liked (20)

PPTX
Data quality and data profiling
PPTX
Data Quality Dashboards
PPT
Data Quality Definitions
PPTX
Data Quality Presentation
PDF
Big Data Profiling
PPT
2007 Tidc India Profiling
RTF
Top Three Data Modeling Tools Usability Comparsion
PPTX
Data profiling
PPT
Odprti podatki & kakovost metapodatkov
PDF
A Machine learning based Data Quality Analysis Approach
PDF
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
PDF
Telling Visual Stories About Data
PPT
Uvod v upravljanje z metapodatki
PPT
Data Quality
PPSX
Using modeling tools data profiling powerdesigner data solutions de spirlet
PPTX
How to Get Rid of Dead Leads on your Database?
PPTX
Data Quality: A Raising Data Warehousing Concern
PPT
MEASURE Evaluation Data Quality Assessment Methodology and Tools
PDF
SAP Power Designer
Data quality and data profiling
Data Quality Dashboards
Data Quality Definitions
Data Quality Presentation
Big Data Profiling
2007 Tidc India Profiling
Top Three Data Modeling Tools Usability Comparsion
Data profiling
Odprti podatki & kakovost metapodatkov
A Machine learning based Data Quality Analysis Approach
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Telling Visual Stories About Data
Uvod v upravljanje z metapodatki
Data Quality
Using modeling tools data profiling powerdesigner data solutions de spirlet
How to Get Rid of Dead Leads on your Database?
Data Quality: A Raising Data Warehousing Concern
MEASURE Evaluation Data Quality Assessment Methodology and Tools
SAP Power Designer
Ad

Similar to Data profiling-best-practices (20)

PPTX
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
PDF
Zero Defect Tools in the Semiconductor Industry.pdf
PPTX
Data Warehouse Testing in the Pharmaceutical Industry
PPTX
Environmental monitoring data management systems.pptx
PDF
dd presentation.pdf
PPT
VTU - MIS Module 4 - SDLC
PDF
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
PPTX
Webinar: Performance Tuning + Optimization
PPTX
Big Data Testing: Ensuring MongoDB Data Quality
PDF
Process wind tunnel - A novel capability for data-driven business process imp...
DOC
Systems Lifecycle workbook
PPT
Different Approaches To Sys Bldg
PDF
CM7brochure
PPTX
How Manual Testers Can Break into Automation Without Programming Skills
PDF
Ibm test data_management_v0.4
PPTX
Database Performance Tuning| Rahul Gulab Singh
PPTX
Implementing a data_science_project (Python Version)_part1
PPTX
System Integration and Architecture.pptx
PPT
DDMA / T-Mobile: Datakwaliteit
PPTX
Nic solution strategy
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Zero Defect Tools in the Semiconductor Industry.pdf
Data Warehouse Testing in the Pharmaceutical Industry
Environmental monitoring data management systems.pptx
dd presentation.pdf
VTU - MIS Module 4 - SDLC
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
Webinar: Performance Tuning + Optimization
Big Data Testing: Ensuring MongoDB Data Quality
Process wind tunnel - A novel capability for data-driven business process imp...
Systems Lifecycle workbook
Different Approaches To Sys Bldg
CM7brochure
How Manual Testers Can Break into Automation Without Programming Skills
Ibm test data_management_v0.4
Database Performance Tuning| Rahul Gulab Singh
Implementing a data_science_project (Python Version)_part1
System Integration and Architecture.pptx
DDMA / T-Mobile: Datakwaliteit
Nic solution strategy

Recently uploaded (20)

PPTX
Global journeys: estimating international migration
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Lecture1 pattern recognition............
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Global journeys: estimating international migration
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
.pdf is not working space design for the following data for the following dat...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction to Knowledge Engineering Part 1
STUDY DESIGN details- Lt Col Maksud (21).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Data_Analytics_and_PowerBI_Presentation.pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Reliability_Chapter_ presentation 1221.5784
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
oil_refinery_comprehensive_20250804084928 (1).pptx
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”

Data profiling-best-practices

  • 1. White Paper Data Profiling Best Practices Data Profiling Best Practices
  • 2. 2 Overview This white paper provides an overview of best practices with data – Examines the best scenarios for WhyUseDataProfilingTechnologies? Deployment of Data Profiling Technologies Data Quality Management >> >> >>
  • 3. White Paper Data Profiling Best Practices 3 >> >> >> >> Data Integration >> >> >> >> >>
  • 4. 4 Data Profiling Process Prepare for the Project >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Analysis Preparation Review Project Initiation Document
  • 5. White Paper Data Profiling Best Practices 5 Current Documentation >> >> >> Team Training Internal Setup/Decisions >> >> Profiling Overview PROJECT PREPARATION ANALYSIS PREPARATION ANALYSISSAMPLING EXTRACT& FORMAT Project Initiation Document Project Preparation Extract & Format Analyze Samples Profiling
  • 7. White Paper Data Profiling Best Practices 7 >> >> Create the Extract Program(s) Load Preparation >> >> >> Sampling >> >> >> >> >> >> Load a Sample of the Data Analysis of the Sample Csv Each field, if separated by a comma, and text fields enclosed within quotes. Generally this type of file al- lows the first row to contain the name of the column. csv File Definition Some product require or allow you to create defini- tion rules for csv files. It is helpful to add or change column names or add descriptions to the attributes. Flat File Definition Varies based on the data profiling product chosen. It varies from a flattened copybook or equivalent for the language used, to pre-defined formats specific to the tool itself. ODBC Connection Open DataBase Connectivity, a standard database access method developed by Microsoft Corporation. The goal of ODBC is to access any data from any application, regardless of which database manage- ment system (DBMS) is handling the data.
  • 8. 8 Adjust the Extracts and Formats of the Data >> >> >> Produce Deliverables Delete the Samples Analysis Analysis Assistant >> >> >> Code >> >> >> Blanks/Nulls/Low Values/High Values
  • 9. White Paper Data Profiling Best Practices 9 Minimums/Maximums Patterns >> >> Duplicates / Inconsistencies Invalid Codes Identify Keys Key Testing Join Testing Low Value 000-00-0000 NULL High Value 999-99-9999 System System 1 Minimum 000-00-00001 Maximum System System 1 System 1 System 2 Values 123-45-6789 12-3456789 123456789 Pattern 9(3)-9(2)-(4) 9(2)-(7) 9(9) System System 1 System 1 Values 123-45-6789 123-45-6789 System System 1 System 1 Values 123-45-6789 123-45-6789
  • 11. White Paper Data Profiling Best Practices 11
  • 12. White Paper Data Profiling Best Practices For more information about our products and services, please log onto our website at www.g1.com or call us today at 888-413-6763. 4200 Parliament Place, Suite 600 Lanham, MD 20706-1844 1-888-413-6763 • www.g1.com Group 1, Group 1 Software and the Group 1 logo are registered trademarks of Group 1 Software, Inc. Pitney Bowes and the Pitney Bowes logo are registered trademarks and the Pitney Bowes Process Bar Design is a trademark of Pitney Bowes Inc. Group 1 Software is a Pitney Bowes company. All other marks referenced in this material are the property of their respective owners. © 2007 Group 1 Software, Inc. All rights reserved. An Equal Opportunity Employer. Printed in U.S.A.