SlideShare a Scribd company logo
Data Clean-Up:  Is there a Better Way? Margaret Hogarth ER&L, 2/2/2010 © Camille Pissaro, 1882, “Peasant Woman Digging,” ARTstor 40-06-12/42
To Be Covered: Commonalities Issues Deficits Excel Access MarcEdit Global Update Data Quality Policy Suggestions? © The Metropolitan Museum of Art, “Book of the Knowledge of Ingenious Mechanical Devices by al-Jazari,” ARTstor
Commonalities Data sources Tools Data clean-up Capture and use issues Needed technical skills bonobokids.com
General Approach Keep original data original; copy to a new worksheet Save frequently Use meaningful file/worksheet names Folder of “final” documents © John Vink, 2005, "CAMBODIA," ARTstor PAR287761
Issues Import issues System limitations Dirty data Non-standardized data Standardized but variable data Application-related issues Others? © Miguel Rio Branco, 1985, "BRAZIL," ARTstor PAR52903
Deficits Time Staff Budget Skills Development Confidence Big-picture view Systems © John Murphy, “Men Digging," ARTstor DMCC.1993.4
System Limitations Field character limits Report field limits
Import => Excel, Zeros Problem: Loss of trailing zero Loss of leading zeros Solution: Import > Delimited > Tab >  Text format Example: 1944826 =>1944-8260 14826  => 0001-4826
Add Hyphens to ISSNs - 1 Sort by ISSN to make sure leading zeros are intact. In a new column type the formula  =MID(A1,1,4)&”-”&MID(A1,5,4) Syntax: =MID(text,start_num,num_chars)
Add Hyphens to ISSNs - 2 Or use Cell Formatting Select range Format Cells (CTRL+1) > Number tab > Custom > 0000-0000 > OK
Import => Excel, Numbers Problem: ID # garbled Solution: Choose Number format Remove decimal places
Import => Excel, Commas Problem: Solution: Comma
Restore Dropped Leading Zeros Change to TEXT format  Sort by ISSN Use  =CONCATENATE(“000”,A1)  to add zeros
Excel: Remove Quotes Select column Find & Select > Replace > Find what: [space]” > Replace with: [leave blank]
Remove Non-Printable Characters Use  TRIM  (removes ASCII value 32 = space character, except single spaces between words) Use  CLEAN  (ASCII codes 0-31, Unicode 127, 129, 141, 143, 144, 157 Use  SUBSTITUTE  for higher codes Book of Hours, c. 1440 “Use of Tornai,” © ARTstor, Rawl.liturg. e.14_roll314.1_frame3
ASCII Characters http://en.wikipedia.org/wiki/ASCII 12/22/2009
Access: Subscript Out of Range - 1 Try These Steps: Check for spaces in column headings Use  TRIM  (removes ASCII value 32 = space character) Use  CLEAN  for ASCII code 0-31 Delete empty right columns Remove empty “used” cells: Find end of "used cells":  CTRL+SHIFT+END Select all empty “used” cells > Edit > Clear > All or Edit > Delete. Save the file.
Access: Subscript Out of Range - 2 Copy and paste cells into a new workbook. Save. Import into Access. Or, save file as CSV, import into Access. = Will see data error. Unknown, circa 1885-1900, “Sorting Mail,” © ARTstor MC212-D-99
Access: Type Conversion Failure Make sure data types in fields match data types in columns. Data like ISBNs are text but can be “read” like numbers. Add top row with correct data/type: XXX for ISBN
Access: Remove Quotes Search for records  with “”: Criteria:  LIKE "*" & Chr(34) & "*" Replace([SomeField],Chr(34),"")  will replace a quotation mark (") with a zero-length string © Erich Lessing, Bayeaux Tapestry, c. 1070-80, ARTstor  31-01-01/23
Access: ISSN Issues Find too-short ISSNs:  Len([FieldName])<n  [9 is good here] Find ISSNs without hyphens:  SELECT table.field, table.field FROM table WHERE (((table.field) Not Like &quot;*-*)); © Eve Arnold, 1979, “Hsishuang Panna Weeding,” ARTstor
Access & ARL Stats - 1 =Sum([YTD Total])  Sum of article downloads in COUNTER Journal 1 report. =[Jan-09]+[Feb-09]+[Mar-09]+[Apr-09]+[May-09]+[Jun-09]  Sum of Jan-Jun 2009 COUNTER J1. =[Jul-09]+[Aug-09]+[Sep-09]+[Oct-09]+[Nov-09]+[Dec-09]  Sum of Jul-Dec 2009 COUNTER J1.
Access & ARL Stats - 2 RowCount:Count(*)  Number of titles in a set. =[YTD Total]*[Cost]  Annual cost-per-use. Access Expressions: http://office.microsoft.com/en-us/access/HA011814491033.aspx
Access or Excel? Access: Relational Large amount of data Primary key Many people working Long text strings Excel: Non-relational Mostly numeric Calculations/Statistics Nelson, Emma. 2010. Using Access or Excel to Manage Your Data.  http://office.microsoft.com/en-us/help/HA010429181033.aspx See also: Microsoft. 2010. Examples of Expressions. http://office.microsoft.com/en-us/access/HA011814491033.aspx
XML - Excel Excel can interpret XML Data > Get External Data > From XML Data Import Format without affecting source data Later Excel: Activate Developer tab through Office logo (upper left)
MarcEdit: XML - 1 Convert large XML files to Excel Specify input, output files Choose MARC21XML => MARC
MarcEdit: XML - 2 Choose display fields, input, output files View, format in Excel
MarcEdit: MARC - 1 Convert large files to local practices DELETE  existing  999  field 910  field(s)… Copy 035 to 001 Remove (Sc-P) Prefix from 001 ADD   910 |aDEL SCP ; jc ; 2009/7/8 Field: 910 Field data: \\$aDEL SCP ; jc ; 2009/7/8 998 |an Field: 998 Field data: \\$an
MarcEdit: MARC - 2
MarcEdit Information By Terry Reese http://oregonstate.edu/~reeset/marcedit/html/  MARCEDIT-L  listserv at  https://listserv.gmu.edu/cgi-bin/wa?SUBED1=marcedit-l&A=1     Regular updates Tutorials, templates, scripts
ILS: “Global Update” For records within ILS system For universal changes “ Check website for coverage.”
A Better Way? Macros microsoft.public.excel (General Excel group) http://groups.google.com/groups/dir?sel=33606583&hl=en OzGrid Forum (Excel tips and VBA macros) http://www.ozgrid.com/forum/ http://www.lib-stats.org.uk/  (statistics listserv) [Courtesy Tansy Matthews]
Data Quality Strategies to Improve Data Quality: Identify problems Treat data as an asset Implement quality systems Principle Activities for Data: Acquire Store Use
Poor Quality Data Indicators Uncorrected errors Redundant data/ processes Lack of data for strategizing Frustration with data, data supplier, IT (c) 2006, SCALA, “Shoes,” 6 th  century BCE, ARTstor
Treat Data as an Asset Inventory data assets Data = dynamic; process = asset Align responsibilities: acquire, store, use data. Establish customer-supplier relationships for data.  © The Metropolitan Museum of Art, “Tournament,” late 16 th  century, ARTstor
Apply Quality Principles Create and keep a customer Detect and correct errors Determine root cause of defects Manage the process Communicate results Audit supplier performance
Library Data Quality Policy - 1 Suppliers/Creators: Understand users, uses, & requirements Ensure requirements are met Manage data creation process Data Processors: Avoid duplication Safeguard data Make data accessible Promote data quality in IT
Library Data Quality Policy - 2 Users: Define requirements, work with suppliers Provide feedback Interpret data correctly Use data legitimately Protect privacy Logistics: Determine master systems Understand system limitations Accessible storage Match inputs with needs Identify key keepers
[email_address] 951-827-2937 Digging for Coal,  The Illustrated Bartsch, vol 85, 1486, ARTstor 8586.1486/154  Other Techniques?
Bibliography Microsoft, 2009. Top Ten Ways to Clean Your Data. http://office.microsoft.com/en-us/excel/HA102218401033.aspx, accessed 12/18/2009. Use error checking to convert numbers that are stored as text to numbers. http://office.microsoft.com/en-us/excel/HP012167611033.aspx, accessed 12/22/2009. Apply a number format to numbers that are stored as text  http://office.microsoft.com/en-us/excel/HP012167611033.aspx Redman, Thomas C. 1995. Improve Data Quality for Competitive Advantage. Sloan Management Review, 36:2, 99-107. Rothschiller, Chad. 2007. Manipulating and Massaging Data in Excel. http://blogs.msdn.com/excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx 12/18/2009. Spencer, John.  March 6, 2008. Find/  /r eplace characters like quotes http://www.eggheadcafe.com/software/aspnet/31782118/findreplace-characters-l.aspx 12/22/2009

More Related Content

PPTX
2018 03 27_biological_databases_part4_v_upload
DOCX
How to work a database
PDF
Lecture10 ie321 dr_atifshahzad
PPTX
WebAppGeneXpress_How_To_Presentation
PDF
MS Access 2010 tutorial 3
PPTX
EnviroInsite training workshop - Database fundamentals
PPTX
Presentation on data preparation with pandas
PDF
SQL Joins and Query Optimization
2018 03 27_biological_databases_part4_v_upload
How to work a database
Lecture10 ie321 dr_atifshahzad
WebAppGeneXpress_How_To_Presentation
MS Access 2010 tutorial 3
EnviroInsite training workshop - Database fundamentals
Presentation on data preparation with pandas
SQL Joins and Query Optimization

What's hot (19)

PDF
MS Access 2010 tutorial 5
PDF
Microsoft Access Notes 2007 Ecdl
PPT
PPTX
Access lesson 06 Integrating Access
PPTX
MS Access Ch 2 PPT
PPTX
Wagx how to_presentation
PPTX
Access lesson 03 Creating Queries
PPT
E mine by V.DINESH KUMAR KSRCT
PDF
Access 2010
PDF
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
PPTX
Access lesson 02 Creating a Database
PPT
Uses of MS Access in Business
PPTX
MS Office Access Tutorial
PPTX
Access lesson 04 Creating and Modifying Forms
PPTX
PPTX
Advanced Filter Concepts in MS-Excel
PPTX
MS Access Ch 1 PPT
MS Access 2010 tutorial 5
Microsoft Access Notes 2007 Ecdl
Access lesson 06 Integrating Access
MS Access Ch 2 PPT
Wagx how to_presentation
Access lesson 03 Creating Queries
E mine by V.DINESH KUMAR KSRCT
Access 2010
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Access lesson 02 Creating a Database
Uses of MS Access in Business
MS Office Access Tutorial
Access lesson 04 Creating and Modifying Forms
Advanced Filter Concepts in MS-Excel
MS Access Ch 1 PPT
Ad

Similar to Data Clean-up: Is There A Better Way? (20)

PDF
Databases By ZAK
PPTX
Squirrel – Enabling Accessible Analytics for All
PDF
Day 4 - Excel Automation and Data Manipulation
PPT
Defense Against the Dark Arts: Protecting Your Data from ORMs
PPTX
HPD SQL Training - Beginner - 20220916.pptx
PPT
ASP.NET 08 - Data Binding And Representation
PPT
R-programming with example representation.ppt
PPT
R Programming for Statistical Applications
PPTX
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptx
PPTX
Physical Design and Development
PPT
R programming by ganesh kavhar
PPTX
How Clean is your Database? Data Scrubbing for all Skill Sets
PPT
SAS - overview of SAS
PPT
R Text-Based Data I/O and Data Frame Access and Manupulation
PPT
R-Programming.ppt it is based on R programming language
PPT
Basocs of statistics with R-Programming.ppt
PPT
Basics of R-Programming with example.ppt
PPT
R-programming-training-in-mumbai
PPTX
data types.pptx
PPT
Database Sizing
Databases By ZAK
Squirrel – Enabling Accessible Analytics for All
Day 4 - Excel Automation and Data Manipulation
Defense Against the Dark Arts: Protecting Your Data from ORMs
HPD SQL Training - Beginner - 20220916.pptx
ASP.NET 08 - Data Binding And Representation
R-programming with example representation.ppt
R Programming for Statistical Applications
ROLL NO 1 TO 9(G1) USE OF EXCEL IN CA PROFESSION (Final Draft).pptx
Physical Design and Development
R programming by ganesh kavhar
How Clean is your Database? Data Scrubbing for all Skill Sets
SAS - overview of SAS
R Text-Based Data I/O and Data Frame Access and Manupulation
R-Programming.ppt it is based on R programming language
Basocs of statistics with R-Programming.ppt
Basics of R-Programming with example.ppt
R-programming-training-in-mumbai
data types.pptx
Database Sizing
Ad

More from Electronic Resources & Libraries (20)

PPTX
Electronic Resources and Libraries Workshop at INFO 2012, Tel Aviv
PPT
Evaluating and Marketing Electronic Resources: What are You “Really” Doing to...
PPT
Wednesday Closing speakers: Where are we headed? Tools & Technologies for the...
PDF
Recommendation and the Library
PPT
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
PPTX
E-Book on the Roll @ The University Alabama Libraries
DOC
Evaluating and Marketing Electronic Resources - Kennedy supplement material
PPTX
Evaluating and Marketing Electronic Resources: What are You “Really” Doing to...
PPT
Developing a Methodology for Evaluating the Cost-effectiveness of Journal Pac...
PPTX
Developing a Methodology for Evaluating the Cost-effectiveness of Journal Pac...
PPTX
Living on the Bleeding Edge of Collection Development
PPTX
Where have all the print journals gone? Adapting Print Collections to an E-ce...
PPT
E-Book Management — It Sounds Serial!
PPT
Collaborating with IT to Deliver E-Reserves Using Drupal and Zotero
PPTX
Serials Assessment Comes of Age - Valuing the Bundles Jewell
PPTX
Serials Assessment Comes of Age - Caroll
PPTX
Patron- Driven Selection of eBooks - Safley
PPT
Patron- Driven Selection of eBooks - Hisle
DOCX
Library as Publisher - handout
PPTX
Library as Publisher
Electronic Resources and Libraries Workshop at INFO 2012, Tel Aviv
Evaluating and Marketing Electronic Resources: What are You “Really” Doing to...
Wednesday Closing speakers: Where are we headed? Tools & Technologies for the...
Recommendation and the Library
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
E-Book on the Roll @ The University Alabama Libraries
Evaluating and Marketing Electronic Resources - Kennedy supplement material
Evaluating and Marketing Electronic Resources: What are You “Really” Doing to...
Developing a Methodology for Evaluating the Cost-effectiveness of Journal Pac...
Developing a Methodology for Evaluating the Cost-effectiveness of Journal Pac...
Living on the Bleeding Edge of Collection Development
Where have all the print journals gone? Adapting Print Collections to an E-ce...
E-Book Management — It Sounds Serial!
Collaborating with IT to Deliver E-Reserves Using Drupal and Zotero
Serials Assessment Comes of Age - Valuing the Bundles Jewell
Serials Assessment Comes of Age - Caroll
Patron- Driven Selection of eBooks - Safley
Patron- Driven Selection of eBooks - Hisle
Library as Publisher - handout
Library as Publisher

Recently uploaded (20)

PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Complications of Minimal Access Surgery at WLH
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
master seminar digital applications in india
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Computing-Curriculum for Schools in Ghana
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Supply Chain Operations Speaking Notes -ICLT Program
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Cell Structure & Organelles in detailed.
A systematic review of self-coping strategies used by university students to ...
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Orientation - ARALprogram of Deped to the Parents.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Final Presentation General Medicine 03-08-2024.pptx
human mycosis Human fungal infections are called human mycosis..pptx
Pharma ospi slides which help in ospi learning
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
FourierSeries-QuestionsWithAnswers(Part-A).pdf
GDM (1) (1).pptx small presentation for students
Final Presentation General Medicine 03-08-2024.pptx
master seminar digital applications in india
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Computing-Curriculum for Schools in Ghana
3rd Neelam Sanjeevareddy Memorial Lecture.pdf

Data Clean-up: Is There A Better Way?

  • 1. Data Clean-Up: Is there a Better Way? Margaret Hogarth ER&L, 2/2/2010 © Camille Pissaro, 1882, “Peasant Woman Digging,” ARTstor 40-06-12/42
  • 2. To Be Covered: Commonalities Issues Deficits Excel Access MarcEdit Global Update Data Quality Policy Suggestions? © The Metropolitan Museum of Art, “Book of the Knowledge of Ingenious Mechanical Devices by al-Jazari,” ARTstor
  • 3. Commonalities Data sources Tools Data clean-up Capture and use issues Needed technical skills bonobokids.com
  • 4. General Approach Keep original data original; copy to a new worksheet Save frequently Use meaningful file/worksheet names Folder of “final” documents © John Vink, 2005, &quot;CAMBODIA,&quot; ARTstor PAR287761
  • 5. Issues Import issues System limitations Dirty data Non-standardized data Standardized but variable data Application-related issues Others? © Miguel Rio Branco, 1985, &quot;BRAZIL,&quot; ARTstor PAR52903
  • 6. Deficits Time Staff Budget Skills Development Confidence Big-picture view Systems © John Murphy, “Men Digging,&quot; ARTstor DMCC.1993.4
  • 7. System Limitations Field character limits Report field limits
  • 8. Import => Excel, Zeros Problem: Loss of trailing zero Loss of leading zeros Solution: Import > Delimited > Tab > Text format Example: 1944826 =>1944-8260 14826 => 0001-4826
  • 9. Add Hyphens to ISSNs - 1 Sort by ISSN to make sure leading zeros are intact. In a new column type the formula =MID(A1,1,4)&”-”&MID(A1,5,4) Syntax: =MID(text,start_num,num_chars)
  • 10. Add Hyphens to ISSNs - 2 Or use Cell Formatting Select range Format Cells (CTRL+1) > Number tab > Custom > 0000-0000 > OK
  • 11. Import => Excel, Numbers Problem: ID # garbled Solution: Choose Number format Remove decimal places
  • 12. Import => Excel, Commas Problem: Solution: Comma
  • 13. Restore Dropped Leading Zeros Change to TEXT format Sort by ISSN Use =CONCATENATE(“000”,A1) to add zeros
  • 14. Excel: Remove Quotes Select column Find & Select > Replace > Find what: [space]” > Replace with: [leave blank]
  • 15. Remove Non-Printable Characters Use TRIM (removes ASCII value 32 = space character, except single spaces between words) Use CLEAN (ASCII codes 0-31, Unicode 127, 129, 141, 143, 144, 157 Use SUBSTITUTE for higher codes Book of Hours, c. 1440 “Use of Tornai,” © ARTstor, Rawl.liturg. e.14_roll314.1_frame3
  • 17. Access: Subscript Out of Range - 1 Try These Steps: Check for spaces in column headings Use TRIM (removes ASCII value 32 = space character) Use CLEAN for ASCII code 0-31 Delete empty right columns Remove empty “used” cells: Find end of &quot;used cells&quot;: CTRL+SHIFT+END Select all empty “used” cells > Edit > Clear > All or Edit > Delete. Save the file.
  • 18. Access: Subscript Out of Range - 2 Copy and paste cells into a new workbook. Save. Import into Access. Or, save file as CSV, import into Access. = Will see data error. Unknown, circa 1885-1900, “Sorting Mail,” © ARTstor MC212-D-99
  • 19. Access: Type Conversion Failure Make sure data types in fields match data types in columns. Data like ISBNs are text but can be “read” like numbers. Add top row with correct data/type: XXX for ISBN
  • 20. Access: Remove Quotes Search for records with “”: Criteria: LIKE &quot;*&quot; & Chr(34) & &quot;*&quot; Replace([SomeField],Chr(34),&quot;&quot;) will replace a quotation mark (&quot;) with a zero-length string © Erich Lessing, Bayeaux Tapestry, c. 1070-80, ARTstor 31-01-01/23
  • 21. Access: ISSN Issues Find too-short ISSNs: Len([FieldName])<n [9 is good here] Find ISSNs without hyphens: SELECT table.field, table.field FROM table WHERE (((table.field) Not Like &quot;*-*)); © Eve Arnold, 1979, “Hsishuang Panna Weeding,” ARTstor
  • 22. Access & ARL Stats - 1 =Sum([YTD Total]) Sum of article downloads in COUNTER Journal 1 report. =[Jan-09]+[Feb-09]+[Mar-09]+[Apr-09]+[May-09]+[Jun-09] Sum of Jan-Jun 2009 COUNTER J1. =[Jul-09]+[Aug-09]+[Sep-09]+[Oct-09]+[Nov-09]+[Dec-09] Sum of Jul-Dec 2009 COUNTER J1.
  • 23. Access & ARL Stats - 2 RowCount:Count(*) Number of titles in a set. =[YTD Total]*[Cost] Annual cost-per-use. Access Expressions: http://office.microsoft.com/en-us/access/HA011814491033.aspx
  • 24. Access or Excel? Access: Relational Large amount of data Primary key Many people working Long text strings Excel: Non-relational Mostly numeric Calculations/Statistics Nelson, Emma. 2010. Using Access or Excel to Manage Your Data. http://office.microsoft.com/en-us/help/HA010429181033.aspx See also: Microsoft. 2010. Examples of Expressions. http://office.microsoft.com/en-us/access/HA011814491033.aspx
  • 25. XML - Excel Excel can interpret XML Data > Get External Data > From XML Data Import Format without affecting source data Later Excel: Activate Developer tab through Office logo (upper left)
  • 26. MarcEdit: XML - 1 Convert large XML files to Excel Specify input, output files Choose MARC21XML => MARC
  • 27. MarcEdit: XML - 2 Choose display fields, input, output files View, format in Excel
  • 28. MarcEdit: MARC - 1 Convert large files to local practices DELETE existing 999 field 910 field(s)… Copy 035 to 001 Remove (Sc-P) Prefix from 001 ADD   910 |aDEL SCP ; jc ; 2009/7/8 Field: 910 Field data: \\$aDEL SCP ; jc ; 2009/7/8 998 |an Field: 998 Field data: \\$an
  • 30. MarcEdit Information By Terry Reese http://oregonstate.edu/~reeset/marcedit/html/ MARCEDIT-L listserv at  https://listserv.gmu.edu/cgi-bin/wa?SUBED1=marcedit-l&A=1    Regular updates Tutorials, templates, scripts
  • 31. ILS: “Global Update” For records within ILS system For universal changes “ Check website for coverage.”
  • 32. A Better Way? Macros microsoft.public.excel (General Excel group) http://groups.google.com/groups/dir?sel=33606583&hl=en OzGrid Forum (Excel tips and VBA macros) http://www.ozgrid.com/forum/ http://www.lib-stats.org.uk/ (statistics listserv) [Courtesy Tansy Matthews]
  • 33. Data Quality Strategies to Improve Data Quality: Identify problems Treat data as an asset Implement quality systems Principle Activities for Data: Acquire Store Use
  • 34. Poor Quality Data Indicators Uncorrected errors Redundant data/ processes Lack of data for strategizing Frustration with data, data supplier, IT (c) 2006, SCALA, “Shoes,” 6 th century BCE, ARTstor
  • 35. Treat Data as an Asset Inventory data assets Data = dynamic; process = asset Align responsibilities: acquire, store, use data. Establish customer-supplier relationships for data. © The Metropolitan Museum of Art, “Tournament,” late 16 th century, ARTstor
  • 36. Apply Quality Principles Create and keep a customer Detect and correct errors Determine root cause of defects Manage the process Communicate results Audit supplier performance
  • 37. Library Data Quality Policy - 1 Suppliers/Creators: Understand users, uses, & requirements Ensure requirements are met Manage data creation process Data Processors: Avoid duplication Safeguard data Make data accessible Promote data quality in IT
  • 38. Library Data Quality Policy - 2 Users: Define requirements, work with suppliers Provide feedback Interpret data correctly Use data legitimately Protect privacy Logistics: Determine master systems Understand system limitations Accessible storage Match inputs with needs Identify key keepers
  • 39. [email_address] 951-827-2937 Digging for Coal, The Illustrated Bartsch, vol 85, 1486, ARTstor 8586.1486/154 Other Techniques?
  • 40. Bibliography Microsoft, 2009. Top Ten Ways to Clean Your Data. http://office.microsoft.com/en-us/excel/HA102218401033.aspx, accessed 12/18/2009. Use error checking to convert numbers that are stored as text to numbers. http://office.microsoft.com/en-us/excel/HP012167611033.aspx, accessed 12/22/2009. Apply a number format to numbers that are stored as text http://office.microsoft.com/en-us/excel/HP012167611033.aspx Redman, Thomas C. 1995. Improve Data Quality for Competitive Advantage. Sloan Management Review, 36:2, 99-107. Rothschiller, Chad. 2007. Manipulating and Massaging Data in Excel. http://blogs.msdn.com/excel/archive/2007/11/12/manipulating-and-massaging-data-in-excel.aspx 12/18/2009. Spencer, John. March 6, 2008. Find/ /r eplace characters like quotes http://www.eggheadcafe.com/software/aspnet/31782118/findreplace-characters-l.aspx 12/22/2009

Editor's Notes

  • #2: Creator Camille Pissarro Title Peasant Woman Digging Date 1882 Material oil on canvas Measurements 65 x 54 cm Description Inscription: signed and dated lower left: C. Pissarro. 82 Repository Private Collection ARTstor Collection Art, Archaeology and Architecture (Erich Lessing Culture and Fine Arts Archives) ID Number 40-06-12/42 Source Image and original data provided by Erich Lessing Culture and Fine Arts Archives/ART RESOURCE, N.Y. http://www.artres.com/c/htm/Home.aspx http://www.artres.com/c/htm/TreePfLight.aspx?ID=LES Rights Photo Credit: Erich Lessing/ART RESOURCE, N.Y.
  • #3: Creator Author: Abu&apos;l Izz Isma&apos;il al-Jazari; Copyist: Farkh ibn `Abd al-Latif Culture Islamic Title Book of the Knowledge of Ingenious Mechanical Devices by al-Jazari Work Type Illustrated Manuscript, Folio Period Mamluk period (1250-1517) Date A.H. 715/ 1315-16 Location Made in: Syria Material Ink, colors, and gold on paper Measurements H. 11 13/16 in. x W. 7 3/4 in. (30 cm x19.7 cm) Credit Line The Metropolitan Museum of Art, Bequest of Cora Timken Burnett, 1956 (57.51.23) Image Copyright Notice Image © The Metropolitan Museum of Art Repository The Metropolitan Museum of Art http://www.metmuseum.org ARTstor Collection Metropolitan Museum of Art - Images for Academic Publishing ID Number 8718 Source Data From: The Metropolitan Museum of Art Rights This image was provided by The Metropolitan Museum of Art. Contact information: Image Library, The Metropolitan Museum of Art, 1000 Fifth Avenue, New York, NY 10028, (212) 396-5050 (fax), Scholars.License@MetMuseum.org Image © The Metropolitan Museum of Art
  • #5: Creator John Vink Title CAMBODIA. Pourk (Siem Reap). 8/02/2005: Chantiers Ecoles runs skills development programs aimed at un-or less-educated villagers in Siem Reap province. A silkfarm trains villagers, here seen sorting cocoons, in all the different techniques in order to reestablish an age-old tradition in high quality silk weaving. Date 2005 Subject Pourk Cambodia ARTstor Collection Magnum Photos ID Number PAR287761.jpg Source Image and original data provided by Magnum Photos
  • #6: Creator Miguel Rio Branco Title BRAZIL. 1985. Serra Pelada (Hill of Gold) gold mine, proved one of the richest deposits of alluvial gold ever found and it is considered to be one of the largest in the world. It is located 270 miles south of Belem on the Amazon delta. Serra Pelada mine is controlled by the State which distributes the &amp;quot;barrancos&amp;quot;, 6 meters square soil each one, to the various owners &amp;quot;garimperos&amp;quot; (gold diggers or fortune hunters) according to their seniority. The &amp;quot;garimperos&amp;quot; are only allowed to dig vertically into the earth in order to avoid encroaching onto the other &amp;quot;barranco&amp;quot;. Whenever gold is found is putted into sacks which are supervised at the edge of the &amp;quot;barranco&amp;quot;. Each worker, &amp;quot;mudhog&amp;quot;, is allowed to chose a sack as a premium for his work. Then the sacks are taken to a sifting and sorting area belonging to the owner. There are over 50,000 &amp;quot;garimperos&amp;quot; with their workers who enter to the gold mine every day. The mine is only open the dry season from September through January bescause of the nature of the landscape machinery. Date 1985 ARTstor Collection Magnum Photos ID Number PAR52903.jpg Source Image and original data provided by Magnum Photos http://www.magnumphotos.com/ Rights ©Miguel Rio Branco / Magnum Photos
  • #7: Creator John Murphy American, 1888-1968, North American; American Title Men Digging Work Type Prints Date unknown Material wood engraving Measurements sheet: 12 3/16 x 14 11/16 in. (31 x 37.3 cm) Description Full View Repository Davis Museum and Cultural Center, Wellesley College Wellesley, Massachusetts, USA Museum purchase 1993.4 http://www.davismuseum.wellesley.edu/ ARTstor Collection Davis Museum and Cultural Center Collection (Wellesley College) Formerly in The AMICO Library ID Number DMCC.1993.4 Source Data From: Davis Museum and Cultural Center, Wellesly College Rights This image was provided by Davis Museum and Cultural Center, Wellesley College. Contact information: Jim Olson, Coordinator of Technology, Davis Museum and Cultural Center, Wellesley College, 106 Central Street, Wellesley, MA 02481, (781) 283-3234 (ph), (781) 283-2064 (fax), jolson@wellesley.edu.
  • #16: Title Book of Hours. Use of Tournai. Folio #: fol. 007r Work Type Manuscript Date c. 1440 Material parchment School Flemish Description Calendar. March: peasant digging with mattock. Miniatures attributed to the Master of Guillebert of Metz. Repository Bodleian Library, University of Oxford http://www.bodley.ox.ac.uk/ Accession Number Shelfmark: MS. Rawl. liturg. e. 14 ARTstor Collection Manuscripts and Early Printed Books (Bodleian Library, Oxford University) ID Number Rawl.liturg.e.14_roll314.1_frame3 Source Image and original data provided by the Bodleian Library, University of Oxford. Rights Copyright Bodleian Library, University of Oxford.
  • #17: ASCII includes definitions for 128 characters: 33 are non-printing control characters (now mostly obsolete) that affect how text and space is processed; 94 are printable characters, and the space is considered an invisible graphic.The most commonly used character encoding on the World Wide Web was US-ASCII until 2008, when it was surpassed by UTF-8. http://en.wikipedia.org/wiki/ASCII 12/22/2009
  • #19: Creator unknown Title Sorting mail. Work Type glass negatives Date n.d. (circa 1885-1900) Material black and white photograph Measurements [no print] Description Inscription: [See individual photos for captions.] Related Item http://nrs.harvard.edu/urn-3:RAD.SCHL:sch00140 Subject United States Postal Service Animals Horse-drawn vehicles Horses Postal service Wagons ARTstor Collection The Schlesinger History of Women in America Collection Source Photograph Number: MC212-D-99 Arthur and Elizabeth Schlesinger Library on the History of Women in America (Radcliffe Institute for Advanced Study, Harvard University) Data From: The Schlesinger History of Women in America Collection Folder Number: D Folder Title: Marian Clark Nichols, 1873-1953: Civil service reform activities: Lantern slides [and glass plate negatives] re: civil service reform, including Thomas Nast cartoons, state and federal statisticas and legislation, and pictures of workers Collection Number: MC212 Collection Name: Emerson, Eugenie Homer, 1854-1940 Collection Title: Papers, 1806-1953 (inclusive) Rights This image has been made available by the Schlesinger Library, Radcliffe Institute, Harvard University solely for noncommercial educational and scholarly purposes. Your use of this image is restricted to those permitted uses specified in the ARTstor Digital Library Terms and Conditions of Use (http://www.artstor.org/info/about/terms_conditions.jsp). To request permission for any other use, please contact the Schlesinger Library at slref@radcliffe.edu. Download Size 1024,1024
  • #21: Title Bayeux Tapestry; scene 35,a: Duke William Orders Ships to Be Built; detail of chopping trees Work Type textile Date c. 1070-80 Material wool embroidery on linen Measurements 53 cm x 69 m Style Period Romanesque Repository Centre Guillaume le Conquérant, Bayeux, France ARTstor Collection Art, Archaeology and Architecture (Erich Lessing Culture and Fine Arts Archives) ID Number 31-01-01/23 Source Image and original data provided by Erich Lessing Culture and Fine Arts Archives/ART RESOURCE, N.Y. http://www.artres.com/c/htm/Home.aspx http://www.artres.com/c/htm/TreePfLight.aspx?ID=LES Rights Photo Credit: Erich Lessing/ART RESOURCE, N.Y. Please note that if this image is under copyright, you may need to contact one or more copyright owners for any use that is not permitted under the ARTstor Terms and Conditions of Use or not otherwise permitted by law. While ARTstor tries to update contact information, it cannot guarantee that such information is always accurate. Determining whether those permissions are necessary, and obtaining such permissions, is your sole responsibility. Download Size 1024,1024
  • #22: Creator Arnold, Eve Title Hsishuang Panna Weeding Date 1979 Location China Subject China Agriculture Farmers Mountains Photography--20th C. A.D documentary books farms ARTstor Collection ARTstor Slide Gallery Source Data from: University of California, San Diego Download Size 1024,1024
  • #35: Culture Etruscan Title Shoes (Sandals) Work Type woodwork Date 6th century BCE Material wood, bronze Description from Bisenzio Olmo Bello Tomb XVIII Repository Museo nazionale di Villa Giulia ARTstor Collection Italian and other European Art (Scala Archives) Source Image and original data provided by SCALA, Florence/ART RESOURCE, N.Y. http://www.artres.com/c/htm/Home.aspx http://www.scalarchives.com Rights (c) 2006, SCALA, Florence / ART RESOURCE, N.Y.
  • #36: Culture German (Nuremberg) Title Tournament Book Date late 16th century Material Pen and colored wash on paper Credit Line The Metropolitan Museum of Art, Rogers Fund, 1922 (22.229) Image Copyright Notice Image © The Metropolitan Museum of Art Repository The Metropolitan Museum of Art http://www.metmuseum.org ARTstor Collection Metropolitan Museum of Art - Images for Academic Publishing ID Number 3389 Source Data From: The Metropolitan Museum of Art Rights This image was provided by The Metropolitan Museum of Art. Contact information: Image Library, The Metropolitan Museum of Art, 1000 Fifth Avenue, New York, NY 10028, (212) 396-5050 (fax), Scholars.License@MetMuseum.org Image © The Metropolitan Museum of Art
  • #40: Creator Anonymous Artists Title Digging for Coal Upon Seeing a Swallow Guarantees Freedom from Fever and Headaches for a Year Series Title BUCH DER TUGEND | Book of Virtue Date 1486 Technique woodcut Description SUPERSTITION ARTstor Collection The Illustrated Bartsch ID Number 8586.1486/154 SCHRAMM, 23.639 Source The Illustrated Bartsch. Vol. 85, German Book Illustration before 1500: Anonymous Artists, 1484-1486 Retrospective conversion of The Illustrated Bartsch (Abaris Books) by ARTstor Inc. and authorized contractors Download Size 1024,1024