SlideShare a Scribd company logo
Advanced redaction whitepaper
Advanced RedactionTechnology: How to Provide Secure Access,
Reduce Costs and Anticipate Future Legislative Requirements

More than 11 million adult Americans were victims of Identity Theft in 2009, a 10%
increase from 2008. The collective cost of these crimes was over $54 billion. Private
information, such as Social Security numbers, within online public records can be
vulnerable to cyber criminals. It is estimated that approximately 10% of Identity Theft
cases originate with personal data collected from government records.


A Growing Legislative Concern

Across the United States, strict legislation is being passed requiring State and County
governments to redact sensitive information, such as Social Security and credit card
numbers from the official and public record. North Carolina, Pennsylvania, Iowa and
Wisconsin recently passed records modernization laws mandating the redaction of
Social Security numbers. In some states, forward-thinking local and state government
officials have independently determined that it is their responsibility to protect
constituents from identity theft.




                                                       Source information gathered from
                                                       National Conference of State Legislators
                                                       Updated May 2010

                                            1 


                                                                                                   
Redaction Defined

Redaction, sometimes referred to as sanitization, is the permanent removal of personal
or sensitive information from hard copy or electronic documents. The traditional
technique of redacting confidential material from a paper document before its public
release involves crossing out portions of text with a wide black pen, followed by
photocopying the result. This manual processing of thousands or millions of document
pages is a time-consuming process that can strain staff resources. As public records
repositories shift from paper documents to electronic images, the challenges facing
state and local governments are also shifting. Complying with state legislation and
Federal “Sunshine” laws, such as the Freedom of Information Act and Openness in
Government Act, requires a records management system, strategy and workflow that
provide data security and accessibility. Deploying automated redaction technology is a
powerful tool for securing personal data and maintaining public record access.


Automated Redaction Technology

As public records are scanned, the electronic images are processed through Optical
Character Recognition (OCR) software that converts it into a digital format. This
conversion allows the document to become “searchable” by a rules-based search
engine that locates sensitive data within the OCR results. The search engine is
powered by rules, clues, pattern recognition and algorithms designed to locate user-
defined sensitive data types. After locating a sensitive data type, the software assigns a
value that measures how well the data matches the pattern and clues. For example,
the search engine may find a Social Security number by finding the clue “SSN” followed
by a pattern of numbers such as 123-45-6789. This example falls into the “high
confidence” range of results where the clue and the number pattern found by the search
engine are an exact match. On another document, the search engine may find the clue
“SSN” followed by eight numbers instead of the standard nine-character Social Security
number. This result may be defined as “medium confidence.” These values or
“confidence” classifications are used to streamline verification workflow.


Accuracy

Accuracy refers to a mathematical calculation involving the number of sensitive data
fields found by the software compared to the total number of sensitive data types within
the record. False positives occur when software locates non-sensitive data and marks it
for redaction. This type of error is included in the overall accuracy rate.

Accuracy is arguably the most important feature of automated redaction. Because no
industry standard exists for calculating accuracy, evaluating and comparing the
accuracy rates among redaction providers can be challenging. To help facilitate the

                                            2 


                                                                                              
evaluation process, vendors’ accuracy formulas must be transparent and
straightforward. If each sensitive data type undetected by the software represents a
failure to protect a citizen’s private information, it stands to reason that the software’s
accuracy rate should be downgraded for every occurrence of this type.

Pre-verification accuracy calculates how well the software locates sensitive data
automatically, without human intervention for quality assurance (verification). Achieving
a high pre-verification accuracy rate is critical for two reasons. When redaction software
automatically finds virtually all sensitive data within records, the security of individuals’
personal data is increased. Additionally, high pre-verification accuracy dramatically
reduces labor costs.


Verification

An important part of any redaction workflow is verification or quality control. The two
most influential factors that affect verification are the quality of the paper records before
scanning and the targeted level of accuracy (higher accuracy requires more
verification). Verification workflow is based on the particular needs of each client, and
generally includes three options: 1) Fully Automated Redaction, where the software
finds and redacts sensitive information automatically. 2) Semi-Automated Redaction
allows a step for an end user to verify each redaction. 3) A Hybrid Redaction approach
allows user-defined “high confidence” redactions to be automatically processed while
lower confidence results are submitted for verification.


Impact of Pre-Verification Accuracy on Labor Costs

To demonstrate the relationship between software accuracy and verification labor costs,
here is an example of a government office processing 40,000 image pages of records
per month utilizing the Hybrid Redaction workflow. Software #1 has a pre-verification
accuracy rate of 80% and Software #2 has an accuracy rate of 99%.


                            Verification Labor Costs:
         40,000 Pages of Records/Month Using Hybrid Redaction Workflow

                                         Software #1                  Software #2  
                                         (80% Accuracy)               (99% Accuracy) 
Pages Processed/Day                                          1,905                            1,905
Pages to be Verified/Day                                       381                               19
Verification Labor in Hours/Day                                 1.5                            0.08
Verification Labor in Hours/Month                             31.5                              1.7
Verification Labor in Hours/Year                               378                               20
~ Annual Verification Labor Costs                           $7,500                            $400 
                                              3 


                                                                                                       
Selecting a Redaction Provider

The redaction vendor selection process should consider 1) experience, 2) accuracy and
3) overall technology.

1. Experienced redaction providers have completed installations with many different
types of records management software. The exposure to different systems helps
seasoned customer support teams anticipate problems before they happen. Verification
labor is often the highest cost within a redaction project. Working with a team
experienced in verification workflow maximizes accuracy, minimizes human intervention
and saves money.

2. The quality of paper records and the complexity of the data to be redacted have an
impact on the accuracy that can achieved for each project. Under most circumstances,
high quality automated redaction can achieve a pre-verification accuracy rate of 95%.

3. Redaction is an evolving technology. Top vendors are constantly adding new
technology to improve accuracy and speed, and to meet the emerging needs of
governments.

Privacy and Information Security Regulations: What Does the Future Hold?

The threat of unauthorized access to sensitive information within public records is
unlikely to diminish in the near future. This proliferation may pave the way for additional
federal and state data security measures. Government offices that are complying with
existing regulations to redact Social Security numbers may face additional legislative
mandates in the future that require the redaction of additional data types. In fact, this is
already happening.

In 2003, the Florida legislature mandated the redaction of Financial Account information
including bank, credit and debit card numbers. Similarly, the Nevada legislature issued
a revised statute in 2006 to mandate the redaction of Drivers’ License numbers,
Identification Card numbers and Financial Account information including bank, credit
and debit card numbers. Government agencies can successfully navigate the redaction
of additional data fields in the future by leveraging today’s technology. As records are
being processed, reports can be created that identify specific documents that contain
the additional data fields (credit card number, drivers’ license number) that may need to
be redacted in the future. This captured information can be used to create a budget for
the additional verification process and to isolate suspected images for automatic/manual
redaction processing. The passage of time presents some problems for this approach.
Documents change and data capture tools and techniques improve rapidly. Using rules
and capture technology from a previous project may decrease accuracy and/or increase
verification labor costs. To maintain accuracy and keep manual labor costs low, a
better solution may be to save the OCR output from the original project to avoid
incurring the cost of rescanning, and write new custom rules for subsequent mandates
as they arise.
                                             4 


                                                                                                
Conclusion

At a minimum, redaction software can help government agencies make public
information available in a secure manner. Advanced technology can be harnessed to
save labor costs and eliminate a significant percentage of tedious data entry tasks.
Government agencies can gain significant, ongoing benefits from selecting a software
partner with leading edge technology.

ID Shield Redaction Software
ID Shield is a proven, cost-effective redaction solution that permanently removes private
information within records and documents. ID Shield Redaction Software customers
have redacted over one billion images. Extract Systems offers server-based and
desktop redaction software.

About the Author




Mark Miller is Vice President of Sales for Extract Systems, a leading provider of best-in-
class data capture and redaction software. Extract’s products are built to adapt and
integrate into any type of environment, providing flexibility, scalability and efficiency. The
productivity gains achieved with Extract Systems’ data automation solutions save
organizations money, improve workflow and eliminate paper.

For more information, please contact:
Extract Systems, LLC
6418 Normandy Lane, Suite 200
Madison, WI 53719
Phone: (877) 778-2543 or (608) 216-7950
E-mail: mark_miller@extractsystems.com
www.extractsystems.com
 
Sources: 
Javelin’s 2010 Consumer Identity Fraud Report
National Conference of State Legislatures
                                              5 


                                                                                                  

More Related Content

PDF
How to Secure Your Files with DLP and FAM
PPT
Logs & The Law: What is Admissible in Court?
PDF
Douglas2018 article an_overviewofsteganographytechn (1)
DOCX
1639(pm proofreading)(tracked)
PPT
Cupa pres a_2
PPTX
Privacy Implications of Biometric Data - Kevin Nevias
PDF
11.graphical password based hybrid authentication system for smart hand held ...
PDF
A frame work for data warehouse for nigerian police force
How to Secure Your Files with DLP and FAM
Logs & The Law: What is Admissible in Court?
Douglas2018 article an_overviewofsteganographytechn (1)
1639(pm proofreading)(tracked)
Cupa pres a_2
Privacy Implications of Biometric Data - Kevin Nevias
11.graphical password based hybrid authentication system for smart hand held ...
A frame work for data warehouse for nigerian police force

What's hot (20)

PDF
White_Papers
PDF
180926 ihan webinar 2
PDF
Sans Tech Paper Hardware Vs Software Encryption
PDF
Cloud Information Accountability Frameworks for Data Sharing in Cloud
PDF
Meaningful Use Risk Analysis Webinar
PDF
IoT/Big Data Patent Claim Drafting Strategy under Post-Alice 101 Eligibility ...
PDF
A Survey on Various Data Mining Technique in Intrusion Detection System
PDF
Whitepaper: Better Managing Patient Relationships and Information with eSigna...
PDF
IRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
PDF
20100420 ethosdata brochure
PDF
Social, political and technological considerations for national identity mana...
PDF
Startups - data protection
DOCX
Israel ministry of justice
PPTX
Successful stewardship Presentation
PDF
4 Steps to Financial Data Security Compliance Technologies to Help Your Finan...
PDF
A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...
PDF
IRJET-An Economical and Secured Approach for Continuous and Transparent User ...
PDF
IRJET - Effective Authentication of Medical IoT Devices using Authentication ...
PDF
Privacy Management System: Protect Data or Perish
DOCX
The 21 CFR Part 11 Compliance Checklist for Digital Applications
White_Papers
180926 ihan webinar 2
Sans Tech Paper Hardware Vs Software Encryption
Cloud Information Accountability Frameworks for Data Sharing in Cloud
Meaningful Use Risk Analysis Webinar
IoT/Big Data Patent Claim Drafting Strategy under Post-Alice 101 Eligibility ...
A Survey on Various Data Mining Technique in Intrusion Detection System
Whitepaper: Better Managing Patient Relationships and Information with eSigna...
IRJET - PHISCAN : Phishing Detector Plugin using Machine Learning
20100420 ethosdata brochure
Social, political and technological considerations for national identity mana...
Startups - data protection
Israel ministry of justice
Successful stewardship Presentation
4 Steps to Financial Data Security Compliance Technologies to Help Your Finan...
A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...
IRJET-An Economical and Secured Approach for Continuous and Transparent User ...
IRJET - Effective Authentication of Medical IoT Devices using Authentication ...
Privacy Management System: Protect Data or Perish
The 21 CFR Part 11 Compliance Checklist for Digital Applications
Ad

Viewers also liked (15)

PPT
Make a Bold Move in 2010
PDF
Building Advocacy With Women
PDF
Discovering Astronomy: Intro to the Cosmos
PDF
Discovering Astronomy Workshop 2013 July
PPT
Annette s rehvivahetus
PDF
Discovering Astronomy Workshop 2013 September
PPT
IM ECM PoV
PDF
Discovering Astronomy Workshop
PDF
Discovering Astronomy Workshop 2014 May
PDF
Astronomy
PDF
Adobe Photoshop Basics - Session 3
DOC
Silverlocks and the_three_monkeys
Make a Bold Move in 2010
Building Advocacy With Women
Discovering Astronomy: Intro to the Cosmos
Discovering Astronomy Workshop 2013 July
Annette s rehvivahetus
Discovering Astronomy Workshop 2013 September
IM ECM PoV
Discovering Astronomy Workshop
Discovering Astronomy Workshop 2014 May
Astronomy
Adobe Photoshop Basics - Session 3
Silverlocks and the_three_monkeys
Ad

Similar to Advanced redaction whitepaper (14)

PDF
Novell Access Governance Suite
PPTX
Improve Governance with Autoclassification
PPT
Brian Dirking Knowing Your Organizations Goals Before Choosing A Product
PPTX
Georgetown Law Guest Lecture 2012 6 2
PDF
How To Eliminate Security Exposures in Office 365 Webinar
PDF
Hiring Guide to the Information Security Profession
PDF
Data Breaches Preparedness (Credit Union Conference Session)
PPTX
Kegler Brown Managing Labor & Employee Relations Seminar 2013
PPT
Finding Your Lost Keys
PPTX
Cyber & Privacy Liability for Health Care Industry
DOCX
Running head SLEEP TIGHT INN 1SLEEP TIGHT INN2.docx
PPTX
Introducing OpenText Auto-Classification
PDF
Cyber security
PDF
lexmark-secure-content-monitor_fed-gov_solution-sheet_final-1-
Novell Access Governance Suite
Improve Governance with Autoclassification
Brian Dirking Knowing Your Organizations Goals Before Choosing A Product
Georgetown Law Guest Lecture 2012 6 2
How To Eliminate Security Exposures in Office 365 Webinar
Hiring Guide to the Information Security Profession
Data Breaches Preparedness (Credit Union Conference Session)
Kegler Brown Managing Labor & Employee Relations Seminar 2013
Finding Your Lost Keys
Cyber & Privacy Liability for Health Care Industry
Running head SLEEP TIGHT INN 1SLEEP TIGHT INN2.docx
Introducing OpenText Auto-Classification
Cyber security
lexmark-secure-content-monitor_fed-gov_solution-sheet_final-1-

Recently uploaded (20)

PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
HVAC Specification 2024 according to central public works department
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
IGGE1 Understanding the Self1234567891011
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
20th Century Theater, Methods, History.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PPTX
Introduction to Building Materials
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
Hazard Identification & Risk Assessment .pdf
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
HVAC Specification 2024 according to central public works department
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Computer Architecture Input Output Memory.pptx
Practical Manual AGRO-233 Principles and Practices of Natural Farming
FORM 1 BIOLOGY MIND MAPS and their schemes
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
History, Philosophy and sociology of education (1).pptx
IGGE1 Understanding the Self1234567891011
B.Sc. DS Unit 2 Software Engineering.pptx
Weekly quiz Compilation Jan -July 25.pdf
20th Century Theater, Methods, History.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Introduction to Building Materials
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Hazard Identification & Risk Assessment .pdf
MBA _Common_ 2nd year Syllabus _2021-22_.pdf

Advanced redaction whitepaper

  • 2. Advanced RedactionTechnology: How to Provide Secure Access, Reduce Costs and Anticipate Future Legislative Requirements More than 11 million adult Americans were victims of Identity Theft in 2009, a 10% increase from 2008. The collective cost of these crimes was over $54 billion. Private information, such as Social Security numbers, within online public records can be vulnerable to cyber criminals. It is estimated that approximately 10% of Identity Theft cases originate with personal data collected from government records. A Growing Legislative Concern Across the United States, strict legislation is being passed requiring State and County governments to redact sensitive information, such as Social Security and credit card numbers from the official and public record. North Carolina, Pennsylvania, Iowa and Wisconsin recently passed records modernization laws mandating the redaction of Social Security numbers. In some states, forward-thinking local and state government officials have independently determined that it is their responsibility to protect constituents from identity theft. Source information gathered from National Conference of State Legislators Updated May 2010 1   
  • 3. Redaction Defined Redaction, sometimes referred to as sanitization, is the permanent removal of personal or sensitive information from hard copy or electronic documents. The traditional technique of redacting confidential material from a paper document before its public release involves crossing out portions of text with a wide black pen, followed by photocopying the result. This manual processing of thousands or millions of document pages is a time-consuming process that can strain staff resources. As public records repositories shift from paper documents to electronic images, the challenges facing state and local governments are also shifting. Complying with state legislation and Federal “Sunshine” laws, such as the Freedom of Information Act and Openness in Government Act, requires a records management system, strategy and workflow that provide data security and accessibility. Deploying automated redaction technology is a powerful tool for securing personal data and maintaining public record access. Automated Redaction Technology As public records are scanned, the electronic images are processed through Optical Character Recognition (OCR) software that converts it into a digital format. This conversion allows the document to become “searchable” by a rules-based search engine that locates sensitive data within the OCR results. The search engine is powered by rules, clues, pattern recognition and algorithms designed to locate user- defined sensitive data types. After locating a sensitive data type, the software assigns a value that measures how well the data matches the pattern and clues. For example, the search engine may find a Social Security number by finding the clue “SSN” followed by a pattern of numbers such as 123-45-6789. This example falls into the “high confidence” range of results where the clue and the number pattern found by the search engine are an exact match. On another document, the search engine may find the clue “SSN” followed by eight numbers instead of the standard nine-character Social Security number. This result may be defined as “medium confidence.” These values or “confidence” classifications are used to streamline verification workflow. Accuracy Accuracy refers to a mathematical calculation involving the number of sensitive data fields found by the software compared to the total number of sensitive data types within the record. False positives occur when software locates non-sensitive data and marks it for redaction. This type of error is included in the overall accuracy rate. Accuracy is arguably the most important feature of automated redaction. Because no industry standard exists for calculating accuracy, evaluating and comparing the accuracy rates among redaction providers can be challenging. To help facilitate the 2   
  • 4. evaluation process, vendors’ accuracy formulas must be transparent and straightforward. If each sensitive data type undetected by the software represents a failure to protect a citizen’s private information, it stands to reason that the software’s accuracy rate should be downgraded for every occurrence of this type. Pre-verification accuracy calculates how well the software locates sensitive data automatically, without human intervention for quality assurance (verification). Achieving a high pre-verification accuracy rate is critical for two reasons. When redaction software automatically finds virtually all sensitive data within records, the security of individuals’ personal data is increased. Additionally, high pre-verification accuracy dramatically reduces labor costs. Verification An important part of any redaction workflow is verification or quality control. The two most influential factors that affect verification are the quality of the paper records before scanning and the targeted level of accuracy (higher accuracy requires more verification). Verification workflow is based on the particular needs of each client, and generally includes three options: 1) Fully Automated Redaction, where the software finds and redacts sensitive information automatically. 2) Semi-Automated Redaction allows a step for an end user to verify each redaction. 3) A Hybrid Redaction approach allows user-defined “high confidence” redactions to be automatically processed while lower confidence results are submitted for verification. Impact of Pre-Verification Accuracy on Labor Costs To demonstrate the relationship between software accuracy and verification labor costs, here is an example of a government office processing 40,000 image pages of records per month utilizing the Hybrid Redaction workflow. Software #1 has a pre-verification accuracy rate of 80% and Software #2 has an accuracy rate of 99%. Verification Labor Costs: 40,000 Pages of Records/Month Using Hybrid Redaction Workflow Software #1   Software #2      (80% Accuracy)  (99% Accuracy)  Pages Processed/Day  1,905 1,905 Pages to be Verified/Day  381 19 Verification Labor in Hours/Day  1.5 0.08 Verification Labor in Hours/Month  31.5 1.7 Verification Labor in Hours/Year  378 20 ~ Annual Verification Labor Costs  $7,500  $400  3   
  • 5. Selecting a Redaction Provider The redaction vendor selection process should consider 1) experience, 2) accuracy and 3) overall technology. 1. Experienced redaction providers have completed installations with many different types of records management software. The exposure to different systems helps seasoned customer support teams anticipate problems before they happen. Verification labor is often the highest cost within a redaction project. Working with a team experienced in verification workflow maximizes accuracy, minimizes human intervention and saves money. 2. The quality of paper records and the complexity of the data to be redacted have an impact on the accuracy that can achieved for each project. Under most circumstances, high quality automated redaction can achieve a pre-verification accuracy rate of 95%. 3. Redaction is an evolving technology. Top vendors are constantly adding new technology to improve accuracy and speed, and to meet the emerging needs of governments. Privacy and Information Security Regulations: What Does the Future Hold? The threat of unauthorized access to sensitive information within public records is unlikely to diminish in the near future. This proliferation may pave the way for additional federal and state data security measures. Government offices that are complying with existing regulations to redact Social Security numbers may face additional legislative mandates in the future that require the redaction of additional data types. In fact, this is already happening. In 2003, the Florida legislature mandated the redaction of Financial Account information including bank, credit and debit card numbers. Similarly, the Nevada legislature issued a revised statute in 2006 to mandate the redaction of Drivers’ License numbers, Identification Card numbers and Financial Account information including bank, credit and debit card numbers. Government agencies can successfully navigate the redaction of additional data fields in the future by leveraging today’s technology. As records are being processed, reports can be created that identify specific documents that contain the additional data fields (credit card number, drivers’ license number) that may need to be redacted in the future. This captured information can be used to create a budget for the additional verification process and to isolate suspected images for automatic/manual redaction processing. The passage of time presents some problems for this approach. Documents change and data capture tools and techniques improve rapidly. Using rules and capture technology from a previous project may decrease accuracy and/or increase verification labor costs. To maintain accuracy and keep manual labor costs low, a better solution may be to save the OCR output from the original project to avoid incurring the cost of rescanning, and write new custom rules for subsequent mandates as they arise. 4   
  • 6. Conclusion At a minimum, redaction software can help government agencies make public information available in a secure manner. Advanced technology can be harnessed to save labor costs and eliminate a significant percentage of tedious data entry tasks. Government agencies can gain significant, ongoing benefits from selecting a software partner with leading edge technology. ID Shield Redaction Software ID Shield is a proven, cost-effective redaction solution that permanently removes private information within records and documents. ID Shield Redaction Software customers have redacted over one billion images. Extract Systems offers server-based and desktop redaction software. About the Author Mark Miller is Vice President of Sales for Extract Systems, a leading provider of best-in- class data capture and redaction software. Extract’s products are built to adapt and integrate into any type of environment, providing flexibility, scalability and efficiency. The productivity gains achieved with Extract Systems’ data automation solutions save organizations money, improve workflow and eliminate paper. For more information, please contact: Extract Systems, LLC 6418 Normandy Lane, Suite 200 Madison, WI 53719 Phone: (877) 778-2543 or (608) 216-7950 E-mail: mark_miller@extractsystems.com www.extractsystems.com   Sources:  Javelin’s 2010 Consumer Identity Fraud Report National Conference of State Legislatures 5