SlideShare a Scribd company logo
Getting Started with
Regular Expressions
in MarcEdit
TERRY REESE
HEAD OF DIGITAL INITIATIVES, THE OHIO STATE
UNIVERSITY
Topics
MarcEdit Regular Expression Support Information
Understanding .NET Regular Expressions
◦ Major components of the language
◦ Understanding grouping mechanisms and references
How Does MarcEdit implement expressions
Getting Regular Expression Help
MarcEdit Regular Expression
Support
Functions that presently support regular expressions
◦ Delete Field
◦ Edit Field
◦ Copy Field
◦ Swap Field
◦ Build New Field
◦ Extract/Delete Records
◦ Validation Processing
◦ Linked Data tooling
◦ More…
MarcEdit Regular Expression
Support
When processing regular expressions with MarcEdit, MarcEdit makes
entire fields or subfields available for processing
◦ i.e., when processing a delete field function – all data from =[field number]
are part of the field that can be queried.
MarcEdit’s regular expression by default deals with one field at a time
(i.e., regular expressions do not allow you to find data across fields by
default)
MarcEdit’s Regular Expression Support is defined by Microsoft .NET’s
Regular Expression object
◦ This object uses a syntax that looks Perl-like, but has some differences.
Microsoft’s Regular Expression
language
Concepts:
◦ Character escapes
◦ Anchors
◦ Character classes
◦ Grouping
◦ Qualifiers
◦ Substitutions
MSDN Documentation: https://msdn.microsoft.com/en-
us/library/az24scfc(v=vs.110).aspx
PDF Quick Reference:
http://download.microsoft.com/download/D/2/4/D240EBF6-A9BA-4E4F-
A63F-AEB6DA0B921C/Regular%20expressions%20quick%20reference.pdf
How we use Regular
Expressions in MarcEdit
Your most important parts of the regular expression language are:
1. Character escapes: drn$x##
2. Character Classes [] & [^]
3. Grouping Elements ()
4. Anchors: ^$
5. Quantifiers: *?+{#}
6. Substitutions: $#
How Expressions Manifest in
MarcEdit
Part of understanding regular expressions in
MarcEdit, is understanding what data is exposed to
the Regular expression engine.
Each of MarcEdit’s global edit functions see different
levels of data
This is important to understand when:
 Creating processing strategies
 Knowing which global editing function to choose
Replace Function
Replace Function
Provides:
 Access to all field data
 Can be processed across fields
(lines)
 Can do preconditional
sorting/evaluation before
evaluating for replacement (can
search for data in one field, and
then perform and action on
another if true)
 Provides most access to record
data for evaluation
Add/Delete Function
Add/Delete Function
Provides:
 Access to all field data from the
equal sign to end of line
 No option to evaluate across fields
 Only available when deleting data
Edit Field Data
Edit Field Data Function
Provides:
 Access all data after the indicators
(no indicator or field data access)
 Can be used to break up fields into
new fields and do recursive
searching
Edit Subfield Data
Edit Subfield Data
Provides:
 Only provides access to the defined
subfield or control data positions
Regular Expression Basics
I like to think of regular expressions the same way as I think of
diagraming a sentence.
http://www.english-grammar-
revolution.com/images/puzzler_words_october_2012.jpg
Regular Expression Basics
I am trying to look at the data I want to replace and break it into its
component parts. For example if I wanted to add a period to the 500 if
it is missing
Source Fields:
=500 $aPrime meridians: Greenwich and Washington
=500 $aPrime meridians: Greenwich and Washington?
Structure:
Expression: (=500.*[^W])$
Examples
Looking at example.txt using the replace function:
◦ Add a period to the 500 if it is missing
◦ Add a $h of cartographic resources between the $a and $c .
◦ Split the 856 into two fields, breaking on the $u.
Examples 1
◦ Add a period to the 500 if it is missing
◦ Find What: (=500.*[^W])$
◦ Replace With: $1.
Explanation:
◦ (=500.*[^W])$
◦ Searches for the 500, then matches all data in the line, until you get to the final character. It
then evaluates the final character to see if it’s a not a word character
Example 2
◦ Add a $h of cartographic resources between the $a and $c .
Find What: (=245.{4})($a.*)(/.*)
◦ (=245.{4})
◦ Match the 245 field with any value in the next 4 characters being valid.
◦ ($a.*)
◦ Select everything within the subfield a
◦ (/$c.*)
◦ Select the / value and the subfield c (and other data)
Replace With: $1$2$$h[cartographic resource] $3
Example 3
Split the 856 into two fields, breaking on the $u.
◦ Find What: (=856.{4})($u.*[^$])($u.*)
◦ (=856.{4})
◦ Matches the 856 field
◦ ($u.*[^$])
◦ Match $u, but stop at the end of the subfield
◦ ($u.*)
◦ Match reminder of field
◦ Replace With: $1$2n=856 41$3
Lcase/ucase
MarcEdit’s regular expression engine includes to extension functions for
dealing with case switching of characters.
◦ lcase & ucase
◦ Usage: (=450.{4})($a.)(.*)
◦ $1$2lcase($3)
◦ Example: Find the 500 with all upper case characters and convert the case of
all values but the first letter in the sentence to lower case.
Multi-Field Replacements
By default, MarcEdit handles one field at a time when doing regular
expressions.
◦ However, when you need to do evaluations against multiple fields, you can
by adding /m to the end of your replacement in the Replace Function in the
MarcEditor
◦ This is a special function added to the MarcEdit regular expression engine
Delete Field Function
The delete field function exposes all the data in the field to be acted
upon as a regular expression.
◦ i.e. =856 .*
◦ So the first value in the Delete Field evaluation is an =, not the subfield data
◦ The reason to do this is to allow for explicit evaluations of indicators.
Getting Regular Expression
Help
The MarcEdit Listserv has a number of regular expression experts that
provide a lot of help to users looking for it
http://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L
Questions

More Related Content

PPTX
Spot galvanometer
PPT
Database Keys
PPT
SQL select statement and functions
PPT
Managing objects with data dictionary views
PDF
LuceneRDD for (Geospatial) Search and Entity Linkage
PPTX
Terry Reese - Real-world data editing with MarcEdit
PDF
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
PPTX
Fitting MarcEdit into the library software ecosystem
Spot galvanometer
Database Keys
SQL select statement and functions
Managing objects with data dictionary views
LuceneRDD for (Geospatial) Search and Entity Linkage
Terry Reese - Real-world data editing with MarcEdit
AALL 2015: Hands on Linked Data Tools for Catalogers: MarcEdit and MARCNext
Fitting MarcEdit into the library software ecosystem

Similar to Getting Started with Regular Expressions In MarcEdit (20)

PPTX
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
ODP
Mysqlppt
PPT
SQL Server 2008 Performance Enhancements
PDF
Spark SQL Deep Dive @ Melbourne Spark Meetup
ODP
ODP
PPT
Module02
PPTX
ADVANCE ITT BY PRASAD
PDF
Refactoring to Java 8 (Devoxx BE)
PDF
Introduction to DAX Language
PPT
Data structures using c
PPT
Data structures using C
PDF
PT- Oracle session01
PDF
BP208 Fabulous Feats with @Formula
DOCX
Database Management Lab -SQL Queries
DOC
NOTES ON "FOXPRO"
PPTX
SQL commands powerpoint presentation. Ppt
PPTX
Analysing Performance of Algorithmic SQL and PLSQL
PDF
MarcEdit Shelter-In-Place Webinar 7: Making Regular Expressions work for you ...
Mysqlppt
SQL Server 2008 Performance Enhancements
Spark SQL Deep Dive @ Melbourne Spark Meetup
Module02
ADVANCE ITT BY PRASAD
Refactoring to Java 8 (Devoxx BE)
Introduction to DAX Language
Data structures using c
Data structures using C
PT- Oracle session01
BP208 Fabulous Feats with @Formula
Database Management Lab -SQL Queries
NOTES ON "FOXPRO"
SQL commands powerpoint presentation. Ppt
Analysing Performance of Algorithmic SQL and PLSQL
Ad

More from Terry Reese (20)

PPTX
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
PPTX
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
PPTX
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
PPTX
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
PPTX
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
PPTX
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
PPTX
Working with the MarcEditor
PPTX
Slides from the NASIG 2018 Preconference
PPTX
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
PPTX
Rejoining the Information access landscape
PPTX
Open metadata, open systems…redrawing the library metadata landscape
PPTX
Thinking about Preservation: OSUL Content Manage Workflow
PDF
The world beyond MARC: let’s focus on asking the right questions
PPTX
Reframing Public Housing: Visualization and Data Analytics in History
PPTX
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
PPTX
Preparing Catalogers for Linked data
PPTX
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
PPTX
Practical approaches to entification in library bibliographic data
PPTX
Making RDA Easy(er) with MarcEdit
PPTX
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
MarcEdit Shelter-In-Place Webinar 8: Automated editing through scripts and to...
MarcEdit Shelter-In-Place Webinar 6: Regular Expressions and .NET, A Primer
MarcEdit Shelter-In-Place Webinar 5.5: Transliterations in MarcEdit
MarcEdit Shelter-In-Place Webinar 5: Working with MarcEdit's Linked Data Fram...
MarcEdit Shelter-In-Place Webinar 4: Merging, Clustering, and Integrations…oh...
MarcEdit Shelter-in-place Webinar 2.5: Getting Started with MarcEdit Mac
Working with the MarcEditor
Slides from the NASIG 2018 Preconference
Making complicated processes simple: a look at how MarcEdit 7 is expanding th...
Rejoining the Information access landscape
Open metadata, open systems…redrawing the library metadata landscape
Thinking about Preservation: OSUL Content Manage Workflow
The world beyond MARC: let’s focus on asking the right questions
Reframing Public Housing: Visualization and Data Analytics in History
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
Preparing Catalogers for Linked data
Harnessing the Lifecycle: Planning and Implementing a Strategic Digital Coll...
Practical approaches to entification in library bibliographic data
Making RDA Easy(er) with MarcEdit
Open Repositories 2014 Poster -- Managing Change: An Organizational Outline f...
Ad

Recently uploaded (20)

PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Computing-Curriculum for Schools in Ghana
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Institutional Correction lecture only . . .
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Pharma ospi slides which help in ospi learning
human mycosis Human fungal infections are called human mycosis..pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
Computing-Curriculum for Schools in Ghana
O5-L3 Freight Transport Ops (International) V1.pdf
01-Introduction-to-Information-Management.pdf
Cell Types and Its function , kingdom of life
Renaissance Architecture: A Journey from Faith to Humanism
FourierSeries-QuestionsWithAnswers(Part-A).pdf
GDM (1) (1).pptx small presentation for students
Institutional Correction lecture only . . .
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Pre independence Education in Inndia.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Anesthesia in Laparoscopic Surgery in India
2.FourierTransform-ShortQuestionswithAnswers.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPH.pptx obstetrics and gynecology in nursing
Pharma ospi slides which help in ospi learning

Getting Started with Regular Expressions In MarcEdit

  • 1. Getting Started with Regular Expressions in MarcEdit TERRY REESE HEAD OF DIGITAL INITIATIVES, THE OHIO STATE UNIVERSITY
  • 2. Topics MarcEdit Regular Expression Support Information Understanding .NET Regular Expressions ◦ Major components of the language ◦ Understanding grouping mechanisms and references How Does MarcEdit implement expressions Getting Regular Expression Help
  • 3. MarcEdit Regular Expression Support Functions that presently support regular expressions ◦ Delete Field ◦ Edit Field ◦ Copy Field ◦ Swap Field ◦ Build New Field ◦ Extract/Delete Records ◦ Validation Processing ◦ Linked Data tooling ◦ More…
  • 4. MarcEdit Regular Expression Support When processing regular expressions with MarcEdit, MarcEdit makes entire fields or subfields available for processing ◦ i.e., when processing a delete field function – all data from =[field number] are part of the field that can be queried. MarcEdit’s regular expression by default deals with one field at a time (i.e., regular expressions do not allow you to find data across fields by default) MarcEdit’s Regular Expression Support is defined by Microsoft .NET’s Regular Expression object ◦ This object uses a syntax that looks Perl-like, but has some differences.
  • 5. Microsoft’s Regular Expression language Concepts: ◦ Character escapes ◦ Anchors ◦ Character classes ◦ Grouping ◦ Qualifiers ◦ Substitutions MSDN Documentation: https://msdn.microsoft.com/en- us/library/az24scfc(v=vs.110).aspx PDF Quick Reference: http://download.microsoft.com/download/D/2/4/D240EBF6-A9BA-4E4F- A63F-AEB6DA0B921C/Regular%20expressions%20quick%20reference.pdf
  • 6. How we use Regular Expressions in MarcEdit Your most important parts of the regular expression language are: 1. Character escapes: drn$x## 2. Character Classes [] & [^] 3. Grouping Elements () 4. Anchors: ^$ 5. Quantifiers: *?+{#} 6. Substitutions: $#
  • 7. How Expressions Manifest in MarcEdit Part of understanding regular expressions in MarcEdit, is understanding what data is exposed to the Regular expression engine. Each of MarcEdit’s global edit functions see different levels of data This is important to understand when:  Creating processing strategies  Knowing which global editing function to choose
  • 9. Replace Function Provides:  Access to all field data  Can be processed across fields (lines)  Can do preconditional sorting/evaluation before evaluating for replacement (can search for data in one field, and then perform and action on another if true)  Provides most access to record data for evaluation
  • 11. Add/Delete Function Provides:  Access to all field data from the equal sign to end of line  No option to evaluate across fields  Only available when deleting data
  • 13. Edit Field Data Function Provides:  Access all data after the indicators (no indicator or field data access)  Can be used to break up fields into new fields and do recursive searching
  • 15. Edit Subfield Data Provides:  Only provides access to the defined subfield or control data positions
  • 16. Regular Expression Basics I like to think of regular expressions the same way as I think of diagraming a sentence. http://www.english-grammar- revolution.com/images/puzzler_words_october_2012.jpg
  • 17. Regular Expression Basics I am trying to look at the data I want to replace and break it into its component parts. For example if I wanted to add a period to the 500 if it is missing Source Fields: =500 $aPrime meridians: Greenwich and Washington =500 $aPrime meridians: Greenwich and Washington? Structure: Expression: (=500.*[^W])$
  • 18. Examples Looking at example.txt using the replace function: ◦ Add a period to the 500 if it is missing ◦ Add a $h of cartographic resources between the $a and $c . ◦ Split the 856 into two fields, breaking on the $u.
  • 19. Examples 1 ◦ Add a period to the 500 if it is missing ◦ Find What: (=500.*[^W])$ ◦ Replace With: $1. Explanation: ◦ (=500.*[^W])$ ◦ Searches for the 500, then matches all data in the line, until you get to the final character. It then evaluates the final character to see if it’s a not a word character
  • 20. Example 2 ◦ Add a $h of cartographic resources between the $a and $c . Find What: (=245.{4})($a.*)(/.*) ◦ (=245.{4}) ◦ Match the 245 field with any value in the next 4 characters being valid. ◦ ($a.*) ◦ Select everything within the subfield a ◦ (/$c.*) ◦ Select the / value and the subfield c (and other data) Replace With: $1$2$$h[cartographic resource] $3
  • 21. Example 3 Split the 856 into two fields, breaking on the $u. ◦ Find What: (=856.{4})($u.*[^$])($u.*) ◦ (=856.{4}) ◦ Matches the 856 field ◦ ($u.*[^$]) ◦ Match $u, but stop at the end of the subfield ◦ ($u.*) ◦ Match reminder of field ◦ Replace With: $1$2n=856 41$3
  • 22. Lcase/ucase MarcEdit’s regular expression engine includes to extension functions for dealing with case switching of characters. ◦ lcase & ucase ◦ Usage: (=450.{4})($a.)(.*) ◦ $1$2lcase($3) ◦ Example: Find the 500 with all upper case characters and convert the case of all values but the first letter in the sentence to lower case.
  • 23. Multi-Field Replacements By default, MarcEdit handles one field at a time when doing regular expressions. ◦ However, when you need to do evaluations against multiple fields, you can by adding /m to the end of your replacement in the Replace Function in the MarcEditor ◦ This is a special function added to the MarcEdit regular expression engine
  • 24. Delete Field Function The delete field function exposes all the data in the field to be acted upon as a regular expression. ◦ i.e. =856 .* ◦ So the first value in the Delete Field evaluation is an =, not the subfield data ◦ The reason to do this is to allow for explicit evaluations of indicators.
  • 25. Getting Regular Expression Help The MarcEdit Listserv has a number of regular expression experts that provide a lot of help to users looking for it http://metis3.gmu.edu/cgi-bin/wa?A0=MARCEDIT-L

Editor's Notes

  • #24: Lcase and ucase