SlideShare a Scribd company logo
Perl Development Sample Courseware Created October 2008 © Garth Gilmour 2008
Overview This is a three day course Core hours and breaks are flexible… The course has three goals Familiarize you with the Perl language Learn the different styles of Perl development Utility scripts, longer programs and OO applications Explore commonly used Perl modules (libraries) Please control the course Ask as many questions as possible Speed up or slow down the pace Request extra examples and exercises Don’t sit in misery!! © Garth Gilmour 2008
Introduction to Perl History and Basic Concepts © Garth Gilmour 2008
Introduction to Perl The Perl language was created by Larry Wall to simplify his text manipulation problems It contains a superset of all the functionality provided by shell scripting, sed and awk, plus many extra features The language is supported by a huge set of libraries Which can be downloaded free of charge from the Comprehensive Perl Archive Network website (CPAN) There are two expansions of the name: P ractical  E xtracting and  R eporting  L anguage P athologically  E clectic  R ubbish  L ister © Garth Gilmour 2008
Versions of Perl and Competitors Version 5 is the current edition of Perl It represents how far the language and interpreter could progress whilst maintaining backward compatibility Perl 6 is a complete rewrite Of both the language and the interpreter The Perl 6 interpreter is known as Parrot It has been under development for a long time  Today Perl has serious competition Python and Ruby are closely related languages They both aim to have cleaner syntax and better OO © Garth Gilmour 2008
Comparing Perl, Python and Ruby © Garth Gilmour 2008 Perl  Python Ruby Documentation ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦  ♦ ♦ ♦  Library Support ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ OO Support ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Power ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Approachability ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Ease of Mastery ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ C / C++ Interop ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Java / .NET Interop ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Web Frameworks ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦
Common Applications of Perl There are three possible levels of Perl coding: Short utility scripts Written in a procedural manner Without high level organization Medium size programs Divided into multiple subroutines grouped within modules Large applications Built using the concepts of object-oriented design Perl is happiest at the first level Historically that’s where its home is Problems exist at the other levels  © Garth Gilmour 2008
© Garth Gilmour 2008 Scripts Core language features Programs Strict module Subroutines References Applications Best Practises Named Parameters Modules and classes
Learning Programming in Perl You can start coding in Perl very quickly Just place some code within a ‘.pl’ file No need to create a special ‘main’ method But Perl isn't a beginners language The interpreter assumes you know what you are doing Many things that you expect not to compile actually do E.g. ‘$fred{‘barney’} = [‘wilma’, ‘betty’]’ creates:  A table (aka hash) called ‘fred’ with one row In the single row the key is ‘barney’ The value is the address of a new array The array has two boxes holding ‘wilma’ and ‘betty’  © Garth Gilmour 2008
Common Applications of Perl Perls ‘sweet spot’ is text manipulation Reading text from a file, loading it into data structures manipulating it and generating formatted output This utility is exploited in three ways By system administrators managing large networks By developers and QA staff writing test harnesses By authors and publishers managing documents Perl used to be the main language for Web Apps But has been replaced by languages with better networking support and component models (mainly Java and C#) © Garth Gilmour 2008
Declaring Variables in Perl Perl prefixes variables with special symbols These are commonly known as sigils In advanced coding they are used in combination Sigils make Perl code appear baffling at first Once you learn to read them they are a big help © Garth Gilmour 2008 Sigil Description $ Scalar variable - holds a single number string or reference @ Array variable - a sequence of one or more scalars % Hash variable - a maps of keys and values (both scalars) \ Reference - used to specify a scalar holds a memory address & Function - used to specify a symbol is a function name * Typeglob - used for manipulating symbol tables (advanced)
Variables and Barewords Forgetting to add a sigil is a common mistake It can lead to unpredictable results in your code An identifier without a sigil is a ‘bareword’ The interpreter searches for a subroutine, package name, label or file-handle with that name Otherwise the identifier is assumed to be an unquoted string You should never deliberately use barewords The meaning of your code will change if someone introduces a function or filename with the same value Consider ‘print abc, “def”, “ghi”’ The bareword ‘abc’ is understood as a filehandle © Garth Gilmour 2008
Variables and Symbol Tables Sigils in Perl enable an unusual language feature You can have more than one variable with the same name As long as the variables are of different types Perl stores details of variables in symbol tables Each symbol name (e.g. ‘fred’) is associated with a typeglob A typeglob stores the memory associated with the scalar called ‘fred’, the array and hash called ‘fred’ etc… Advanced coding techniques utilize typeglobs They are data types in their own right with their own sigil © Garth Gilmour 2008
© Garth Gilmour 2008 Symbol Table Typeglob $fred = 12; @fred = (12,13,14); %fred = (k1 =>12, k2 => 14); sub fred { return 101; } Name Link fred Others… Type Link $ @ % &
Perl Comments Standard Perl comments start with ‘#’ This comments out everything to the end of the line A multi-line comment is here-document or ‘heredoc’ This is signified by ‘<<‘ followed by an identifier There must be no space between the two  Unless the identifier is quoted The identifier is taken as terminating the comment When it appears by itself without quotes or whitespace Note that comments are allowed in regex’s If you use the extended regular expression syntax © Garth Gilmour 2008
Using Heredoc Comments © Garth Gilmour 2008 $myvar = << &quot;THE_END&quot;; More than prince of cats, I can tell you. O, he is the courageous captain of compliments. He fights  as you sing prick-song, keeps time, distance, and proportion; rests me his minim rest, one, two, and the  third in your bosom: the very butcher of a silk button,  a duellist, a duellist; a gentleman of the very first  house, of the first and second cause: ah, the  immortal passado! the punto reverso! the hay!  THE_END print &quot;--- START DATA ---\n&quot;, $myvar, &quot;--- END DATA ---\n&quot;;
The Perl Language and Modules Perl libraries are organized as ‘modules’ One of Perl’s strengths is their number and variety All of them can be downloaded from CPAN The distinction between Perl and its libraries is fuzzy Pragmatic Modules supplement the core language By interacting with the interpreter through symbol tables An important library for large projects is ‘strict’ This causes a range of extra checks to be run on your code They can all be run or enabled on an individual basis © Garth Gilmour 2008
The Strict Module Strict is a module that changes how you write code It performs extra checks that change what count as valid Perl Including it is a best practise for large scripts The checks can be enabled selectively E.g. ‘use strict “vars”’ turns on explicit declarations © Garth Gilmour 2008 Example Declaration Description use strict ‘vars’ User defined variables must be declared by:   ♦  Using the  ‘my’ or ‘our’ functions ♦  Prefixing the variable with a package name  use strict ‘refs’ Symbolic references (an obsolete feature) are not allowed use strict ‘subs’ Barewords are treated as syntax errors, rather than being interpreted as subroutine names or unquoted strings
Basic Programming The Core Perl Syntax © Garth Gilmour 2008
Introducing Scalar Variables A scalar variable is a single box It is prefixed by the dollar sigil Perl is a weakly typed language A box may hold a number, a string or a memory address If the scalar holds an address it is known as a ‘reference’ Type conversions occur automatically So in the expression ‘$var1 = $var2 + $var3’ both types are converted to numbers before being added together As with all variables sigils are created on demand You don’t need to declare them separately © Garth Gilmour 2008
Scalar Variables and Operators In strongly typed languages operators are overloaded So ‘var1 + var2’ would add the variables if they were numbers but concatenate them if they were strings If the types weren't matched there would be a compiler error Weak typing means Perl cannot support overloading Instead there must be an operator for each operation In the case of addition: The ‘+’ operator means add as numbers  The ‘.’ operator means concatenate Conversions are made as required © Garth Gilmour 2008
Operators Commonly Used in Perl © Garth Gilmour 2008 Description Number Version String Version Addition $var1 + $var2 $var1 . $var2 Equality $var1 == $var2 $var1 eq $va2 Ordered Comparison >, <, <=, >= lt, gt, le, ge Power Of $var1 ** 3 $var1 x 3 Bitwise Comparison &, |, ^ (NB work differently for numbers and strings) Logical &&, ||, ! and, or, not (Lower precedence) Conditional $var1 = $var2 ? 12 : 14; Range 1..4 ‘ D’ .. ‘Z’
© Garth Gilmour 2008 $num1 = 42; $num2 = &quot;42&quot;; $result = $num1 + $num2; print &quot;adding numbers gives $result&quot;, &quot;\n&quot;x2; $result = $num1 . $num2; print &quot;adding strings gives $result&quot;, &quot;\n&quot;x2; $result = $num2 ** 3; print &quot;42 to the power of 3 is $result&quot;, &quot;\n&quot;x2; $result = $num1 x 3; print &quot;42 concatenated with itself three times is $result&quot;, &quot;\n&quot;x2; if($num1 == $num2) { print '$num1 and $num2 are equal as numbers',&quot;\n&quot;x2; } if($num1 eq $num2) { print '$num1 and $num2 are equal as strings',&quot;\n&quot;x2; }
String Values in Detail  Strings may be placed in single or double quotes They have different meanings and are not interchangeable Single quotes are not treated specially The interpreter sees them as a plain sequence of characters Double quotes cause variable interpolation Perl searches for sigils in the string and replaces them with the value of the variable (creating it if required) Note that you can also use ‘backtick’ quotes These surround a string to be run as an OS command E.g. ‘ $var1 = `ls -al` ’ runs the UNIX list command and stores the results in the variable ‘$var1’ © Garth Gilmour 2008
© Garth Gilmour 2008 $var1 = &quot;abc&quot;; $var2 = 123; $var3 = [&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;]; print 'Values are $var1, $var2 and $var3 \n'; print &quot;Values are $var1, $var2 and $var3 \n&quot;; $path = `set path`; print &quot;Value of path environment variable is:\n $path&quot;; Values are $var1, $var2 and $var3 \nValues are abc, 123 and ARRAY(0x225e28)  Value of path environment variable is: PATH=c:\jdk1.5.0_05\bin;C:\Perl\site\bin;C:\Perl\bin;c:\ruby\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;
What is Truth in Perl? Truth is a source of confusion in Perl Many Perl functions return boolean values By Perl does not have a boolean type Three things in Perl count as false The empty string The string or number “0” An undefined variable You can obtain the undefined value by: Using a variable that has not been initialized Passing a variable as an argument to ‘undef’ Using the return value from ‘undef’ © Garth Gilmour 2008
© Garth Gilmour 2008 $var1 = &quot;0&quot;;  # The string &quot;0&quot; counts as false $var2 = &quot;&quot;;  # The empty string also counts as false $var3 = &quot;AB&quot;;  # Other string values are true $var4 = -12;  # Other numerical values are true $var5;  # Undefined values are false $var6 = undef();  # Values set to undef are false printTruth('$var1',$var1); printTruth('$var2',$var2); printTruth('$var3',$var3); printTruth('$var4',$var4); printTruth('$var5',$var5); printTruth('$var6',$var6); undef($var4);  # Release storage space for var1 so it becomes undefined printTruth('$var4',$var4); sub printTruth { my ($varName,$varValue) = @_; if($varValue) { print &quot;$varName is true\n&quot;; } else { print &quot;$varName is false\n&quot;; } }
Special Scalar Variables The Perl interpreter automatically creates variables These variables are represented by symbols rather than names This is a further source of confusion when learning Perl The ‘English’ module renames the variables more clearly © Garth Gilmour 2008 Variable Name Description $] The version of Perl supported by this interpreter $0 The name of the file containing the current script $^O The name of the operating system $_ The current item (used in input, output and loops) $/ The line separator used when reading text (default it newline)
Special Scalar Variables © Garth Gilmour 2008 print &quot;This is verson $] of Perl\n&quot;; print &quot;Running on the $^O operating system\n&quot;; print &quot;The current script is $0 \n&quot;; @myarray = (&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;,&quot;gh&quot;); print &quot;Elements are:\n&quot;; foreach(@myarray) { print &quot;\t $_ \n&quot;; } This is verson 5.008008 of Perl Running on the MSWin32 operating system The current script is C:\perl\specialScalars.pl  Elements are:   ab    cd    ef    gh
Reading Text Into a Scalar Variable Reading text into a scalar is simple The expression ‘$line = <INPUT>’ reads a line of text from INPUT and stores it in the scalar ‘line’ The symbol within the angle braces is a handle Handles are links to resources outside your program The ‘STDIN’ and ‘STDOUT’ handles are created automatically We will see how to open and close handles later…  To write data use the ‘print’ function  This takes a file handle as an optional first parameter  The default handle is STDOUT  © Garth Gilmour 2008
© Garth Gilmour 2008 open(INPUT,&quot;MuchAdoAboutNothing.txt&quot;); $count = 1; foreach(<INPUT>) { print(&quot;$count:\t $_&quot;); $count++; } 1:  Much Ado About Nothing 2:  A comedy by William Shakespear 3:   4:  Act 1, Scene 1 5:  Before LEONATO'S house. 6:   Enter LEONATO, HERO, and BEATRICE, with a Messenger 7:   8:  LEONATO  9:   I learn in this letter that Don Peter of Arragon 10:   comes this night to Messina.
Conditionals and Iteration in Perl Perl supports the standard ‘if’ conditional Note that the ‘elif’ keyword is used instead of ‘else if’ The ‘unless’ is an ‘if’ in reverse So ‘if(!done()) { … }’ becomes ‘unless(done()) { … }’ This is convenient once you get used to it It is possible to place the test after the action E.g. ‘$a = 12 if $b < $c’ or ‘$a = 12 unless $b >= $c’  The C/C++ ‘switch’ keyword is not supported Although there are ways of simulating it if required The rarely used ternary conditional operator is available E.g. ‘$var1 = ($var2 == $var3) ? 17 : 19’  © Garth Gilmour 2008
© Garth Gilmour 2008 print &quot;Enter a number\n&quot;; $number = <STDIN>; chomp($number); if($number < 10) { print &quot;$number is less than 10\n&quot;; } elsif($number < 20) { print &quot;$number is less than 20\n&quot;; } elsif($number < 30) { print &quot;$number is less than 30\n&quot;; } else { print &quot;$number is greater than 30\n&quot;; } unless($number % 2 == 0) { print &quot;$number is odd\n&quot;; } Enter a number 17 17 is less than 20 17 is odd
Conditionals and Iteration in Perl The standard loops are supported Perl provides ‘while’, ‘do … while’ and ‘for’ loops With the same syntax and semantics as C/C++ Variations of the ‘while’ loops are available The ‘until’ and ‘do … until’ loops avoid the need to negate the conditional, but are not always intuitive The ‘for’ loop is extended in two ways: It can be used with ranges rather than counters E.g. ‘for(1..4) { … }’ or ‘for(1..$max) { … }’ The ‘foreach’ loop iterates over arrays We will meet it later when introducing data structures © Garth Gilmour 2008
© Garth Gilmour 2008 print &quot;Enter a positive number\n&quot;; $max = <STDIN>; chomp($max); if($max <= 0) { die(&quot;Number must be positive!&quot;); } print &quot;Demo of while loop\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; $count = 0; while($count <= $max) { print &quot;\t\t$count\n&quot;; $count++; } print &quot;Demo of do..while loop\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; $count = 0; do { print &quot;\t\t$count\n&quot;; $count++; } while($count <= $max); print &quot;Demo of until loop\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; $count = 0; until($count > $max) { print &quot;\t\t$count\n&quot;; $count++; }  print &quot;Demo of do..until loop\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; $count = 0; do { print &quot;\t\t$count\n&quot;; $count++; }until($count > $max); print &quot;Demo of for loop v1\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; for($count=0; $count<= $max; $count++) { print &quot;\t\t$count\n&quot;; } print &quot;Demo of for each loop v3\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; for(0..$max) { print &quot;\t\t$_\n&quot;; }
Conditionals and Iteration in Perl Loops can optionally have a continue block E.g. ‘while($a < 12) { … } continue { … }’ This is used with the loop control operators A call to ‘last’ immediately exits the loop Without executing the continue block A call to ‘next’ skips the remaining statements in this iteration But the continue block is executed before the loop condition is re-evaluated and (if true) the next iteration begins A call to ‘redo’ restarts the current iteration The continue block is not executed The loop condition is not checked © Garth Gilmour 2008
Basic Perl I/O Using the Console and Files © Garth Gilmour 2008
Basic Perl I/O Perl I/O is based around handles Links provided by the OS to data sources and sinks Handles for the console are built-in ‘ STDIN’ and ‘STDOUT’ represent the command prompt Input is read via the ‘< >’ operator So ‘$data = <STDIN>;’ reads a line from the console Use the ‘chomp’ function to remove the newline Output is written via the ‘print’ function If the first argument is not a handle then STDOUT is used A comma should  not  be placed after the handle  © Garth Gilmour 2008
Testing File Paths Perl provides built in operators for testing file paths E.g. ‘ if(-e $file && -T $file) { print &quot;$file exists and is a text file&quot;; } ’ You should always check a file before opening it © Garth Gilmour 2008 File Test Operator Description -e File exists -r File is readable -w File is writable -z File has zero size -s Returns file size -T File is a text file -B File is a binary file -S File is a socket
Opening and Reading From Files The ‘open’ function is used to create a file handle The first argument is the symbol we want to represent the handle Files are opened in a particular mode As indicated by the character(s) before the filename The default is to open for reading © Garth Gilmour 2008 Function Description open(HANDLE, “myfile.txt”) open(HANDLE, “<myfile.txt”) Open file for reading open(HANDLE, “>myfile.txt”) Open file for writing (truncating if necessary) open(HANDLE, “>>myfile.txt”) Open file for appending open(HANDLE, “+<myfile.txt”) Open file for reading and updating
Opening and Reading From Files The standard form of ‘open’ could cause problems If the handle name was already in use (e.g. as a subroutine) If we were trying to open a file called ‘>myfile.txt’ There are two ways around this There is a three argument form of ‘open’ The mode(s) are passed as separate arguments The handle can be stored in a scalar variable This is known as an indirect filehandle Once you have a file opened you can: Read lines from the file using the ‘< >’ operator Write to the file using the print method © Garth Gilmour 2008
© Garth Gilmour 2008 open(INPUT,&quot;input.txt&quot;); open(OUTPUT,&quot;>output.txt&quot;); $count = 0; while($line = <INPUT>) { print OUTPUT ++$count, &quot;\t&quot;, $line; } print &quot;Processed $count lines\n&quot; This short interval was sufficient to determine d'Artagnan on the part he was to take.  It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in.  To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself.  All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second.  Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please.  You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take.  It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in.  To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself.  All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second.  Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please.  You said you were but three, 11 but it appears to me we are four.&quot;
© Garth Gilmour 2008 open(INPUT, '<', &quot;input.txt&quot;); open(OUTPUT, '>', &quot;output.txt&quot;); $count = 0; while($line = <INPUT>) { print OUTPUT ++$count, &quot;\t&quot;, $line; } print &quot;Processed $count lines\n&quot; This short interval was sufficient to determine d'Artagnan on the part he was to take.  It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in.  To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself.  All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second.  Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please.  You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take.  It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in.  To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself.  All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second.  Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please.  You said you were but three, 11 but it appears to me we are four.&quot;
© Garth Gilmour 2008 open($input, '<', &quot;input.txt&quot;); open($output, '>', &quot;output.txt&quot;); $count = 0; while($line = <$input>) { print $output ++$count, &quot;\t&quot;, $line; } print &quot;Processed $count lines\n&quot; This short interval was sufficient to determine d'Artagnan on the part he was to take.  It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in.  To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself.  All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second.  Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please.  You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take.  It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in.  To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself.  All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second.  Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please.  You said you were but three, 11 but it appears to me we are four.&quot;
Opening and Reading From Files Lines from a file should be read using a ‘while’ loop A ‘for’ loop causes the interpreter to create a list of all the lines from the file, which is then iterated over File handles should be closed via ‘close’ Handles stored as scalars are automatically closed when the variable goes out of scope, but you may want to do this earlier You should verify calls to ‘open’, ‘print’ and ‘close’  Both return a boolean value to indicate success or failure You can throw an error using the ‘die’ or ‘croak’ functions We will cover these in depth later in the course © Garth Gilmour 2008
© Garth Gilmour 2008 open($input, '<', &quot;input.txt&quot;) or die &quot;Can't open input&quot;; open($output, '>', &quot;output.txt&quot;) or die &quot;Can't open output&quot;; $count = 0; while($line = <$input>) { print($output ++$count, &quot;\t&quot;, $line) or die &quot;Can't write to file&quot;; } print &quot;Processed $count lines\n&quot;; close($input) or die &quot;Can't close input&quot;; close($output) or die &quot;Can't close output&quot;;
File Handles and Globbing You can place a pattern inside the ‘< >’ operator In which case the pattern is ‘globbed’ E.g. ‘@dirs = <../*>;’ To avoid confusion use the ‘glob’ function E.g. ‘@dirs = glob(‘../*’);’ Globbing is often used to change file properties ‘ chmod’ and ‘chown’ change a files access rights and ownership E.g. ‘while(glob(“*.pl”)) { chmod(O777, $_); }’ E.g. ‘while(glob(“*.pl”)) { chown($user_id,$group_id, $_); }’ © Garth Gilmour 2008
© Garth Gilmour 2008 @exampleDirectories = glob('..\*'); print &quot;Perl example files are: \n&quot;; foreach $dir (@exampleDirectories) { @perlFiles = glob($dir . '\*.pl'); foreach(@perlFiles) { #characters preceding a slash preceding #  letters ending in '.pl' m/.*\\(\w+\.pl)/; print &quot;\t$1 in $dir \n&quot;; } } Perl example files are:  arrayFunctions.pl in ..\arrays  arraysAndLists.pl in ..\arrays  days.pl in ..\arrays  forEach.pl in ..\arrays  fork.pl in ..\concurrency  threads.pl in ..\concurrency  customerDB.pl in ..\databases  arraysOfArrays.pl in ..\dataStructures  checkingErrors.pl in ..\files  indirectHandles.pl in ..\files  basicHashes.pl in ..\hashes  extraSyntax.pl in ..\hashes
Arrays in Perl Creating and Using Lists © Garth Gilmour 2008
Introducing Arrays in Perl In other languages arrays cannot change size  Hence they must be supplemented with data structures E.g. the C++ STL or the Collections libraries in Java and C# In Perl arrays can grow and shrink as required So there is no need for a separate ‘vector’ or ‘LinkedList’ type If the array is of size 10 and you try to store something in box 100 then the size is automatically changed Often arrays are created implicitly E.g. ‘$myarray[9] = “abc”’ would create an array called ‘myarray’ with ten boxes, all of which were undefined apart from the last © Garth Gilmour 2008
© Garth Gilmour 2008 $myarray[8] = &quot;string in 9th box&quot;; $myarray[10] = &quot;string in 11th box&quot;; $count = 0; foreach(@myarray) { print $count++,&quot;: $_\n&quot;; } 0:  1:  2:  3:  4:  5:  6:  7:  8: string in 9th box 9:  10: string in 11th box
Support for Arrays in Perl Arrays are declared with the ‘@’ sigil E.g. ‘@myarray = (“abc”, 123, “def”, 456, “ghi”, 789)’ Normally we create an array based on a list of values The ‘qw’ operator can be used to avoid quoting E.g. ‘@myarray = qw(abc 123 def 456 ghi 789)’ Lists can also be initialized based on arrays E.g. ‘($val1,$val2,$val3) = @myarray’ copies the values in the first three boxes into the scalar variables ‘ ($val1) = @myarray’ is an idiom for grabbing the first value It is equivalent to ‘$val1 = $myarray[0]’ © Garth Gilmour 2008
Support for Arrays in Perl Note that lists are only used for grouping Unlike arrays their existence is only ever temporary   The ‘@’ sigil is not used when indexing Instead of ‘@myarray[1]’ we write ‘$myarray[1]’ This is because the value we are accessing is a scalar Arrays can be created based on slices of other arrays E.g. ‘@array2 = @array1[1..3]’ creates a new array called ‘array2’ holding copies of boxes 2,3 and 4 in ‘array1’ © Garth Gilmour 2008
© Garth Gilmour 2008 @tstArray = qw(abc def ghi jkl mno pqr); ($first) = @tstArray; ($a,$b,$c) = @tstArray; print &quot;First element is: $first\n&quot;; print &quot;First three elements are: $a $b $c\n&quot; First element is: abc First three elements are: abc def ghi
Special Features of Arrays Perl makes assumptions about your use of arrays Almost anything you might write is meaningful Even if the meaning is not what you intended If you put an array in a scalar context the size is used ‘ $var = @myarray’ stores the size of ‘myarray’ in ‘var’ ‘ $var = @myarray - 1’ would store the index of the last box You can easily copy values into another array E.g. ‘@myarray3 = (@myarray1,@myarray2)’ would mean ‘myarray3’ contained copies of the values in the other two arrays It does  not  create a multidimensional array © Garth Gilmour 2008
Special Features of Arrays There is an easy way to find the last index For ‘@myarray’ it is stored in the variable ‘$#myarray’ Arrays can be shrunk if required The special variable ‘$#myarray’ is not immutable So ‘$#myarray -= 2’ removes the last two boxes of ‘myarray’ There are two ways to empty out an array By assigning its size to -1 E.g. ‘$#myarray = -1’  By assigning it to an empty list E.g. ‘@myarray = ()’ © Garth Gilmour 2008
© Garth Gilmour 2008 @workDays = (&quot;Monday&quot;,&quot;Tuesday&quot;,&quot;Wednesday&quot;,&quot;Thursday&quot;,&quot;Friday&quot;); @weekendDays = qw(Saturday Sunday);  @days = (@workDays,@weekendDays);  $numDays = @days;  $firstDay = $days[0]; $lastDay = $days[$#days]; print &quot;There are $numDays days in a week\n\n&quot;; print &quot;$firstDay is the first day and $lastDay is the last day \n&quot;; print &quot;\nThe other days are:\n&quot;; foreach $day (@days) { unless($day eq $firstDay or $day eq $lastDay) { print $day, &quot;\n&quot;; } } print &quot;\nThe last four days are:\n&quot;; @lastDays = @days[3..6]; foreach $day (@lastDays) { print $day, &quot;\n&quot;; }
Iterating Over Arrays The easiest way to loop over an array is ‘foreach’ This iterates over a list of values and assigns each to a scalar variable You can specify the scalar or use the built-in ‘$_’ The ‘foreach’ keyword is just an alias for ‘for’ Which one you use is a matter of style © Garth Gilmour 2008 @myarray = qw(abc def ghi jkl); print &quot;Loop one\n&quot;; foreach $item (@myarray) { print &quot;\t&quot;,$item,&quot;\n&quot; ; } print &quot;Loop two\n&quot;; foreach (@myarray) { print &quot;\t&quot;,$_,&quot;\n&quot; ; } print &quot;Loop three\n&quot;; for $item (@myarray) { print &quot;\t&quot;,$item,&quot;\n&quot; ; } print &quot;Loop four\n&quot;; for (@myarray) { print &quot;\t&quot;,$_,&quot;\n&quot; ; }
Functions for Working With Arrays Perl provides powerful functions for manipulating arrays The ‘push’ and ‘pop’ functions add and remove from the end The ‘unshift’ and ‘shift’ functions do the same from the start © Garth Gilmour 2008 Function Name Description push Add a new box to the end of the array pop Remove a box from the end of the array unshift Add a new box to the start of the array shift Remove the first box in the array join Join all the values in the array into a string, separated by a delimiter split Create an array by splitting a string into a sequence of sub-strings, using a regular expression to specify the delimiter token(s)
© Garth Gilmour 2008 @myarray1 = qw(abc def ghi); $val1 = pop(@myarray1); print &quot;\nJust popped $val1, contents now:\n&quot;; foreach(@myarray1) { print &quot;\t$_\n&quot;; } push(@myarray1,&quot;zzz&quot;); print &quot;\nJust pushed zzz, contents now:\n&quot;; foreach(@myarray1) { print &quot;\t$_\n&quot;; } $val1 = shift(@myarray1); print &quot;\nJust shifted $val1, contents now:\n&quot;; foreach(@myarray1) { print &quot;\t$_\n&quot;; } unshift(@myarray1,&quot;AAA&quot;); print &quot;\nJust unshifted AAA, contents now:\n&quot;; foreach(@myarray1) { print &quot;\t$_\n&quot;; } Just popped ghi, contents now: abc def Just pushed zzz, contents now: abc def zzz Just shifted abc, contents now: def zzz Just unshifted AAA, contents now: AAA def zzz
Hashes in Perl Creating and Using Tables © Garth Gilmour 2008
Introducing Hashes in Perl Hashes are the second built in data type A hash is a data structure that works like a map or table The name comes from the use of a hashing algorithm The ‘%’ sigil is used when declaring hashes As with arrays this is not used when referring to values E.g. ‘$myhash{“k1”}’ returns the value for the key ‘k1’ in the hash ‘%myhash’ and ‘$myhash{“k1”} = 12’ sets it Hashes can be declared and expanded explicitly ‘ $myhash{“k1”} = 12’ creates a hash called ‘myhash’ if required Otherwise if the row does not exist it is added to the hash  © Garth Gilmour 2008
© Garth Gilmour 2008 $myhash{'k1'} = &quot;abc&quot;; $myhash{'k2'} = 123; $myhash{'k3'} = &quot;def&quot;; $myhash{'k4'} = 456; foreach $key (keys %myhash) { print $key, &quot; indexes &quot;, $myhash{$key}, &quot;\n&quot;; } k2 indexes 123 k1 indexes abc k3 indexes def k4 indexes 456
Initializing a Hash Like arrays hashes can be initialized via lists E.g. ‘%myhash = (“k1”, 123, “k2”, “XYZ”)’ creates a hash with two rows, where the keys are ‘k1’ and ‘k2’ Again ‘qw’ can be used to avoid quoting literals The odd numbers become keys and the even numbers values The ordering of the rows cannot be predicted The ‘fat comma’ notation makes things clearer E.g. ‘%myhash = (k1 => 123, k2 => “XYZ”)’ The ‘=>’ operator is the same as the comma, except that the value on the left hand side is quoted if required Each pair should be placed on its own line for clarity © Garth Gilmour 2008
© Garth Gilmour 2008 %myhash = ( k1 => 123, k2 => &quot;abc&quot;, k3 => 456, k4 => &quot;def&quot;, k5 => 789 ); foreach $key (keys %myhash) { print $key, &quot; indexes &quot;, $myhash{$key}, &quot;\n&quot;; } k5 indexes 789 k2 indexes abc k1 indexes 123 k3 indexes 456 k4 indexes def
Functions for Working With Hashes As with arrays there are built-in functions for hashes ‘ keys’ and ‘values’ return the entries in different columns  The ‘each’ function is slightly complex Every time it is called it returns a list holding a key/value pair When the end of the hash is reached it returns a null array  © Garth Gilmour 2008 Function Name Description each Returns a list of two values representing a row in the hash exists Returns true if a specified entry exists in the hash keys Returns a list of all the keys in the hash values Returns a list of all the values in the hash delete Removes a row from the hash
© Garth Gilmour 2008 %myhash = ( k1 => 123, k2 => &quot;abc&quot;, k3 => 456, k4 => &quot;def&quot;, k5 => 789 ); print &quot;Keys are:\n&quot;; foreach $key (keys %myhash) { print &quot;\t&quot;, $key, &quot;\n&quot;; } print &quot;Values are:\n&quot;; foreach $value (values %myhash) { print &quot;\t&quot;, $value, &quot;\n&quot;; } print &quot;Entries are:\n&quot;; while (($key, $value) = each(%myhash)) { print &quot;\t$key indexes $value \n&quot;; } Keys are: k5 k2 k1 k3 k4 Values are: 789 abc 123 456 def Entries are: k5 indexes 789  k2 indexes abc  k1 indexes 123  k3 indexes 456  k4 indexes def
Special Syntax for Hashes A list can be assigned to an array E.g. ‘@myarray = %myhash’ causes all the keys and values from ‘myhash’ to be inserted into ‘myarray’ The items are added in the order they are found Rather than the order in which they were added Slices of hashes can be obtained E.g. ‘($v1,$v2) = @myhash { “k1”, “k2” }’ stores the values associated with ‘k1’ and ‘k2’ into the two scalar variables Slicing can also be used to add values E.g. ‘@myhash { “k1”, “k2”, “k3” } = (“abc”, 123, “def”)’ adds three key/value pairs into the hash © Garth Gilmour 2008
© Garth Gilmour 2008 %tstHash = (&quot;k1&quot;,&quot;v1&quot;,&quot;k2&quot;,&quot;v2&quot;); print &quot;Original hash contents:\n&quot;; foreach(keys(%tstHash)) { print(&quot;$_ indexes &quot;,$tstHash{$_},&quot;\n&quot;); } @tstHash {&quot;k3&quot;,&quot;k4&quot;,&quot;k5&quot;,&quot;k6&quot;} = (111,222,333,444);  print &quot;\nHash contents after insertions:\n&quot;; foreach(keys(%tstHash)) { print(&quot;$_ indexes &quot;,$tstHash{$_},&quot;\n&quot;); } ($var1,$var2,$var3) = @tstHash {&quot;k1&quot;,&quot;k2&quot;,&quot;k3&quot;}; print &quot;\nValues in scalars are $var1 $var2 and $var3\n&quot;; @elements = %tstHash; print &quot;\nArray contents:\n&quot;; foreach(@elements) { print &quot;$_ &quot;; } Original hash contents: k2 indexes v2 k1 indexes v1 Hash contents after insertions: k5 indexes 333 k2 indexes v2 k1 indexes v1 k6 indexes 444 k3 indexes 111 k4 indexes 222 Values in scalars are v1 v2 and 111 Array contents: k5 333 k2 v2 k1 v1 k6 444 k3 111 k4 222
Hashes of Anonymous Arrays This is the first advanced data structure we will meet We introduce it now because it is so useful The syntax ‘[12, “AB”]’ creates an anonymous array No name is associated with the array Instead the ‘[ ]’ operator returns its address We can store the addresses in a hash Indexed by an appropriate key This can be used to store all kinds of data E.g. exam results for students © Garth Gilmour 2008
© Garth Gilmour 2008 %results = ( dave => [54, 62, 73, 48], jane => [59, 67, 82, 70], fred => [92, 64, 59, 71] ); results dave jane fred 54 62 73 48 59 67 82 70 92 64 59 71
Hashes of Anonymous Arrays To work with the array as a whole use ‘@{ }’ E.g. ‘@{$myhash{“dave”}}’ means get the array whose address is indexed in ‘myhash’ by the key ‘dave’ To work with array elements use the arrow operator  E.g. ‘$myhash{“dave”}->[1]’ means get the value in ‘myhash’ indexed by ‘dave’ and then go to box 2 in the array it references ‘ ${$myhash{“dave”}}[1]’ is also valid  But is very hard to decode ‘ $$myhash{“dave”}[1]’ would be interpreted as meaning that ‘myhash’ is a scalar variable holding the address of the array © Garth Gilmour 2008
© Garth Gilmour 2008 %actors = ( &quot;george clooney&quot; => [&quot;Oceans 11&quot;, &quot;The Peacemaker&quot;,  &quot;O Brother Where Art Thou&quot;], &quot;harrison ford&quot;  => [&quot;Star Wars&quot;,&quot;Sabrina&quot;,&quot;Indiana Jones&quot;], &quot;robin williams&quot; => [&quot;Good Morning Vietnam&quot;,&quot;Hook&quot;,&quot;The Birdcage&quot;]     ); print &quot;\nThe list of actors and their movies is: \n&quot;; foreach $actor (keys %actors) { print &quot;\t$actor starred in: \n&quot;; foreach $film(@{$actors{$actor}}) { print &quot;\t\t$film\n&quot;; } } The list of actors and their movies is:  robin williams starred in:  Good Morning Vietnam Hook The Birdcage harrison ford starred in:  Star Wars Sabrina Indiana Jones george clooney starred in:  Oceans 11 The Peacemaker O Brother Where Art Thou
Regular Expressions Part 1: Core Concepts © Garth Gilmour 2008
Introducing Regular Expressions A Regular Expression is a pattern in text The commonly used shorthand is ‘regex’ Regex’s are used to find matches in search strings E.g. the regex for an email might be: One or more lowercase or uppercase letters Optionally a dot and one or more letters (any case) The ‘@’ symbol followed by the company name One of a range of supported prefixes (.com or .co.uk or .ie) Regex’s can save a huge amount of time and effort Especially when compared to writing your own parser © Garth Gilmour 2008
The Syntax of Regular Expressions Regular expressions are a ‘little language’ Like SQL and XPath they have their own syntax Unfortunately the special characters in the regex language can also occur as part of your search string E.g. the dot is short for any character, so if you mean the actual character dot it must be escaped The syntax of regex’s has developed over time Initially in UNIX commands and then in Perl scripting Perl V5 set the standard for regex support Most languages now support the Perl 5 regex syntax Your regex’s are run by an ‘Expression Engine’ The details of how this works can be quite complex © Garth Gilmour 2008
Key Regex Concept No 1 The search for the next match starts from just after the end of the last successful match So if the pattern is ‘three uppercase letters’ and the pattern is ‘ABCDEF’ then the matches are ‘ABC’ and ‘DEF’ Not ‘ABC’ followed by ‘BCD’ followed by ‘CDE’ and so on © Garth Gilmour 2008 A B C D E F G H I J K L M N O Match No 1 Match No 2 Match No 3 Match No 4 Match No 5
Key Regex Concept No 2 Regular expressions are greedy by default So given the pattern ‘one or more uppercase letters’ and the search string ‘ABCDEFg’ the match is ‘ABCDEF’ The engine always selects the largest possible set of characters It is possible to use non-greedy matching symbols instead © Garth Gilmour 2008 A B C D E f g h I J K L M N o p q r S T U V W X y z Match 1 Match 2 Match 3
Character Classes The building block of all regex’s is the character class This defines a set of symbols to find e.g. ‘ [aeiou]’ matches any vowel ‘ [a-z]’ matches uppercase letters ‘ [A-Z]’ matches lowercase letters ‘ [a-zA-Z]’ matches a letter in either case Note that it is a set and not a sequence ‘ [abc]’ matches ‘a’ OR ‘b’ OR ‘c’ and NOT ‘abc’ The top hat symbol negates the character class So ‘[^aeiou]’ matches any character that isn't a vowel Note that outside a character class ‘^’ has another meaning © Garth Gilmour 2008
Shortcuts for Character Classes There are two shortcut notations for character classes  One provided by the Perl version of regular expressions The other by the POSIX standards (this is very rarely used) © Garth Gilmour 2008 Perl Shortcut Description Character Class \d Digit [0-9] \D Non-Digit [^0-9] \s Whitespace Character [ \t\n\r\f] \S Non Whitespace Character [^ \t\n\r\f] \w Word Character [a-zA-Z0-9_] \W Non-Word Character [^a-zA-Z0-9_]
Specifying Multiplicities  By default a character class matches one instance You can specify a different number in braces So ‘[a-z]’ is the same as ‘[a-z]{1}’ Separating numbers by commas specifies a range So ‘[a-z]{2,4}’ means between two and four lowercase letters The question mark signifies optionality So ‘[a-z]{2}[A-Z]?’ specifies two lowercase letters optionally followed by a single uppercase letter There are two meta-characters used for many The plus means one or more and the star zero or more So ‘[a-z]+[A-Z]*’ means one or more lowercase letters followed by zero or more uppercase letters (note that greediness applies) © Garth Gilmour 2008
Specifying Points Within the Input Two characters signify the start and end points The ‘top hat’ specifies the start The dollar specifies the end These are very useful ‘ ^$’ matches blank lines ‘ ^[a-zA-Z]{10}’ captures the first 10 characters ‘ [a-zA-Z]{10}$’ captures the last 10 characters ‘ ^[^0-9]{5}’ captures the first 5 characters if they are not digits What you mean by start and end points can vary You can select whether they match the start and end of the entire string or each line embedded within it © Garth Gilmour 2008
Using Submatches Within a Regex Matches can contain submatches By placing part of the regex in braces it can be accessed separately from the main match E.g. applying ‘([a-z]+)([A-Z]+)’ to ‘ABCdefGHIjkl’ matches ‘defGHI’ with submatches of ‘def’ and ‘GHI’ Braces are used for both grouping and submatches If you want to use braces for grouping only use ‘(?: … )’ Submatches can be very helpful E.g. you want to match email addresses but capture the name and domain prefix separately  © Garth Gilmour 2008
© Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Z]{2} Matches: AB GH MN ST ABCdefGHIjklMNOpqrSTU [A-Z]{3} Matches: ABC GHI MNO STU ABCdefGHIjklMNOpqrSTU [A-Z]{3}[a-z] Matches: ABCd GHIj MNOp
© Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Z]+[a-z]+ Matches: ABCdef GHIjkl MNOpqr ABCdefGHIjklMNOpqrSTU ([A-Z]+)([a-z]+) Matches: ABCdef Group 1: ABC Group 2: def GHIjkl Group 1: GHI Group 2: jkl MNOpqr Group 1: MNO Group 2: pqr ABCdefGHIjklMNOpqrSTU [A-Za-z]+ Matches:   ABCdefGHIjklMNOpqrSTU
© Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Za-z]{5} Matches: ABCde fGHIj klMNO pqrST ABCdefGHIjklMNOpqrSTU ^[A-Za-z]{5} Matches: ABCde ABCdefGHIjklMNOpqrSTU [A-Za-z]{5}$ Matches: qrSTU ABCdefGHIjklMNOpqrSTU [A-Za-z]{5,8} Matches: ABCdefGH IjklMNOp rqSTU
Other Meta-Characters The bar is used as a logical OR So ‘(com|ie|net)’ matches one of three prefixes Note that placing spaces around the bar changes the pattern The dot matches any character You can choose whether or not this includes newlines The slash is used to escape meta-characters So ‘(.com|.ie|.net)’ matches any character plus the prefix whereas ‘(\.com|\.ie|\.net)’ matches the prefix with the dot In Sed ‘\<’ and ‘\>’ match the start and end of a word  Perl does not support this and instead uses ‘\b’ for both © Garth Gilmour 2008
Regular Expressions Part 2: Perl Syntax © Garth Gilmour 2008
Regular Expressions in Perl Perl 5 is the established standard for regex’s The ‘=~’ operator is used to apply an expression E.g. ‘$match = $data =~ m/[A-Z]+/’ By default the operator returns a true/false value The ‘!~’ operator is the reverse of ‘=~’ If successful the matching group is stored in ‘$&’ ‘ $`’ holds the text before the match ‘ $’’ holds the text after the match The ‘g’ modifier causes an array to be returned E.g. ‘@results = $data =~ m/[A-Z]+/g’ A ‘foreach’ loop can be used to iterate over the results © Garth Gilmour 2008
© Garth Gilmour 2008 $data = &quot;ABCdefGHIjklMNOpqrSTUvwxYZA&quot;; $m1 = $data =~ m/[A-Z]/; if($m1) { print &quot;Match one is $&\n\n&quot;; } $m2 = $data =~ m/[A-Z]{2}/; if($m2) { print &quot;Match two is $&\n\n&quot;; } $m3 = $data =~ m/[A-Z]+/; if($m3) { print &quot;Match three is $&\n\n&quot;; } $m4 = $data =~ m/[^e]+/; if($m4) { print &quot;Match four is $&\n\n&quot;; } $m5 = $data =~ m/[A-Z]{3}[a-z]{2}/; if($m5) { print &quot;Match five is $&\n\n&quot;; } Match one is A Match two is AB Match three is ABC Match four is ABCd Match five is ABCde
© Garth Gilmour 2008 $m6 = $data =~ m/[A-Z]+[a-z]+/; if($m6) { print &quot;Match six is $&\n\n&quot;; } $m7 = $data =~ m/[A-Za-z]{9}/; if($m7) { print &quot;Match seven is $&\n\n&quot;; } $m8 = $data =~ m/.+/; if($m8) { print &quot;Match eight is $&\n\n&quot;; } $m9 = $data =~ m/^.{8}/; if($m9) { print &quot;Match nine is $&\n\n&quot;; } $m10 = $data =~ m/.{8}$/; if($m10) { print &quot;Match ten is $&\n\n&quot;; } Match six is ABCdef Match seven is ABCdefGHI Match eight is ABCdefGHIjklMNOpqrSTUvwxYZA Match nine is ABCdefGH Match ten is TUvwxYZA
© Garth Gilmour 2008 @groupOne = $data =~ m/[A-Z]{3}/g; print &quot;First group of matches are:\n&quot;; foreach(@groupOne) { print &quot;\t$_\n&quot;; } @groupTwo = $data =~ m/[a-z]{3}/g; print &quot;Second group of matches are:\n&quot;; foreach(@groupTwo) { print &quot;\t$_\n&quot;; } @groupThree = $data =~ m/.{4}/g; print &quot;Third group of matches are:\n&quot;; foreach(@groupThree) { print &quot;\t$_\n&quot;; } First group of matches are: ABC GHI MNO STU YZA Second group of matches are: def jkl pqr vwx Third group of matches are: ABCd efGH Ijkl MNOp qrST Uvwx
Regular Expressions in Perl Several other modifiers can be used with ‘m//’ E.g. by default ‘.’ does not match new-line characters, so ‘.+’ will not cross embedded new-lines unless you use ‘s’ Submatches are stored in numbered scalar variables So ‘$1’ holds the first submatch and so on © Garth Gilmour 2008 Modifier Description g Finds all the matches in the string i Makes the pattern case-insensitive s Means ‘.’ matches new-line characters m Means ‘^’ and ‘$’ match substrings
Using Regular Expressions in Perl Best practice is to use extended expressions This is enabled via the ‘x’ modifier The ‘x’ modifier has two effects: Whitespace within the regex is disregarded The regex can be spread over multiple lines Standard Perl comments can be added  These let you explain the your intent Multi line regexes should be bracketed with ‘{ }’ The interpreter will accept any characters as brackets If an opening brace is used then the reverse is expected E.g. ‘m![A-Z]+!’, ‘m?[A-Z]+?’, or ‘m{[A-Z]+}’,  © Garth Gilmour 2008
Using Regular Expressions in Perl © Garth Gilmour 2008 print &quot;Enter the email address:\n&quot;; chomp($email = <STDIN>); #easier to read version of  ([a-z]+(\.[a-z]+)?)\@megacorp(\.com|\.ie|\.co\.uk) $m = $email =~ m{^  #start of string (  #start of inner match 1 [a-z]+  #one or more lowercase letters (\.[a-z]+)?  #optionally a dot and letters (inner match 2) )  #end of inner match 1 \@megacorp  #company name NB need to escape array (\.com|\.ie|\.co\.uk)  #possible domain names $  #end of string }x; if($m) { print &quot;Recognized email for $1 in domain $3\n&quot;; } else { print &quot;Invalid address!\n&quot;; }
Pattern Matching and Substitutions The ‘s///’ operator is used to make substitutions Its syntax is ‘s/  regex  /  replacement text  /  modifiers ’ The replacement text can contain ‘$1’, ‘$2’ etc… By default a substitution is only made for the first match By using the ‘g’ modifier all matches are replaced The return value is the number of substitutions made The ‘e’ modifier is especially useful The replacement text is executed as a Perl expression The standard use of this is to replace a match with the result of running one of more functions against the match © Garth Gilmour 2008
© Garth Gilmour 2008 $test = &quot;aabbccddyzeeffgghhiijjkkllmmnnoo&quot;; # Replace all occurances of ff with FF $test =~ s/ff/FF/g; print $test , &quot;\n&quot;; # Replace any occurance of four characters at the start of the string with XX $test =~ s/^.{4}/XX/g; print $test , &quot;\n&quot;; # Replace any lower case characters with their upper case equivalents $test =~ s/([a-z])/uc($1)/eg; print $test , &quot;\n&quot;; aabbccddyzeeFFgghhiijjkkllmmnnoo XXccddyzeeFFgghhiijjkkllmmnnoo XXCCDDYZEEFFGGHHIIJJKKLLMMNNOO
Regular Expressions Part 3: Advanced Concepts © Garth Gilmour 2008
Regular Expressions - Advanced So far we have covered the 80% of regular expressions that developers use 99% of the time But there is still a lot of extra functionality available We mention some of the advanced features here You are encouraged to play with them after the course These features may be useful when you are: Trying to parse non-ASCI based text Trying to shorten an overly long expression Extracting information from poorly organised data Searching for the most efficient regex possible © Garth Gilmour 2008
Non Greedy Matching We have seen that matching is greedy E.g. ‘A.+B’ grabs as many characters from A--> B So against ‘nnnAnnnBnnnBnnn’ the first match is ‘AnnnBnnnB’  Rather than ‘AnnnB’ which may well be what you wanted Greedy matching makes it hard to define ‘end tokens’ It is possible to have a non-greedy match ‘ .+?’ still means grab one of more characters But it takes as few as possible to create a successful match ‘ .*?’ does the same thing for zero of more characters ‘ ??’ is the non-greedy version of ‘?’ Given the choice between grabbing zero or one characters it prefers to grab zero, as long as the match will still succeed © Garth Gilmour 2008
© Garth Gilmour 2008 abcDEfghIJklm [a-zA-Z]+[A-Z]{2} abcDEfghIJ Matches: abcDE fghIJ [a-zA-Z]+?[A-Z]{2} [a-zA-Z]*[A-Z]{2} abcDEfghIJklm Matches: ab cDEfg hIJkl [a-zA-Z]*?[A-Z]{2}
Parenthesis Which Do Not Capture We have seen that parenthesis serve two functions: As in coding they group separate constructs They are used to define matches-within-matches Sometimes you only want the first function You need to separate out part of the expression but don’t want to capture the result within a submatch E.g. ‘(\.com|\.ie)’ when matching against URL’s The ‘(?: … )’ syntax provides non-capturing brackets E.g. ‘(?:\.com|\.ie)’ lets you specify optionality without capturing © Garth Gilmour 2008
Modifying Just Part of the Pattern We know modifiers can be placed after ‘m//’ or ‘s///’ E.g. the ‘i’ modifier makes the match case insensitive But the modifier applies to the whole pattern What if only part of the regex should be case insensitive? What if ‘.’ should match newlines only in one place? The ‘(?imsx: … )’ syntax lets you do this E.g. if you say ‘[A-Z]{3}(?i:[A-Z]{2})[A-Z]{3}’ this matches ‘ABCdeFGH’ or ‘ABCDEFGH’ but not ‘abcdefgh’ You can use if to turn off modifiers as well E.g. ‘g/… (?-i: …) …/i’ makes part of the pattern case sensitive © Garth Gilmour 2008
Matching Without Capturing It is possible to check what is ahead or behind you Without capturing these characters as part of the match The proper name for these is ‘lookaround’ assertions This is useful when what comes after you is both a condition of your match and part of the next match © Garth Gilmour 2008 Assertion Explanation (?= …) Looks ahead to see if a pattern occurs without capturing (?! …) Looks ahead to see if a pattern  does not  occur without capturing (?<= …) Looks behind to see if a pattern occurs without capturing (?<! …) Looks behind to see if a pattern  does not  occur without capturing
Building Regex Parts Dynamically Consider the following problem You need a regex to find company email addresses But the company name will be specified at runtime You could build the whole expression at runtime Through normal string concatenation But this code will be tedious and error prone Perl lets you embed dynamic content in expressions When you use ‘(?{ … })’ within your regex the code placed at ‘…’ is run by the interpreter and forms part of the pattern  Variables in the block are in scope until the current search for a match either succeeds or is rolled back © Garth Gilmour 2008
Working With Unicode So far in our discussion we have assumed ASCII E.g. ‘[A-Z]+’ matches one or more uppercase letters As long as your input text is written in ASCII in a culture that agrees with the common definition of English capital letters However this does not apply with internationalization Where we will be working with unfamiliar character sets and conventions (capitals, accents, reading direction etc…) Perl supports internationalization via Unicode The character set that aims to embrace and supplant the characters sets already in-use across the world Unicode itself is unavoidably complex and contradictory It aims to unify character sets with as little modification as possible © Garth Gilmour 2008
Working With Unicode Unicode defines a character in terms of: A unique number and textual name A representative glyph (which does not preclude others) Annotations - which add extra information informally Properties - which formally group characters together based on a shared criteria (e.g. mathematical symbols) Don’t confuse character sets with character encodings ‘ UTF-8’ favours western characters but Unicode itself does not  There are 88 properties a character may have As of Unicode version 4.1.0 These can be tested for in regex’s  © Garth Gilmour 2008
Sample Unicode Properties © Garth Gilmour 2008 Unicode Property Description AHex True for ASCII characters used in hexadecimal numbers Alpha True if a character can be compared to others ea The East Asian width of a character (full, half or narrow) IDC Indicates if a character can only be used as the first in an identifier Math True if the character is used in describing mathematical expressions Lower Indicates if a character is a lowercase letter STerm Indicates if a character is used to terminate a sentence Term Indicates if a character is punctuation that terminates a unit WSpace Indicates if a character should be treated as whitespace during parsing
Unicode Properties in Expressions Unicode properties can be used as character classes ‘ \p{Math}’ matches math symbols and ‘\P{Math}’ is the reverse If the name is a single character then the braces can be omitted Shortcuts for character classes support Unicode So in modern interpreters the ‘\d’ shortcut is the same as ‘\p{IsDigit}’ and ‘\D’ is the same as ‘\P{IsDigit}’ Perl has alias’ for properties and defines its own ‘ \p{IsDigit}’ is the more verbose Perl terminology for ‘\p{Nd}’ ‘ \p{IsXDigit}’ is a Perl property name equivalent to ‘[0-9a-fA-F]’ © Garth Gilmour 2008
Perl Subroutines Creating and Calling Functions © Garth Gilmour 2008
Introducing Subroutines Subroutines do not have a C/C++ style declaration When you create a subroutine you do not declare its return type or the parameters it will take Instead the syntax for a subroutine is ‘sub NAME { … }’ Any subroutine can take any number of parameters These are automatically placed in the special array ‘@_’ Not to be confused with ‘$_’ which holds the current item in a loop The first parameter is ‘$_[0]’ and the last is ‘$_[$#_]’ The return value can be specified in two ways By default it is the result of the last expression evaluated You can explicitly provide a value via the ‘return’ keyword  © Garth Gilmour 2008
© Garth Gilmour 2008 sub func { my($param1, $param2, $param3, $param4, $param5) = @_; print &quot;func called with:\n&quot;; print &quot;\t$param1\n&quot;; print &quot;\t$param2\n&quot;; print &quot;\t$param3\n&quot;; print &quot;\t$param4\n&quot;; print &quot;\t$param5\n&quot;; } func(&quot;abc&quot;,123,&quot;def&quot;,456,&quot;ghi&quot;); func called with: abc 123 def 456 ghi
Variable Declarations and Scoping As you write functions you will make a startling discovery Variables created inside subroutines remain alive and usable for the rest of the life of the script (unlike in C/C++/Java etc…) This is because all variables are global by default They are represented by an entry in the symbol table Perl lets you control the scope and lifetime of variables Via the ‘my’ , ‘our’ and ‘local’ functions Each of these has a subtly different effect Use these in all but the shortest scripts… © Garth Gilmour 2008
Variable Declarations and Scoping The ‘my’ function creates a private variable Its scope and lifetime are the block it is declared in The ‘local’ function is unusual It defines a new value for a variable, which is held for the duration of the current scope Once control leaves the current block the value is reset Any sub-methods which get called see the new value The ‘our’ function simply (re)declares a global variable  It is the safe way of referencing a global variable within a subroutine or declaring a global variable under ‘use strict’ © Garth Gilmour 2008
© Garth Gilmour 2008 $var1 = &quot;ABC&quot;; $var2 = &quot;DEF&quot;; $var3 = &quot;GHI&quot;; print &quot;At start $var1, $var2 and $var3\n\n&quot;; func1(); print &quot;At end $var1, $var2 and $var3\n\n&quot;; sub func1 { my $var1 = &quot;JKL&quot;; local $var2 = &quot;MNO&quot;; our $var3; print &quot;In func1 $var1, $var2 and $var3\n\n&quot;; func2(); } sub func2 { print &quot;In func2 $var1, $var2 and $var3\n\n&quot;; } At start ABC, DEF and GHI In func1 JKL, MNO and GHI In func2 ABC, MNO and GHI At end ABC, DEF and GHI
Using Named Parameters Parameters in Perl are loosely organised In large applications you want to be more specific about what value goes with what parameter There is a simple idiom for naming parameters: Pass parameters into the subroutine in pairs E.g. ‘connect(“ip” => “12.24.5.6”, “port” => 80, “timeout” => 30)’ Inside the subroutine load the parameters into a hash The first parameter becomes the key for the second and so on Use parameters by taking values from the hash So rather than saying ‘$_[1]’ we use ‘$params{“ip”}’ This allows parameters to be passed in any order As long as the name/value convention is preserved © Garth Gilmour 2008
© Garth Gilmour 2008 sub func { my %params = @_; print &quot;\tParameter fred has value $params{'fred'}\n&quot;; print &quot;\tParameter wilma has value $params{'wilma'}\n&quot;; print &quot;\tParameter barney has value $params{'barney'}\n&quot;; } print &quot;---------- First Call ----------\n&quot;; @args1 = (&quot;fred&quot;,20,&quot;wilma&quot;,30,&quot;barney&quot;,40); func(@args1); print &quot;---------- Second Call ----------\n&quot;; @args2 = (fred => 50, wilma => 60, barney => 70); func(@args2); print &quot;---------- Third Call ----------\n&quot;; func(fred => 80, wilma => 90, barney => 100); ---------- First Call ---------- Parameter fred has value 20 Parameter wilma has value 30 Parameter barney has value 40 ---------- Second Call ---------- Parameter fred has value 50 Parameter wilma has value 60 Parameter barney has value 70 ---------- Third Call ---------- Parameter fred has value 80 Parameter wilma has value 90 Parameter barney has value 100
Subroutines and Recursion The ‘&’ sigil is normally left out when calling subroutines Without parenthesis the symbol is initially treated as a bareword  Parenthesis around the parameters are also optional if the interpreter has processed the declaration Whether or not you use them is a matter of style If you use the sigil and omit the parameter list then the parameter array of the current function is passed This creates a compact syntax for recursive functions © Garth Gilmour 2008
Subroutines and Recursion © Garth Gilmour 2008 sub recursion1 { until($_[0] == 0) { print &quot;$_[0] &quot;; $_[0]--; &recursion1; } } sub recursion2 { if(@_) { print(shift, &quot; &quot;); &recursion2; } } $val1 = 10; recursion1($val1); print &quot;\n&quot;; @val2 = qw(abc def ghi jkl); recursion2(@val2); 10 9 8 7 6 5 4 3 2 1  abc def ghi jkl
Anonymous Subroutines Subroutines can be declared without names Using the syntax ‘sub { … }’ What is returned is the address of the subroutine This can be captured in a reference (see later) This enables some ‘meta-programming’ techniques You can write functions which build and return functions You can write functions which take blocks of code as parameters These techniques have their own terminology A block of code passed as a parameter is a  closure One function building a simplified version of another is  currying © Garth Gilmour 2008
© Garth Gilmour 2008 Closures are cool! Closures are cool! Closures are cool! Closures are cool! Closures are cool! Closures are very cool! Closures are very cool! Closures are very cool! ab cd ef gh sub doTimes { for(1..$_[0]) { &{$_[1]}(); } } sub withMatches { for($_[0] =~ m/$_[1]/g) { &{$_[2]}($_); } } $ref = sub { print &quot;Closures are cool!\n&quot;; }; doTimes(5, $ref); doTimes(3, sub { print &quot;Closures are very cool!\n&quot;; }); withMatches(&quot;ab cd ef gh&quot;, &quot;[a-z]{2}&quot;, sub { print &quot;$_[0]\n&quot;; });
Error Handling Managing Error Conditions © Garth Gilmour 2008
Error Handling in Perl Traditional functions report errors via a return code Modern functions raise exceptions There are several functions for raising exceptions The ‘die’ function causes a value to be thrown as an exception The value is typically an error message string But could be a reference to anything you want More versatile functions are provided by the ‘Carp’ module They report errors from the users perspective To test for exceptions use ‘eval’ blocks Errors generated from code inside ‘eval { … }’ are trapped You can test and obtain them via the ‘$@’ variable © Garth Gilmour 2008
© Garth Gilmour 2008 eval { op1(); }; if($@) { print &quot;Code threw error: &quot;, $@; } sub op1 { op2(); } sub op2 { op3(); } sub op3 { die &quot;BOOM!&quot;; } eval { op1(); }; if($@) { print &quot;Error of severity &quot;, $@->{'severity'};  print &quot; thrown with message &quot;, $@->{'msg'}; } sub op1 { op2(); } sub op2 { op3(); } sub op3 { my %error = (msg => 'BOOM!', severity => 'fatal'); die \%error; }
References in Perl Using Memory Addresses © Garth Gilmour 2008
References in Perl References are an advanced feature of Perl They are similar to pointers in ‘C’ and ‘C++’ A scalar can hold the address of another variable This could be another scalar, an array, a hash or a function The address of a variable is taken via the ‘\’ sigil Sigils are combined when working with references  So if ‘ref’ is a reference to an array then the first element could be accessed with ‘$$ref[0]’, ‘${$ref}[0]’ or ‘$ref->[0]’ Each syntax is appropriate in different circumstances © Garth Gilmour 2008
© Garth Gilmour 2008 $var1 = 124; $ref1 = \$var1; print $ref1, &quot;\n&quot;; print &quot;Reference ref1 refers to value $$ref1\n&quot;; print &quot;Reference ref1 refers to value ${$ref1}\n&quot;; SCALAR(0x226d98) Reference ref1 refers to value 124 Reference ref1 refers to value 124
© Garth Gilmour 2008 @var2 = (&quot;abc&quot;,&quot;def&quot;,&quot;ghi&quot;,&quot;jkl&quot;); $ref2 = \@var2; print $ref2, “\n”; print &quot;Reference ref2 refers to array with contents: &quot;; foreach $val (@$ref2) { print &quot;$val &quot;; } print &quot;\nReference ref2 refers to array with contents: &quot;; foreach $val (@{$ref2}) { print &quot;$val &quot;; } print &quot;\nFirst three elements in array pointed to by ref2 are: &quot;; @slice = @{$ref2}[0..2]; foreach $val (@slice) { print &quot;$val &quot;; } print &quot;\nFirst item is $$ref2[0]&quot;; print &quot;\nFirst item is ${$ref2}[0]&quot;; print &quot;\nFirst item is $ref2->[0]&quot;;
© Garth Gilmour 2008 ARRAY(0x226da4) Reference ref2 refers to array with contents: abc def ghi jkl  Reference ref2 refers to array with contents: abc def ghi jkl  First three elements in array pointed to by ref2 are: abc def ghi  First item is abc First item is abc First item is abc
© Garth Gilmour 2008 %var3 = (&quot;k1&quot;,&quot;xxx&quot;,&quot;k2&quot;,&quot;yyy&quot;,&quot;k3&quot;,&quot;zzz&quot;); $ref3 = \%var3; print &quot;Reference ref3 refers to hash with contents:\n&quot;; foreach $key (keys %$ref3) { print &quot;\t\t$key indexes $$ref3{$key}\n&quot;; } print &quot;Reference ref3 refers to hash with contents:\n&quot;; foreach $key (keys %{$ref3}) { print &quot;\t\t$key indexes $ref3->{$key}\n&quot;; } print &quot;Key k1 indexes $$ref3{'k1'} \n&quot;; print &quot;Key k1 indexes ${$ref3}{'k1'} \n&quot;; print &quot;Key k1 indexes $ref3->{'k1'} \n&quot;;
© Garth Gilmour 2008 Reference ref3 refers to hash with contents: k2 indexes yyy k1 indexes xxx k3 indexes zzz Reference ref3 refers to hash with contents: k2 indexes yyy k1 indexes xxx k3 indexes zzz Key k1 indexes xxx  Key k1 indexes xxx  Key k1 indexes xxx
References and Anonymous Data All complete programming languages need to have the ability to allocate memory on demand As opposed to pre-declaring it though standard variables Consider processing records from a file You need to allocate memory on the fly for each record you find Memory is allocated via anonymous data structures There is no special keyword but rather a different syntax for creating anonymous arrays, hashes and subroutines It isn't (directly) possible to create anonymous scalars © Garth Gilmour 2008
© Garth Gilmour 2008 $ref1 = [&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;,&quot;gh&quot;]; $ref2 = { k1 => 123, k2 => 456, k3 => 789 }; $ref3 = sub { return $_[0] + $_[1]; }; print $ref1, &quot;\n&quot;; print $ref2, &quot;\n&quot;; print $ref3, &quot;\n&quot;; print $ref1->[0], &quot;\n&quot;; print $ref2->{'k1'}, &quot;\n&quot;; print $ref3->(12,5), &quot;\n&quot;; ARRAY(0x225f88) HASH(0x226d80) CODE(0x18303f0) ab 123 17
References and Data Structures Complex data structures are built using arrays and hashes in combination The syntax ‘$ref = [“abc”, “def”, “ghi”]’ creates an anonymous array and stores its address in ‘ref’ The syntax ‘$ref = { “k1” => “v1”, “k2” => “v2”}’ creates an anonymous hash and stores its address in ‘ref’ So an exam marking script could be made up of: An array of references to anonymous hashes Where each hash held a candidates details Including a reference to an anonymous array of answers © Garth Gilmour 2008
© Garth Gilmour 2008 @numerals = ( [100,&quot;C&quot;], [90,&quot;XC&quot;], [50,&quot;L&quot;], [40,&quot;XL&quot;], [10,&quot;X&quot;], [9,&quot;IX&quot;], [5,&quot;V&quot;], [4,&quot;IV&quot;], [1,&quot;I&quot;] ); print &quot;Enter the number to convert to a roman numeral...\n&quot;; $number = <STDIN>; chomp($number); foreach (@numerals) { my $decimal = $_->[0]; my $string = $_->[1]; my $times = int($number / $decimal); if($times > 0) { for(1..$times) { print $string; } $number = $number % $decimal; } }
© Garth Gilmour 2008 $ref1 = [ [&quot;abc&quot;,&quot;def&quot;,&quot;ghi&quot;], [&quot;jkl&quot;,&quot;mno&quot;,&quot;pqr&quot;], [&quot;stu&quot;,&quot;vwx&quot;,&quot;yza&quot;] ]; print &quot;Contents of 2d array are:\n&quot;; foreach(@{$ref1}) { print &quot;\t&quot;; foreach(@{$_}) { print &quot;$_ &quot;; } print &quot;\n&quot;; } Contents of 2d array are: abc def ghi  jkl mno pqr  stu vwx yza
© Garth Gilmour 2008 $ref = { k1 => { k4 => &quot;ab&quot;, k5 => &quot;cd&quot; }, k2 => { k6 => &quot;ef&quot;, k7 => &quot;gh&quot; }, k3 => { k8 => &quot;ij&quot;, k9 => &quot;kl&quot; } }; print $ref->{'k1'}->{'k4'}, &quot;\n&quot;; print $ref->{'k1'}->{'k5'}, &quot;\n&quot;; print $ref->{'k2'}->{'k6'}, &quot;\n&quot;; print $ref->{'k2'}->{'k7'}, &quot;\n&quot;; print $ref->{'k3'}->{'k8'}, &quot;\n&quot;; print $ref->{'k3'}->{'k9'}, &quot;\n&quot;; ab cd ef gh ij kl
Modules and Packages Creating Reusable Code © Garth Gilmour 2008
Modules and Code Reuse in Perl Modules are Perl libraries You can create your own or download them from CPAN They are normally found in the ‘lib’ folder of your distribution There are several types of module: Traditional and object-oriented modules are for code reuse They let you avoid re-inventing the wheel in and across projects Pragmatic Modules extend the language When loaded they alter symbol tables and interact with the interpreter, thereby adding to Perl’s functionality © Garth Gilmour 2008
Creating Perl Modules Modules are placed in a separate file By convention this is given a ‘.pm’ extension Pragmatic modules have lowercase names Other module names should contain capitals The module begins with a package declaration This creates a new namespace for symbols Within the interpreter this is represented by a new symbol table There are two ways of loading a module The ‘use’ declaration loads a module at compile time The ‘require’ declaration loads a module at runtime © Garth Gilmour 2008
Creating Perl Modules Perl does not strictly enforce barriers between modules Symbols from a module can always be used  By prefixing the symbol name with the package name You should only use the symbols a module wants you to see There is a standard mechanism for exporting symbols The module needs to require the ‘Exporter’ module And place the name ‘Exporter’ into an array called ‘@ISA’ This allows the module to place entries in other symbol tables Symbols placed in an array called ‘@EXPORT’ are automatically added to the table of the script with the ‘use’ declaration Symbols placed in an array called ‘@EXPORT_OK’ will be added to the importing symbol table if they are listed after ‘use’ © Garth Gilmour 2008
© Garth Gilmour 2008 package Maths; require Exporter; our @ISA = (&quot;Exporter&quot;); @EXPORT = qw(add multiply subtract); sub add { return $_[0] + $_[1]; } sub divide { return $_[0] / $_[1]; } sub multiply { return $_[0] * $_[1]; } sub subtract { return $_[0] - $_[1]; } use Maths; print &quot;Calculations using our maths module:&quot;; print &quot;\n\t40 + 30 is: &quot;, add(40,30); print &quot;\n\t60 - 20 is: &quot;, subtract(60,20); print &quot;\n\t50 * 10 is: &quot;, multiply(50,10); # divide is not exported so it only works  #  if we qualify the namespace print &quot;\n\t40 / 10 is: &quot;, Maths::divide(40,10);
© Garth Gilmour 2008 package Maths; require Exporter; our @ISA = (&quot;Exporter&quot;); @EXPORT_OK = qw(add multiply subtract); sub add { return $_[0] + $_[1]; } sub divide { return $_[0] / $_[1]; } sub multiply { return $_[0] * $_[1]; } sub subtract { return $_[0] - $_[1]; } use Maths qw(add multiply subtract); print &quot;Calculations using our maths module:&quot;; print &quot;\n\t40 + 30 is: &quot;, add(40,30); print &quot;\n\t60 - 20 is: &quot;, subtract(60,20); print &quot;\n\t50 * 10 is: &quot;, multiply(50,10); # divide is not exported so it only works  #  if we qualify the namespace print &quot;\n\t40 / 10 is: &quot;, Maths::divide(40,10);
Extra Syntax for Modules Modules can prevent symbols from being exported If you place a symbol name in ‘@EXPORT_FAIL’ then Perl will call an ‘export_fail’ subroutine, which can throw an error Simple versioning is supported A module can declare a ‘$VERSION’ variable Which can then be mentioned in the ‘use’ declaration E.g. ‘use Fred 2.7;’ means only version 2.7 will be accepted Code placed within the module will be executed For the module to load the last statement run must be true Most modules end with ‘1;’ to ensure this is the case A ‘BEGIN { … }’ block is run when the module is loaded © Garth Gilmour 2008
Objects in Perl Support for Object Oriented Programming Concepts © Garth Gilmour 2008
Object Oriented Perl  OO support in Perl is minimal at best It is not a good language for learning Object Oriented coding Perl provides only rudimentary support for classes and objects Leaving you to do most of the hard work yourself The ‘bless’ keyword is the key to OO in Perl Once you properly understand what it does the rest of OO in Perl becomes relatively straightforward It helps to approach Perl OO indirectly We will consider class declarations in Python and Ruby first This is a good warm-up for understanding the Perl syntax © Garth Gilmour 2008
Core Principles of OO Languages All popular OO languages use the same concepts: Class declarations are the templates for objects A class declaration is made up of members Members holding data are known as fields Members holding code are known as methods Special methods are provided with their own syntax The most important of these is the  constructor  method to be called automatically when an object is created Every object has a built in reference to itself Similar to the ‘127.0.0.1’ IP address in networking It helps to think of members as slots on the object Only slots on the outside can be used by clients  © Garth Gilmour 2008
© Garth Gilmour 2008 Client Code Account Objects account1 account2 withdraw { … } display { … } withdraw { … } display { … }
© Garth Gilmour 2008 class Account: def __init__(self,id,balance): self.id = id self.balance = balance def withdraw(self, amount): self.balance -= amount def display(self): print &quot;Account with id&quot;, self.id, &quot;and balance&quot;, self.balance account1 = Account(&quot;AB12&quot;,30000) account2 = Account(&quot;CD34&quot;,45000) print &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) print &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Constructor Class Declaration Creating Objects
© Garth Gilmour 2008 class Account def initialize(id, balance) @id = id @balance = balance end def withdraw(amount) @balance -= amount end def display() puts &quot;Account with id #{@id} and balance #{@balance}&quot; end end account1 = Account.new(&quot;AB12&quot;,30000) account2 = Account.new(&quot;CD34&quot;,45000) puts &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) puts &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Class Declaration Constructor Creating Objects
Perl Syntax for Object Orientation Perl does not have separate class declarations Instead a class is just a special type of module Fields are saved in anonymous hashes Or in whatever anonymous data structure you want An arbitrary method is used as a constructor It creates the hash and blesses it into the package Blessing associates the hash with the package Methods of the package can be called via the hash reference In the method declaration the hash is the first parameter © Garth Gilmour 2008
© Garth Gilmour 2008 package Account; sub new { my $packageName = shift; my $data = {}; $data->{'id'} = shift; $data->{'balance'} = shift; bless $data, $packageName; } sub withdraw { my $data = shift; $data->{'balance'} -= $_[0]; } sub display { my $data = shift; print &quot;Account with id $data->{'id'} and balance $data->{'balance'}\n&quot;; } Packages double as classes Any method can be a constructor Anonymous hashes hold fields
© Garth Gilmour 2008 $account1 = Account->new(&quot;AB12&quot;,30000); $account2 = Account->new(&quot;CD34&quot;,45000); print &quot;----- Before Withdrawl -----\n&quot;; $account1->display(); $account2->display(); $account1->withdraw(250); $account2->withdraw(500); print &quot;----- After Withdrawl -----\n&quot;; $account1->display(); $account2->display(); ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Creating Objects: Account->new(“AB”,1000) equals Account::new(‘Account’, “AB”, 1000) Because of blessing: $account->withdraw(250) equals Account::withdraw($account, 250)
Inheritance and Overriding in Perl These are the other two key principles of OO Inheritance enables one class to be built on top of another Overriding enables derived class methods to replace those inherited from the base class It helps to visualize objects as layered For each class in the hierarchy there is a layer in the object When a method is overridden the slot in a base layer is rewired into an implementation in the derived layer Again it helps to review the syntax of other languages Once again the Perl syntax is minimal © Garth Gilmour 2008
© Garth Gilmour 2008 Client Code account1 account2 withdraw { … } { … } Account SavingsAccount display { … } withdraw { … } display { … } Account
© Garth Gilmour 2008 class Account: def __init__(self,id,balance): self.id = id self.balance = balance def withdraw(self, amount): self.balance -= amount def display(self): print &quot;Account with id&quot;, self.id, &quot;and balance&quot;, self.balance class SavingsAccount(Account): def __init__(self,id,balance,fee): Account.__init__(self,id,balance) self.fee = fee def withdraw(self, amount): self.balance -= (amount + self.fee) account1 = Account(&quot;AB12&quot;,30000) account2 = SavingsAccount(&quot;CD34&quot;,45000,15) print &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) print &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
© Garth Gilmour 2008 class Account def initialize(id, balance) @id = id @balance = balance end def withdraw(amount) @balance -= amount end def display() puts &quot;Account with id #{@id} and balance #{@balance}&quot; end end class SavingsAccount < Account def initialize(id, balance, fee) super(id,balance) @fee = fee end def withdraw(amount) @balance -= (amount + @fee) end end account1 = Account.new(&quot;AB12&quot;,30000) account2 = SavingsAccount.new(&quot;CD34&quot;,45000,15) puts &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) puts &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
Inheritance and Overriding in Perl Inheritance is supported via ‘@ISA’ One module requires another and places its name in ‘@ISA’ This means that if a symbol is not found in the derived module the interpreter will search the base one Thereby creating the layered effect associated with inheritance The pragmatic module ‘base’ simplifies this E.g. ‘use base Employee’ in ‘Manager.pm’ Overriding works in the same way The search order means the derived version is found first The ‘SUPER’ symbol lets you access the base version This is particularly useful when creating derived constructors  © Garth Gilmour 2008
© Garth Gilmour 2008 use Account; use SavingsAccount; $account1 = Account->new(&quot;AB12&quot;,30000); $account2 = SavingsAccount->new(&quot;CD34&quot;,45000,15); print &quot;----- Before Withdrawl -----\n&quot;; $account1->display(); $account2->display(); $account1->withdraw(250); $account2->withdraw(500); print &quot;----- After Withdrawl -----\n&quot;; $account1->display(); $account2->display(); ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
© Garth Gilmour 2008 package Account; require Exporter; our @ISA = &quot;Exporter&quot;; our @EXPORT = qw(new withdraw display); sub new { my $packageName = shift; my $data = {}; $data->{'id'} = shift; $data->{'balance'} = shift; bless $data, $packageName; } sub withdraw { my $data = shift; $data->{'balance'} -= $_[0]; } sub display { my $data = shift; print &quot;Account with id $data->{'id'} and balance $data->{'balance'}\n&quot;; } 1; package SavingsAccount; use base Account; sub new { $data = shift->SUPER::new(@_); $data->{'fee'} = $_[2]; return $data; } sub withdraw { my $data = shift; $data->{'balance'} -= ($_[0] + $data->{'fee'}); } 1;
Parsing XML An Example of an OO Module © Garth Gilmour 2008
An Example of Parsing XML Files Text files are increasingly formatted as XML This adds an extra layer of structure that makes it easier to preserve the semantics of the data There are many API’s for parsing and creating XML Perl has modules that support all of these standards Note XPath is the XML version of regular expressions In XPath V2 regular expressions can be used within an XPath The most basic is the SAX standard This is a low-level event driven API Implemented by the ‘XML::Parser’ module © Garth Gilmour 2008
An Example of Parsing XML Files To parse XML create an instance of ‘XML::Parser’ As normal the constructor method is called ‘new’ The constructor method takes named parameters The ‘handler’ parameter should be an anonymous hash The hash defines callback methods E.g. the ‘Start’ and ‘End’ keys index methods to be called whenever the parser encounters opening or closing tags Parsing is triggered via a call to ‘parse’ or ‘parsefile’ The parser reads in the XML from the file or string As different parts of the file are met callbacks are triggered All your implementation is placed in the callback methods © Garth Gilmour 2008
© Garth Gilmour 2008 <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <myapp> <resources> <threads>7</threads> <server-ip>120.153.72.208</server-ip> <cache size=&quot;10&quot;/> </resources> <accounts> <user role=&quot;administrator&quot;> <id>dave</id> <password>abab12</password> </user> <user role=&quot;power-user&quot;> <id>jane</id> <password>cdcd34</password> </user> <user role=&quot;end-user&quot;> <id>fred</id> <password>efef56</password> </user> <user role=&quot;end-user&quot;> <id>sharon</id> <password>ghghij</password> </user> </accounts> </myapp> We found the following users administrator dave with password abab12 power-user jane with password cdcd34 end-user fred and with password efef56 end-user sharon with password ghghij
© Garth Gilmour 2008 #Variables to temporarily store user details our $userID; our $userRole; our $userPassword; #Flags to let us know when we are in a particular element our $isInID = 0; our $isInPassword = 0; our $parser = new XML::Parser(Handlers => {  Start => \&startElement, End  => \&endElement, Char  => \&characters }); $parser->parsefile(&quot;config.xml&quot;); print &quot;We found the following users\n&quot;; foreach(@users) { my $user = $_; print &quot;\t$$user[1] with ID $$user[0] and password $$user[2]\n&quot;; }
© Garth Gilmour 2008 sub startElement { if($_[1] eq &quot;user&quot;) { $userRole = $_[3]; } elsif($_[1] eq &quot;id&quot;) { $isInID = 1; } elsif($_[1] eq &quot;password&quot;) { $isInPassword = 1; } } sub endElement { if($_[1] eq &quot;user&quot;) { my $newUser = [$userID, $userRole, $userPassword]; push(@users, $newUser); } elsif($_[1] eq &quot;id&quot;) { $isInID = 0; } elsif($_[1] eq &quot;password&quot;) { $isInPassword = 0; } } sub characters { if($isInID) {  $userID = $_[1];  } if($isInPassword) {  $userPassword = $_[1];  } }
Database Access Another Example of OO Perl © Garth Gilmour 2008
Database Access in Perl ‘ DBI’ is the standard module for database access It enables you to access any relational database Like most modern API’s the ‘DBI’ module is a shell As with JDBC and ADO.NET the purpose of the ‘DBI’ module is to expose a common interface that conceals the type of the DB The actual functionality is provided by driver modules which implement the standard functionality in a vendor specific way The ODBC driver lets you talk to any database But is probably not as efficient as a native driver © Garth Gilmour 2008
The Architecture of the DBI Library © Garth Gilmour 2008 DBI API DBD-mysql DBD-Oracle DBD-Sybase DBD-ODBC
Using the DBI Module A ‘database handle’ is a link to the underlying driver It is a reference to an object that represents a connection A database handle is created by a call to ‘connect’ The parameter is a string that identifies the database This is written as ‘dbi: DRIVER_NAME : DRIVER_INFO ’ Forming the right connection string is half the battle Make sure you are using the right documentation for the driver The ‘disconnect’ method terminates the connection It is good practise to do this explicitly even in short scripts © Garth Gilmour 2008
Using the DBI Module Database handles are factories for statement handles These are obtained by calling the ‘prepare’ method The SQL string is specified as the parameter The statement is triggered via the ‘execute’ method If the query is a SELECT then a result set is obtained The results are stored inside the statement  This is done in a vendor specific way The results can be iterated one row at a time There a variety of methods for retrieving the values in a row The ‘dump_results’ method is a quick way of printing the data © Garth Gilmour 2008
© Garth Gilmour 2008 my $connectionString = &quot;dbi:ODBC:SomeDB&quot;; my $dbh = DBI->connect($connectionString) or die &quot;cant connect to DB!&quot;; my $statement = $dbh->prepare($insertStatement); $statement->execute($val1, $val2, $val3, $val4); listAllRows($dbh); $statement = $dbh->prepare($deleteStatement); $statement->execute(&quot;100&quot;); listAllRows($dbh); $dbh->disconnect(); sub listAllRows { my($dbh) = $_[0]; my $statement = $dbh->prepare($selectStatement); $statement->execute; print &quot;Table contents are\n&quot;; while(my($column1,$column2,$column3) = $statement->fetchrow()) { print “$column1\t$column2\t$column3\n&quot; } }
Course Project An Exam Marking System © Garth Gilmour 2008
A Course Project - Marking Exams We will be writing a script to process exam results Stage 1: A hash of arrays (marks per candidate) Stage 2: A hash of hashes (marks per candidate per subject) Stage 3: A hash of hashes of arrays Holding individual marks per subject per candidate © Garth Gilmour 2008 Name Mark Dave Jane Fred 80 70 64 81 56 90 93 76 87 64 59 55 62 68 68 71 55 79
© Garth Gilmour 2008 Name Link dave jane fred Subject Mark History 80 Maths 70 French 65 Subject Mark English 66 Maths 70 Politics 82 Subject Mark Physics 61 History 73 Spanish 52
© Garth Gilmour 2008 Name Link dave jane fred Subject Mark History Maths French Subject Mark English Maths Politics Subject Mark Physics History Spanish 10 9 10 7 5 6 9 9 8 7 6 7 4 5 8 6 0 9 5 8 7 7 6 8 7 8 4 6 5 9 6 6 7 9 8 8 10 10 8 7 6 8 9 6 8 5 0 9 10 8 7 4 8 9

More Related Content

PDF
Boogie 2011 Hi-Lite
PPT
Workin ontherailsroad
PPT
WorkinOnTheRailsRoad
PPTX
pebble - Building apps on pebble
PPTX
Coding standards
PPSX
Coding standard
PPTX
Cp 111 lecture 3
PDF
LIL Presentation
Boogie 2011 Hi-Lite
Workin ontherailsroad
WorkinOnTheRailsRoad
pebble - Building apps on pebble
Coding standards
Coding standard
Cp 111 lecture 3
LIL Presentation

What's hot (20)

PDF
Create Your Own Language
PPTX
Coding standard and coding guideline
PDF
Object-oriented Design: Polymorphism via Inheritance (vs. Delegation)
PPTX
Lexing and parsing
PPTX
Presentation c++
PDF
Object-Oriented Design: Multiple inheritance (C++ and C#)
PDF
XKE - Programming Paradigms & Constructs
PDF
PL Lecture 02 - Binding and Scope
PPT
PDF Localization
PDF
Fusing Modeling and Programming into Language-Oriented Programming
PDF
PL Lecture 01 - preliminaries
PPT
C#3.0 & Vb 9.0 New Features
PPTX
Dost.jar and fo.jar
PPT
Programming Paradigms
PPT
Andy On Closures
PPT
6 data types
PPT
Introduction to c_sharp
PPT
Chapter1pp
PPT
Building scalable and language independent java services using apache thrift
PDF
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages
Create Your Own Language
Coding standard and coding guideline
Object-oriented Design: Polymorphism via Inheritance (vs. Delegation)
Lexing and parsing
Presentation c++
Object-Oriented Design: Multiple inheritance (C++ and C#)
XKE - Programming Paradigms & Constructs
PL Lecture 02 - Binding and Scope
PDF Localization
Fusing Modeling and Programming into Language-Oriented Programming
PL Lecture 01 - preliminaries
C#3.0 & Vb 9.0 New Features
Dost.jar and fo.jar
Programming Paradigms
Andy On Closures
6 data types
Introduction to c_sharp
Chapter1pp
Building scalable and language independent java services using apache thrift
Evaluate And Analysis of ALGOL, ADA ,PASCAL Programming Languages

Viewers also liked (20)

PPTX
Aerohive Configuration guide.
PDF
CODE BLUE 2014 : Physical [In]Security: It’s not ALL about Cyber by Inbar Raz
PPT
Chapter5ccna
PPTX
6421 b Module-10
PPT
Chapter3ccna
DOCX
PPTX
Workshop on Cyber security
PPT
Chapter10ccna
PDF
US Pmp Overview 2008
PDF
CSE 2016 Future of Cyber Security by Matthew Rosenquist
PPT
The Future of Cyber Security
PPTX
ISO 27001 - information security user awareness training presentation -part 2
PDF
DTS Solution - Building a SOC (Security Operations Center)
PDF
Building a Next-Generation Security Operation Center Based on IBM QRadar and ...
PPTX
ISO 27001 - information security user awareness training presentation - Part 1
PPTX
Cyber Security 101: Training, awareness, strategies for small to medium sized...
PPT
SOC presentation- Building a Security Operations Center
PDF
Malaysia's National Cyber Security Policy
PPTX
ISO 27001 - Information security user awareness training presentation - part 3
PDF
Cyber security-report-2017
 
Aerohive Configuration guide.
CODE BLUE 2014 : Physical [In]Security: It’s not ALL about Cyber by Inbar Raz
Chapter5ccna
6421 b Module-10
Chapter3ccna
Workshop on Cyber security
Chapter10ccna
US Pmp Overview 2008
CSE 2016 Future of Cyber Security by Matthew Rosenquist
The Future of Cyber Security
ISO 27001 - information security user awareness training presentation -part 2
DTS Solution - Building a SOC (Security Operations Center)
Building a Next-Generation Security Operation Center Based on IBM QRadar and ...
ISO 27001 - information security user awareness training presentation - Part 1
Cyber Security 101: Training, awareness, strategies for small to medium sized...
SOC presentation- Building a Security Operations Center
Malaysia's National Cyber Security Policy
ISO 27001 - Information security user awareness training presentation - part 3
Cyber security-report-2017
 

Similar to Perl Development (Sample Courseware) (20)

PDF
perltut
PDF
perltut
PPT
PERL - complete_Training_Modules_Ref.ppt
PPT
PERL - complete_guide_references (1).ppt
PPTX
Perl slid
PDF
newperl5
PDF
newperl5
ODP
Introduction to Perl
ODP
Beginning Perl
PDF
Introduction to PERL Programming - Complete Notes
ODP
Introduction to Perl - Day 1
PPT
Perl Reference.ppt
PPT
LPW: Beginners Perl
PDF
Perl University: Getting Started with Perl
PPT
PERL.ppt
PPT
Perl Basics with Examples
PPT
web programming Unit VI PPT by Bhavsingh Maloth
PPT
Introduction to perl scripting______.ppt
PDF
Modern Perl for Non-Perl Programmers
perltut
perltut
PERL - complete_Training_Modules_Ref.ppt
PERL - complete_guide_references (1).ppt
Perl slid
newperl5
newperl5
Introduction to Perl
Beginning Perl
Introduction to PERL Programming - Complete Notes
Introduction to Perl - Day 1
Perl Reference.ppt
LPW: Beginners Perl
Perl University: Getting Started with Perl
PERL.ppt
Perl Basics with Examples
web programming Unit VI PPT by Bhavsingh Maloth
Introduction to perl scripting______.ppt
Modern Perl for Non-Perl Programmers

More from Garth Gilmour (20)

PPTX
Compose in Theory
PPTX
Kotlin / Android Update
PPTX
TypeScript Vs. KotlinJS
PPTX
Shut Up And Eat Your Veg
PPTX
Lies Told By The Kotlin Compiler
PPTX
A TypeScript Fans KotlinJS Adventures
PPTX
The Heat Death Of Enterprise IT
PPTX
Lies Told By The Kotlin Compiler
PPTX
Type Driven Development with TypeScript
PPTX
Generics On The JVM (What you don't know will hurt you)
PPTX
Using Kotlin, to Create Kotlin, to Teach Kotlin, in Space
PPTX
Is Software Engineering A Profession?
PPTX
Social Distancing is not Behaving Distantly
PDF
The Great Scala Makeover
PDF
Transitioning Android Teams Into Kotlin
PDF
Simpler and Safer Java Types (via the Vavr and Lambda Libraries)
PDF
The Three Horse Race
PDF
The Bestiary of Pure Functional Programming
PDF
BelTech 2019 Presenters Workshop
PDF
Kotlin The Whole Damn Family
Compose in Theory
Kotlin / Android Update
TypeScript Vs. KotlinJS
Shut Up And Eat Your Veg
Lies Told By The Kotlin Compiler
A TypeScript Fans KotlinJS Adventures
The Heat Death Of Enterprise IT
Lies Told By The Kotlin Compiler
Type Driven Development with TypeScript
Generics On The JVM (What you don't know will hurt you)
Using Kotlin, to Create Kotlin, to Teach Kotlin, in Space
Is Software Engineering A Profession?
Social Distancing is not Behaving Distantly
The Great Scala Makeover
Transitioning Android Teams Into Kotlin
Simpler and Safer Java Types (via the Vavr and Lambda Libraries)
The Three Horse Race
The Bestiary of Pure Functional Programming
BelTech 2019 Presenters Workshop
Kotlin The Whole Damn Family

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PPTX
Spectroscopy.pptx food analysis technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Big Data Technologies - Introduction.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Building Integrated photovoltaic BIPV_UPV.pdf
Machine learning based COVID-19 study performance prediction
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
Spectroscopy.pptx food analysis technology
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx

Perl Development (Sample Courseware)

  • 1. Perl Development Sample Courseware Created October 2008 © Garth Gilmour 2008
  • 2. Overview This is a three day course Core hours and breaks are flexible… The course has three goals Familiarize you with the Perl language Learn the different styles of Perl development Utility scripts, longer programs and OO applications Explore commonly used Perl modules (libraries) Please control the course Ask as many questions as possible Speed up or slow down the pace Request extra examples and exercises Don’t sit in misery!! © Garth Gilmour 2008
  • 3. Introduction to Perl History and Basic Concepts © Garth Gilmour 2008
  • 4. Introduction to Perl The Perl language was created by Larry Wall to simplify his text manipulation problems It contains a superset of all the functionality provided by shell scripting, sed and awk, plus many extra features The language is supported by a huge set of libraries Which can be downloaded free of charge from the Comprehensive Perl Archive Network website (CPAN) There are two expansions of the name: P ractical E xtracting and R eporting L anguage P athologically E clectic R ubbish L ister © Garth Gilmour 2008
  • 5. Versions of Perl and Competitors Version 5 is the current edition of Perl It represents how far the language and interpreter could progress whilst maintaining backward compatibility Perl 6 is a complete rewrite Of both the language and the interpreter The Perl 6 interpreter is known as Parrot It has been under development for a long time Today Perl has serious competition Python and Ruby are closely related languages They both aim to have cleaner syntax and better OO © Garth Gilmour 2008
  • 6. Comparing Perl, Python and Ruby © Garth Gilmour 2008 Perl Python Ruby Documentation ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Library Support ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ OO Support ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Power ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Approachability ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Ease of Mastery ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ C / C++ Interop ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Java / .NET Interop ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ Web Frameworks ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦
  • 7. Common Applications of Perl There are three possible levels of Perl coding: Short utility scripts Written in a procedural manner Without high level organization Medium size programs Divided into multiple subroutines grouped within modules Large applications Built using the concepts of object-oriented design Perl is happiest at the first level Historically that’s where its home is Problems exist at the other levels © Garth Gilmour 2008
  • 8. © Garth Gilmour 2008 Scripts Core language features Programs Strict module Subroutines References Applications Best Practises Named Parameters Modules and classes
  • 9. Learning Programming in Perl You can start coding in Perl very quickly Just place some code within a ‘.pl’ file No need to create a special ‘main’ method But Perl isn't a beginners language The interpreter assumes you know what you are doing Many things that you expect not to compile actually do E.g. ‘$fred{‘barney’} = [‘wilma’, ‘betty’]’ creates: A table (aka hash) called ‘fred’ with one row In the single row the key is ‘barney’ The value is the address of a new array The array has two boxes holding ‘wilma’ and ‘betty’ © Garth Gilmour 2008
  • 10. Common Applications of Perl Perls ‘sweet spot’ is text manipulation Reading text from a file, loading it into data structures manipulating it and generating formatted output This utility is exploited in three ways By system administrators managing large networks By developers and QA staff writing test harnesses By authors and publishers managing documents Perl used to be the main language for Web Apps But has been replaced by languages with better networking support and component models (mainly Java and C#) © Garth Gilmour 2008
  • 11. Declaring Variables in Perl Perl prefixes variables with special symbols These are commonly known as sigils In advanced coding they are used in combination Sigils make Perl code appear baffling at first Once you learn to read them they are a big help © Garth Gilmour 2008 Sigil Description $ Scalar variable - holds a single number string or reference @ Array variable - a sequence of one or more scalars % Hash variable - a maps of keys and values (both scalars) \ Reference - used to specify a scalar holds a memory address & Function - used to specify a symbol is a function name * Typeglob - used for manipulating symbol tables (advanced)
  • 12. Variables and Barewords Forgetting to add a sigil is a common mistake It can lead to unpredictable results in your code An identifier without a sigil is a ‘bareword’ The interpreter searches for a subroutine, package name, label or file-handle with that name Otherwise the identifier is assumed to be an unquoted string You should never deliberately use barewords The meaning of your code will change if someone introduces a function or filename with the same value Consider ‘print abc, “def”, “ghi”’ The bareword ‘abc’ is understood as a filehandle © Garth Gilmour 2008
  • 13. Variables and Symbol Tables Sigils in Perl enable an unusual language feature You can have more than one variable with the same name As long as the variables are of different types Perl stores details of variables in symbol tables Each symbol name (e.g. ‘fred’) is associated with a typeglob A typeglob stores the memory associated with the scalar called ‘fred’, the array and hash called ‘fred’ etc… Advanced coding techniques utilize typeglobs They are data types in their own right with their own sigil © Garth Gilmour 2008
  • 14. © Garth Gilmour 2008 Symbol Table Typeglob $fred = 12; @fred = (12,13,14); %fred = (k1 =>12, k2 => 14); sub fred { return 101; } Name Link fred Others… Type Link $ @ % &
  • 15. Perl Comments Standard Perl comments start with ‘#’ This comments out everything to the end of the line A multi-line comment is here-document or ‘heredoc’ This is signified by ‘<<‘ followed by an identifier There must be no space between the two Unless the identifier is quoted The identifier is taken as terminating the comment When it appears by itself without quotes or whitespace Note that comments are allowed in regex’s If you use the extended regular expression syntax © Garth Gilmour 2008
  • 16. Using Heredoc Comments © Garth Gilmour 2008 $myvar = << &quot;THE_END&quot;; More than prince of cats, I can tell you. O, he is the courageous captain of compliments. He fights as you sing prick-song, keeps time, distance, and proportion; rests me his minim rest, one, two, and the third in your bosom: the very butcher of a silk button, a duellist, a duellist; a gentleman of the very first house, of the first and second cause: ah, the immortal passado! the punto reverso! the hay! THE_END print &quot;--- START DATA ---\n&quot;, $myvar, &quot;--- END DATA ---\n&quot;;
  • 17. The Perl Language and Modules Perl libraries are organized as ‘modules’ One of Perl’s strengths is their number and variety All of them can be downloaded from CPAN The distinction between Perl and its libraries is fuzzy Pragmatic Modules supplement the core language By interacting with the interpreter through symbol tables An important library for large projects is ‘strict’ This causes a range of extra checks to be run on your code They can all be run or enabled on an individual basis © Garth Gilmour 2008
  • 18. The Strict Module Strict is a module that changes how you write code It performs extra checks that change what count as valid Perl Including it is a best practise for large scripts The checks can be enabled selectively E.g. ‘use strict “vars”’ turns on explicit declarations © Garth Gilmour 2008 Example Declaration Description use strict ‘vars’ User defined variables must be declared by: ♦ Using the ‘my’ or ‘our’ functions ♦ Prefixing the variable with a package name use strict ‘refs’ Symbolic references (an obsolete feature) are not allowed use strict ‘subs’ Barewords are treated as syntax errors, rather than being interpreted as subroutine names or unquoted strings
  • 19. Basic Programming The Core Perl Syntax © Garth Gilmour 2008
  • 20. Introducing Scalar Variables A scalar variable is a single box It is prefixed by the dollar sigil Perl is a weakly typed language A box may hold a number, a string or a memory address If the scalar holds an address it is known as a ‘reference’ Type conversions occur automatically So in the expression ‘$var1 = $var2 + $var3’ both types are converted to numbers before being added together As with all variables sigils are created on demand You don’t need to declare them separately © Garth Gilmour 2008
  • 21. Scalar Variables and Operators In strongly typed languages operators are overloaded So ‘var1 + var2’ would add the variables if they were numbers but concatenate them if they were strings If the types weren't matched there would be a compiler error Weak typing means Perl cannot support overloading Instead there must be an operator for each operation In the case of addition: The ‘+’ operator means add as numbers The ‘.’ operator means concatenate Conversions are made as required © Garth Gilmour 2008
  • 22. Operators Commonly Used in Perl © Garth Gilmour 2008 Description Number Version String Version Addition $var1 + $var2 $var1 . $var2 Equality $var1 == $var2 $var1 eq $va2 Ordered Comparison >, <, <=, >= lt, gt, le, ge Power Of $var1 ** 3 $var1 x 3 Bitwise Comparison &, |, ^ (NB work differently for numbers and strings) Logical &&, ||, ! and, or, not (Lower precedence) Conditional $var1 = $var2 ? 12 : 14; Range 1..4 ‘ D’ .. ‘Z’
  • 23. © Garth Gilmour 2008 $num1 = 42; $num2 = &quot;42&quot;; $result = $num1 + $num2; print &quot;adding numbers gives $result&quot;, &quot;\n&quot;x2; $result = $num1 . $num2; print &quot;adding strings gives $result&quot;, &quot;\n&quot;x2; $result = $num2 ** 3; print &quot;42 to the power of 3 is $result&quot;, &quot;\n&quot;x2; $result = $num1 x 3; print &quot;42 concatenated with itself three times is $result&quot;, &quot;\n&quot;x2; if($num1 == $num2) { print '$num1 and $num2 are equal as numbers',&quot;\n&quot;x2; } if($num1 eq $num2) { print '$num1 and $num2 are equal as strings',&quot;\n&quot;x2; }
  • 24. String Values in Detail Strings may be placed in single or double quotes They have different meanings and are not interchangeable Single quotes are not treated specially The interpreter sees them as a plain sequence of characters Double quotes cause variable interpolation Perl searches for sigils in the string and replaces them with the value of the variable (creating it if required) Note that you can also use ‘backtick’ quotes These surround a string to be run as an OS command E.g. ‘ $var1 = `ls -al` ’ runs the UNIX list command and stores the results in the variable ‘$var1’ © Garth Gilmour 2008
  • 25. © Garth Gilmour 2008 $var1 = &quot;abc&quot;; $var2 = 123; $var3 = [&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;]; print 'Values are $var1, $var2 and $var3 \n'; print &quot;Values are $var1, $var2 and $var3 \n&quot;; $path = `set path`; print &quot;Value of path environment variable is:\n $path&quot;; Values are $var1, $var2 and $var3 \nValues are abc, 123 and ARRAY(0x225e28) Value of path environment variable is: PATH=c:\jdk1.5.0_05\bin;C:\Perl\site\bin;C:\Perl\bin;c:\ruby\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;
  • 26. What is Truth in Perl? Truth is a source of confusion in Perl Many Perl functions return boolean values By Perl does not have a boolean type Three things in Perl count as false The empty string The string or number “0” An undefined variable You can obtain the undefined value by: Using a variable that has not been initialized Passing a variable as an argument to ‘undef’ Using the return value from ‘undef’ © Garth Gilmour 2008
  • 27. © Garth Gilmour 2008 $var1 = &quot;0&quot;; # The string &quot;0&quot; counts as false $var2 = &quot;&quot;; # The empty string also counts as false $var3 = &quot;AB&quot;; # Other string values are true $var4 = -12; # Other numerical values are true $var5; # Undefined values are false $var6 = undef(); # Values set to undef are false printTruth('$var1',$var1); printTruth('$var2',$var2); printTruth('$var3',$var3); printTruth('$var4',$var4); printTruth('$var5',$var5); printTruth('$var6',$var6); undef($var4); # Release storage space for var1 so it becomes undefined printTruth('$var4',$var4); sub printTruth { my ($varName,$varValue) = @_; if($varValue) { print &quot;$varName is true\n&quot;; } else { print &quot;$varName is false\n&quot;; } }
  • 28. Special Scalar Variables The Perl interpreter automatically creates variables These variables are represented by symbols rather than names This is a further source of confusion when learning Perl The ‘English’ module renames the variables more clearly © Garth Gilmour 2008 Variable Name Description $] The version of Perl supported by this interpreter $0 The name of the file containing the current script $^O The name of the operating system $_ The current item (used in input, output and loops) $/ The line separator used when reading text (default it newline)
  • 29. Special Scalar Variables © Garth Gilmour 2008 print &quot;This is verson $] of Perl\n&quot;; print &quot;Running on the $^O operating system\n&quot;; print &quot;The current script is $0 \n&quot;; @myarray = (&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;,&quot;gh&quot;); print &quot;Elements are:\n&quot;; foreach(@myarray) { print &quot;\t $_ \n&quot;; } This is verson 5.008008 of Perl Running on the MSWin32 operating system The current script is C:\perl\specialScalars.pl Elements are: ab cd ef gh
  • 30. Reading Text Into a Scalar Variable Reading text into a scalar is simple The expression ‘$line = <INPUT>’ reads a line of text from INPUT and stores it in the scalar ‘line’ The symbol within the angle braces is a handle Handles are links to resources outside your program The ‘STDIN’ and ‘STDOUT’ handles are created automatically We will see how to open and close handles later… To write data use the ‘print’ function This takes a file handle as an optional first parameter The default handle is STDOUT © Garth Gilmour 2008
  • 31. © Garth Gilmour 2008 open(INPUT,&quot;MuchAdoAboutNothing.txt&quot;); $count = 1; foreach(<INPUT>) { print(&quot;$count:\t $_&quot;); $count++; } 1: Much Ado About Nothing 2: A comedy by William Shakespear 3: 4: Act 1, Scene 1 5: Before LEONATO'S house. 6: Enter LEONATO, HERO, and BEATRICE, with a Messenger 7: 8: LEONATO 9: I learn in this letter that Don Peter of Arragon 10: comes this night to Messina.
  • 32. Conditionals and Iteration in Perl Perl supports the standard ‘if’ conditional Note that the ‘elif’ keyword is used instead of ‘else if’ The ‘unless’ is an ‘if’ in reverse So ‘if(!done()) { … }’ becomes ‘unless(done()) { … }’ This is convenient once you get used to it It is possible to place the test after the action E.g. ‘$a = 12 if $b < $c’ or ‘$a = 12 unless $b >= $c’ The C/C++ ‘switch’ keyword is not supported Although there are ways of simulating it if required The rarely used ternary conditional operator is available E.g. ‘$var1 = ($var2 == $var3) ? 17 : 19’ © Garth Gilmour 2008
  • 33. © Garth Gilmour 2008 print &quot;Enter a number\n&quot;; $number = <STDIN>; chomp($number); if($number < 10) { print &quot;$number is less than 10\n&quot;; } elsif($number < 20) { print &quot;$number is less than 20\n&quot;; } elsif($number < 30) { print &quot;$number is less than 30\n&quot;; } else { print &quot;$number is greater than 30\n&quot;; } unless($number % 2 == 0) { print &quot;$number is odd\n&quot;; } Enter a number 17 17 is less than 20 17 is odd
  • 34. Conditionals and Iteration in Perl The standard loops are supported Perl provides ‘while’, ‘do … while’ and ‘for’ loops With the same syntax and semantics as C/C++ Variations of the ‘while’ loops are available The ‘until’ and ‘do … until’ loops avoid the need to negate the conditional, but are not always intuitive The ‘for’ loop is extended in two ways: It can be used with ranges rather than counters E.g. ‘for(1..4) { … }’ or ‘for(1..$max) { … }’ The ‘foreach’ loop iterates over arrays We will meet it later when introducing data structures © Garth Gilmour 2008
  • 35. © Garth Gilmour 2008 print &quot;Enter a positive number\n&quot;; $max = <STDIN>; chomp($max); if($max <= 0) { die(&quot;Number must be positive!&quot;); } print &quot;Demo of while loop\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; $count = 0; while($count <= $max) { print &quot;\t\t$count\n&quot;; $count++; } print &quot;Demo of do..while loop\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; $count = 0; do { print &quot;\t\t$count\n&quot;; $count++; } while($count <= $max); print &quot;Demo of until loop\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; $count = 0; until($count > $max) { print &quot;\t\t$count\n&quot;; $count++; } print &quot;Demo of do..until loop\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; $count = 0; do { print &quot;\t\t$count\n&quot;; $count++; }until($count > $max); print &quot;Demo of for loop v1\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; for($count=0; $count<= $max; $count++) { print &quot;\t\t$count\n&quot;; } print &quot;Demo of for each loop v3\n&quot;; print &quot;\tNumbers from 0 to $max are:\n&quot;; for(0..$max) { print &quot;\t\t$_\n&quot;; }
  • 36. Conditionals and Iteration in Perl Loops can optionally have a continue block E.g. ‘while($a < 12) { … } continue { … }’ This is used with the loop control operators A call to ‘last’ immediately exits the loop Without executing the continue block A call to ‘next’ skips the remaining statements in this iteration But the continue block is executed before the loop condition is re-evaluated and (if true) the next iteration begins A call to ‘redo’ restarts the current iteration The continue block is not executed The loop condition is not checked © Garth Gilmour 2008
  • 37. Basic Perl I/O Using the Console and Files © Garth Gilmour 2008
  • 38. Basic Perl I/O Perl I/O is based around handles Links provided by the OS to data sources and sinks Handles for the console are built-in ‘ STDIN’ and ‘STDOUT’ represent the command prompt Input is read via the ‘< >’ operator So ‘$data = <STDIN>;’ reads a line from the console Use the ‘chomp’ function to remove the newline Output is written via the ‘print’ function If the first argument is not a handle then STDOUT is used A comma should not be placed after the handle © Garth Gilmour 2008
  • 39. Testing File Paths Perl provides built in operators for testing file paths E.g. ‘ if(-e $file && -T $file) { print &quot;$file exists and is a text file&quot;; } ’ You should always check a file before opening it © Garth Gilmour 2008 File Test Operator Description -e File exists -r File is readable -w File is writable -z File has zero size -s Returns file size -T File is a text file -B File is a binary file -S File is a socket
  • 40. Opening and Reading From Files The ‘open’ function is used to create a file handle The first argument is the symbol we want to represent the handle Files are opened in a particular mode As indicated by the character(s) before the filename The default is to open for reading © Garth Gilmour 2008 Function Description open(HANDLE, “myfile.txt”) open(HANDLE, “<myfile.txt”) Open file for reading open(HANDLE, “>myfile.txt”) Open file for writing (truncating if necessary) open(HANDLE, “>>myfile.txt”) Open file for appending open(HANDLE, “+<myfile.txt”) Open file for reading and updating
  • 41. Opening and Reading From Files The standard form of ‘open’ could cause problems If the handle name was already in use (e.g. as a subroutine) If we were trying to open a file called ‘>myfile.txt’ There are two ways around this There is a three argument form of ‘open’ The mode(s) are passed as separate arguments The handle can be stored in a scalar variable This is known as an indirect filehandle Once you have a file opened you can: Read lines from the file using the ‘< >’ operator Write to the file using the print method © Garth Gilmour 2008
  • 42. © Garth Gilmour 2008 open(INPUT,&quot;input.txt&quot;); open(OUTPUT,&quot;>output.txt&quot;); $count = 0; while($line = <INPUT>) { print OUTPUT ++$count, &quot;\t&quot;, $line; } print &quot;Processed $count lines\n&quot; This short interval was sufficient to determine d'Artagnan on the part he was to take. It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in. To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself. All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second. Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please. You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take. It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in. To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself. All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second. Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please. You said you were but three, 11 but it appears to me we are four.&quot;
  • 43. © Garth Gilmour 2008 open(INPUT, '<', &quot;input.txt&quot;); open(OUTPUT, '>', &quot;output.txt&quot;); $count = 0; while($line = <INPUT>) { print OUTPUT ++$count, &quot;\t&quot;, $line; } print &quot;Processed $count lines\n&quot; This short interval was sufficient to determine d'Artagnan on the part he was to take. It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in. To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself. All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second. Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please. You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take. It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in. To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself. All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second. Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please. You said you were but three, 11 but it appears to me we are four.&quot;
  • 44. © Garth Gilmour 2008 open($input, '<', &quot;input.txt&quot;); open($output, '>', &quot;output.txt&quot;); $count = 0; while($line = <$input>) { print $output ++$count, &quot;\t&quot;, $line; } print &quot;Processed $count lines\n&quot; This short interval was sufficient to determine d'Artagnan on the part he was to take. It was one of those events which decide the life of a man; it was a choice between the king and the cardinal--the choice made, it must be persisted in. To fight, that was to disobey the law, that was to risk his head, that was to make at one blow an enemy of a minister more powerful than the king himself. All this young man perceived, and yet, to his praise we speak it, he did not hesitate a second. Turning towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to correct your words, if you please. You said you were but three, but it appears to me we are four.&quot; 1 This short interval was sufficient to determine d'Artagnan on the 2 part he was to take. It was one of those events which decide the 3 life of a man; it was a choice between the king and the 4 cardinal--the choice made, it must be persisted in. To fight, 5 that was to disobey the law, that was to risk his head, that was 6 to make at one blow an enemy of a minister more powerful than the 7 king himself. All this young man perceived, and yet, to his 8 praise we speak it, he did not hesitate a second. Turning 9 towards Athos and his friends, &quot;Gentlemen,&quot; said he, &quot;allow me to 10 correct your words, if you please. You said you were but three, 11 but it appears to me we are four.&quot;
  • 45. Opening and Reading From Files Lines from a file should be read using a ‘while’ loop A ‘for’ loop causes the interpreter to create a list of all the lines from the file, which is then iterated over File handles should be closed via ‘close’ Handles stored as scalars are automatically closed when the variable goes out of scope, but you may want to do this earlier You should verify calls to ‘open’, ‘print’ and ‘close’ Both return a boolean value to indicate success or failure You can throw an error using the ‘die’ or ‘croak’ functions We will cover these in depth later in the course © Garth Gilmour 2008
  • 46. © Garth Gilmour 2008 open($input, '<', &quot;input.txt&quot;) or die &quot;Can't open input&quot;; open($output, '>', &quot;output.txt&quot;) or die &quot;Can't open output&quot;; $count = 0; while($line = <$input>) { print($output ++$count, &quot;\t&quot;, $line) or die &quot;Can't write to file&quot;; } print &quot;Processed $count lines\n&quot;; close($input) or die &quot;Can't close input&quot;; close($output) or die &quot;Can't close output&quot;;
  • 47. File Handles and Globbing You can place a pattern inside the ‘< >’ operator In which case the pattern is ‘globbed’ E.g. ‘@dirs = <../*>;’ To avoid confusion use the ‘glob’ function E.g. ‘@dirs = glob(‘../*’);’ Globbing is often used to change file properties ‘ chmod’ and ‘chown’ change a files access rights and ownership E.g. ‘while(glob(“*.pl”)) { chmod(O777, $_); }’ E.g. ‘while(glob(“*.pl”)) { chown($user_id,$group_id, $_); }’ © Garth Gilmour 2008
  • 48. © Garth Gilmour 2008 @exampleDirectories = glob('..\*'); print &quot;Perl example files are: \n&quot;; foreach $dir (@exampleDirectories) { @perlFiles = glob($dir . '\*.pl'); foreach(@perlFiles) { #characters preceding a slash preceding # letters ending in '.pl' m/.*\\(\w+\.pl)/; print &quot;\t$1 in $dir \n&quot;; } } Perl example files are: arrayFunctions.pl in ..\arrays arraysAndLists.pl in ..\arrays days.pl in ..\arrays forEach.pl in ..\arrays fork.pl in ..\concurrency threads.pl in ..\concurrency customerDB.pl in ..\databases arraysOfArrays.pl in ..\dataStructures checkingErrors.pl in ..\files indirectHandles.pl in ..\files basicHashes.pl in ..\hashes extraSyntax.pl in ..\hashes
  • 49. Arrays in Perl Creating and Using Lists © Garth Gilmour 2008
  • 50. Introducing Arrays in Perl In other languages arrays cannot change size Hence they must be supplemented with data structures E.g. the C++ STL or the Collections libraries in Java and C# In Perl arrays can grow and shrink as required So there is no need for a separate ‘vector’ or ‘LinkedList’ type If the array is of size 10 and you try to store something in box 100 then the size is automatically changed Often arrays are created implicitly E.g. ‘$myarray[9] = “abc”’ would create an array called ‘myarray’ with ten boxes, all of which were undefined apart from the last © Garth Gilmour 2008
  • 51. © Garth Gilmour 2008 $myarray[8] = &quot;string in 9th box&quot;; $myarray[10] = &quot;string in 11th box&quot;; $count = 0; foreach(@myarray) { print $count++,&quot;: $_\n&quot;; } 0: 1: 2: 3: 4: 5: 6: 7: 8: string in 9th box 9: 10: string in 11th box
  • 52. Support for Arrays in Perl Arrays are declared with the ‘@’ sigil E.g. ‘@myarray = (“abc”, 123, “def”, 456, “ghi”, 789)’ Normally we create an array based on a list of values The ‘qw’ operator can be used to avoid quoting E.g. ‘@myarray = qw(abc 123 def 456 ghi 789)’ Lists can also be initialized based on arrays E.g. ‘($val1,$val2,$val3) = @myarray’ copies the values in the first three boxes into the scalar variables ‘ ($val1) = @myarray’ is an idiom for grabbing the first value It is equivalent to ‘$val1 = $myarray[0]’ © Garth Gilmour 2008
  • 53. Support for Arrays in Perl Note that lists are only used for grouping Unlike arrays their existence is only ever temporary The ‘@’ sigil is not used when indexing Instead of ‘@myarray[1]’ we write ‘$myarray[1]’ This is because the value we are accessing is a scalar Arrays can be created based on slices of other arrays E.g. ‘@array2 = @array1[1..3]’ creates a new array called ‘array2’ holding copies of boxes 2,3 and 4 in ‘array1’ © Garth Gilmour 2008
  • 54. © Garth Gilmour 2008 @tstArray = qw(abc def ghi jkl mno pqr); ($first) = @tstArray; ($a,$b,$c) = @tstArray; print &quot;First element is: $first\n&quot;; print &quot;First three elements are: $a $b $c\n&quot; First element is: abc First three elements are: abc def ghi
  • 55. Special Features of Arrays Perl makes assumptions about your use of arrays Almost anything you might write is meaningful Even if the meaning is not what you intended If you put an array in a scalar context the size is used ‘ $var = @myarray’ stores the size of ‘myarray’ in ‘var’ ‘ $var = @myarray - 1’ would store the index of the last box You can easily copy values into another array E.g. ‘@myarray3 = (@myarray1,@myarray2)’ would mean ‘myarray3’ contained copies of the values in the other two arrays It does not create a multidimensional array © Garth Gilmour 2008
  • 56. Special Features of Arrays There is an easy way to find the last index For ‘@myarray’ it is stored in the variable ‘$#myarray’ Arrays can be shrunk if required The special variable ‘$#myarray’ is not immutable So ‘$#myarray -= 2’ removes the last two boxes of ‘myarray’ There are two ways to empty out an array By assigning its size to -1 E.g. ‘$#myarray = -1’ By assigning it to an empty list E.g. ‘@myarray = ()’ © Garth Gilmour 2008
  • 57. © Garth Gilmour 2008 @workDays = (&quot;Monday&quot;,&quot;Tuesday&quot;,&quot;Wednesday&quot;,&quot;Thursday&quot;,&quot;Friday&quot;); @weekendDays = qw(Saturday Sunday); @days = (@workDays,@weekendDays); $numDays = @days; $firstDay = $days[0]; $lastDay = $days[$#days]; print &quot;There are $numDays days in a week\n\n&quot;; print &quot;$firstDay is the first day and $lastDay is the last day \n&quot;; print &quot;\nThe other days are:\n&quot;; foreach $day (@days) { unless($day eq $firstDay or $day eq $lastDay) { print $day, &quot;\n&quot;; } } print &quot;\nThe last four days are:\n&quot;; @lastDays = @days[3..6]; foreach $day (@lastDays) { print $day, &quot;\n&quot;; }
  • 58. Iterating Over Arrays The easiest way to loop over an array is ‘foreach’ This iterates over a list of values and assigns each to a scalar variable You can specify the scalar or use the built-in ‘$_’ The ‘foreach’ keyword is just an alias for ‘for’ Which one you use is a matter of style © Garth Gilmour 2008 @myarray = qw(abc def ghi jkl); print &quot;Loop one\n&quot;; foreach $item (@myarray) { print &quot;\t&quot;,$item,&quot;\n&quot; ; } print &quot;Loop two\n&quot;; foreach (@myarray) { print &quot;\t&quot;,$_,&quot;\n&quot; ; } print &quot;Loop three\n&quot;; for $item (@myarray) { print &quot;\t&quot;,$item,&quot;\n&quot; ; } print &quot;Loop four\n&quot;; for (@myarray) { print &quot;\t&quot;,$_,&quot;\n&quot; ; }
  • 59. Functions for Working With Arrays Perl provides powerful functions for manipulating arrays The ‘push’ and ‘pop’ functions add and remove from the end The ‘unshift’ and ‘shift’ functions do the same from the start © Garth Gilmour 2008 Function Name Description push Add a new box to the end of the array pop Remove a box from the end of the array unshift Add a new box to the start of the array shift Remove the first box in the array join Join all the values in the array into a string, separated by a delimiter split Create an array by splitting a string into a sequence of sub-strings, using a regular expression to specify the delimiter token(s)
  • 60. © Garth Gilmour 2008 @myarray1 = qw(abc def ghi); $val1 = pop(@myarray1); print &quot;\nJust popped $val1, contents now:\n&quot;; foreach(@myarray1) { print &quot;\t$_\n&quot;; } push(@myarray1,&quot;zzz&quot;); print &quot;\nJust pushed zzz, contents now:\n&quot;; foreach(@myarray1) { print &quot;\t$_\n&quot;; } $val1 = shift(@myarray1); print &quot;\nJust shifted $val1, contents now:\n&quot;; foreach(@myarray1) { print &quot;\t$_\n&quot;; } unshift(@myarray1,&quot;AAA&quot;); print &quot;\nJust unshifted AAA, contents now:\n&quot;; foreach(@myarray1) { print &quot;\t$_\n&quot;; } Just popped ghi, contents now: abc def Just pushed zzz, contents now: abc def zzz Just shifted abc, contents now: def zzz Just unshifted AAA, contents now: AAA def zzz
  • 61. Hashes in Perl Creating and Using Tables © Garth Gilmour 2008
  • 62. Introducing Hashes in Perl Hashes are the second built in data type A hash is a data structure that works like a map or table The name comes from the use of a hashing algorithm The ‘%’ sigil is used when declaring hashes As with arrays this is not used when referring to values E.g. ‘$myhash{“k1”}’ returns the value for the key ‘k1’ in the hash ‘%myhash’ and ‘$myhash{“k1”} = 12’ sets it Hashes can be declared and expanded explicitly ‘ $myhash{“k1”} = 12’ creates a hash called ‘myhash’ if required Otherwise if the row does not exist it is added to the hash © Garth Gilmour 2008
  • 63. © Garth Gilmour 2008 $myhash{'k1'} = &quot;abc&quot;; $myhash{'k2'} = 123; $myhash{'k3'} = &quot;def&quot;; $myhash{'k4'} = 456; foreach $key (keys %myhash) { print $key, &quot; indexes &quot;, $myhash{$key}, &quot;\n&quot;; } k2 indexes 123 k1 indexes abc k3 indexes def k4 indexes 456
  • 64. Initializing a Hash Like arrays hashes can be initialized via lists E.g. ‘%myhash = (“k1”, 123, “k2”, “XYZ”)’ creates a hash with two rows, where the keys are ‘k1’ and ‘k2’ Again ‘qw’ can be used to avoid quoting literals The odd numbers become keys and the even numbers values The ordering of the rows cannot be predicted The ‘fat comma’ notation makes things clearer E.g. ‘%myhash = (k1 => 123, k2 => “XYZ”)’ The ‘=>’ operator is the same as the comma, except that the value on the left hand side is quoted if required Each pair should be placed on its own line for clarity © Garth Gilmour 2008
  • 65. © Garth Gilmour 2008 %myhash = ( k1 => 123, k2 => &quot;abc&quot;, k3 => 456, k4 => &quot;def&quot;, k5 => 789 ); foreach $key (keys %myhash) { print $key, &quot; indexes &quot;, $myhash{$key}, &quot;\n&quot;; } k5 indexes 789 k2 indexes abc k1 indexes 123 k3 indexes 456 k4 indexes def
  • 66. Functions for Working With Hashes As with arrays there are built-in functions for hashes ‘ keys’ and ‘values’ return the entries in different columns The ‘each’ function is slightly complex Every time it is called it returns a list holding a key/value pair When the end of the hash is reached it returns a null array © Garth Gilmour 2008 Function Name Description each Returns a list of two values representing a row in the hash exists Returns true if a specified entry exists in the hash keys Returns a list of all the keys in the hash values Returns a list of all the values in the hash delete Removes a row from the hash
  • 67. © Garth Gilmour 2008 %myhash = ( k1 => 123, k2 => &quot;abc&quot;, k3 => 456, k4 => &quot;def&quot;, k5 => 789 ); print &quot;Keys are:\n&quot;; foreach $key (keys %myhash) { print &quot;\t&quot;, $key, &quot;\n&quot;; } print &quot;Values are:\n&quot;; foreach $value (values %myhash) { print &quot;\t&quot;, $value, &quot;\n&quot;; } print &quot;Entries are:\n&quot;; while (($key, $value) = each(%myhash)) { print &quot;\t$key indexes $value \n&quot;; } Keys are: k5 k2 k1 k3 k4 Values are: 789 abc 123 456 def Entries are: k5 indexes 789 k2 indexes abc k1 indexes 123 k3 indexes 456 k4 indexes def
  • 68. Special Syntax for Hashes A list can be assigned to an array E.g. ‘@myarray = %myhash’ causes all the keys and values from ‘myhash’ to be inserted into ‘myarray’ The items are added in the order they are found Rather than the order in which they were added Slices of hashes can be obtained E.g. ‘($v1,$v2) = @myhash { “k1”, “k2” }’ stores the values associated with ‘k1’ and ‘k2’ into the two scalar variables Slicing can also be used to add values E.g. ‘@myhash { “k1”, “k2”, “k3” } = (“abc”, 123, “def”)’ adds three key/value pairs into the hash © Garth Gilmour 2008
  • 69. © Garth Gilmour 2008 %tstHash = (&quot;k1&quot;,&quot;v1&quot;,&quot;k2&quot;,&quot;v2&quot;); print &quot;Original hash contents:\n&quot;; foreach(keys(%tstHash)) { print(&quot;$_ indexes &quot;,$tstHash{$_},&quot;\n&quot;); } @tstHash {&quot;k3&quot;,&quot;k4&quot;,&quot;k5&quot;,&quot;k6&quot;} = (111,222,333,444); print &quot;\nHash contents after insertions:\n&quot;; foreach(keys(%tstHash)) { print(&quot;$_ indexes &quot;,$tstHash{$_},&quot;\n&quot;); } ($var1,$var2,$var3) = @tstHash {&quot;k1&quot;,&quot;k2&quot;,&quot;k3&quot;}; print &quot;\nValues in scalars are $var1 $var2 and $var3\n&quot;; @elements = %tstHash; print &quot;\nArray contents:\n&quot;; foreach(@elements) { print &quot;$_ &quot;; } Original hash contents: k2 indexes v2 k1 indexes v1 Hash contents after insertions: k5 indexes 333 k2 indexes v2 k1 indexes v1 k6 indexes 444 k3 indexes 111 k4 indexes 222 Values in scalars are v1 v2 and 111 Array contents: k5 333 k2 v2 k1 v1 k6 444 k3 111 k4 222
  • 70. Hashes of Anonymous Arrays This is the first advanced data structure we will meet We introduce it now because it is so useful The syntax ‘[12, “AB”]’ creates an anonymous array No name is associated with the array Instead the ‘[ ]’ operator returns its address We can store the addresses in a hash Indexed by an appropriate key This can be used to store all kinds of data E.g. exam results for students © Garth Gilmour 2008
  • 71. © Garth Gilmour 2008 %results = ( dave => [54, 62, 73, 48], jane => [59, 67, 82, 70], fred => [92, 64, 59, 71] ); results dave jane fred 54 62 73 48 59 67 82 70 92 64 59 71
  • 72. Hashes of Anonymous Arrays To work with the array as a whole use ‘@{ }’ E.g. ‘@{$myhash{“dave”}}’ means get the array whose address is indexed in ‘myhash’ by the key ‘dave’ To work with array elements use the arrow operator E.g. ‘$myhash{“dave”}->[1]’ means get the value in ‘myhash’ indexed by ‘dave’ and then go to box 2 in the array it references ‘ ${$myhash{“dave”}}[1]’ is also valid But is very hard to decode ‘ $$myhash{“dave”}[1]’ would be interpreted as meaning that ‘myhash’ is a scalar variable holding the address of the array © Garth Gilmour 2008
  • 73. © Garth Gilmour 2008 %actors = ( &quot;george clooney&quot; => [&quot;Oceans 11&quot;, &quot;The Peacemaker&quot;, &quot;O Brother Where Art Thou&quot;], &quot;harrison ford&quot; => [&quot;Star Wars&quot;,&quot;Sabrina&quot;,&quot;Indiana Jones&quot;], &quot;robin williams&quot; => [&quot;Good Morning Vietnam&quot;,&quot;Hook&quot;,&quot;The Birdcage&quot;] ); print &quot;\nThe list of actors and their movies is: \n&quot;; foreach $actor (keys %actors) { print &quot;\t$actor starred in: \n&quot;; foreach $film(@{$actors{$actor}}) { print &quot;\t\t$film\n&quot;; } } The list of actors and their movies is: robin williams starred in: Good Morning Vietnam Hook The Birdcage harrison ford starred in: Star Wars Sabrina Indiana Jones george clooney starred in: Oceans 11 The Peacemaker O Brother Where Art Thou
  • 74. Regular Expressions Part 1: Core Concepts © Garth Gilmour 2008
  • 75. Introducing Regular Expressions A Regular Expression is a pattern in text The commonly used shorthand is ‘regex’ Regex’s are used to find matches in search strings E.g. the regex for an email might be: One or more lowercase or uppercase letters Optionally a dot and one or more letters (any case) The ‘@’ symbol followed by the company name One of a range of supported prefixes (.com or .co.uk or .ie) Regex’s can save a huge amount of time and effort Especially when compared to writing your own parser © Garth Gilmour 2008
  • 76. The Syntax of Regular Expressions Regular expressions are a ‘little language’ Like SQL and XPath they have their own syntax Unfortunately the special characters in the regex language can also occur as part of your search string E.g. the dot is short for any character, so if you mean the actual character dot it must be escaped The syntax of regex’s has developed over time Initially in UNIX commands and then in Perl scripting Perl V5 set the standard for regex support Most languages now support the Perl 5 regex syntax Your regex’s are run by an ‘Expression Engine’ The details of how this works can be quite complex © Garth Gilmour 2008
  • 77. Key Regex Concept No 1 The search for the next match starts from just after the end of the last successful match So if the pattern is ‘three uppercase letters’ and the pattern is ‘ABCDEF’ then the matches are ‘ABC’ and ‘DEF’ Not ‘ABC’ followed by ‘BCD’ followed by ‘CDE’ and so on © Garth Gilmour 2008 A B C D E F G H I J K L M N O Match No 1 Match No 2 Match No 3 Match No 4 Match No 5
  • 78. Key Regex Concept No 2 Regular expressions are greedy by default So given the pattern ‘one or more uppercase letters’ and the search string ‘ABCDEFg’ the match is ‘ABCDEF’ The engine always selects the largest possible set of characters It is possible to use non-greedy matching symbols instead © Garth Gilmour 2008 A B C D E f g h I J K L M N o p q r S T U V W X y z Match 1 Match 2 Match 3
  • 79. Character Classes The building block of all regex’s is the character class This defines a set of symbols to find e.g. ‘ [aeiou]’ matches any vowel ‘ [a-z]’ matches uppercase letters ‘ [A-Z]’ matches lowercase letters ‘ [a-zA-Z]’ matches a letter in either case Note that it is a set and not a sequence ‘ [abc]’ matches ‘a’ OR ‘b’ OR ‘c’ and NOT ‘abc’ The top hat symbol negates the character class So ‘[^aeiou]’ matches any character that isn't a vowel Note that outside a character class ‘^’ has another meaning © Garth Gilmour 2008
  • 80. Shortcuts for Character Classes There are two shortcut notations for character classes One provided by the Perl version of regular expressions The other by the POSIX standards (this is very rarely used) © Garth Gilmour 2008 Perl Shortcut Description Character Class \d Digit [0-9] \D Non-Digit [^0-9] \s Whitespace Character [ \t\n\r\f] \S Non Whitespace Character [^ \t\n\r\f] \w Word Character [a-zA-Z0-9_] \W Non-Word Character [^a-zA-Z0-9_]
  • 81. Specifying Multiplicities By default a character class matches one instance You can specify a different number in braces So ‘[a-z]’ is the same as ‘[a-z]{1}’ Separating numbers by commas specifies a range So ‘[a-z]{2,4}’ means between two and four lowercase letters The question mark signifies optionality So ‘[a-z]{2}[A-Z]?’ specifies two lowercase letters optionally followed by a single uppercase letter There are two meta-characters used for many The plus means one or more and the star zero or more So ‘[a-z]+[A-Z]*’ means one or more lowercase letters followed by zero or more uppercase letters (note that greediness applies) © Garth Gilmour 2008
  • 82. Specifying Points Within the Input Two characters signify the start and end points The ‘top hat’ specifies the start The dollar specifies the end These are very useful ‘ ^$’ matches blank lines ‘ ^[a-zA-Z]{10}’ captures the first 10 characters ‘ [a-zA-Z]{10}$’ captures the last 10 characters ‘ ^[^0-9]{5}’ captures the first 5 characters if they are not digits What you mean by start and end points can vary You can select whether they match the start and end of the entire string or each line embedded within it © Garth Gilmour 2008
  • 83. Using Submatches Within a Regex Matches can contain submatches By placing part of the regex in braces it can be accessed separately from the main match E.g. applying ‘([a-z]+)([A-Z]+)’ to ‘ABCdefGHIjkl’ matches ‘defGHI’ with submatches of ‘def’ and ‘GHI’ Braces are used for both grouping and submatches If you want to use braces for grouping only use ‘(?: … )’ Submatches can be very helpful E.g. you want to match email addresses but capture the name and domain prefix separately © Garth Gilmour 2008
  • 84. © Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Z]{2} Matches: AB GH MN ST ABCdefGHIjklMNOpqrSTU [A-Z]{3} Matches: ABC GHI MNO STU ABCdefGHIjklMNOpqrSTU [A-Z]{3}[a-z] Matches: ABCd GHIj MNOp
  • 85. © Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Z]+[a-z]+ Matches: ABCdef GHIjkl MNOpqr ABCdefGHIjklMNOpqrSTU ([A-Z]+)([a-z]+) Matches: ABCdef Group 1: ABC Group 2: def GHIjkl Group 1: GHI Group 2: jkl MNOpqr Group 1: MNO Group 2: pqr ABCdefGHIjklMNOpqrSTU [A-Za-z]+ Matches: ABCdefGHIjklMNOpqrSTU
  • 86. © Garth Gilmour 2008 ABCdefGHIjklMNOpqrSTU [A-Za-z]{5} Matches: ABCde fGHIj klMNO pqrST ABCdefGHIjklMNOpqrSTU ^[A-Za-z]{5} Matches: ABCde ABCdefGHIjklMNOpqrSTU [A-Za-z]{5}$ Matches: qrSTU ABCdefGHIjklMNOpqrSTU [A-Za-z]{5,8} Matches: ABCdefGH IjklMNOp rqSTU
  • 87. Other Meta-Characters The bar is used as a logical OR So ‘(com|ie|net)’ matches one of three prefixes Note that placing spaces around the bar changes the pattern The dot matches any character You can choose whether or not this includes newlines The slash is used to escape meta-characters So ‘(.com|.ie|.net)’ matches any character plus the prefix whereas ‘(\.com|\.ie|\.net)’ matches the prefix with the dot In Sed ‘\<’ and ‘\>’ match the start and end of a word Perl does not support this and instead uses ‘\b’ for both © Garth Gilmour 2008
  • 88. Regular Expressions Part 2: Perl Syntax © Garth Gilmour 2008
  • 89. Regular Expressions in Perl Perl 5 is the established standard for regex’s The ‘=~’ operator is used to apply an expression E.g. ‘$match = $data =~ m/[A-Z]+/’ By default the operator returns a true/false value The ‘!~’ operator is the reverse of ‘=~’ If successful the matching group is stored in ‘$&’ ‘ $`’ holds the text before the match ‘ $’’ holds the text after the match The ‘g’ modifier causes an array to be returned E.g. ‘@results = $data =~ m/[A-Z]+/g’ A ‘foreach’ loop can be used to iterate over the results © Garth Gilmour 2008
  • 90. © Garth Gilmour 2008 $data = &quot;ABCdefGHIjklMNOpqrSTUvwxYZA&quot;; $m1 = $data =~ m/[A-Z]/; if($m1) { print &quot;Match one is $&\n\n&quot;; } $m2 = $data =~ m/[A-Z]{2}/; if($m2) { print &quot;Match two is $&\n\n&quot;; } $m3 = $data =~ m/[A-Z]+/; if($m3) { print &quot;Match three is $&\n\n&quot;; } $m4 = $data =~ m/[^e]+/; if($m4) { print &quot;Match four is $&\n\n&quot;; } $m5 = $data =~ m/[A-Z]{3}[a-z]{2}/; if($m5) { print &quot;Match five is $&\n\n&quot;; } Match one is A Match two is AB Match three is ABC Match four is ABCd Match five is ABCde
  • 91. © Garth Gilmour 2008 $m6 = $data =~ m/[A-Z]+[a-z]+/; if($m6) { print &quot;Match six is $&\n\n&quot;; } $m7 = $data =~ m/[A-Za-z]{9}/; if($m7) { print &quot;Match seven is $&\n\n&quot;; } $m8 = $data =~ m/.+/; if($m8) { print &quot;Match eight is $&\n\n&quot;; } $m9 = $data =~ m/^.{8}/; if($m9) { print &quot;Match nine is $&\n\n&quot;; } $m10 = $data =~ m/.{8}$/; if($m10) { print &quot;Match ten is $&\n\n&quot;; } Match six is ABCdef Match seven is ABCdefGHI Match eight is ABCdefGHIjklMNOpqrSTUvwxYZA Match nine is ABCdefGH Match ten is TUvwxYZA
  • 92. © Garth Gilmour 2008 @groupOne = $data =~ m/[A-Z]{3}/g; print &quot;First group of matches are:\n&quot;; foreach(@groupOne) { print &quot;\t$_\n&quot;; } @groupTwo = $data =~ m/[a-z]{3}/g; print &quot;Second group of matches are:\n&quot;; foreach(@groupTwo) { print &quot;\t$_\n&quot;; } @groupThree = $data =~ m/.{4}/g; print &quot;Third group of matches are:\n&quot;; foreach(@groupThree) { print &quot;\t$_\n&quot;; } First group of matches are: ABC GHI MNO STU YZA Second group of matches are: def jkl pqr vwx Third group of matches are: ABCd efGH Ijkl MNOp qrST Uvwx
  • 93. Regular Expressions in Perl Several other modifiers can be used with ‘m//’ E.g. by default ‘.’ does not match new-line characters, so ‘.+’ will not cross embedded new-lines unless you use ‘s’ Submatches are stored in numbered scalar variables So ‘$1’ holds the first submatch and so on © Garth Gilmour 2008 Modifier Description g Finds all the matches in the string i Makes the pattern case-insensitive s Means ‘.’ matches new-line characters m Means ‘^’ and ‘$’ match substrings
  • 94. Using Regular Expressions in Perl Best practice is to use extended expressions This is enabled via the ‘x’ modifier The ‘x’ modifier has two effects: Whitespace within the regex is disregarded The regex can be spread over multiple lines Standard Perl comments can be added These let you explain the your intent Multi line regexes should be bracketed with ‘{ }’ The interpreter will accept any characters as brackets If an opening brace is used then the reverse is expected E.g. ‘m![A-Z]+!’, ‘m?[A-Z]+?’, or ‘m{[A-Z]+}’, © Garth Gilmour 2008
  • 95. Using Regular Expressions in Perl © Garth Gilmour 2008 print &quot;Enter the email address:\n&quot;; chomp($email = <STDIN>); #easier to read version of ([a-z]+(\.[a-z]+)?)\@megacorp(\.com|\.ie|\.co\.uk) $m = $email =~ m{^ #start of string ( #start of inner match 1 [a-z]+ #one or more lowercase letters (\.[a-z]+)? #optionally a dot and letters (inner match 2) ) #end of inner match 1 \@megacorp #company name NB need to escape array (\.com|\.ie|\.co\.uk) #possible domain names $ #end of string }x; if($m) { print &quot;Recognized email for $1 in domain $3\n&quot;; } else { print &quot;Invalid address!\n&quot;; }
  • 96. Pattern Matching and Substitutions The ‘s///’ operator is used to make substitutions Its syntax is ‘s/ regex / replacement text / modifiers ’ The replacement text can contain ‘$1’, ‘$2’ etc… By default a substitution is only made for the first match By using the ‘g’ modifier all matches are replaced The return value is the number of substitutions made The ‘e’ modifier is especially useful The replacement text is executed as a Perl expression The standard use of this is to replace a match with the result of running one of more functions against the match © Garth Gilmour 2008
  • 97. © Garth Gilmour 2008 $test = &quot;aabbccddyzeeffgghhiijjkkllmmnnoo&quot;; # Replace all occurances of ff with FF $test =~ s/ff/FF/g; print $test , &quot;\n&quot;; # Replace any occurance of four characters at the start of the string with XX $test =~ s/^.{4}/XX/g; print $test , &quot;\n&quot;; # Replace any lower case characters with their upper case equivalents $test =~ s/([a-z])/uc($1)/eg; print $test , &quot;\n&quot;; aabbccddyzeeFFgghhiijjkkllmmnnoo XXccddyzeeFFgghhiijjkkllmmnnoo XXCCDDYZEEFFGGHHIIJJKKLLMMNNOO
  • 98. Regular Expressions Part 3: Advanced Concepts © Garth Gilmour 2008
  • 99. Regular Expressions - Advanced So far we have covered the 80% of regular expressions that developers use 99% of the time But there is still a lot of extra functionality available We mention some of the advanced features here You are encouraged to play with them after the course These features may be useful when you are: Trying to parse non-ASCI based text Trying to shorten an overly long expression Extracting information from poorly organised data Searching for the most efficient regex possible © Garth Gilmour 2008
  • 100. Non Greedy Matching We have seen that matching is greedy E.g. ‘A.+B’ grabs as many characters from A--> B So against ‘nnnAnnnBnnnBnnn’ the first match is ‘AnnnBnnnB’ Rather than ‘AnnnB’ which may well be what you wanted Greedy matching makes it hard to define ‘end tokens’ It is possible to have a non-greedy match ‘ .+?’ still means grab one of more characters But it takes as few as possible to create a successful match ‘ .*?’ does the same thing for zero of more characters ‘ ??’ is the non-greedy version of ‘?’ Given the choice between grabbing zero or one characters it prefers to grab zero, as long as the match will still succeed © Garth Gilmour 2008
  • 101. © Garth Gilmour 2008 abcDEfghIJklm [a-zA-Z]+[A-Z]{2} abcDEfghIJ Matches: abcDE fghIJ [a-zA-Z]+?[A-Z]{2} [a-zA-Z]*[A-Z]{2} abcDEfghIJklm Matches: ab cDEfg hIJkl [a-zA-Z]*?[A-Z]{2}
  • 102. Parenthesis Which Do Not Capture We have seen that parenthesis serve two functions: As in coding they group separate constructs They are used to define matches-within-matches Sometimes you only want the first function You need to separate out part of the expression but don’t want to capture the result within a submatch E.g. ‘(\.com|\.ie)’ when matching against URL’s The ‘(?: … )’ syntax provides non-capturing brackets E.g. ‘(?:\.com|\.ie)’ lets you specify optionality without capturing © Garth Gilmour 2008
  • 103. Modifying Just Part of the Pattern We know modifiers can be placed after ‘m//’ or ‘s///’ E.g. the ‘i’ modifier makes the match case insensitive But the modifier applies to the whole pattern What if only part of the regex should be case insensitive? What if ‘.’ should match newlines only in one place? The ‘(?imsx: … )’ syntax lets you do this E.g. if you say ‘[A-Z]{3}(?i:[A-Z]{2})[A-Z]{3}’ this matches ‘ABCdeFGH’ or ‘ABCDEFGH’ but not ‘abcdefgh’ You can use if to turn off modifiers as well E.g. ‘g/… (?-i: …) …/i’ makes part of the pattern case sensitive © Garth Gilmour 2008
  • 104. Matching Without Capturing It is possible to check what is ahead or behind you Without capturing these characters as part of the match The proper name for these is ‘lookaround’ assertions This is useful when what comes after you is both a condition of your match and part of the next match © Garth Gilmour 2008 Assertion Explanation (?= …) Looks ahead to see if a pattern occurs without capturing (?! …) Looks ahead to see if a pattern does not occur without capturing (?<= …) Looks behind to see if a pattern occurs without capturing (?<! …) Looks behind to see if a pattern does not occur without capturing
  • 105. Building Regex Parts Dynamically Consider the following problem You need a regex to find company email addresses But the company name will be specified at runtime You could build the whole expression at runtime Through normal string concatenation But this code will be tedious and error prone Perl lets you embed dynamic content in expressions When you use ‘(?{ … })’ within your regex the code placed at ‘…’ is run by the interpreter and forms part of the pattern Variables in the block are in scope until the current search for a match either succeeds or is rolled back © Garth Gilmour 2008
  • 106. Working With Unicode So far in our discussion we have assumed ASCII E.g. ‘[A-Z]+’ matches one or more uppercase letters As long as your input text is written in ASCII in a culture that agrees with the common definition of English capital letters However this does not apply with internationalization Where we will be working with unfamiliar character sets and conventions (capitals, accents, reading direction etc…) Perl supports internationalization via Unicode The character set that aims to embrace and supplant the characters sets already in-use across the world Unicode itself is unavoidably complex and contradictory It aims to unify character sets with as little modification as possible © Garth Gilmour 2008
  • 107. Working With Unicode Unicode defines a character in terms of: A unique number and textual name A representative glyph (which does not preclude others) Annotations - which add extra information informally Properties - which formally group characters together based on a shared criteria (e.g. mathematical symbols) Don’t confuse character sets with character encodings ‘ UTF-8’ favours western characters but Unicode itself does not There are 88 properties a character may have As of Unicode version 4.1.0 These can be tested for in regex’s © Garth Gilmour 2008
  • 108. Sample Unicode Properties © Garth Gilmour 2008 Unicode Property Description AHex True for ASCII characters used in hexadecimal numbers Alpha True if a character can be compared to others ea The East Asian width of a character (full, half or narrow) IDC Indicates if a character can only be used as the first in an identifier Math True if the character is used in describing mathematical expressions Lower Indicates if a character is a lowercase letter STerm Indicates if a character is used to terminate a sentence Term Indicates if a character is punctuation that terminates a unit WSpace Indicates if a character should be treated as whitespace during parsing
  • 109. Unicode Properties in Expressions Unicode properties can be used as character classes ‘ \p{Math}’ matches math symbols and ‘\P{Math}’ is the reverse If the name is a single character then the braces can be omitted Shortcuts for character classes support Unicode So in modern interpreters the ‘\d’ shortcut is the same as ‘\p{IsDigit}’ and ‘\D’ is the same as ‘\P{IsDigit}’ Perl has alias’ for properties and defines its own ‘ \p{IsDigit}’ is the more verbose Perl terminology for ‘\p{Nd}’ ‘ \p{IsXDigit}’ is a Perl property name equivalent to ‘[0-9a-fA-F]’ © Garth Gilmour 2008
  • 110. Perl Subroutines Creating and Calling Functions © Garth Gilmour 2008
  • 111. Introducing Subroutines Subroutines do not have a C/C++ style declaration When you create a subroutine you do not declare its return type or the parameters it will take Instead the syntax for a subroutine is ‘sub NAME { … }’ Any subroutine can take any number of parameters These are automatically placed in the special array ‘@_’ Not to be confused with ‘$_’ which holds the current item in a loop The first parameter is ‘$_[0]’ and the last is ‘$_[$#_]’ The return value can be specified in two ways By default it is the result of the last expression evaluated You can explicitly provide a value via the ‘return’ keyword © Garth Gilmour 2008
  • 112. © Garth Gilmour 2008 sub func { my($param1, $param2, $param3, $param4, $param5) = @_; print &quot;func called with:\n&quot;; print &quot;\t$param1\n&quot;; print &quot;\t$param2\n&quot;; print &quot;\t$param3\n&quot;; print &quot;\t$param4\n&quot;; print &quot;\t$param5\n&quot;; } func(&quot;abc&quot;,123,&quot;def&quot;,456,&quot;ghi&quot;); func called with: abc 123 def 456 ghi
  • 113. Variable Declarations and Scoping As you write functions you will make a startling discovery Variables created inside subroutines remain alive and usable for the rest of the life of the script (unlike in C/C++/Java etc…) This is because all variables are global by default They are represented by an entry in the symbol table Perl lets you control the scope and lifetime of variables Via the ‘my’ , ‘our’ and ‘local’ functions Each of these has a subtly different effect Use these in all but the shortest scripts… © Garth Gilmour 2008
  • 114. Variable Declarations and Scoping The ‘my’ function creates a private variable Its scope and lifetime are the block it is declared in The ‘local’ function is unusual It defines a new value for a variable, which is held for the duration of the current scope Once control leaves the current block the value is reset Any sub-methods which get called see the new value The ‘our’ function simply (re)declares a global variable It is the safe way of referencing a global variable within a subroutine or declaring a global variable under ‘use strict’ © Garth Gilmour 2008
  • 115. © Garth Gilmour 2008 $var1 = &quot;ABC&quot;; $var2 = &quot;DEF&quot;; $var3 = &quot;GHI&quot;; print &quot;At start $var1, $var2 and $var3\n\n&quot;; func1(); print &quot;At end $var1, $var2 and $var3\n\n&quot;; sub func1 { my $var1 = &quot;JKL&quot;; local $var2 = &quot;MNO&quot;; our $var3; print &quot;In func1 $var1, $var2 and $var3\n\n&quot;; func2(); } sub func2 { print &quot;In func2 $var1, $var2 and $var3\n\n&quot;; } At start ABC, DEF and GHI In func1 JKL, MNO and GHI In func2 ABC, MNO and GHI At end ABC, DEF and GHI
  • 116. Using Named Parameters Parameters in Perl are loosely organised In large applications you want to be more specific about what value goes with what parameter There is a simple idiom for naming parameters: Pass parameters into the subroutine in pairs E.g. ‘connect(“ip” => “12.24.5.6”, “port” => 80, “timeout” => 30)’ Inside the subroutine load the parameters into a hash The first parameter becomes the key for the second and so on Use parameters by taking values from the hash So rather than saying ‘$_[1]’ we use ‘$params{“ip”}’ This allows parameters to be passed in any order As long as the name/value convention is preserved © Garth Gilmour 2008
  • 117. © Garth Gilmour 2008 sub func { my %params = @_; print &quot;\tParameter fred has value $params{'fred'}\n&quot;; print &quot;\tParameter wilma has value $params{'wilma'}\n&quot;; print &quot;\tParameter barney has value $params{'barney'}\n&quot;; } print &quot;---------- First Call ----------\n&quot;; @args1 = (&quot;fred&quot;,20,&quot;wilma&quot;,30,&quot;barney&quot;,40); func(@args1); print &quot;---------- Second Call ----------\n&quot;; @args2 = (fred => 50, wilma => 60, barney => 70); func(@args2); print &quot;---------- Third Call ----------\n&quot;; func(fred => 80, wilma => 90, barney => 100); ---------- First Call ---------- Parameter fred has value 20 Parameter wilma has value 30 Parameter barney has value 40 ---------- Second Call ---------- Parameter fred has value 50 Parameter wilma has value 60 Parameter barney has value 70 ---------- Third Call ---------- Parameter fred has value 80 Parameter wilma has value 90 Parameter barney has value 100
  • 118. Subroutines and Recursion The ‘&’ sigil is normally left out when calling subroutines Without parenthesis the symbol is initially treated as a bareword Parenthesis around the parameters are also optional if the interpreter has processed the declaration Whether or not you use them is a matter of style If you use the sigil and omit the parameter list then the parameter array of the current function is passed This creates a compact syntax for recursive functions © Garth Gilmour 2008
  • 119. Subroutines and Recursion © Garth Gilmour 2008 sub recursion1 { until($_[0] == 0) { print &quot;$_[0] &quot;; $_[0]--; &recursion1; } } sub recursion2 { if(@_) { print(shift, &quot; &quot;); &recursion2; } } $val1 = 10; recursion1($val1); print &quot;\n&quot;; @val2 = qw(abc def ghi jkl); recursion2(@val2); 10 9 8 7 6 5 4 3 2 1 abc def ghi jkl
  • 120. Anonymous Subroutines Subroutines can be declared without names Using the syntax ‘sub { … }’ What is returned is the address of the subroutine This can be captured in a reference (see later) This enables some ‘meta-programming’ techniques You can write functions which build and return functions You can write functions which take blocks of code as parameters These techniques have their own terminology A block of code passed as a parameter is a closure One function building a simplified version of another is currying © Garth Gilmour 2008
  • 121. © Garth Gilmour 2008 Closures are cool! Closures are cool! Closures are cool! Closures are cool! Closures are cool! Closures are very cool! Closures are very cool! Closures are very cool! ab cd ef gh sub doTimes { for(1..$_[0]) { &{$_[1]}(); } } sub withMatches { for($_[0] =~ m/$_[1]/g) { &{$_[2]}($_); } } $ref = sub { print &quot;Closures are cool!\n&quot;; }; doTimes(5, $ref); doTimes(3, sub { print &quot;Closures are very cool!\n&quot;; }); withMatches(&quot;ab cd ef gh&quot;, &quot;[a-z]{2}&quot;, sub { print &quot;$_[0]\n&quot;; });
  • 122. Error Handling Managing Error Conditions © Garth Gilmour 2008
  • 123. Error Handling in Perl Traditional functions report errors via a return code Modern functions raise exceptions There are several functions for raising exceptions The ‘die’ function causes a value to be thrown as an exception The value is typically an error message string But could be a reference to anything you want More versatile functions are provided by the ‘Carp’ module They report errors from the users perspective To test for exceptions use ‘eval’ blocks Errors generated from code inside ‘eval { … }’ are trapped You can test and obtain them via the ‘$@’ variable © Garth Gilmour 2008
  • 124. © Garth Gilmour 2008 eval { op1(); }; if($@) { print &quot;Code threw error: &quot;, $@; } sub op1 { op2(); } sub op2 { op3(); } sub op3 { die &quot;BOOM!&quot;; } eval { op1(); }; if($@) { print &quot;Error of severity &quot;, $@->{'severity'}; print &quot; thrown with message &quot;, $@->{'msg'}; } sub op1 { op2(); } sub op2 { op3(); } sub op3 { my %error = (msg => 'BOOM!', severity => 'fatal'); die \%error; }
  • 125. References in Perl Using Memory Addresses © Garth Gilmour 2008
  • 126. References in Perl References are an advanced feature of Perl They are similar to pointers in ‘C’ and ‘C++’ A scalar can hold the address of another variable This could be another scalar, an array, a hash or a function The address of a variable is taken via the ‘\’ sigil Sigils are combined when working with references So if ‘ref’ is a reference to an array then the first element could be accessed with ‘$$ref[0]’, ‘${$ref}[0]’ or ‘$ref->[0]’ Each syntax is appropriate in different circumstances © Garth Gilmour 2008
  • 127. © Garth Gilmour 2008 $var1 = 124; $ref1 = \$var1; print $ref1, &quot;\n&quot;; print &quot;Reference ref1 refers to value $$ref1\n&quot;; print &quot;Reference ref1 refers to value ${$ref1}\n&quot;; SCALAR(0x226d98) Reference ref1 refers to value 124 Reference ref1 refers to value 124
  • 128. © Garth Gilmour 2008 @var2 = (&quot;abc&quot;,&quot;def&quot;,&quot;ghi&quot;,&quot;jkl&quot;); $ref2 = \@var2; print $ref2, “\n”; print &quot;Reference ref2 refers to array with contents: &quot;; foreach $val (@$ref2) { print &quot;$val &quot;; } print &quot;\nReference ref2 refers to array with contents: &quot;; foreach $val (@{$ref2}) { print &quot;$val &quot;; } print &quot;\nFirst three elements in array pointed to by ref2 are: &quot;; @slice = @{$ref2}[0..2]; foreach $val (@slice) { print &quot;$val &quot;; } print &quot;\nFirst item is $$ref2[0]&quot;; print &quot;\nFirst item is ${$ref2}[0]&quot;; print &quot;\nFirst item is $ref2->[0]&quot;;
  • 129. © Garth Gilmour 2008 ARRAY(0x226da4) Reference ref2 refers to array with contents: abc def ghi jkl Reference ref2 refers to array with contents: abc def ghi jkl First three elements in array pointed to by ref2 are: abc def ghi First item is abc First item is abc First item is abc
  • 130. © Garth Gilmour 2008 %var3 = (&quot;k1&quot;,&quot;xxx&quot;,&quot;k2&quot;,&quot;yyy&quot;,&quot;k3&quot;,&quot;zzz&quot;); $ref3 = \%var3; print &quot;Reference ref3 refers to hash with contents:\n&quot;; foreach $key (keys %$ref3) { print &quot;\t\t$key indexes $$ref3{$key}\n&quot;; } print &quot;Reference ref3 refers to hash with contents:\n&quot;; foreach $key (keys %{$ref3}) { print &quot;\t\t$key indexes $ref3->{$key}\n&quot;; } print &quot;Key k1 indexes $$ref3{'k1'} \n&quot;; print &quot;Key k1 indexes ${$ref3}{'k1'} \n&quot;; print &quot;Key k1 indexes $ref3->{'k1'} \n&quot;;
  • 131. © Garth Gilmour 2008 Reference ref3 refers to hash with contents: k2 indexes yyy k1 indexes xxx k3 indexes zzz Reference ref3 refers to hash with contents: k2 indexes yyy k1 indexes xxx k3 indexes zzz Key k1 indexes xxx Key k1 indexes xxx Key k1 indexes xxx
  • 132. References and Anonymous Data All complete programming languages need to have the ability to allocate memory on demand As opposed to pre-declaring it though standard variables Consider processing records from a file You need to allocate memory on the fly for each record you find Memory is allocated via anonymous data structures There is no special keyword but rather a different syntax for creating anonymous arrays, hashes and subroutines It isn't (directly) possible to create anonymous scalars © Garth Gilmour 2008
  • 133. © Garth Gilmour 2008 $ref1 = [&quot;ab&quot;,&quot;cd&quot;,&quot;ef&quot;,&quot;gh&quot;]; $ref2 = { k1 => 123, k2 => 456, k3 => 789 }; $ref3 = sub { return $_[0] + $_[1]; }; print $ref1, &quot;\n&quot;; print $ref2, &quot;\n&quot;; print $ref3, &quot;\n&quot;; print $ref1->[0], &quot;\n&quot;; print $ref2->{'k1'}, &quot;\n&quot;; print $ref3->(12,5), &quot;\n&quot;; ARRAY(0x225f88) HASH(0x226d80) CODE(0x18303f0) ab 123 17
  • 134. References and Data Structures Complex data structures are built using arrays and hashes in combination The syntax ‘$ref = [“abc”, “def”, “ghi”]’ creates an anonymous array and stores its address in ‘ref’ The syntax ‘$ref = { “k1” => “v1”, “k2” => “v2”}’ creates an anonymous hash and stores its address in ‘ref’ So an exam marking script could be made up of: An array of references to anonymous hashes Where each hash held a candidates details Including a reference to an anonymous array of answers © Garth Gilmour 2008
  • 135. © Garth Gilmour 2008 @numerals = ( [100,&quot;C&quot;], [90,&quot;XC&quot;], [50,&quot;L&quot;], [40,&quot;XL&quot;], [10,&quot;X&quot;], [9,&quot;IX&quot;], [5,&quot;V&quot;], [4,&quot;IV&quot;], [1,&quot;I&quot;] ); print &quot;Enter the number to convert to a roman numeral...\n&quot;; $number = <STDIN>; chomp($number); foreach (@numerals) { my $decimal = $_->[0]; my $string = $_->[1]; my $times = int($number / $decimal); if($times > 0) { for(1..$times) { print $string; } $number = $number % $decimal; } }
  • 136. © Garth Gilmour 2008 $ref1 = [ [&quot;abc&quot;,&quot;def&quot;,&quot;ghi&quot;], [&quot;jkl&quot;,&quot;mno&quot;,&quot;pqr&quot;], [&quot;stu&quot;,&quot;vwx&quot;,&quot;yza&quot;] ]; print &quot;Contents of 2d array are:\n&quot;; foreach(@{$ref1}) { print &quot;\t&quot;; foreach(@{$_}) { print &quot;$_ &quot;; } print &quot;\n&quot;; } Contents of 2d array are: abc def ghi jkl mno pqr stu vwx yza
  • 137. © Garth Gilmour 2008 $ref = { k1 => { k4 => &quot;ab&quot;, k5 => &quot;cd&quot; }, k2 => { k6 => &quot;ef&quot;, k7 => &quot;gh&quot; }, k3 => { k8 => &quot;ij&quot;, k9 => &quot;kl&quot; } }; print $ref->{'k1'}->{'k4'}, &quot;\n&quot;; print $ref->{'k1'}->{'k5'}, &quot;\n&quot;; print $ref->{'k2'}->{'k6'}, &quot;\n&quot;; print $ref->{'k2'}->{'k7'}, &quot;\n&quot;; print $ref->{'k3'}->{'k8'}, &quot;\n&quot;; print $ref->{'k3'}->{'k9'}, &quot;\n&quot;; ab cd ef gh ij kl
  • 138. Modules and Packages Creating Reusable Code © Garth Gilmour 2008
  • 139. Modules and Code Reuse in Perl Modules are Perl libraries You can create your own or download them from CPAN They are normally found in the ‘lib’ folder of your distribution There are several types of module: Traditional and object-oriented modules are for code reuse They let you avoid re-inventing the wheel in and across projects Pragmatic Modules extend the language When loaded they alter symbol tables and interact with the interpreter, thereby adding to Perl’s functionality © Garth Gilmour 2008
  • 140. Creating Perl Modules Modules are placed in a separate file By convention this is given a ‘.pm’ extension Pragmatic modules have lowercase names Other module names should contain capitals The module begins with a package declaration This creates a new namespace for symbols Within the interpreter this is represented by a new symbol table There are two ways of loading a module The ‘use’ declaration loads a module at compile time The ‘require’ declaration loads a module at runtime © Garth Gilmour 2008
  • 141. Creating Perl Modules Perl does not strictly enforce barriers between modules Symbols from a module can always be used By prefixing the symbol name with the package name You should only use the symbols a module wants you to see There is a standard mechanism for exporting symbols The module needs to require the ‘Exporter’ module And place the name ‘Exporter’ into an array called ‘@ISA’ This allows the module to place entries in other symbol tables Symbols placed in an array called ‘@EXPORT’ are automatically added to the table of the script with the ‘use’ declaration Symbols placed in an array called ‘@EXPORT_OK’ will be added to the importing symbol table if they are listed after ‘use’ © Garth Gilmour 2008
  • 142. © Garth Gilmour 2008 package Maths; require Exporter; our @ISA = (&quot;Exporter&quot;); @EXPORT = qw(add multiply subtract); sub add { return $_[0] + $_[1]; } sub divide { return $_[0] / $_[1]; } sub multiply { return $_[0] * $_[1]; } sub subtract { return $_[0] - $_[1]; } use Maths; print &quot;Calculations using our maths module:&quot;; print &quot;\n\t40 + 30 is: &quot;, add(40,30); print &quot;\n\t60 - 20 is: &quot;, subtract(60,20); print &quot;\n\t50 * 10 is: &quot;, multiply(50,10); # divide is not exported so it only works # if we qualify the namespace print &quot;\n\t40 / 10 is: &quot;, Maths::divide(40,10);
  • 143. © Garth Gilmour 2008 package Maths; require Exporter; our @ISA = (&quot;Exporter&quot;); @EXPORT_OK = qw(add multiply subtract); sub add { return $_[0] + $_[1]; } sub divide { return $_[0] / $_[1]; } sub multiply { return $_[0] * $_[1]; } sub subtract { return $_[0] - $_[1]; } use Maths qw(add multiply subtract); print &quot;Calculations using our maths module:&quot;; print &quot;\n\t40 + 30 is: &quot;, add(40,30); print &quot;\n\t60 - 20 is: &quot;, subtract(60,20); print &quot;\n\t50 * 10 is: &quot;, multiply(50,10); # divide is not exported so it only works # if we qualify the namespace print &quot;\n\t40 / 10 is: &quot;, Maths::divide(40,10);
  • 144. Extra Syntax for Modules Modules can prevent symbols from being exported If you place a symbol name in ‘@EXPORT_FAIL’ then Perl will call an ‘export_fail’ subroutine, which can throw an error Simple versioning is supported A module can declare a ‘$VERSION’ variable Which can then be mentioned in the ‘use’ declaration E.g. ‘use Fred 2.7;’ means only version 2.7 will be accepted Code placed within the module will be executed For the module to load the last statement run must be true Most modules end with ‘1;’ to ensure this is the case A ‘BEGIN { … }’ block is run when the module is loaded © Garth Gilmour 2008
  • 145. Objects in Perl Support for Object Oriented Programming Concepts © Garth Gilmour 2008
  • 146. Object Oriented Perl OO support in Perl is minimal at best It is not a good language for learning Object Oriented coding Perl provides only rudimentary support for classes and objects Leaving you to do most of the hard work yourself The ‘bless’ keyword is the key to OO in Perl Once you properly understand what it does the rest of OO in Perl becomes relatively straightforward It helps to approach Perl OO indirectly We will consider class declarations in Python and Ruby first This is a good warm-up for understanding the Perl syntax © Garth Gilmour 2008
  • 147. Core Principles of OO Languages All popular OO languages use the same concepts: Class declarations are the templates for objects A class declaration is made up of members Members holding data are known as fields Members holding code are known as methods Special methods are provided with their own syntax The most important of these is the constructor method to be called automatically when an object is created Every object has a built in reference to itself Similar to the ‘127.0.0.1’ IP address in networking It helps to think of members as slots on the object Only slots on the outside can be used by clients © Garth Gilmour 2008
  • 148. © Garth Gilmour 2008 Client Code Account Objects account1 account2 withdraw { … } display { … } withdraw { … } display { … }
  • 149. © Garth Gilmour 2008 class Account: def __init__(self,id,balance): self.id = id self.balance = balance def withdraw(self, amount): self.balance -= amount def display(self): print &quot;Account with id&quot;, self.id, &quot;and balance&quot;, self.balance account1 = Account(&quot;AB12&quot;,30000) account2 = Account(&quot;CD34&quot;,45000) print &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) print &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Constructor Class Declaration Creating Objects
  • 150. © Garth Gilmour 2008 class Account def initialize(id, balance) @id = id @balance = balance end def withdraw(amount) @balance -= amount end def display() puts &quot;Account with id #{@id} and balance #{@balance}&quot; end end account1 = Account.new(&quot;AB12&quot;,30000) account2 = Account.new(&quot;CD34&quot;,45000) puts &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) puts &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Class Declaration Constructor Creating Objects
  • 151. Perl Syntax for Object Orientation Perl does not have separate class declarations Instead a class is just a special type of module Fields are saved in anonymous hashes Or in whatever anonymous data structure you want An arbitrary method is used as a constructor It creates the hash and blesses it into the package Blessing associates the hash with the package Methods of the package can be called via the hash reference In the method declaration the hash is the first parameter © Garth Gilmour 2008
  • 152. © Garth Gilmour 2008 package Account; sub new { my $packageName = shift; my $data = {}; $data->{'id'} = shift; $data->{'balance'} = shift; bless $data, $packageName; } sub withdraw { my $data = shift; $data->{'balance'} -= $_[0]; } sub display { my $data = shift; print &quot;Account with id $data->{'id'} and balance $data->{'balance'}\n&quot;; } Packages double as classes Any method can be a constructor Anonymous hashes hold fields
  • 153. © Garth Gilmour 2008 $account1 = Account->new(&quot;AB12&quot;,30000); $account2 = Account->new(&quot;CD34&quot;,45000); print &quot;----- Before Withdrawl -----\n&quot;; $account1->display(); $account2->display(); $account1->withdraw(250); $account2->withdraw(500); print &quot;----- After Withdrawl -----\n&quot;; $account1->display(); $account2->display(); ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44500 Creating Objects: Account->new(“AB”,1000) equals Account::new(‘Account’, “AB”, 1000) Because of blessing: $account->withdraw(250) equals Account::withdraw($account, 250)
  • 154. Inheritance and Overriding in Perl These are the other two key principles of OO Inheritance enables one class to be built on top of another Overriding enables derived class methods to replace those inherited from the base class It helps to visualize objects as layered For each class in the hierarchy there is a layer in the object When a method is overridden the slot in a base layer is rewired into an implementation in the derived layer Again it helps to review the syntax of other languages Once again the Perl syntax is minimal © Garth Gilmour 2008
  • 155. © Garth Gilmour 2008 Client Code account1 account2 withdraw { … } { … } Account SavingsAccount display { … } withdraw { … } display { … } Account
  • 156. © Garth Gilmour 2008 class Account: def __init__(self,id,balance): self.id = id self.balance = balance def withdraw(self, amount): self.balance -= amount def display(self): print &quot;Account with id&quot;, self.id, &quot;and balance&quot;, self.balance class SavingsAccount(Account): def __init__(self,id,balance,fee): Account.__init__(self,id,balance) self.fee = fee def withdraw(self, amount): self.balance -= (amount + self.fee) account1 = Account(&quot;AB12&quot;,30000) account2 = SavingsAccount(&quot;CD34&quot;,45000,15) print &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) print &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
  • 157. © Garth Gilmour 2008 class Account def initialize(id, balance) @id = id @balance = balance end def withdraw(amount) @balance -= amount end def display() puts &quot;Account with id #{@id} and balance #{@balance}&quot; end end class SavingsAccount < Account def initialize(id, balance, fee) super(id,balance) @fee = fee end def withdraw(amount) @balance -= (amount + @fee) end end account1 = Account.new(&quot;AB12&quot;,30000) account2 = SavingsAccount.new(&quot;CD34&quot;,45000,15) puts &quot;----- Before Withdrawl -----&quot; account1.display() account2.display() account1.withdraw(250) account2.withdraw(500) puts &quot;----- After Withdrawl -----&quot; account1.display() account2.display() ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
  • 158. Inheritance and Overriding in Perl Inheritance is supported via ‘@ISA’ One module requires another and places its name in ‘@ISA’ This means that if a symbol is not found in the derived module the interpreter will search the base one Thereby creating the layered effect associated with inheritance The pragmatic module ‘base’ simplifies this E.g. ‘use base Employee’ in ‘Manager.pm’ Overriding works in the same way The search order means the derived version is found first The ‘SUPER’ symbol lets you access the base version This is particularly useful when creating derived constructors © Garth Gilmour 2008
  • 159. © Garth Gilmour 2008 use Account; use SavingsAccount; $account1 = Account->new(&quot;AB12&quot;,30000); $account2 = SavingsAccount->new(&quot;CD34&quot;,45000,15); print &quot;----- Before Withdrawl -----\n&quot;; $account1->display(); $account2->display(); $account1->withdraw(250); $account2->withdraw(500); print &quot;----- After Withdrawl -----\n&quot;; $account1->display(); $account2->display(); ----- Before Withdrawl ----- Account with id AB12 and balance 30000 Account with id CD34 and balance 45000 ----- After Withdrawl ----- Account with id AB12 and balance 29750 Account with id CD34 and balance 44485
  • 160. © Garth Gilmour 2008 package Account; require Exporter; our @ISA = &quot;Exporter&quot;; our @EXPORT = qw(new withdraw display); sub new { my $packageName = shift; my $data = {}; $data->{'id'} = shift; $data->{'balance'} = shift; bless $data, $packageName; } sub withdraw { my $data = shift; $data->{'balance'} -= $_[0]; } sub display { my $data = shift; print &quot;Account with id $data->{'id'} and balance $data->{'balance'}\n&quot;; } 1; package SavingsAccount; use base Account; sub new { $data = shift->SUPER::new(@_); $data->{'fee'} = $_[2]; return $data; } sub withdraw { my $data = shift; $data->{'balance'} -= ($_[0] + $data->{'fee'}); } 1;
  • 161. Parsing XML An Example of an OO Module © Garth Gilmour 2008
  • 162. An Example of Parsing XML Files Text files are increasingly formatted as XML This adds an extra layer of structure that makes it easier to preserve the semantics of the data There are many API’s for parsing and creating XML Perl has modules that support all of these standards Note XPath is the XML version of regular expressions In XPath V2 regular expressions can be used within an XPath The most basic is the SAX standard This is a low-level event driven API Implemented by the ‘XML::Parser’ module © Garth Gilmour 2008
  • 163. An Example of Parsing XML Files To parse XML create an instance of ‘XML::Parser’ As normal the constructor method is called ‘new’ The constructor method takes named parameters The ‘handler’ parameter should be an anonymous hash The hash defines callback methods E.g. the ‘Start’ and ‘End’ keys index methods to be called whenever the parser encounters opening or closing tags Parsing is triggered via a call to ‘parse’ or ‘parsefile’ The parser reads in the XML from the file or string As different parts of the file are met callbacks are triggered All your implementation is placed in the callback methods © Garth Gilmour 2008
  • 164. © Garth Gilmour 2008 <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <myapp> <resources> <threads>7</threads> <server-ip>120.153.72.208</server-ip> <cache size=&quot;10&quot;/> </resources> <accounts> <user role=&quot;administrator&quot;> <id>dave</id> <password>abab12</password> </user> <user role=&quot;power-user&quot;> <id>jane</id> <password>cdcd34</password> </user> <user role=&quot;end-user&quot;> <id>fred</id> <password>efef56</password> </user> <user role=&quot;end-user&quot;> <id>sharon</id> <password>ghghij</password> </user> </accounts> </myapp> We found the following users administrator dave with password abab12 power-user jane with password cdcd34 end-user fred and with password efef56 end-user sharon with password ghghij
  • 165. © Garth Gilmour 2008 #Variables to temporarily store user details our $userID; our $userRole; our $userPassword; #Flags to let us know when we are in a particular element our $isInID = 0; our $isInPassword = 0; our $parser = new XML::Parser(Handlers => { Start => \&startElement, End => \&endElement, Char => \&characters }); $parser->parsefile(&quot;config.xml&quot;); print &quot;We found the following users\n&quot;; foreach(@users) { my $user = $_; print &quot;\t$$user[1] with ID $$user[0] and password $$user[2]\n&quot;; }
  • 166. © Garth Gilmour 2008 sub startElement { if($_[1] eq &quot;user&quot;) { $userRole = $_[3]; } elsif($_[1] eq &quot;id&quot;) { $isInID = 1; } elsif($_[1] eq &quot;password&quot;) { $isInPassword = 1; } } sub endElement { if($_[1] eq &quot;user&quot;) { my $newUser = [$userID, $userRole, $userPassword]; push(@users, $newUser); } elsif($_[1] eq &quot;id&quot;) { $isInID = 0; } elsif($_[1] eq &quot;password&quot;) { $isInPassword = 0; } } sub characters { if($isInID) { $userID = $_[1]; } if($isInPassword) { $userPassword = $_[1]; } }
  • 167. Database Access Another Example of OO Perl © Garth Gilmour 2008
  • 168. Database Access in Perl ‘ DBI’ is the standard module for database access It enables you to access any relational database Like most modern API’s the ‘DBI’ module is a shell As with JDBC and ADO.NET the purpose of the ‘DBI’ module is to expose a common interface that conceals the type of the DB The actual functionality is provided by driver modules which implement the standard functionality in a vendor specific way The ODBC driver lets you talk to any database But is probably not as efficient as a native driver © Garth Gilmour 2008
  • 169. The Architecture of the DBI Library © Garth Gilmour 2008 DBI API DBD-mysql DBD-Oracle DBD-Sybase DBD-ODBC
  • 170. Using the DBI Module A ‘database handle’ is a link to the underlying driver It is a reference to an object that represents a connection A database handle is created by a call to ‘connect’ The parameter is a string that identifies the database This is written as ‘dbi: DRIVER_NAME : DRIVER_INFO ’ Forming the right connection string is half the battle Make sure you are using the right documentation for the driver The ‘disconnect’ method terminates the connection It is good practise to do this explicitly even in short scripts © Garth Gilmour 2008
  • 171. Using the DBI Module Database handles are factories for statement handles These are obtained by calling the ‘prepare’ method The SQL string is specified as the parameter The statement is triggered via the ‘execute’ method If the query is a SELECT then a result set is obtained The results are stored inside the statement This is done in a vendor specific way The results can be iterated one row at a time There a variety of methods for retrieving the values in a row The ‘dump_results’ method is a quick way of printing the data © Garth Gilmour 2008
  • 172. © Garth Gilmour 2008 my $connectionString = &quot;dbi:ODBC:SomeDB&quot;; my $dbh = DBI->connect($connectionString) or die &quot;cant connect to DB!&quot;; my $statement = $dbh->prepare($insertStatement); $statement->execute($val1, $val2, $val3, $val4); listAllRows($dbh); $statement = $dbh->prepare($deleteStatement); $statement->execute(&quot;100&quot;); listAllRows($dbh); $dbh->disconnect(); sub listAllRows { my($dbh) = $_[0]; my $statement = $dbh->prepare($selectStatement); $statement->execute; print &quot;Table contents are\n&quot;; while(my($column1,$column2,$column3) = $statement->fetchrow()) { print “$column1\t$column2\t$column3\n&quot; } }
  • 173. Course Project An Exam Marking System © Garth Gilmour 2008
  • 174. A Course Project - Marking Exams We will be writing a script to process exam results Stage 1: A hash of arrays (marks per candidate) Stage 2: A hash of hashes (marks per candidate per subject) Stage 3: A hash of hashes of arrays Holding individual marks per subject per candidate © Garth Gilmour 2008 Name Mark Dave Jane Fred 80 70 64 81 56 90 93 76 87 64 59 55 62 68 68 71 55 79
  • 175. © Garth Gilmour 2008 Name Link dave jane fred Subject Mark History 80 Maths 70 French 65 Subject Mark English 66 Maths 70 Politics 82 Subject Mark Physics 61 History 73 Spanish 52
  • 176. © Garth Gilmour 2008 Name Link dave jane fred Subject Mark History Maths French Subject Mark English Maths Politics Subject Mark Physics History Spanish 10 9 10 7 5 6 9 9 8 7 6 7 4 5 8 6 0 9 5 8 7 7 6 8 7 8 4 6 5 9 6 6 7 9 8 8 10 10 8 7 6 8 9 6 8 5 0 9 10 8 7 4 8 9