SlideShare a Scribd company logo
Linked Lists In Perl
What?
Why bother?
Steven Lembark  
Workhorse Computing
lembark@wrkhors.com
©2009,2013 Steven Lembark
What About Arrays?
● Perl arrays are fast and flexible.
● Autovivication makes them generally easy to use.
● Built-in Perl functions for managing them.
● But there are a few limitations:
● Memory manglement.
● Iterating multiple lists.
● Maintaining state within the lists.
● Difficult to manage links within the list.
Memory Issues
● Pushing a single value onto an array can
double its size.
● Copying array contents is particularly expensive
for long-lived processes which have problems
with heap fragmentation.
● Heavily forked processes also run into problems
with copy-on-write due to whole-array copies.
● There is no way to reduce the array structure
size once it grows -- also painful for long-lived
processes.
Comparing Multiple Lists
● for( @array ) will iterate one list at a time.
● Indexes work but have their own problems:
● Single index requires identical offsets in all of the
arrays.
● Varying offsets require indexes for each array,
with separate bookeeping for each.
● Passing all of the arrays and objects becomes a lot
of work, even if they are wrapped on objects –
which have their own performance overhead.
● Indexes used for external references have to be
updated with every shift, push, pop...
Linked Lists: The Other Guys
● Perly code for linked lists is simple, simplifies
memory management, and list iteration.
● Trade indexed access for skip-chains and
external references.
● List operators like push, pop, and splice are
simple.
● You also get more granular locking in threaded
applications.
Examples
● Insertion sorts are stable, but splicing an item
into the middle of an array is expensive;
adding a new node to a linked list is cheap.
● Threading can require locking an entire array
to update it; linked lists can safely be locked
by node and frequently don't even require
locking.
● Comparing lists of nodes can be simpler than
dealing with multiple arrays – especially if the
offsets change.
Implementing Perly Linked Lists
● Welcome back to arrayrefs.
● Arrays can hold any type of data.
● Use a “next” ref and arbitrary data.
● The list itself requires a static “head” and a
node variable to walk down the list.
● Doubly-linked lists are also manageable with
the addition of weak links.
● For the sake of time I'll only discuss singly-
linked lists here.
List Structure ● Singly-linked lists
are usually drawn
as some data
followed by a
pointer to the next
link.
● In Perl it helps to
draw the pointer
first, because that
is where it is
stored.
Adding a Node
● New nodes do not
need to be
contiguous in
memory.
● Also doesn't require
locking the entire
list.
Dropping A Node
● Dropping a node releases
its memory – at last back
to Perl.
● Only real effect is on the
prior node's next ref.
● This is the only piece that
needs to be locked for
threading.
Perl Code
● A link followed by data looks like:
$node = [ $next, @data ]
● Walking the list copies the node:
( $node, my @data ) = @$node;
●
Adding a new node recycles the “next” ref:
$node->[0] = [ $node->[0], @data ]
●
Removing recycles the next's next:
($node->[0], my @data) = @{$node->[0]};
A Reverse-Order List
● Just update the head node's next reference.
● Fast because it moves the minimum of data.
my $list = [ [] ];
my $node = $list->[0];
for my $val ( @_ )
{
$node->[0] = [ $node->[0], $val ]
}
# list is populated w/ empty tail.
In-order List
● Just move the node, looks like a push.
● Could be a one-liner, I've shown it here as two
operations.
my $list = [ [] ];
my $node = $list->[0];
for my $val ( @_ )
{
@$node = ( [], $val );
$node = $node->[0];
}
Viewing The List
● Structure is recursive from Perl's point of view.
● Uses the one-line version (golf anyone)?
DB<1> $list = [ [], 'head node' ];
DB<2> $node = $list->[0];
DB<3> for( 1 .. 5 ) { ($node) = @$node = ( [], “node-$_” ) }
DB<14> x $list
0 ARRAY(0x8390608) $list
0 ARRAY(0x83ee698) $list->[0]
0 ARRAY(0x8411f88) $list->[0][0]
0 ARRAY(0x83907c8) $list->[0][0][0]
0 ARRAY(0x83f9a10) $list->[0][0][0][0]
0 ARRAY(0x83f9a20) $list->[0][0][0][0][0]
0 ARRAY(0x83f9a50) $list->[0][0][0][0][0][0]
empty array empty tail node
1 'node-5' $list->[0][0][0][0][0][1]
1 'node-4' $list->[0][0][0][0][1]
1 'node-3' $list->[0][0][0][1]
1 'node-2' $list->[0][0][1]
1 'node-1' $list->[0][1]
1 'head node' $list->[1]
Destroying A Linked List
● Prior to 5.12, Perl's memory de-allocator is
recursive.
● Without a DESTROY the lists blow up after 100
nodes when perl blows is stack.
● The fix was an iterative destructor:
● This is no longer required.
DESTROY
{
my $list = shift;
$list = $list->[0] while $list;
}
Simple Linked List Class
● Bless an arrayref with the head (placeholder)
node and any data for tracking the list.
sub new
{
my $proto = shift;
bless [ [], @_ ], blessed $proto || $proto
}
# iterative < 5.12, else no-op.
DESTROY {}
Building the list: unshift
● One reason for the head node: it provides a
place to insert the data nodes after.
● The new first node has the old first node's
“next” ref and the new data.
sub unshift
{
my $list = shift;
$list->[0] = [ $list->[0], @_ ];
$list
}
Taking one off: shift
● This starts directly from the head node also:
just replace the head node's next with the first
node's next.
sub shift
{
my $list = shift;
( $list->[0], my @data )
= @{ $list >[0] };‑
wantarray ? @data : @data
}
Push Is A Little Harder
● One approach is an unshift before the tail.
● Another is populating the tail node:
sub push
{
my $list = shift;
my $node = $list->[0];
$node = $node->[0] while $node->[0];
# populate the empty tail node
@{ $node } = [ [], @_ ];
$list
}
The Bane Of Single Links: pop
● You need the node-before-the-tail to pop the
tail.
● By the time you've found the tail it is too late
to pop it off.
● Storing the node-before-tail takes extra
bookeeping.
● The trick is to use two nodes, one trailing the
other: when the small roast is burned the big
one is just right.
sub node_pop
{
my $list = shift;
my $prior = $list->head;
my $node = $prior->[0];
while( $node->[0] )
{
$prior = $node;
$node = $node->[0];
}
( $prior->[0], my @data ) = @$node;
wantarray ? @data : @data
}
● Lexical $prior is more efficient than examining
$node->[0][0] at multiple points in the loop.
Mixing OO & Procedural Code
● Most of what follows could be done entirely
with method calls.
● Catch: They are slower than directly accessing
the nodes.
● One advantage to singly-linked lists is that the
structure is simple.
● Mixing procedural and OO code without getting
tangled is easy enough.
● This is why I use it for genetics code: the code is
fast and simple.
Walking A List
● By itself the node has enough state to track
location – no separate index required.
● Putting the link first allows advancing and
extraction of data in one access of the node:
my $node = $list->[0];
while( $node )
{
( $node, my %info ) = @$node;
# process %info...
}
Comparing Multiple Lists
● Same code, just more assignments:
my $n0 = $list0->[0];
my $n1 = $list1->[0];
while( $n0 && $n1 )
{
( $n0, my @data0 ) = @$n0;
( $n1, my @data1 ) = @$n1;
# deal with @data1, @data2
...
}
Syncopated Lists
● Adjusting the offsets requires minimum
bookkeeping, doesn't affect the parent list.
while( @$n0, @$n1 )
{
$_ = $_->[0] for $n0, $n1;
aligned $n0, $n1
or ( $n0, $n1 ) = realign $n0, $n1
or last;
$score += compare $n0, $n1;
}
Using The Head Node
● $head->[0] is the first node, there are a few
useful things to add into @head[1...].
● Tracking the length or keeping a ref to the tail tail
simplifys push, pop; requires extra bookkeeping.
● The head node can also store user-supplied
data describing the list.
● I use this for tracking length and species names in
results of DNA sequences.
Close, but no cigar...
● The class shown only works at the head.
● Be nice to insert things in the middle without
resorting to $node variables.
● Or call methods on the internal nodes.
● A really useful class would use inside-out data
to track the head, for example.
● Can't assign $list = $list->[0], however.
● Looses the inside-out data.
● We need a structure that walks the list without
modifying its own refaddr.
The fix: ref-to-arrayref
● Scalar refs are the ultimate container struct.
● They can reference anything, in this case an array.
● $list stays in one place, $$list walks up the list.
● “head” or “next” modify $$list to reposition
the location.
● Saves blessing every node on the list.
● Simplifies having a separate class for nodes.
● Also helps when resorting to procedural code for
speed.
Basics Don't Change Much
sub new
{
my $proto = shift;
my $head = [ [], @_ ];
my $list = $head;
$headz{ refaddr $list } = $head;
bless $list, blessed $proto || $proto
}
DESTROY # add iteration for < v5.12.
{
my $list = shift;
delete $headz{ refaddr $list };
}
● List updates assign to $$list.
● DESTROY cleans up inside-out data.
Walking The List
sub head
{
my $list = shift
$$list = $headz{ refaddr $list };
$list
}
sub next
{
my $list = shift;
( $$list, my @data ) = @$$list
or return
wantarray ? @data : @data
}
$list->head;
while( my @data = $list->next ) { ... }
Reverse-order revisited:
● Unshift isn't much different.
● Note that $list is not updated.
sub unshift
{
my $list = shift;
my $head = $list->head;
$head->[0] = [ $head->[0], @_ ];
$list
}
my $list = List::Class->new( ... );
$list->unshift( $_ ) for @data;
Useful shortcut: 'add_after'
● Unlike arrays, adding into the middle of a list is
efficient and common.
● “push” adds to the tail, need something else.
● add_after() puts a node after the current one.
● unshift() is really “$list->head->add_after”.
● Use with “next_node” that ignores the data.
● In-order addition (head, middle, or end):
$list->add_after( $_ )->next for @data;
● Helps if next() avoids walking off the list.
sub next
{
my $list = shift;
my $next = $$list->[0] or return;
@$next or return;
$$list = $next;
$list
}
sub add_after
{
my $list = shift;
my $node = $$list;
$node->[0] = [ $node->[0], @_ ]
$list
}
Off-by-one gotchas
● The head node does not have any user data.
● Common mistake: $list->head->data.
● This gets you the list's data, not users'.
● Fix is to pre-increment the list:
$list->head->next or last;
while( @data = $list->next ) {...}
Overloading a list: bool, offset.
● while( $list ), if( $list ) would be nice.
● Basically, this leaves the list “true” if it has
data; false if it is at the tail node.
use overload
q{bool} =>
sub
{
my $list = shift;
$$list
},
Offsets would be nice also
q{++} => sub
{
my $list = shift;
my $node = $$list;
@$node and $$list= $node->[0];
$list
},
q{+} => sub
{
my ( $list, $offset ) = $_[2] ? ...
my $node = $$list;
for ( 1 .. $offset )
{
@$node && $node = $node->[0]
or last;
}
$node
},
Updating the list becomes trivial
● An offset from the list is a node.
● That leaves += simply assigning $list + $off.
q{+=} =>
sub
{
my ( $list, $offset ) = …
$$list = $list + $offset;
$listh
};
Backdoor: node operations
● Be nice to extract a node without having to
creep around inside of the object.
● Handing back the node ref saves derived
classes from having to unwrap the object.
● Also save having to update the list object's
location to peek into the next or head node.
sub curr_node { ${ $_[0] } }
sub next_node { ${ $_[0] }->[0] }
sub root_node { $headz{ refaddr $_[0] } }
sub head_node { $headz{ refaddr $_[0] }->[0] }
Skip chains: bookmarks for lists
● Separate list of 'interesting' nodes.
● Alphabetical sort would have skip-chain of first
letters or prefixes.
● In-list might have ref to next “interesting” node.
● Placeholders simplify bookkeeping.
● For alphabetic, pre-load 'A' .. 'Z' into the list.
● Saves updating the skip chain for inserts prior to
the currently referenced node.
Applying Perly Linked Lists
● I use them in the W-curve code for comparing
DNA sequences.
● The comparison has to deal with different sizes,
local gaps between the curves.
● Comparison requires fast lookahead for the next
'interesting' node on the list.
● Nodes and skip chains do the job nicely.
● List structure allows efficient node updates
without disturbing the object.
W-curve is derived from LL::S
● Nodes have three spatial values and a skip-chain
initialized after the list is initialized.
sub initialize
{
my ( $wc, $dna ) = @$_;
my $pt = [ 0, 0, 0 ];
$wc->head->truncate;
while( my $a = substr $dna, 0, 1, '' )
{
$pt = $wc->next_point( $a, $pt );
$wc->add_after( @$pt, '' )->next;
}
$wc
}
Skip-chain looks for “peaks”
● The alignment algorithm looks for nearby
points ignoring their Z value.
● Comparing the sparse list of radii > 0.50 speeds
up the alignment.
● Skip-chains for each node point to the next node
with a large-enough radius.
● Building the skip chain uses an “inch worm”.
● The head walks up to the next useful node.
● The tail fills ones between with a node refrence.
Skip Chain:
“interesting”
nodes.
sub add_skip
{
my $list = shift;
my $node = $list->head_node;
my $skip = $node->[0];
for( 1 .. $list->size )
{
$skip->[1] > $cutoff or next;
while( $node != $skip )
{
$node->[4] = $skip; # replace “”
$node = $node->[0]; # next node
}
}
continue
{
$skip = $skip->[0];
}
}
Use nodes to compare lists.
● DNA sequences can become offset due to gaps
on either sequence.
● This prevents using a single index to compare
lists stored as arrays.
● A linked list can be re-aligned passing only the
nodes.
● Comparison can be re-started with only the nodes.
● Makes for a useful mix of OO and procedural
code.
● Re-aligning
the lists
simply
requires
assigning the
local values.
● Updating the
node var's
does not
affect the list
locations if
compare
fails.
sub compare_lists
{
...
while( @$node0 && @$node1 )
{
( $node0, $node1 )
= realign_nodes $node0, $node1
or last;
( $dist, $node0, $node1 )
= compare_aligned $node0, $node1;
$score += $dist;
}
if( defined $score )
{
$list0->node( $node0 );
$list1->node( $node1 );
}
# caller gets back unused portion.
( $score, $list0, $list1 )
}
sub compare_aligned
{
my ( $node0, $node1 ) = @_;
my $sum = 0;
my $dist = 0;
while( @$node0 && @$node1 )
{
$dist = distance $node0, $node1
// last;
$sum += $dist;
$_ = $_->[0] for $node0, $node1;
}
( $sum, $node0, $node1 )
}
● Compare
aligned
hands back
the unused
portion.
● Caller gets
back the
nodes to re-
align if
there is a
gap.
Other Uses for Linked Lists
● Convenient trees.
● Each level of tree is a list.
● The data is a list of children.
● Balancing trees only updates a couple of next refs.
● Arrays work but get expensive.
● Need to copy entire sub-lists for modification.
Two-dimensional lists
● If the data at each node is a list you get two-
dimensional lists.
● Four-way linked lists are the guts of
spreadsheets.
● Inserting a column or row does not require re-
allocating the existing list.
● Deleting a row or column returns data to the heap.
● A multiply-linked array allows immediate jumps
to neighboring cells.
● Three-dimensional lists add “sheets”.
Work Queues
● Pushing pairs of nodes onto an array makes for
simple queued analysis of the lists.
● Adding to the list doesn't invalidate the queue.
● Circular lists' last node points back to the head.
● Used for queues where one thread inserts new
links, others remove them for processing.
● Minimal locking reduces overhead.
● New tasks get inserted after the last one.
● Worker tasks just keep walking the list from
where they last slept.
Construct a circular linked list:
DB<1> $list = [];
DB<2> @$list = ( $list, 'head node' );
DB<3> $node = $list;
DB<4> for( 1 .. 5 ) { $node->[0] = [ $node->[0], "node $_" ] }
DB<5> x $list
0 ARRAY(0xe87d00)
0 ARRAY(0xe8a3a8)
0 ARRAY(0xc79758)
0 ARRAY(0xe888e0)
0 ARRAY(0xea31b0)
0 ARRAY(0xea31c8)
0 ARRAY(0xe87d00)
-> REUSED_ADDRESS
1 'node 1'
1 'node 2'
1 'node 3'
1 'node 4'
1 'node 5'
1 'head node
● No end, use $node != $list as sentinel value.
● weaken( $list->[0] ) if list goes out of scope.
Summary
● Linked lists can be quite lazy for a variety of
uses in Perl.
● Singly-linked lists are simple to implement,
efficient for “walking” the list.
● Tradeoff for random access:
● Memory allocation for large, varying lists.
● Simpler comparison of multiple lists.
● Skip-chains.
● Reduced locking in threads.

More Related Content

PDF
Problem Characteristics in Artificial Intelligence
PPT
Adaptive Resonance Theory
PPTX
Heuristic or informed search
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PPTX
constraint satisfaction problems.pptx
PDF
PPTX
Structure of agents
PDF
Chapter 7: Matrix Multiplication
Problem Characteristics in Artificial Intelligence
Adaptive Resonance Theory
Heuristic or informed search
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
constraint satisfaction problems.pptx
Structure of agents
Chapter 7: Matrix Multiplication

What's hot (20)

PPTX
Shift reduce parser
PPTX
Dynamic Path Planning
PDF
"The Blockchain Effect on the Future of Game Design" by Sherry Jones (July 27...
PDF
Approximation Algorithms
PPTX
N-queens.pptx
PPT
01 knapsack using backtracking
PPTX
Java Introduction
PPTX
Particle Swarm Optimization
PPTX
KMP String Matching Algorithm
PDF
Elementary Parallel Algorithms
PPTX
Rod Cutting Problem
PPT
Single source stortest path bellman ford and dijkstra
PPTX
Lecture 11 diagonalization & complex eigenvalues - 5-3 & 5-5
PPT
Gravitational search algorithm in optimization techniques
PPTX
Demonstrate interpolation search
PPTX
8 queens problem.pptx
PPTX
Adversarial search
PPT
AI Lecture 5 (game playing)
PPTX
Lecture 14 Heuristic Search-A star algorithm
PPTX
Lecture 19 sma star algorithm
Shift reduce parser
Dynamic Path Planning
"The Blockchain Effect on the Future of Game Design" by Sherry Jones (July 27...
Approximation Algorithms
N-queens.pptx
01 knapsack using backtracking
Java Introduction
Particle Swarm Optimization
KMP String Matching Algorithm
Elementary Parallel Algorithms
Rod Cutting Problem
Single source stortest path bellman ford and dijkstra
Lecture 11 diagonalization & complex eigenvalues - 5-3 & 5-5
Gravitational search algorithm in optimization techniques
Demonstrate interpolation search
8 queens problem.pptx
Adversarial search
AI Lecture 5 (game playing)
Lecture 14 Heuristic Search-A star algorithm
Lecture 19 sma star algorithm
Ad

Similar to Linked Lists With Perl: Why bother? (7)

PDF
Intro to Table-Grouping™ technology
PDF
From java to rails
DOCX
Heap property
DOCX
Heap property
PPSX
Stacks fundamentals
PDF
Our Friends the Utils: A highway traveled by wheels we didn't re-invent.
PPTX
Implemention of Linked list concept in Data Structures
Intro to Table-Grouping™ technology
From java to rails
Heap property
Heap property
Stacks fundamentals
Our Friends the Utils: A highway traveled by wheels we didn't re-invent.
Implemention of Linked list concept in Data Structures
Ad

More from Workhorse Computing (20)

PDF
Object::Trampoline: Follow the bouncing object.
PDF
Wheels we didn't re-invent: Perl's Utility Modules
PDF
mro-every.pdf
PDF
Paranormal statistics: Counting What Doesn't Add Up
PDF
The $path to knowledge: What little it take to unit-test Perl.
PDF
Unit Testing Lots of Perl
PDF
Generating & Querying Calendar Tables in Posgresql
PDF
Hypers and Gathers and Takes! Oh my!
PDF
BSDM with BASH: Command Interpolation
PDF
Findbin libs
PDF
Memory Manglement in Raku
PDF
BASH Variables Part 1: Basic Interpolation
PDF
Effective Benchmarks
PDF
Metadata-driven Testing
PDF
The W-curve and its application.
PDF
Keeping objects healthy with Object::Exercise.
PDF
Perl6 Regexen: Reduce the line noise in your code.
PDF
Smoking docker
PDF
Getting Testy With Perl6
PDF
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Object::Trampoline: Follow the bouncing object.
Wheels we didn't re-invent: Perl's Utility Modules
mro-every.pdf
Paranormal statistics: Counting What Doesn't Add Up
The $path to knowledge: What little it take to unit-test Perl.
Unit Testing Lots of Perl
Generating & Querying Calendar Tables in Posgresql
Hypers and Gathers and Takes! Oh my!
BSDM with BASH: Command Interpolation
Findbin libs
Memory Manglement in Raku
BASH Variables Part 1: Basic Interpolation
Effective Benchmarks
Metadata-driven Testing
The W-curve and its application.
Keeping objects healthy with Object::Exercise.
Perl6 Regexen: Reduce the line noise in your code.
Smoking docker
Getting Testy With Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6

Recently uploaded (20)

PDF
Architecture types and enterprise applications.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
The various Industrial Revolutions .pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Modernising the Digital Integration Hub
PPTX
1. Introduction to Computer Programming.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Architecture types and enterprise applications.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Enhancing emotion recognition model for a student engagement use case through...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
A comparative study of natural language inference in Swahili using monolingua...
DP Operators-handbook-extract for the Mautical Institute
The various Industrial Revolutions .pptx
Group 1 Presentation -Planning and Decision Making .pptx
Programs and apps: productivity, graphics, security and other tools
Univ-Connecticut-ChatGPT-Presentaion.pdf
Getting Started with Data Integration: FME Form 101
cloud_computing_Infrastucture_as_cloud_p
Assigned Numbers - 2025 - Bluetooth® Document
A novel scalable deep ensemble learning framework for big data classification...
Modernising the Digital Integration Hub
1. Introduction to Computer Programming.pptx
1 - Historical Antecedents, Social Consideration.pdf
O2C Customer Invoices to Receipt V15A.pptx
Hybrid model detection and classification of lung cancer
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf

Linked Lists With Perl: Why bother?

  • 1. Linked Lists In Perl What? Why bother? Steven Lembark   Workhorse Computing lembark@wrkhors.com ©2009,2013 Steven Lembark
  • 2. What About Arrays? ● Perl arrays are fast and flexible. ● Autovivication makes them generally easy to use. ● Built-in Perl functions for managing them. ● But there are a few limitations: ● Memory manglement. ● Iterating multiple lists. ● Maintaining state within the lists. ● Difficult to manage links within the list.
  • 3. Memory Issues ● Pushing a single value onto an array can double its size. ● Copying array contents is particularly expensive for long-lived processes which have problems with heap fragmentation. ● Heavily forked processes also run into problems with copy-on-write due to whole-array copies. ● There is no way to reduce the array structure size once it grows -- also painful for long-lived processes.
  • 4. Comparing Multiple Lists ● for( @array ) will iterate one list at a time. ● Indexes work but have their own problems: ● Single index requires identical offsets in all of the arrays. ● Varying offsets require indexes for each array, with separate bookeeping for each. ● Passing all of the arrays and objects becomes a lot of work, even if they are wrapped on objects – which have their own performance overhead. ● Indexes used for external references have to be updated with every shift, push, pop...
  • 5. Linked Lists: The Other Guys ● Perly code for linked lists is simple, simplifies memory management, and list iteration. ● Trade indexed access for skip-chains and external references. ● List operators like push, pop, and splice are simple. ● You also get more granular locking in threaded applications.
  • 6. Examples ● Insertion sorts are stable, but splicing an item into the middle of an array is expensive; adding a new node to a linked list is cheap. ● Threading can require locking an entire array to update it; linked lists can safely be locked by node and frequently don't even require locking. ● Comparing lists of nodes can be simpler than dealing with multiple arrays – especially if the offsets change.
  • 7. Implementing Perly Linked Lists ● Welcome back to arrayrefs. ● Arrays can hold any type of data. ● Use a “next” ref and arbitrary data. ● The list itself requires a static “head” and a node variable to walk down the list. ● Doubly-linked lists are also manageable with the addition of weak links. ● For the sake of time I'll only discuss singly- linked lists here.
  • 8. List Structure ● Singly-linked lists are usually drawn as some data followed by a pointer to the next link. ● In Perl it helps to draw the pointer first, because that is where it is stored.
  • 9. Adding a Node ● New nodes do not need to be contiguous in memory. ● Also doesn't require locking the entire list.
  • 10. Dropping A Node ● Dropping a node releases its memory – at last back to Perl. ● Only real effect is on the prior node's next ref. ● This is the only piece that needs to be locked for threading.
  • 11. Perl Code ● A link followed by data looks like: $node = [ $next, @data ] ● Walking the list copies the node: ( $node, my @data ) = @$node; ● Adding a new node recycles the “next” ref: $node->[0] = [ $node->[0], @data ] ● Removing recycles the next's next: ($node->[0], my @data) = @{$node->[0]};
  • 12. A Reverse-Order List ● Just update the head node's next reference. ● Fast because it moves the minimum of data. my $list = [ [] ]; my $node = $list->[0]; for my $val ( @_ ) { $node->[0] = [ $node->[0], $val ] } # list is populated w/ empty tail.
  • 13. In-order List ● Just move the node, looks like a push. ● Could be a one-liner, I've shown it here as two operations. my $list = [ [] ]; my $node = $list->[0]; for my $val ( @_ ) { @$node = ( [], $val ); $node = $node->[0]; }
  • 14. Viewing The List ● Structure is recursive from Perl's point of view. ● Uses the one-line version (golf anyone)? DB<1> $list = [ [], 'head node' ]; DB<2> $node = $list->[0]; DB<3> for( 1 .. 5 ) { ($node) = @$node = ( [], “node-$_” ) } DB<14> x $list 0 ARRAY(0x8390608) $list 0 ARRAY(0x83ee698) $list->[0] 0 ARRAY(0x8411f88) $list->[0][0] 0 ARRAY(0x83907c8) $list->[0][0][0] 0 ARRAY(0x83f9a10) $list->[0][0][0][0] 0 ARRAY(0x83f9a20) $list->[0][0][0][0][0] 0 ARRAY(0x83f9a50) $list->[0][0][0][0][0][0] empty array empty tail node 1 'node-5' $list->[0][0][0][0][0][1] 1 'node-4' $list->[0][0][0][0][1] 1 'node-3' $list->[0][0][0][1] 1 'node-2' $list->[0][0][1] 1 'node-1' $list->[0][1] 1 'head node' $list->[1]
  • 15. Destroying A Linked List ● Prior to 5.12, Perl's memory de-allocator is recursive. ● Without a DESTROY the lists blow up after 100 nodes when perl blows is stack. ● The fix was an iterative destructor: ● This is no longer required. DESTROY { my $list = shift; $list = $list->[0] while $list; }
  • 16. Simple Linked List Class ● Bless an arrayref with the head (placeholder) node and any data for tracking the list. sub new { my $proto = shift; bless [ [], @_ ], blessed $proto || $proto } # iterative < 5.12, else no-op. DESTROY {}
  • 17. Building the list: unshift ● One reason for the head node: it provides a place to insert the data nodes after. ● The new first node has the old first node's “next” ref and the new data. sub unshift { my $list = shift; $list->[0] = [ $list->[0], @_ ]; $list }
  • 18. Taking one off: shift ● This starts directly from the head node also: just replace the head node's next with the first node's next. sub shift { my $list = shift; ( $list->[0], my @data ) = @{ $list >[0] };‑ wantarray ? @data : @data }
  • 19. Push Is A Little Harder ● One approach is an unshift before the tail. ● Another is populating the tail node: sub push { my $list = shift; my $node = $list->[0]; $node = $node->[0] while $node->[0]; # populate the empty tail node @{ $node } = [ [], @_ ]; $list }
  • 20. The Bane Of Single Links: pop ● You need the node-before-the-tail to pop the tail. ● By the time you've found the tail it is too late to pop it off. ● Storing the node-before-tail takes extra bookeeping. ● The trick is to use two nodes, one trailing the other: when the small roast is burned the big one is just right.
  • 21. sub node_pop { my $list = shift; my $prior = $list->head; my $node = $prior->[0]; while( $node->[0] ) { $prior = $node; $node = $node->[0]; } ( $prior->[0], my @data ) = @$node; wantarray ? @data : @data } ● Lexical $prior is more efficient than examining $node->[0][0] at multiple points in the loop.
  • 22. Mixing OO & Procedural Code ● Most of what follows could be done entirely with method calls. ● Catch: They are slower than directly accessing the nodes. ● One advantage to singly-linked lists is that the structure is simple. ● Mixing procedural and OO code without getting tangled is easy enough. ● This is why I use it for genetics code: the code is fast and simple.
  • 23. Walking A List ● By itself the node has enough state to track location – no separate index required. ● Putting the link first allows advancing and extraction of data in one access of the node: my $node = $list->[0]; while( $node ) { ( $node, my %info ) = @$node; # process %info... }
  • 24. Comparing Multiple Lists ● Same code, just more assignments: my $n0 = $list0->[0]; my $n1 = $list1->[0]; while( $n0 && $n1 ) { ( $n0, my @data0 ) = @$n0; ( $n1, my @data1 ) = @$n1; # deal with @data1, @data2 ... }
  • 25. Syncopated Lists ● Adjusting the offsets requires minimum bookkeeping, doesn't affect the parent list. while( @$n0, @$n1 ) { $_ = $_->[0] for $n0, $n1; aligned $n0, $n1 or ( $n0, $n1 ) = realign $n0, $n1 or last; $score += compare $n0, $n1; }
  • 26. Using The Head Node ● $head->[0] is the first node, there are a few useful things to add into @head[1...]. ● Tracking the length or keeping a ref to the tail tail simplifys push, pop; requires extra bookkeeping. ● The head node can also store user-supplied data describing the list. ● I use this for tracking length and species names in results of DNA sequences.
  • 27. Close, but no cigar... ● The class shown only works at the head. ● Be nice to insert things in the middle without resorting to $node variables. ● Or call methods on the internal nodes. ● A really useful class would use inside-out data to track the head, for example. ● Can't assign $list = $list->[0], however. ● Looses the inside-out data. ● We need a structure that walks the list without modifying its own refaddr.
  • 28. The fix: ref-to-arrayref ● Scalar refs are the ultimate container struct. ● They can reference anything, in this case an array. ● $list stays in one place, $$list walks up the list. ● “head” or “next” modify $$list to reposition the location. ● Saves blessing every node on the list. ● Simplifies having a separate class for nodes. ● Also helps when resorting to procedural code for speed.
  • 29. Basics Don't Change Much sub new { my $proto = shift; my $head = [ [], @_ ]; my $list = $head; $headz{ refaddr $list } = $head; bless $list, blessed $proto || $proto } DESTROY # add iteration for < v5.12. { my $list = shift; delete $headz{ refaddr $list }; } ● List updates assign to $$list. ● DESTROY cleans up inside-out data.
  • 30. Walking The List sub head { my $list = shift $$list = $headz{ refaddr $list }; $list } sub next { my $list = shift; ( $$list, my @data ) = @$$list or return wantarray ? @data : @data } $list->head; while( my @data = $list->next ) { ... }
  • 31. Reverse-order revisited: ● Unshift isn't much different. ● Note that $list is not updated. sub unshift { my $list = shift; my $head = $list->head; $head->[0] = [ $head->[0], @_ ]; $list } my $list = List::Class->new( ... ); $list->unshift( $_ ) for @data;
  • 32. Useful shortcut: 'add_after' ● Unlike arrays, adding into the middle of a list is efficient and common. ● “push” adds to the tail, need something else. ● add_after() puts a node after the current one. ● unshift() is really “$list->head->add_after”. ● Use with “next_node” that ignores the data. ● In-order addition (head, middle, or end): $list->add_after( $_ )->next for @data; ● Helps if next() avoids walking off the list.
  • 33. sub next { my $list = shift; my $next = $$list->[0] or return; @$next or return; $$list = $next; $list } sub add_after { my $list = shift; my $node = $$list; $node->[0] = [ $node->[0], @_ ] $list }
  • 34. Off-by-one gotchas ● The head node does not have any user data. ● Common mistake: $list->head->data. ● This gets you the list's data, not users'. ● Fix is to pre-increment the list: $list->head->next or last; while( @data = $list->next ) {...}
  • 35. Overloading a list: bool, offset. ● while( $list ), if( $list ) would be nice. ● Basically, this leaves the list “true” if it has data; false if it is at the tail node. use overload q{bool} => sub { my $list = shift; $$list },
  • 36. Offsets would be nice also q{++} => sub { my $list = shift; my $node = $$list; @$node and $$list= $node->[0]; $list }, q{+} => sub { my ( $list, $offset ) = $_[2] ? ... my $node = $$list; for ( 1 .. $offset ) { @$node && $node = $node->[0] or last; } $node },
  • 37. Updating the list becomes trivial ● An offset from the list is a node. ● That leaves += simply assigning $list + $off. q{+=} => sub { my ( $list, $offset ) = … $$list = $list + $offset; $listh };
  • 38. Backdoor: node operations ● Be nice to extract a node without having to creep around inside of the object. ● Handing back the node ref saves derived classes from having to unwrap the object. ● Also save having to update the list object's location to peek into the next or head node. sub curr_node { ${ $_[0] } } sub next_node { ${ $_[0] }->[0] } sub root_node { $headz{ refaddr $_[0] } } sub head_node { $headz{ refaddr $_[0] }->[0] }
  • 39. Skip chains: bookmarks for lists ● Separate list of 'interesting' nodes. ● Alphabetical sort would have skip-chain of first letters or prefixes. ● In-list might have ref to next “interesting” node. ● Placeholders simplify bookkeeping. ● For alphabetic, pre-load 'A' .. 'Z' into the list. ● Saves updating the skip chain for inserts prior to the currently referenced node.
  • 40. Applying Perly Linked Lists ● I use them in the W-curve code for comparing DNA sequences. ● The comparison has to deal with different sizes, local gaps between the curves. ● Comparison requires fast lookahead for the next 'interesting' node on the list. ● Nodes and skip chains do the job nicely. ● List structure allows efficient node updates without disturbing the object.
  • 41. W-curve is derived from LL::S ● Nodes have three spatial values and a skip-chain initialized after the list is initialized. sub initialize { my ( $wc, $dna ) = @$_; my $pt = [ 0, 0, 0 ]; $wc->head->truncate; while( my $a = substr $dna, 0, 1, '' ) { $pt = $wc->next_point( $a, $pt ); $wc->add_after( @$pt, '' )->next; } $wc }
  • 42. Skip-chain looks for “peaks” ● The alignment algorithm looks for nearby points ignoring their Z value. ● Comparing the sparse list of radii > 0.50 speeds up the alignment. ● Skip-chains for each node point to the next node with a large-enough radius. ● Building the skip chain uses an “inch worm”. ● The head walks up to the next useful node. ● The tail fills ones between with a node refrence.
  • 43. Skip Chain: “interesting” nodes. sub add_skip { my $list = shift; my $node = $list->head_node; my $skip = $node->[0]; for( 1 .. $list->size ) { $skip->[1] > $cutoff or next; while( $node != $skip ) { $node->[4] = $skip; # replace “” $node = $node->[0]; # next node } } continue { $skip = $skip->[0]; } }
  • 44. Use nodes to compare lists. ● DNA sequences can become offset due to gaps on either sequence. ● This prevents using a single index to compare lists stored as arrays. ● A linked list can be re-aligned passing only the nodes. ● Comparison can be re-started with only the nodes. ● Makes for a useful mix of OO and procedural code.
  • 45. ● Re-aligning the lists simply requires assigning the local values. ● Updating the node var's does not affect the list locations if compare fails. sub compare_lists { ... while( @$node0 && @$node1 ) { ( $node0, $node1 ) = realign_nodes $node0, $node1 or last; ( $dist, $node0, $node1 ) = compare_aligned $node0, $node1; $score += $dist; } if( defined $score ) { $list0->node( $node0 ); $list1->node( $node1 ); } # caller gets back unused portion. ( $score, $list0, $list1 ) }
  • 46. sub compare_aligned { my ( $node0, $node1 ) = @_; my $sum = 0; my $dist = 0; while( @$node0 && @$node1 ) { $dist = distance $node0, $node1 // last; $sum += $dist; $_ = $_->[0] for $node0, $node1; } ( $sum, $node0, $node1 ) } ● Compare aligned hands back the unused portion. ● Caller gets back the nodes to re- align if there is a gap.
  • 47. Other Uses for Linked Lists ● Convenient trees. ● Each level of tree is a list. ● The data is a list of children. ● Balancing trees only updates a couple of next refs. ● Arrays work but get expensive. ● Need to copy entire sub-lists for modification.
  • 48. Two-dimensional lists ● If the data at each node is a list you get two- dimensional lists. ● Four-way linked lists are the guts of spreadsheets. ● Inserting a column or row does not require re- allocating the existing list. ● Deleting a row or column returns data to the heap. ● A multiply-linked array allows immediate jumps to neighboring cells. ● Three-dimensional lists add “sheets”.
  • 49. Work Queues ● Pushing pairs of nodes onto an array makes for simple queued analysis of the lists. ● Adding to the list doesn't invalidate the queue. ● Circular lists' last node points back to the head. ● Used for queues where one thread inserts new links, others remove them for processing. ● Minimal locking reduces overhead. ● New tasks get inserted after the last one. ● Worker tasks just keep walking the list from where they last slept.
  • 50. Construct a circular linked list: DB<1> $list = []; DB<2> @$list = ( $list, 'head node' ); DB<3> $node = $list; DB<4> for( 1 .. 5 ) { $node->[0] = [ $node->[0], "node $_" ] } DB<5> x $list 0 ARRAY(0xe87d00) 0 ARRAY(0xe8a3a8) 0 ARRAY(0xc79758) 0 ARRAY(0xe888e0) 0 ARRAY(0xea31b0) 0 ARRAY(0xea31c8) 0 ARRAY(0xe87d00) -> REUSED_ADDRESS 1 'node 1' 1 'node 2' 1 'node 3' 1 'node 4' 1 'node 5' 1 'head node ● No end, use $node != $list as sentinel value. ● weaken( $list->[0] ) if list goes out of scope.
  • 51. Summary ● Linked lists can be quite lazy for a variety of uses in Perl. ● Singly-linked lists are simple to implement, efficient for “walking” the list. ● Tradeoff for random access: ● Memory allocation for large, varying lists. ● Simpler comparison of multiple lists. ● Skip-chains. ● Reduced locking in threads.