Data Step Hash Object vs SQL Join

Hash vs Join
A case study evaluating the use of the data
step hash object to replace a SQL join
Geoff Ness
Sep 2014

The Hash Object
• Effectively a lookup table which resides in
memory – key/value pairs
• Similar to associative arrays, dictionaries in other
programming languages
• Fast lookup (O(1)), no sorting required
• Can offer a faster alternative to traditional data
step merge or SQL join, at a price:
– The syntax is unfamiliar to a lot of SAS programmers
– There’s more code to write
– Requires more memory than a join (sometimes much
more)

Using Hash to replace a SQL Join
Fact
table
Dimension
1
Dimension
2
Dimension
3
Dimension
4

Alternative using the Hash Object
• Replacing the join typically requires 3 steps to
be coded:
1 - Create variables by ‘faking’ a set statement:

2 - Then declare hash objects for each dimension:

3 - Finally, join rows from the fact to rows in the
dimensions by calling the hash .find() method:
• The .find() method returns 0 when a matching
row is found in the column from .definekey(), and
the values from .definedata() are populated

Performance Comparison
• When joining 2 dimensions, small fact (100K
rows):

• Joining 2 dimensions, large fact (~10M rows):

• Joining 9 dimensions, small fact (100K rows):

• Joining 9 dimensions, large fact (~10M rows):

Stuff we haven’t considered
• Outer joins (yes these are possible)
• When proc sql will use the hash object ‘under
the covers’
• Performance against RDBMS tables (as
opposed to SAS datasets)
• Hash iterators
• Other things that can be done with the hash
object (sorting, summarisation, de-duplication)

Summary
• Implementing a join using the hash object can
provide a considerable saving in terms of time,
usually at the expense of memory
• The code is a little more involved but breaks
down to a reasonably simple process to
implement
• Things to consider:
– The number and size of tables involved
– The memory required to load all the hash objects into
memory

References
The SAS® Hash Object in Action
http://support.sas.com/resources/papers/proceedings09/153-
2009.pdf
Introduction to SAS® Hash Objects
http://www.scsug.org/wp-content/uploads/2013/11/Introduction-to-
SAS%C2%AE-Hash-Objects-Chris-Schacherer.pdf
A Hash Alternative to the PROC SQL Left Join
http://www.nesug.org/proceedings/nesug06/dm/da07.pdf
Using the Hash Object – SAS® Language Reference: Concepts
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/defa
ult/viewer.htm#a002585310.htm

Data Step Hash Object vs SQL Join

More Related Content

What's hot (20)

Similar to Data Step Hash Object vs SQL Join (20)

Data Step Hash Object vs SQL Join

Editor's Notes