Part Numbering and ID codes: general considerations and check digits

Part Numbers, Codes, IDs &c.
Let’s talk about how we refer to our parts
– and how it’s such an appropriate subject for
Smalltalk

john@ItsNotExactlyRocket.Science 2/50
I’m John H Woods, chronically online since 5BW (before Web).
So, first profile picture predates digital cameras: stuck my head
on a flatbed scanner. Fortunately technology has moved on!
1 First entered IT as a job after learning
Smalltalk during Biosciences PhD 1998
2 Have consulted widely over many
domains and much tech. since then.
3 (But, I have still never met language
I didn’t think was much, much worse
than Smalltalk.)
I’m johnhwoods on Discord, Threads, The Register, gmail, hotmail, wikipedia and elsewhere
1 but if to contact me specifically about this, click here: john@itsnotexactlyrocket.science
HELLO!

What makes a good part identification code?

1 Unique in our organization

2 Unambiguous for us, our partners, clients and suppliers

3 Simple, consistent logical structure

So far, so uncontroversial …

a Decimal only; I will die on this hill
b Variable width (different hill, but same)
c No leading zeros! And perhaps more restrictions …
d A single, transparent, punctuation mark to structure the digits

a Decimal only; I will die on this hill
b Variable width (different hill, but same)
c No leading zeros! And perhaps more restrictions …
d A single, transparent, punctuation mark to structure the digits
4 ABOVE ALL: Use a check digit! See you on Hill 3!

Common and good advice rejected
1 Grouping codes by compatibility categories
a embeds assembly details: a future-proofing nightmare
2 Using constant width fields
a contradicts meaningful structure (more later …)
b impacts scalability and future-proofing: exhaust a short
number range – or end up with inconveniently long ones
c commonly used part codes as lengthy as rare ones
3 Using meaningful alphanumeric codes
a embeds too much info, increases length and complexity
b ambiguity and locale issues with letters and acronyms
c impacts future-proofing as parts are replaced or retired

Hill 1: Only Use Decimal
1 Better internationalization (‘i18n’)
a understood even in CJK and RTL languages
b avoids confusion regarding meaning of letters and letter sequences
2 Reduction of errors
a Less ambiguity when written (O-D-0-o, 8-B, I-l-1, S-5, G-6-b, 2-Z-7, 9-q)
b Less ambiguity in spoken English (“oh” O-o-0, 8-A-H, S-F, M-N &c.)
c Easy character set (or … is it Hex, Base 36, 64, or other? CaSe SeNsItIvE?)
3 Better compatibility with other data systems
a Nearly everything accepts integers (31 bits gives over 2 billion numbers)
b Less comparison ambiguity (all part numbers are different unique integers)
4 Only numeric keys required: speed of entry / simplicity of devices
5 Improved choice of check-digit algorithms (more later …)

Surely you can’t be serious?
1 I am serious and don’t call me Shirley.
2 But decimal numbers are less compact than alphanumeric?
a Not much: 20% longer than Hex, 50% longer than base 36
b Just 10 digits is already literally billions of part numbers
3 But they’re harder to remember / don’t have meaning!
a That’s why we have computer systems… and a description column!
b People using them frequently will remember shorter codes, people
who don’t won’t need to, and will perform simple lookups as required
c Grouping can give numbers meaning. A woodworker, realizing all
planed timber is nnn-11-, will know what a nnn-11-2400-47-100 is;
a mechanic will be able to guess what a nnn-19-225-45 might be; and
a librarian is unlikely to have to look up 903.

Hill 2: Variable width
1 Allows for meaningful structure and therefore changes
a Version 2 of a nnn-101-234 can be nnn-101-234-2
b This is especially powerful with optional punctuation (next slide…)
2 Allows for shorter codes for commonly used parts
a With a check digit (see later…) there are ten two digit, a hundred
three digit, and a thousand four digit codes: 1110 short numbers
b Avoids short codes like 31 having meaningless leading zeros
(e.g. 00000031) or other prefixes (e.g. ‘30000031’)
3 The advantages of fixed width are now limited
a Sort order is unimportant – and antithetical to structure
b Error checking by check digit far superior to digit counting
c Output formatting is less burdensome on modern systems

thebenefitsofseparationwithpunctuation
1 Grouping with some form of punctuation
a improves readability with more than 5 or 6 digits (e.g. ‘the long card number’)
b can improve familiarization, allowing memorization or even meaning
c can allow the use of numbers that are already recognized, like tyre sizes
2 However
a spaces can wrap and other data systems can do weird things with them
b almost every other symbol has a meaning already, including commas, periods
and both slashes, and the ‘vertical bar’ | can look too much like 1, l or I.
c _ isn’t on a numeric keypad, generally needs <SHIFT>, and has a long name
3 The hyphen (-) is a good compromise (usually on the top corner of
a numeric keypad) and has a nice short name (‘dash’) in English.
NB Punctuation must be transparent so ‘101-234-2’ is 1012342
Integers transfer unambiguously between systems

Zero: The trouble with l'œuf
1 Leading zeros
a often get mangled by other systems
b destroy the valuable 1:1 relationship to integers
c are not required when we’ve already abandoned constant width
d are invisible to many check digit algorithms (usually by design)
2 And, all zeros
a can cause some Damm issues (this algorithm discussed later …)
b can appear dotted/slashed, 0, sometimes confused with 8
3 But, banning all zeros reduces
a the number of items representable with a given number of digits
b ability to use numbers with real meanings, such as 2400 for 2.4m

Hill 3: Check your digits – with Check Digits!
1 When humans enter or communicate data, they make errors:
a Letter / digit errors: 0 ⬌ O or I ⬌ 1 (N/A in decimal only codes)
b Single digit errors: 8 ⬌ 9 (and more keys are adjacent on a numpad)
c Simple transposition error: 98 ⬌ 89
d Twin error: 88 ⬌ 99
e Jump transposition error: 317 ⬌ 713
f Jump twin error: 313 ⬌ 414
g Phonetic: 60 ⬌ 16 (“sixty” ⬌ “sixteen”)
2 Simpler algorithms: easier and faster, but detect fewer errors
3 Let’s see how to get the maximum error detection when using
a single digit checksum …

Modulo 10 Sums
1 We could just sum all the digits modulo 10:
12345 = 1+2+3+4+5 = 15; divide by 10 is 1 remainder 5
Our new number would be 123455 (or 12345-5 etc.)
We can detect all single digit errors:
12445-5 = 1+2+4+4+5 = 16, modulo 10 is 6
but check digit is 5 and therefore so we know the number is invalid
This approach similarly detects insertions and deletions
– but not (most) transpositions, as the sum of the digits is unchanged
2 Smalltalk
('12345'
inject: 0 "start value; sum is accumulated"
into: [ :sum :char | sum + char digitValue ]) 10

Weighted Sum, Modulo 11
1 To detect transpositions, we ‘weight’ the digits:
The ISBN-10 of the Blue Book is 0-201-11371-6
ISBN-10 uses weights of the digit position:
0×1 + 2×2 + 0×3 + 1×4 + 1×5 + 1×6 + 3×7 + 7×8 +1×9 + 6×10 = 165
165 modulo 11 is 0 and that means that the number is valid.
The rightmost digit is a check digit whose value is adjusted to
ensure the weighted sum of the digits is an exact multiple of 11
2 UK NHS Numbers use the same technique.
3 Note that the last digit might need a value of 10! So:
a In the NHS, the next available number is used
b In ISBN-10, the last digit is an X, e.g. 0-14-118260-X
(Who loves a Roman numerals joke? I, for one!)

Weighted Sum, Modulo 11 in Smalltalk (1/2)
"WARNING: There is no error checking in any of these snippets.
Initial checks must be done by Regex or manual format checking."
| weight |
weight:= 0. "left to right, weights are 1,2,3, ..."
('6088489006' inject: 0
into: [ :checksum :char |
(weight:= weight+1) * char digitValue + checksum ]) 11
"or, with an additional method, the weight is the index (In
Smalltalk, arrays, strings &c. start at 1 – as nature intended!)"
('608848900' withIndexInject: 0
into: [ :checksum :char :idx |
idx * char digitValue + checksum ]) 11
"The former returns 0 indicating validity. The latter, with a 9
character input, returns 6, which is the value of the check digit
needed to make the number valid, as in the former case.

"NB: the checksum can be 10, e.g. some editions of Strunk & White..."
('020530902' withIndexInject: 0
"Returns 10, which is why the original ISBN was 0-205-30902-X.
The UK NHS addresses this problem by simply not using any numbers
which would require a check “digit” character outside the range 0-9"
"But, for ISBN-10, we need to handle the occasional terminal X"
| isbn |
isbn:= '020530902X'.
(isbn allButLast withIndexInject: 0
= (isbn last = $X ifTrue: [ 10 ] ifFalse: [ isbn last digitValue ])
"Generating ISBN-10s is left as an exercise for the reader"
Weighted Sum, Modulo 11 in Smalltalk (2/2)

ISBN-13 is EAN-13
1 ISBN numbers migrated from 10 to 13 digits in 2007,
for compatibility with EAN (European Article Number),
now internationally known as GTIN (Global Trade Item
Number).
a The symbology includes only digits 0-9
therefore a modulo 10 algorithm is used.
b Counting from right to left, odd positions
are weighted 1, even positions weighted 3.
2 ISBN-10s are converted by prefixing 978 to the payload
and appending the new EAN-13 check digit.
3+2×3+0+9×3+0+3×3+5+0×3+2+0×3+8+7×3+9 = 90 ✓

EAN/GTIN in Smalltalk (1/2)
"reversed makes no difference in the GTIN-13 case as it has an odd number of
digits, but allows GTIN-8 and GTIN-18 with no need to check input parity"
"Note the index flip when moving from validation to generation"
| lastWeight |
lastWeight:= 3. "right to left, weights are 1,3,1,3 ..."
('1234512745657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum ]) 10
"Returns 0, which indicates the number is correct"
"To generate a check digit, run the algorithm on the `payload`
of the first 12 digits and subtract the result from 10"
| lastWeight |
lastWeight:= 1. "right to left, weights are 3,1,3,1 ..."
(10 - ('123451274565' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum ]) 10) 10
"Returns 7 as expected from the above. If this returns 10,
the check digit is 0, hence the additional 10 at the end"
"NB: without the check digit, our first digit is even, not odd."

EAN/GTIN in Smalltalk (2/2)
"EAN transposition check shows its most important weakness"
| lastWeight |
lastWeight:= 3.
'1235412745657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum 10 ]
"Returns 2, which proves the number is incorrect"
1 "But, if the transposed digits differ by 5, the error goes undetected:
x + 3(x+5) = 4x+15 and 3x + (x+5) = 4x+5, which are congruent modulo 10"
| lastWeight |
lastWeight:= 3.
'1234517245657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum 10 ]
"Returns 0 indicating validity. However, such transpositions are less
frequent: digits different by 5 are not adjacent on either the number row or
standard numeric keypads (whether calculator style or telephone style)"

Luhn
1 Patented in 1960 but now in the public domain, Luhn is
used for: credit card numbers; IMEIs (mobile handsets);
European Patent Application numbers; ICCIDs (SIM cards);
several government IDs from round the world; &c.
2 Another “sum digits with alternating weights modulo 10”
method; the weights, again starting from the right,
alternate 1 and 2, rather than EAN/GTIN’s 1 and 3.
3 However, where doubling a digit gives a two digit result,
we use the sum of the digits of the answer.
4 So, when in even digit positions, 0,1,2,3,4 yield 0,2,4,6,8
– but 5,6,7,8,9 yield 1,3,5,7,9 – not 10,12,14,16,18.

Luhn Example
1 To validate: 8905543677
2 Going right to left every other digit we get
7+14+6+6+4+10+5+0+9+16
3 Replacing all two digit results with their digit sum yields
7+5+6+6+4+1+5+0+9+7 = 50 mod 10 = 0.
4 As with GTIN, if the sum is a multiple of 10, the number is
valid; and, as with GTIN, this is effectively the same operation as
dropping the check digit, calculating a check digit from the
payload, and comparing it to the existing check digit.
5 As with GTIN, to generate a check digit, we must flip the weights
as the now missing check digit was digit 1, and we start by
doubling the first digit as we go right to left.

Luhn Smalltalk (1/5) – digits and characters
"How to sum digits of a two digit number?"
14 printString fold: [:a :b | a digitValue + b digitValue]
"Or just add the tens and the units"
(0 to: 9) collect: [:n | | d |
d:= n*2.
d // 10 + d 10 ]
"#(0 2 4 6 8 1 3 5 7 9)"
"Or, once we have that result, just use it as a lookup table"
(0 to: 9) collect: [:n | #(0 2 4 6 8 1 3 5 7 9) at: 1+n"
"Also, by using digitValue the argument can be any Unicode characters that
represent the decimal digits, e.g. Sinhala"
'෦෧෨෩෪෫෬෭෮෯' asArray collect: [ :char | char digitValue ]
"#(0 1 2 3 4 5 6 7 8 9)"
NB: We did all the input sanitization before, right? Right?"

Luhn Smalltalk (2/5) – using a lookup table
"This uses alternating offsets of 11 and 1 with a lookup table. The validation
code, highlighted, is only four lines. This Snippet will return any invalid
numbers in the array. All the inputs in the example below are valid."
| luhn |
luhn:= #[ 0 1 2 3 4 5 6 7 8 9
0 2 4 6 8 1 3 5 7 9 ].
#(18 #[3 7 2] '٥٦٤٨٧٢' "formats: integer, byte array, unicode string"
8905543677 "my example from earlier slide"
17893729974 "the example from Wikipedia"
1234567812345670 "the example from Rosetta Code"
'3566002020360505' "an example Credit Card number, spaces removed"
339269927028426 "an example IMEI number"
) reject: [ :input | | index |
index:=11.
(input asString reversed inject:0 into:[:sum :digit || codePoint |
codePoint:= digit asInteger & 15.
sum + (luhn at:(index:= 12-index)+codePoint) ])10=0]

Luhn Smalltalk (3/5) – tables beyond 0 to 9
"Unlike the other single digit check digit algorithms here, Luhn can be
generalized to larger (or smaller) character sets with an even number of
symbols. Lookup table generation translates to Smalltalk as follows:"
| charset base luhn |
charset:= '0123456789ABCDEFGHJK'. "works with any symbols, in any order"
base:= charset size.
(base2) isZero
ifFalse: [ self error:'Luhn character set must have even size' ].
luhn:= OrderedCollection new.
luhn addAll: (0 to:base-1).
0 to: base-1 do: [:digit || double |
double:= 2 * digit.
luhn add: double//base + doublebase ].
luhn asByteArray
"#[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 2 4 6 8 10 12 14 16 18 1 3 5 7 9 11 13 15 17 19 ]"
e.g.: 14 ×2 = 28
which is 18 in base 20
so digit sum is 9
Note: a nice
easy pattern
is emerging

Luhn Smalltalk (4/5) – validation modulo N
"Note table construction and validation code. Snippet takes an array and
returns an array of those elements which are invalid. Lookup table could be
cached but typically construction is 2-5µs, with each check 1-2µs "
| base charset luhn |
base:= (charset := 'abcdef') size.
luhn:= ((0 to:base-1),(0 to:base-2 by:2),(1 to:base by:2)) asByteArray.
#( 'abcdefe' 'abcb' 'defc' )
reject:[:input || index |
index:= 1+base.
(input asString reversed
inject: 0
into: [:sum :digit || codePoint |
codePoint:= (charset indexOf: digit)-1.
sum + (luhn at: (index:= 2+base-index) + codePoint) ])base=0]
"NB: In design, ensure the charset size is even, with no repetition"
"NB: In execution, ensure that each argument, after removing allowed
punctuation, comprises only characters from the character set"

Luhn Smalltalk (5/5) – generation modulo N
"Because (m-(x mod m) mod m) simplifies to (-x) mod m, the generation
method (again note that the index is flipped) is as follows ..."
| base charset luhn |
base:= (charset := ' ✓ ') size.
😎☀ ⚙💩⚡
luhn:= ((0 to:base-1),(0 to:base-2 by:2),(1 to:base by:2)) asByteArray.
#( ' ✓ ' ' ✓' ' ' )
😎☀ ⚙💩⚡ 😎☀ ⚙💩⚡
collect:[:input || index |
index:= 1.
input asString, (charset at: 1 + ((input asString reversed
inject: 0
into: [:sum :digit || codePoint |
codePoint:= (charset indexOf: digit)-1.
sum + (luhn at: (index:= 2+base-index) + codePoint) ])
negated base)) asString ]
"#(' ✓ ' ' ✓ ' ' ✓')"
😎☀ ⚙💩⚡💩 😎☀ ☀ ⚙💩⚡
"Check digits Pile of Poo Sun With Rays and Check Mark are appended"
very little code
is required to
generate
Luhn mod N
check digits

Luhn Weaknesses
1 Twin errors:
22 2 + 4
🠖 6
🠖 🠔 5 + 1 5 + 10
🠔 🠔 55
33 3 + 6
🠖 9
🠖 🠔 6 + 3 6 + 12
🠔 🠔 66
44 4 + 8
🠖 12
🠖 🠔 7 + 5 7 + 14
🠔 🠔 77
Unfortunately all three of these errors comprise adjacent keys
on a numeric keypad; less likely to be typed on a number row
2 Transposition error:
90 9 + 0
🠖 9
🠖 🠔 0 + 9 0 + 18
🠔 🠔 09
Not only adjacent on number row, but at the extreme right,
both conventionally operated by digitus minimus dexter 1
1
right pinky finger

Luhn Conclusions
1 Works with any symbol set
a there must be an even number of symbols
– no problem for octal, decimal, hexadecimal and the Latin alphabet
b but symbol codes do not need to be contiguous or in any particular order
2 Allows codes as short as 2 symbols, and up to arbitrary length.
3 Easy to implement, especially for the standard case of digits 0..9
4 Fast: typically lookup table construction 2-5µs; then validation < 1µs
5 Widely used, well-understood and still extremely useful
6 Has some unfortunate weaknesses
a 60% of twin errors undetected, and easy to key on a numpad
b cannot detect transpositions of the end digits in its character set
– and this is particularly relevant with the 0..9 case

Half Time!
1 Smalltalk is great! Any dissent to be handled later – in the car park 😛
2 Some codes have double digit checksums (checksa?)
a e.g. International banking codes: IBAN uses a modulo 97 algorithm
b possibly overkill for part numbering, we’ll stick to single digits
3 Despite the appeal of Luhn Mod N, decimal codes remain optimal
a but weaknesses of Luhn 0..9 make it subobtimal for part numbering
b and countering the weakness by manually preventing the creation of
numbers with 09 or 90, or by using the digits out of order, e.g.
1234567890, would result in non-standard implementations and
possibly maintainability issues
4 Advantages over later algorithms turn out to be underwhelming:
a It’s not that much faster
b It’s not that much simpler

Verhoeff
1 In 1969 Jacobus Verhoeff presented the first decimal check digit algorithm
which detects all adjacent transposition errors and all single digit errors,
despite this having been thought (even mistakenly ‘proved’) impossible.
2 From some eye-watering maths, he created three tables: p, d, and inv.
[Wikipedia]

Navigating the Verhoeff Tables: Validation
1 Starting with, e.g., ‘3572’ and c = 0, we go L R
🠔 from position 0 (the check digit)
2 The permutation table, p, gives us p(0,2) (0 is position modulo 8, 2 is the digit) = 2
3 The 2 goes into the multiplicative table, d, with the current c, giving c = d(0,2) = 2
4 Moving left: p(1,7) = 0; c=d(2,0) = 2
5 Moving left: p(2,5) = 9; c=d(2,9) = 6
6 Moving left: p(3,3) = 6; c=d(6,6) = 0
7 We have ended on c = 0
8 We look up 0 in the inverse table, inv, and get 0
9 The number 3572 is therefore valid ✓
What is the point of step 8? The only 0 in inv maps to 0.
Surely we could just ignore the inverse table for validation?
We can, but we won’t – and the generation method tells us why…

Navigating the Verhoeff Tables: Generation
1 Starting with, e.g., ‘3570’ and c = 0, we go L R
🠔 from position 0 (the check digit)
2 The permutation table, p, gives us p(0,0) (0 is position modulo 8, 0 is the digit) = 0
3 The 0 goes into the multiplicative table, d, with the current c, giving c = d(0,0) = 0
4 Moving left: p(1,7) = 0; c=d(0,0) = 0
5 Moving left: p(2,5) = 9; c=d(0,9) = 9
6 Moving left: p(3,3) = 6; c=d(9,6) = 3
7 We have ended on c = 3
8 We look up 3 in the inverse table, inv, and get 2
9 The number 357 therefore needs the check digit 2 to become 3572 as previous slide
So, to generate a check digit we suffix a zero1
to our input and
use the same method – and the result is our new check digit.
1
This is equivalent to multiplying the equivalent integer by 10.

Verhoeff in Smalltalk (1/3) - lookups
Recalling Arrays start at 1, we have the following lookup methods:
privateInvTableAt: j
^ #[ 0 4 3 2 1 5 6 7 8 9 ] at: j+1
privatePtableRowAt: r column: c
^ ( #( #[ 0 1 2 3 4 5 6 7 8 9 ]
#[ 1 5 7 6 2 8 3 0 9 4 ]
#[ 5 8 0 3 7 9 6 1 4 2 ]
#[ 8 9 1 6 0 4 3 5 2 7 ]
#[ 9 4 5 3 1 2 6 8 7 0 ]
#[ 4 2 8 6 5 7 3 9 0 1 ]
#[ 2 7 9 3 8 0 6 4 1 5 ]
#[ 7 0 4 6 9 1 3 2 5 8 ] )
at: r+1)
at: c+1
privateDtableRowAt: r column: c
"as above but with d(j,k) table”
allowedCharacters
^Array new
,'0123456789' "Western Arabic”
,'٠١٢٣٤٥٦٧٨٩' "Eastern Arabic”
,'०१२३४५६७८९' "Devanagari"
,'০১২৩৪৫৬৭৮৯' "Bengali"
,'௦௧௨௩௪௫௬௭௮௯' "Tamil"
,'๐๑๒๓๔๕๖๗๘๙' "Thai"
,'၀၁၂၃၄၅၆၇၈၉' "Myanmar"
,'෦෧෨෩෪෫෬෭෮෯'"Sinhala"
allowedSeparators
^Array new
with: $-
with: Character space

Verhoeff in Smalltalk (2/4) - API
Private method,
privateVerhoeff: digits
"NB: private – no error checking
argument is any SequenceableCollection
of integers in (0 to:9); returns a
single Integer in that range."
| pos c |
pos := c := 0.
digits reverseDo: [:n | | p |
p := self pTableAtRow: pos col: n.
c := self dTableAtRow: c col: p.
pos := pos + 1 8 ].
^ self invTableAt: c
Private
VerhoeffString: aString
"Takes unpunctuated string of Unicode
with digitValue (0 to:9), returns a
single Integer in that range."
aString isString
ifFalse: [ self error: '...' ].
aString size > 1
ifFalse: [ self error: '...’ ].
aString reject: [ :char |
self allowedCharacters
includes:char]) isEmpty
ifFalse: ['...’ ].
aString first digitValue > 0
ifFalse: [self error: '...' ].
^ self privateVerhoeff:(aString asArray
collect: [ :char | char digitValue ])
Public method,

Verhoeff in Smalltalk (3/3) – in one block
"Verhoeff algorithm; argument is [0-9][0-9]+, answer is integer in 0..9"
| c pos |
pos:= c:= 0. "checksum and position (right to left) start at zero"
3572 asString reversed do: [ :digit |
c := (#( #( 0 1 2 3 4 5 6 7 8 9 ) #( 1 2 3 4 0 6 7 8 9 5 )
#( 2 3 4 0 1 7 8 9 5 6 ) #( 3 4 0 1 2 8 9 5 6 7 )
#( 4 0 1 2 3 9 5 6 7 8 ) #( 5 9 8 7 6 0 4 3 2 1 )
#( 6 5 9 8 7 1 0 4 3 2 ) #( 7 6 5 9 8 2 1 0 4 3 )
#( 8 7 6 5 9 3 2 1 0 4 ) #( 9 8 7 6 5 4 3 2 1 0 ) ) at:1+c)
at:1+((#( #( 0 1 2 3 4 5 6 7 8 9 ) #( 1 5 7 6 2 8 3 0 9 4 )
#( 5 8 0 3 7 9 6 1 4 2 ) #( 8 9 1 6 0 4 3 5 2 7 )
#( 9 4 5 3 1 2 6 8 7 0 ) #( 4 2 8 6 5 7 3 9 0 1 )
#( 2 7 9 3 8 0 6 4 1 5 ) #( 7 0 4 6 9 1 3 2 5 8 ) )
at:1+pos) at: digit digitValue + 1).
pos:= pos + 1 8 ].
#( 0 4 3 2 1 5 6 7 8 9 ) at:1+c
"NB: No error checking – SANITIZE INPUTS! Integers must be >9;
strings (or arrays &c.) MUST comprise only 0..9 (or $0..$9 &c.)"

Verhoeff Weaknesses?
1 Verhoeff only works with decimal codes, but:
a but we already rejected non-decimal codes.
2 The algorithm is complex, but:
a the implementation is just table lookups: 2 per digit and 1 more
b the operations involved, unicode lookup, table lookup &c. are simple
c it’s pretty fast, ~2µs, and comparable to Luhn
3 Unlike most checksum algorithms, it is sensitive to leading zeros, but:
a that’s actually useful for those who do use leading zeros
b hopefully I’ve persuaded you that isn’t going to be us!
4 There may be other weaknesses, so:
a find someone with a better understanding of dihedral groups
b draw statistical inferences from experimentation (later… )

Damm
1 In 2004 Michael Damm published this algorithm in his PhD on
Totally Antisymmetric Quasigroups … Yes, me neither!
2 Again the underlying maths is complex but translates to table lookups,
even easier ones than Verhoeff: just one table to iterate and no inverse.
3 Start with 0, end with 0:
[Wikipedia]

Navigating the Damm Table
1 Starting with, e.g., ‘64353’ and r = 0, we go L R
🠖 (for a change!)
2 column 6 row 0 8
🠖
3 column 4 row 8 6
🠖
4 column 3 row 6 9
🠖
5 column 5 row 9 3
🠖
6 column 3 row 3 🠖 0
7 We have ended on r = 0
8 There’s no inverse or final lookup table, once we hit 0, it’s valid
9 The input ‘64353’ is therefore valid ✓
How do we manage without an inverse table? Because the 10x10 Latin Square we are
using for each iteration has a zero diagonal, when row r = column c, then r,c 0.
🠖
Consequently, the non-zero value returned on failed validation is also the check digit
that should be appended on generation. So, as with Verhoeff, just one method for
both validation and generation.

Damm Smalltalk (1/2)
"Damm algorithm; argument is [0-9][0-9]+, answer is integer in 0..9"
57245724 asString inject: 0 into: [ :row :col |
#[ 0 3 1 7 5 9 8 6 4 2
7 0 9 2 1 5 4 8 6 3
4 2 0 6 8 7 1 3 5 9
1 7 5 0 9 8 3 4 2 6
6 1 2 3 0 4 5 9 7 8
3 6 7 4 2 0 9 5 8 1
5 8 6 9 7 2 0 1 3 4
8 9 4 5 3 6 2 0 1 7
9 4 3 8 6 1 7 2 0 5
2 5 8 1 4 3 6 7 9 0 ] at: 10 * row + (col digitValue) + 1 ]
"NB: No error checking – SANITIZE INPUTS! Integers must be >9;
strings (or arrays &c.) MUST comprise only 0..9 (or $0..$9 &c.)"

Damm Smalltalk (2/2)
"Could use the same digitValue approach to store the table as a String"
'57245724' asString inject: 0 into: [ :row :col |
('03175986427092154863420687135917509834266123045978',
'36742095815869720134894536201794386172052581436790'
at: (10 * row + 1 + (col digitValue))) digitValue]
"Naturally we could store the table as a constant, but this hardly needs optimising."
| reader |
reader := testInput readStream.
[ reader next asString inject: 0 into: [ :row :col |
('03175986427092154863420687135917509834266123045978',
'36742095815869720134894536201794386172052581436790'
at: (10 * row + 1 + (col digitValue))) digitValue] ] bench2.
"'O.956 picoFortnights (1.156µs)'"
"It’s clear the Damm algorithm is very simple to code, and could
easily be done client-side (e.g. in Javascript) where required."

Damm Weaknesses?
1 Damm, like Verhoef, only works with decimal codes, but:
a but we already rejected non-decimal codes
2 Like Verhoeff, the algorithm is complex, but:
a the implementation is just table lookups: 1 per digit.
b the operations involved, unicode lookup, table lookup &c. are simple
c it’s pretty fast, ~2µs, and comparable to Luhn and Verhoeff
3 Like most checksum algorithms (but unlike Verhoeff) the algorithm
is insensitive to leading zeros, but:
a who wants to use leading zeros anyway?
4 There may be other weaknesses, so:
a how are you with totally antisymmetric quasigroups?
b draw statistical inferences from experimentation (later… )

Zero Redux: Un œuf is un œuf? (1/3)
1 Unlike the Verhoeff algorithm, the position of digits
(their index) is not used in the calculation
2 To calculate a check digit with Verhoeff, you process the
number with a 0 appended (i.e. the number × 10) and
the result is the check digit of the original number
3 But in Damm, the check digit is simply obtained by
applying the algorithm to the number itself
4 So in both Damm and Verhoeff a valid number returns 0
a But in Damm that is also the next check digit
b so Damm is insensitive to trailing zeros

Zero Redux: More than un œuf? (2/3)
1 For example, the check digit of ‘1’ is 3, giving 13, the
smallest “Damm number.” Running Damm on 13 yields
0, which, as with Verhoeff, indicates a correct checksum
2 But therefore 130 also has a correct Damm checksum,
as do 1300 and 13000 &c.
3 One possible defence is to simply ban trailing zero
a Using a dash to highlight the check digit, Part 14 would become
Part 14-9 but Part 21 would become part 21-0.
b This is not ideal if Part 2 has already become Part 2-1; so we
would just avoid having a payload (the “unchecked” raw
number) of 21.

Zero Redux: You had un œuf yet? (3/3)
1 … but …
2 Because the “interim digit” (r) in the Damm algorithm
starts with a value of 0, any Damm number followed by
another will also be a Damm number
3 So 13, 1313, 13013, 130013, 1300013 all checksum
4 So, Damm is “zero blind” anywhere before or after a
Damm number, including between them:
5 So in, e.g., ...13…515…37…6876… each ‘…’ could be
any number of zeros (including zero of them) without
the checksum failing.

Defense (1/2)
1 Go back to fixed width. But:
a I already chose this as a hill to die on
b It makes commonly used part numbers long
c It requires otherwise meaningful numbers to be padded
d It makes meaningful number changes hard to manage:
e.g. part nnn-14- can be replaced with part nnn-14-2-
2 Never mind leading or trailing, ban the zero! But:
a reduces the pool of part numbers
b may also impair choosing meaningful part numbers
– The strategy would ban numbers such as 2400 for no reason.
3 Live with it: no check-digit algorithm detects all errors
Defence against the Damm Zero

tl:dr;
1 Already have a check digit? Use that.
2 Already using EAN/GTIN article numbers? Use those.
3 Need alphanumerics, other symbols, or character sets
larger (or smaller) than decimal digits? Use Luhn.
4 Do you need maximum error resistance? Use Verhoeff.
5 Do you want effective but really simple? Use Damm
In a further presentation I will provide some statistical
analysis, some potential refinements, and my final
recommendations. “I see you shiver with anticip …”

Part Numbering and ID codes: general considerations and check digits

More Related Content

Similar to Part Numbering and ID codes: general considerations and check digits (20)

Recently uploaded (20)

Part Numbering and ID codes: general considerations and check digits