SlideShare a Scribd company logo
Part Numbers, Codes, IDs &c.
Let’s talk about how we refer to our parts
– and how it’s such an appropriate subject for
Smalltalk
john@ItsNotExactlyRocket.Science 2/50
I’m John H Woods, chronically online since 5BW (before Web).
So, first profile picture predates digital cameras: stuck my head
on a flatbed scanner. Fortunately technology has moved on!
1 First entered IT as a job after learning
Smalltalk during Biosciences PhD 1998
2 Have consulted widely over many
domains and much tech. since then.
3 (But, I have still never met language
I didn’t think was much, much worse
than Smalltalk.)
I’m johnhwoods on Discord, Threads, The Register, gmail, hotmail, wikipedia and elsewhere
1 but if to contact me specifically about this, click here: john@itsnotexactlyrocket.science
HELLO!
john@ItsNotExactlyRocket.Science 3/50
What makes a good part identification code?
john@ItsNotExactlyRocket.Science 4/50
What makes a good part identification code?
1 Unique in our organization
john@ItsNotExactlyRocket.Science 5/50
What makes a good part identification code?
1 Unique in our organization
2 Unambiguous for us, our partners, clients and suppliers
john@ItsNotExactlyRocket.Science 6/50
What makes a good part identification code?
1 Unique in our organization
2 Unambiguous for us, our partners, clients and suppliers
3 Simple, consistent logical structure
john@ItsNotExactlyRocket.Science 7/50
What makes a good part identification code?
1 Unique in our organization
2 Unambiguous for us, our partners, clients and suppliers
3 Simple, consistent logical structure
So far, so uncontroversial …
john@ItsNotExactlyRocket.Science 8/50
What makes a good part identification code?
1 Unique in our organization
2 Unambiguous for us, our partners, clients and suppliers
3 Simple, consistent logical structure
a Decimal only; I will die on this hill
b Variable width (different hill, but same)
c No leading zeros! And perhaps more restrictions …
d A single, transparent, punctuation mark to structure the digits
john@ItsNotExactlyRocket.Science 9/50
What makes a good part identification code?
1 Unique in our organization
2 Unambiguous for us, our partners, clients and suppliers
3 Simple, consistent logical structure
a Decimal only; I will die on this hill
b Variable width (different hill, but same)
c No leading zeros! And perhaps more restrictions …
d A single, transparent, punctuation mark to structure the digits
4 ABOVE ALL: Use a check digit! See you on Hill 3!
john@ItsNotExactlyRocket.Science 10/50
Common and good advice rejected
1 Grouping codes by compatibility categories
a embeds assembly details: a future-proofing nightmare
2 Using constant width fields
a contradicts meaningful structure (more later …)
b impacts scalability and future-proofing: exhaust a short
number range – or end up with inconveniently long ones
c commonly used part codes as lengthy as rare ones
3 Using meaningful alphanumeric codes
a embeds too much info, increases length and complexity
b ambiguity and locale issues with letters and acronyms
c impacts future-proofing as parts are replaced or retired
john@ItsNotExactlyRocket.Science 11/50
Hill 1: Only Use Decimal
1 Better internationalization (‘i18n’)
a understood even in CJK and RTL languages
b avoids confusion regarding meaning of letters and letter sequences
2 Reduction of errors
a Less ambiguity when written (O-D-0-o, 8-B, I-l-1, S-5, G-6-b, 2-Z-7, 9-q)
b Less ambiguity in spoken English (“oh” O-o-0, 8-A-H, S-F, M-N &c.)
c Easy character set (or … is it Hex, Base 36, 64, or other? CaSe SeNsItIvE?)
3 Better compatibility with other data systems
a Nearly everything accepts integers (31 bits gives over 2 billion numbers)
b Less comparison ambiguity (all part numbers are different unique integers)
4 Only numeric keys required: speed of entry / simplicity of devices
5 Improved choice of check-digit algorithms (more later …)
john@ItsNotExactlyRocket.Science 12/50
Surely you can’t be serious?
1 I am serious and don’t call me Shirley.
2 But decimal numbers are less compact than alphanumeric?
a Not much: 20% longer than Hex, 50% longer than base 36
b Just 10 digits is already literally billions of part numbers
3 But they’re harder to remember / don’t have meaning!
a That’s why we have computer systems… and a description column!
b People using them frequently will remember shorter codes, people
who don’t won’t need to, and will perform simple lookups as required
c Grouping can give numbers meaning. A woodworker, realizing all
planed timber is nnn-11-, will know what a nnn-11-2400-47-100 is;
a mechanic will be able to guess what a nnn-19-225-45 might be; and
a librarian is unlikely to have to look up 903.
john@ItsNotExactlyRocket.Science 13/50
Hill 2: Variable width
1 Allows for meaningful structure and therefore changes
a Version 2 of a nnn-101-234 can be nnn-101-234-2
b This is especially powerful with optional punctuation (next slide…)
2 Allows for shorter codes for commonly used parts
a With a check digit (see later…) there are ten two digit, a hundred
three digit, and a thousand four digit codes: 1110 short numbers
b Avoids short codes like 31 having meaningless leading zeros
(e.g. 00000031) or other prefixes (e.g. ‘30000031’)
3 The advantages of fixed width are now limited
a Sort order is unimportant – and antithetical to structure
b Error checking by check digit far superior to digit counting
c Output formatting is less burdensome on modern systems
john@ItsNotExactlyRocket.Science 14/50
thebenefitsofseparationwithpunctuation
1 Grouping with some form of punctuation
a improves readability with more than 5 or 6 digits (e.g. ‘the long card number’)
b can improve familiarization, allowing memorization or even meaning
c can allow the use of numbers that are already recognized, like tyre sizes
2 However
a spaces can wrap and other data systems can do weird things with them
b almost every other symbol has a meaning already, including commas, periods
and both slashes, and the ‘vertical bar’ | can look too much like 1, l or I.
c _ isn’t on a numeric keypad, generally needs <SHIFT>, and has a long name
3 The hyphen (-) is a good compromise (usually on the top corner of
a numeric keypad) and has a nice short name (‘dash’) in English.
NB Punctuation must be transparent so ‘101-234-2’ is 1012342
Integers transfer unambiguously between systems
john@ItsNotExactlyRocket.Science 15/50
Zero: The trouble with l'œuf
1 Leading zeros
a often get mangled by other systems
b destroy the valuable 1:1 relationship to integers
c are not required when we’ve already abandoned constant width
d are invisible to many check digit algorithms (usually by design)
2 And, all zeros
a can cause some Damm issues (this algorithm discussed later …)
b can appear dotted/slashed, 0, sometimes confused with 8
3 But, banning all zeros reduces
a the number of items representable with a given number of digits
b ability to use numbers with real meanings, such as 2400 for 2.4m
john@ItsNotExactlyRocket.Science 16/50
Hill 3: Check your digits – with Check Digits!
1 When humans enter or communicate data, they make errors:
a Letter / digit errors: 0 ⬌ O or I ⬌ 1 (N/A in decimal only codes)
b Single digit errors: 8 ⬌ 9 (and more keys are adjacent on a numpad)
c Simple transposition error: 98 ⬌ 89
d Twin error: 88 ⬌ 99
e Jump transposition error: 317 ⬌ 713
f Jump twin error: 313 ⬌ 414
g Phonetic: 60 ⬌ 16 (“sixty” ⬌ “sixteen”)
2 Simpler algorithms: easier and faster, but detect fewer errors
3 Let’s see how to get the maximum error detection when using
a single digit checksum …
john@ItsNotExactlyRocket.Science 17/50
Modulo 10 Sums
1 We could just sum all the digits modulo 10:
12345 = 1+2+3+4+5 = 15; divide by 10 is 1 remainder 5
Our new number would be 123455 (or 12345-5 etc.)
We can detect all single digit errors:
12445-5 = 1+2+4+4+5 = 16, modulo 10 is 6
but check digit is 5 and therefore so we know the number is invalid
This approach similarly detects insertions and deletions
– but not (most) transpositions, as the sum of the digits is unchanged
2 Smalltalk
('12345'
inject: 0 "start value; sum is accumulated"
into: [ :sum :char | sum + char digitValue ])  10
john@ItsNotExactlyRocket.Science 18/50
Weighted Sum, Modulo 11
1 To detect transpositions, we ‘weight’ the digits:
The ISBN-10 of the Blue Book is 0-201-11371-6
ISBN-10 uses weights of the digit position:
0×1 + 2×2 + 0×3 + 1×4 + 1×5 + 1×6 + 3×7 + 7×8 +1×9 + 6×10 = 165
165 modulo 11 is 0 and that means that the number is valid.
The rightmost digit is a check digit whose value is adjusted to
ensure the weighted sum of the digits is an exact multiple of 11
2 UK NHS Numbers use the same technique.
3 Note that the last digit might need a value of 10! So:
a In the NHS, the next available number is used
b In ISBN-10, the last digit is an X, e.g. 0-14-118260-X
(Who loves a Roman numerals joke? I, for one!)
john@ItsNotExactlyRocket.Science 19/50
Weighted Sum, Modulo 11 in Smalltalk (1/2)
"WARNING: There is no error checking in any of these snippets.
Initial checks must be done by Regex or manual format checking."
| weight |
weight:= 0. "left to right, weights are 1,2,3, ..."
('6088489006' inject: 0
into: [ :checksum :char |
(weight:= weight+1) * char digitValue + checksum ])  11
"or, with an additional method, the weight is the index (In
Smalltalk, arrays, strings &c. start at 1 – as nature intended!)"
('608848900' withIndexInject: 0
into: [ :checksum :char :idx |
idx * char digitValue + checksum ])  11
"The former returns 0 indicating validity. The latter, with a 9
character input, returns 6, which is the value of the check digit
needed to make the number valid, as in the former case.
john@ItsNotExactlyRocket.Science 20/50
"NB: the checksum can be 10, e.g. some editions of Strunk & White..."
('020530902' withIndexInject: 0
into: [ :checksum :char :idx |
idx * char digitValue + checksum ])  11
"Returns 10, which is why the original ISBN was 0-205-30902-X.
The UK NHS addresses this problem by simply not using any numbers
which would require a check “digit” character outside the range 0-9"
"But, for ISBN-10, we need to handle the occasional terminal X"
| isbn |
isbn:= '020530902X'.
(isbn allButLast withIndexInject: 0
into: [ :checksum :char :idx |
idx * char digitValue + checksum ])  11
= (isbn last = $X ifTrue: [ 10 ] ifFalse: [ isbn last digitValue ])
"Generating ISBN-10s is left as an exercise for the reader"
Weighted Sum, Modulo 11 in Smalltalk (2/2)
john@ItsNotExactlyRocket.Science 21/50
ISBN-13 is EAN-13
1 ISBN numbers migrated from 10 to 13 digits in 2007,
for compatibility with EAN (European Article Number),
now internationally known as GTIN (Global Trade Item
Number).
a The symbology includes only digits 0-9
therefore a modulo 10 algorithm is used.
b Counting from right to left, odd positions
are weighted 1, even positions weighted 3.
2 ISBN-10s are converted by prefixing 978 to the payload
and appending the new EAN-13 check digit.
3+2×3+0+9×3+0+3×3+5+0×3+2+0×3+8+7×3+9 = 90 ✓
john@ItsNotExactlyRocket.Science 22/50
EAN/GTIN in Smalltalk (1/2)
"reversed makes no difference in the GTIN-13 case as it has an odd number of
digits, but allows GTIN-8 and GTIN-18 with no need to check input parity"
"Note the index flip when moving from validation to generation"
| lastWeight |
lastWeight:= 3. "right to left, weights are 1,3,1,3 ..."
('1234512745657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum ])  10
"Returns 0, which indicates the number is correct"
"To generate a check digit, run the algorithm on the `payload`
of the first 12 digits and subtract the result from 10"
| lastWeight |
lastWeight:= 1. "right to left, weights are 3,1,3,1 ..."
(10 - ('123451274565' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum ])  10)  10
"Returns 7 as expected from the above. If this returns 10,
the check digit is 0, hence the additional  10 at the end"
"NB: without the check digit, our first digit is even, not odd."
john@ItsNotExactlyRocket.Science 23/50
EAN/GTIN in Smalltalk (2/2)
"EAN transposition check shows its most important weakness"
| lastWeight |
lastWeight:= 3.
'1235412745657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum  10 ]
"Returns 2, which proves the number is incorrect"
1 "But, if the transposed digits differ by 5, the error goes undetected:
x + 3(x+5) = 4x+15 and 3x + (x+5) = 4x+5, which are congruent modulo 10"
| lastWeight |
lastWeight:= 3.
'1234517245657' reversed inject: 0 into: [ :sum :each |
each digitValue * (lastWeight:= 4-lastWeight) + sum  10 ]
"Returns 0 indicating validity. However, such transpositions are less
frequent: digits different by 5 are not adjacent on either the number row or
standard numeric keypads (whether calculator style or telephone style)"
john@ItsNotExactlyRocket.Science 24/50
Luhn
1 Patented in 1960 but now in the public domain, Luhn is
used for: credit card numbers; IMEIs (mobile handsets);
European Patent Application numbers; ICCIDs (SIM cards);
several government IDs from round the world; &c.
2 Another “sum digits with alternating weights modulo 10”
method; the weights, again starting from the right,
alternate 1 and 2, rather than EAN/GTIN’s 1 and 3.
3 However, where doubling a digit gives a two digit result,
we use the sum of the digits of the answer.
4 So, when in even digit positions, 0,1,2,3,4 yield 0,2,4,6,8
– but 5,6,7,8,9 yield 1,3,5,7,9 – not 10,12,14,16,18.
john@ItsNotExactlyRocket.Science 25/50
Luhn Example
1 To validate: 8905543677
2 Going right to left every other digit we get
7+14+6+6+4+10+5+0+9+16
3 Replacing all two digit results with their digit sum yields
7+5+6+6+4+1+5+0+9+7 = 50 mod 10 = 0.
4 As with GTIN, if the sum is a multiple of 10, the number is
valid; and, as with GTIN, this is effectively the same operation as
dropping the check digit, calculating a check digit from the
payload, and comparing it to the existing check digit.
5 As with GTIN, to generate a check digit, we must flip the weights
as the now missing check digit was digit 1, and we start by
doubling the first digit as we go right to left.
john@ItsNotExactlyRocket.Science 26/50
Luhn Smalltalk (1/5) – digits and characters
"How to sum digits of a two digit number?"
14 printString fold: [:a :b | a digitValue + b digitValue]
"Or just add the tens and the units"
(0 to: 9) collect: [:n | | d |
d:= n*2.
d // 10 + d 10 ]
"#(0 2 4 6 8 1 3 5 7 9)"
"Or, once we have that result, just use it as a lookup table"
(0 to: 9) collect: [:n | #(0 2 4 6 8 1 3 5 7 9) at: 1+n"
"Also, by using digitValue the argument can be any Unicode characters that
represent the decimal digits, e.g. Sinhala"
'෦෧෨෩෪෫෬෭෮෯' asArray collect: [ :char | char digitValue ]
"#(0 1 2 3 4 5 6 7 8 9)"
NB: We did all the input sanitization before, right? Right?"
john@ItsNotExactlyRocket.Science 27/50
Luhn Smalltalk (2/5) – using a lookup table
"This uses alternating offsets of 11 and 1 with a lookup table. The validation
code, highlighted, is only four lines. This Snippet will return any invalid
numbers in the array. All the inputs in the example below are valid."
| luhn |
luhn:= #[ 0 1 2 3 4 5 6 7 8 9
0 2 4 6 8 1 3 5 7 9 ].
#(18 #[3 7 2] '٥٦٤٨٧٢' "formats: integer, byte array, unicode string"
8905543677 "my example from earlier slide"
17893729974 "the example from Wikipedia"
1234567812345670 "the example from Rosetta Code"
'3566002020360505' "an example Credit Card number, spaces removed"
339269927028426 "an example IMEI number"
) reject: [ :input | | index |
index:=11.
(input asString reversed inject:0 into:[:sum :digit || codePoint |
codePoint:= digit asInteger & 15.
sum + (luhn at:(index:= 12-index)+codePoint) ])10=0]
john@ItsNotExactlyRocket.Science 28/50
Luhn Smalltalk (3/5) – tables beyond 0 to 9
"Unlike the other single digit check digit algorithms here, Luhn can be
generalized to larger (or smaller) character sets with an even number of
symbols. Lookup table generation translates to Smalltalk as follows:"
| charset base luhn |
charset:= '0123456789ABCDEFGHJK'. "works with any symbols, in any order"
base:= charset size.
(base2) isZero
ifFalse: [ self error:'Luhn character set must have even size' ].
luhn:= OrderedCollection new.
luhn addAll: (0 to:base-1).
0 to: base-1 do: [:digit || double |
double:= 2 * digit.
luhn add: double//base + doublebase ].
luhn asByteArray
"#[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 2 4 6 8 10 12 14 16 18 1 3 5 7 9 11 13 15 17 19 ]"
e.g.: 14 ×2 = 28
which is 18 in base 20
so digit sum is 9
Note: a nice
easy pattern
is emerging
john@ItsNotExactlyRocket.Science 29/50
Luhn Smalltalk (4/5) – validation modulo N
"Note table construction and validation code. Snippet takes an array and
returns an array of those elements which are invalid. Lookup table could be
cached but typically construction is 2-5µs, with each check 1-2µs "
| base charset luhn |
base:= (charset := 'abcdef') size.
luhn:= ((0 to:base-1),(0 to:base-2 by:2),(1 to:base by:2)) asByteArray.
#( 'abcdefe' 'abcb' 'defc' )
reject:[:input || index |
index:= 1+base.
(input asString reversed
inject: 0
into: [:sum :digit || codePoint |
codePoint:= (charset indexOf: digit)-1.
sum + (luhn at: (index:= 2+base-index) + codePoint) ])base=0]
"NB: In design, ensure the charset size is even, with no repetition"
"NB: In execution, ensure that each argument, after removing allowed
punctuation, comprises only characters from the character set"
john@ItsNotExactlyRocket.Science 30/50
Luhn Smalltalk (5/5) – generation modulo N
"Because (m-(x mod m) mod m) simplifies to (-x) mod m, the generation
method (again note that the index is flipped) is as follows ..."
| base charset luhn |
base:= (charset := ' ✓ ') size.
😎☀ ⚙💩⚡
luhn:= ((0 to:base-1),(0 to:base-2 by:2),(1 to:base by:2)) asByteArray.
#( ' ✓ ' ' ✓' ' ' )
😎☀ ⚙💩⚡ 😎☀ ⚙💩⚡
collect:[:input || index |
index:= 1.
input asString, (charset at: 1 + ((input asString reversed
inject: 0
into: [:sum :digit || codePoint |
codePoint:= (charset indexOf: digit)-1.
sum + (luhn at: (index:= 2+base-index) + codePoint) ])
negated  base)) asString ]
"#(' ✓ ' ' ✓ ' ' ✓')"
😎☀ ⚙💩⚡💩 😎☀ ☀ ⚙💩⚡
"Check digits Pile of Poo Sun With Rays and Check Mark are appended"
very little code
is required to
generate
Luhn mod N
check digits
john@ItsNotExactlyRocket.Science 31/50
Luhn Weaknesses
1 Twin errors:
22 2 + 4
🠖 6
🠖 🠔 5 + 1 5 + 10
🠔 🠔 55
33 3 + 6
🠖 9
🠖 🠔 6 + 3 6 + 12
🠔 🠔 66
44 4 + 8
🠖 12
🠖 🠔 7 + 5 7 + 14
🠔 🠔 77
Unfortunately all three of these errors comprise adjacent keys
on a numeric keypad; less likely to be typed on a number row
2 Transposition error:
90 9 + 0
🠖 9
🠖 🠔 0 + 9 0 + 18
🠔 🠔 09
Not only adjacent on number row, but at the extreme right,
both conventionally operated by digitus minimus dexter 1
1
right pinky finger
john@ItsNotExactlyRocket.Science 32/50
Luhn Conclusions
1 Works with any symbol set
a there must be an even number of symbols
– no problem for octal, decimal, hexadecimal and the Latin alphabet
b but symbol codes do not need to be contiguous or in any particular order
2 Allows codes as short as 2 symbols, and up to arbitrary length.
3 Easy to implement, especially for the standard case of digits 0..9
4 Fast: typically lookup table construction 2-5µs; then validation < 1µs
5 Widely used, well-understood and still extremely useful
6 Has some unfortunate weaknesses
a 60% of twin errors undetected, and easy to key on a numpad
b cannot detect transpositions of the end digits in its character set
– and this is particularly relevant with the 0..9 case
john@ItsNotExactlyRocket.Science 33/50
Half Time!
1 Smalltalk is great! Any dissent to be handled later – in the car park 😛
2 Some codes have double digit checksums (checksa?)
a e.g. International banking codes: IBAN uses a modulo 97 algorithm
b possibly overkill for part numbering, we’ll stick to single digits
3 Despite the appeal of Luhn Mod N, decimal codes remain optimal
a but weaknesses of Luhn 0..9 make it subobtimal for part numbering
b and countering the weakness by manually preventing the creation of
numbers with 09 or 90, or by using the digits out of order, e.g.
1234567890, would result in non-standard implementations and
possibly maintainability issues
4 Advantages over later algorithms turn out to be underwhelming:
a It’s not that much faster
b It’s not that much simpler
john@ItsNotExactlyRocket.Science 34/50
Verhoeff
1 In 1969 Jacobus Verhoeff presented the first decimal check digit algorithm
which detects all adjacent transposition errors and all single digit errors,
despite this having been thought (even mistakenly ‘proved’) impossible.
2 From some eye-watering maths, he created three tables: p, d, and inv.
[Wikipedia]
john@ItsNotExactlyRocket.Science 35/50
Navigating the Verhoeff Tables: Validation
1 Starting with, e.g., ‘3572’ and c = 0, we go L R
🠔 from position 0 (the check digit)
2 The permutation table, p, gives us p(0,2) (0 is position modulo 8, 2 is the digit) = 2
3 The 2 goes into the multiplicative table, d, with the current c, giving c = d(0,2) = 2
4 Moving left: p(1,7) = 0; c=d(2,0) = 2
5 Moving left: p(2,5) = 9; c=d(2,9) = 6
6 Moving left: p(3,3) = 6; c=d(6,6) = 0
7 We have ended on c = 0
8 We look up 0 in the inverse table, inv, and get 0
9 The number 3572 is therefore valid ✓
What is the point of step 8? The only 0 in inv maps to 0.
Surely we could just ignore the inverse table for validation?
We can, but we won’t – and the generation method tells us why…
john@ItsNotExactlyRocket.Science 36/50
Navigating the Verhoeff Tables: Generation
1 Starting with, e.g., ‘3570’ and c = 0, we go L R
🠔 from position 0 (the check digit)
2 The permutation table, p, gives us p(0,0) (0 is position modulo 8, 0 is the digit) = 0
3 The 0 goes into the multiplicative table, d, with the current c, giving c = d(0,0) = 0
4 Moving left: p(1,7) = 0; c=d(0,0) = 0
5 Moving left: p(2,5) = 9; c=d(0,9) = 9
6 Moving left: p(3,3) = 6; c=d(9,6) = 3
7 We have ended on c = 3
8 We look up 3 in the inverse table, inv, and get 2
9 The number 357 therefore needs the check digit 2 to become 3572 as previous slide
So, to generate a check digit we suffix a zero1
to our input and
use the same method – and the result is our new check digit.
1
This is equivalent to multiplying the equivalent integer by 10.
john@ItsNotExactlyRocket.Science 37/50
Verhoeff in Smalltalk (1/3) - lookups
Recalling Arrays start at 1, we have the following lookup methods:
privateInvTableAt: j
^ #[ 0 4 3 2 1 5 6 7 8 9 ] at: j+1
privatePtableRowAt: r column: c
^ ( #( #[ 0 1 2 3 4 5 6 7 8 9 ]
#[ 1 5 7 6 2 8 3 0 9 4 ]
#[ 5 8 0 3 7 9 6 1 4 2 ]
#[ 8 9 1 6 0 4 3 5 2 7 ]
#[ 9 4 5 3 1 2 6 8 7 0 ]
#[ 4 2 8 6 5 7 3 9 0 1 ]
#[ 2 7 9 3 8 0 6 4 1 5 ]
#[ 7 0 4 6 9 1 3 2 5 8 ] )
at: r+1)
at: c+1
privateDtableRowAt: r column: c
"as above but with d(j,k) table”
allowedCharacters
^Array new
,'0123456789' "Western Arabic”
,'٠١٢٣٤٥٦٧٨٩' "Eastern Arabic”
,'०१२३४५६७८९' "Devanagari"
,'০১২৩৪৫৬৭৮৯' "Bengali"
,'௦௧௨௩௪௫௬௭௮௯' "Tamil"
,'๐๑๒๓๔๕๖๗๘๙' "Thai"
,'၀၁၂၃၄၅၆၇၈၉' "Myanmar"
,'෦෧෨෩෪෫෬෭෮෯'"Sinhala"
allowedSeparators
^Array new
with: $-
with: Character space
john@ItsNotExactlyRocket.Science 38/50
Verhoeff in Smalltalk (2/4) - API
Private method,
privateVerhoeff: digits
"NB: private – no error checking
argument is any SequenceableCollection
of integers in (0 to:9); returns a
single Integer in that range."
| pos c |
pos := c := 0.
digits reverseDo: [:n | | p |
p := self pTableAtRow: pos col: n.
c := self dTableAtRow: c col: p.
pos := pos + 1  8 ].
^ self invTableAt: c
Private
VerhoeffString: aString
"Takes unpunctuated string of Unicode
with digitValue (0 to:9), returns a
single Integer in that range."
aString isString
ifFalse: [ self error: '...' ].
aString size > 1
ifFalse: [ self error: '...’ ].
aString reject: [ :char |
self allowedCharacters
includes:char]) isEmpty
ifFalse: ['...’ ].
aString first digitValue > 0
ifFalse: [self error: '...' ].
^ self privateVerhoeff:(aString asArray
collect: [ :char | char digitValue ])
Public method,
john@ItsNotExactlyRocket.Science 39/50
Verhoeff in Smalltalk (3/3) – in one block
"Verhoeff algorithm; argument is [0-9][0-9]+, answer is integer in 0..9"
| c pos |
pos:= c:= 0. "checksum and position (right to left) start at zero"
3572 asString reversed do: [ :digit |
c := (#( #( 0 1 2 3 4 5 6 7 8 9 ) #( 1 2 3 4 0 6 7 8 9 5 )
#( 2 3 4 0 1 7 8 9 5 6 ) #( 3 4 0 1 2 8 9 5 6 7 )
#( 4 0 1 2 3 9 5 6 7 8 ) #( 5 9 8 7 6 0 4 3 2 1 )
#( 6 5 9 8 7 1 0 4 3 2 ) #( 7 6 5 9 8 2 1 0 4 3 )
#( 8 7 6 5 9 3 2 1 0 4 ) #( 9 8 7 6 5 4 3 2 1 0 ) ) at:1+c)
at:1+((#( #( 0 1 2 3 4 5 6 7 8 9 ) #( 1 5 7 6 2 8 3 0 9 4 )
#( 5 8 0 3 7 9 6 1 4 2 ) #( 8 9 1 6 0 4 3 5 2 7 )
#( 9 4 5 3 1 2 6 8 7 0 ) #( 4 2 8 6 5 7 3 9 0 1 )
#( 2 7 9 3 8 0 6 4 1 5 ) #( 7 0 4 6 9 1 3 2 5 8 ) )
at:1+pos) at: digit digitValue + 1).
pos:= pos + 1  8 ].
#( 0 4 3 2 1 5 6 7 8 9 ) at:1+c
"NB: No error checking – SANITIZE INPUTS! Integers must be >9;
strings (or arrays &c.) MUST comprise only 0..9 (or $0..$9 &c.)"
john@ItsNotExactlyRocket.Science 40/50
Verhoeff Weaknesses?
1 Verhoeff only works with decimal codes, but:
a but we already rejected non-decimal codes.
2 The algorithm is complex, but:
a the implementation is just table lookups: 2 per digit and 1 more
b the operations involved, unicode lookup, table lookup &c. are simple
c it’s pretty fast, ~2µs, and comparable to Luhn
3 Unlike most checksum algorithms, it is sensitive to leading zeros, but:
a that’s actually useful for those who do use leading zeros
b hopefully I’ve persuaded you that isn’t going to be us!
4 There may be other weaknesses, so:
a find someone with a better understanding of dihedral groups
b draw statistical inferences from experimentation (later… )
john@ItsNotExactlyRocket.Science 41/50
Damm
1 In 2004 Michael Damm published this algorithm in his PhD on
Totally Antisymmetric Quasigroups … Yes, me neither!
2 Again the underlying maths is complex but translates to table lookups,
even easier ones than Verhoeff: just one table to iterate and no inverse.
3 Start with 0, end with 0:
[Wikipedia]
john@ItsNotExactlyRocket.Science 42/50
Navigating the Damm Table
1 Starting with, e.g., ‘64353’ and r = 0, we go L R
🠖 (for a change!)
2 column 6 row 0 8
🠖
3 column 4 row 8 6
🠖
4 column 3 row 6 9
🠖
5 column 5 row 9 3
🠖
6 column 3 row 3 🠖 0
7 We have ended on r = 0
8 There’s no inverse or final lookup table, once we hit 0, it’s valid
9 The input ‘64353’ is therefore valid ✓
How do we manage without an inverse table? Because the 10x10 Latin Square we are
using for each iteration has a zero diagonal, when row r = column c, then r,c 0.
🠖
Consequently, the non-zero value returned on failed validation is also the check digit
that should be appended on generation. So, as with Verhoeff, just one method for
both validation and generation.
john@ItsNotExactlyRocket.Science 43/50
Damm Smalltalk (1/2)
"Damm algorithm; argument is [0-9][0-9]+, answer is integer in 0..9"
57245724 asString inject: 0 into: [ :row :col |
#[ 0 3 1 7 5 9 8 6 4 2
7 0 9 2 1 5 4 8 6 3
4 2 0 6 8 7 1 3 5 9
1 7 5 0 9 8 3 4 2 6
6 1 2 3 0 4 5 9 7 8
3 6 7 4 2 0 9 5 8 1
5 8 6 9 7 2 0 1 3 4
8 9 4 5 3 6 2 0 1 7
9 4 3 8 6 1 7 2 0 5
2 5 8 1 4 3 6 7 9 0 ] at: 10 * row + (col digitValue) + 1 ]
"NB: No error checking – SANITIZE INPUTS! Integers must be >9;
strings (or arrays &c.) MUST comprise only 0..9 (or $0..$9 &c.)"
john@ItsNotExactlyRocket.Science 44/50
Damm Smalltalk (2/2)
"Could use the same digitValue approach to store the table as a String"
'57245724' asString inject: 0 into: [ :row :col |
('03175986427092154863420687135917509834266123045978',
'36742095815869720134894536201794386172052581436790'
at: (10 * row + 1 + (col digitValue))) digitValue]
"Naturally we could store the table as a constant, but this hardly needs optimising."
| reader |
reader := testInput readStream.
[ reader next asString inject: 0 into: [ :row :col |
('03175986427092154863420687135917509834266123045978',
'36742095815869720134894536201794386172052581436790'
at: (10 * row + 1 + (col digitValue))) digitValue] ] bench2.
"'O.956 picoFortnights (1.156µs)'"
"It’s clear the Damm algorithm is very simple to code, and could
easily be done client-side (e.g. in Javascript) where required."
john@ItsNotExactlyRocket.Science 45/50
Damm Weaknesses?
1 Damm, like Verhoef, only works with decimal codes, but:
a but we already rejected non-decimal codes
2 Like Verhoeff, the algorithm is complex, but:
a the implementation is just table lookups: 1 per digit.
b the operations involved, unicode lookup, table lookup &c. are simple
c it’s pretty fast, ~2µs, and comparable to Luhn and Verhoeff
3 Like most checksum algorithms (but unlike Verhoeff) the algorithm
is insensitive to leading zeros, but:
a who wants to use leading zeros anyway?
4 There may be other weaknesses, so:
a how are you with totally antisymmetric quasigroups?
b draw statistical inferences from experimentation (later… )
john@ItsNotExactlyRocket.Science 46/50
Zero Redux: Un œuf is un œuf? (1/3)
1 Unlike the Verhoeff algorithm, the position of digits
(their index) is not used in the calculation
2 To calculate a check digit with Verhoeff, you process the
number with a 0 appended (i.e. the number × 10) and
the result is the check digit of the original number
3 But in Damm, the check digit is simply obtained by
applying the algorithm to the number itself
4 So in both Damm and Verhoeff a valid number returns 0
a But in Damm that is also the next check digit
b so Damm is insensitive to trailing zeros
john@ItsNotExactlyRocket.Science 47/50
Zero Redux: More than un œuf? (2/3)
1 For example, the check digit of ‘1’ is 3, giving 13, the
smallest “Damm number.” Running Damm on 13 yields
0, which, as with Verhoeff, indicates a correct checksum
2 But therefore 130 also has a correct Damm checksum,
as do 1300 and 13000 &c.
3 One possible defence is to simply ban trailing zero
a Using a dash to highlight the check digit, Part 14 would become
Part 14-9 but Part 21 would become part 21-0.
b This is not ideal if Part 2 has already become Part 2-1; so we
would just avoid having a payload (the “unchecked” raw
number) of 21.
john@ItsNotExactlyRocket.Science 48/50
Zero Redux: You had un œuf yet? (3/3)
1 … but …
2 Because the “interim digit” (r) in the Damm algorithm
starts with a value of 0, any Damm number followed by
another will also be a Damm number
3 So 13, 1313, 13013, 130013, 1300013 all checksum
4 So, Damm is “zero blind” anywhere before or after a
Damm number, including between them:
5 So in, e.g., ...13…515…37…6876… each ‘…’ could be
any number of zeros (including zero of them) without
the checksum failing.
john@ItsNotExactlyRocket.Science 49/50
Defense (1/2)
1 Go back to fixed width. But:
a I already chose this as a hill to die on
b It makes commonly used part numbers long
c It requires otherwise meaningful numbers to be padded
d It makes meaningful number changes hard to manage:
e.g. part nnn-14- can be replaced with part nnn-14-2-
2 Never mind leading or trailing, ban the zero! But:
a reduces the pool of part numbers
b may also impair choosing meaningful part numbers
– The strategy would ban numbers such as 2400 for no reason.
3 Live with it: no check-digit algorithm detects all errors
Defence against the Damm Zero
john@ItsNotExactlyRocket.Science 50/50
tl:dr;
1 Already have a check digit? Use that.
2 Already using EAN/GTIN article numbers? Use those.
3 Need alphanumerics, other symbols, or character sets
larger (or smaller) than decimal digits? Use Luhn.
4 Do you need maximum error resistance? Use Verhoeff.
5 Do you want effective but really simple? Use Damm
In a further presentation I will provide some statistical
analysis, some potential refinements, and my final
recommendations. “I see you shiver with anticip …”

More Related Content

PDF
Unit 1 Class Notes Printed digital Electronics.pdf
PDF
Binary_Codes.pdfhfkhvfkdhvbhfvfdhfgffhfdhyf
PPS
Nossi Ch 1
PPTX
3.codes( binary code ,excess 3, gray code )
PPTX
DIGITAL DESIGN
PPTX
Digital principles basic
PPT
Lecture 01
PDF
digital-180612132737.pdf
Unit 1 Class Notes Printed digital Electronics.pdf
Binary_Codes.pdfhfkhvfkdhvbhfvfdhfgffhfdhyf
Nossi Ch 1
3.codes( binary code ,excess 3, gray code )
DIGITAL DESIGN
Digital principles basic
Lecture 01
digital-180612132737.pdf

Similar to Part Numbering and ID codes: general considerations and check digits (20)

PPT
Fundamentals of Electrical and electronic engineering Binary code,grey code,o...
PPT
cfe2-d-35-binary-code-powerpoint_ver_1.ppt
PDF
Number system
PPTX
Numbers Numbers Everywhere and example of how algorithms exist right under ou...
PDF
Number Systems CPSC125
PPT
Number Systems
PDF
PDF
Lecture 5 binary_codes
PPTX
Binary and EC codes
PPTX
Lecture 7 Data Representation (1).pptx for computer organization and architec...
PPTX
Unit 1 PDF.pptx
PPTX
COMPUTER AWARENESS.pptx
PDF
form-3-computer studies summarized NOTES.pdf
PPTX
1173_237_747_Module_2_Part1__Basics_of_C_programming__1_.pptx
PPTX
Coding Schemes related dld lecture it is dld
PPTX
Presentation of ICT.ppt.pptx
PPTX
Data representation
PPTX
ComEd101 - Chapter 2 Lesson 2.pptx
PDF
Buacm 3
PPT
Binary Conversion
Fundamentals of Electrical and electronic engineering Binary code,grey code,o...
cfe2-d-35-binary-code-powerpoint_ver_1.ppt
Number system
Numbers Numbers Everywhere and example of how algorithms exist right under ou...
Number Systems CPSC125
Number Systems
Lecture 5 binary_codes
Binary and EC codes
Lecture 7 Data Representation (1).pptx for computer organization and architec...
Unit 1 PDF.pptx
COMPUTER AWARENESS.pptx
form-3-computer studies summarized NOTES.pdf
1173_237_747_Module_2_Part1__Basics_of_C_programming__1_.pptx
Coding Schemes related dld lecture it is dld
Presentation of ICT.ppt.pptx
Data representation
ComEd101 - Chapter 2 Lesson 2.pptx
Buacm 3
Binary Conversion
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
A Presentation on Artificial Intelligence
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
A Presentation on Artificial Intelligence
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine Learning_overview_presentation.pptx
Spectroscopy.pptx food analysis technology
Ad

Part Numbering and ID codes: general considerations and check digits

  • 1. Part Numbers, Codes, IDs &c. Let’s talk about how we refer to our parts – and how it’s such an appropriate subject for Smalltalk
  • 2. john@ItsNotExactlyRocket.Science 2/50 I’m John H Woods, chronically online since 5BW (before Web). So, first profile picture predates digital cameras: stuck my head on a flatbed scanner. Fortunately technology has moved on! 1 First entered IT as a job after learning Smalltalk during Biosciences PhD 1998 2 Have consulted widely over many domains and much tech. since then. 3 (But, I have still never met language I didn’t think was much, much worse than Smalltalk.) I’m johnhwoods on Discord, Threads, The Register, gmail, hotmail, wikipedia and elsewhere 1 but if to contact me specifically about this, click here: john@itsnotexactlyrocket.science HELLO!
  • 3. john@ItsNotExactlyRocket.Science 3/50 What makes a good part identification code?
  • 4. john@ItsNotExactlyRocket.Science 4/50 What makes a good part identification code? 1 Unique in our organization
  • 5. john@ItsNotExactlyRocket.Science 5/50 What makes a good part identification code? 1 Unique in our organization 2 Unambiguous for us, our partners, clients and suppliers
  • 6. john@ItsNotExactlyRocket.Science 6/50 What makes a good part identification code? 1 Unique in our organization 2 Unambiguous for us, our partners, clients and suppliers 3 Simple, consistent logical structure
  • 7. john@ItsNotExactlyRocket.Science 7/50 What makes a good part identification code? 1 Unique in our organization 2 Unambiguous for us, our partners, clients and suppliers 3 Simple, consistent logical structure So far, so uncontroversial …
  • 8. john@ItsNotExactlyRocket.Science 8/50 What makes a good part identification code? 1 Unique in our organization 2 Unambiguous for us, our partners, clients and suppliers 3 Simple, consistent logical structure a Decimal only; I will die on this hill b Variable width (different hill, but same) c No leading zeros! And perhaps more restrictions … d A single, transparent, punctuation mark to structure the digits
  • 9. john@ItsNotExactlyRocket.Science 9/50 What makes a good part identification code? 1 Unique in our organization 2 Unambiguous for us, our partners, clients and suppliers 3 Simple, consistent logical structure a Decimal only; I will die on this hill b Variable width (different hill, but same) c No leading zeros! And perhaps more restrictions … d A single, transparent, punctuation mark to structure the digits 4 ABOVE ALL: Use a check digit! See you on Hill 3!
  • 10. john@ItsNotExactlyRocket.Science 10/50 Common and good advice rejected 1 Grouping codes by compatibility categories a embeds assembly details: a future-proofing nightmare 2 Using constant width fields a contradicts meaningful structure (more later …) b impacts scalability and future-proofing: exhaust a short number range – or end up with inconveniently long ones c commonly used part codes as lengthy as rare ones 3 Using meaningful alphanumeric codes a embeds too much info, increases length and complexity b ambiguity and locale issues with letters and acronyms c impacts future-proofing as parts are replaced or retired
  • 11. john@ItsNotExactlyRocket.Science 11/50 Hill 1: Only Use Decimal 1 Better internationalization (‘i18n’) a understood even in CJK and RTL languages b avoids confusion regarding meaning of letters and letter sequences 2 Reduction of errors a Less ambiguity when written (O-D-0-o, 8-B, I-l-1, S-5, G-6-b, 2-Z-7, 9-q) b Less ambiguity in spoken English (“oh” O-o-0, 8-A-H, S-F, M-N &c.) c Easy character set (or … is it Hex, Base 36, 64, or other? CaSe SeNsItIvE?) 3 Better compatibility with other data systems a Nearly everything accepts integers (31 bits gives over 2 billion numbers) b Less comparison ambiguity (all part numbers are different unique integers) 4 Only numeric keys required: speed of entry / simplicity of devices 5 Improved choice of check-digit algorithms (more later …)
  • 12. john@ItsNotExactlyRocket.Science 12/50 Surely you can’t be serious? 1 I am serious and don’t call me Shirley. 2 But decimal numbers are less compact than alphanumeric? a Not much: 20% longer than Hex, 50% longer than base 36 b Just 10 digits is already literally billions of part numbers 3 But they’re harder to remember / don’t have meaning! a That’s why we have computer systems… and a description column! b People using them frequently will remember shorter codes, people who don’t won’t need to, and will perform simple lookups as required c Grouping can give numbers meaning. A woodworker, realizing all planed timber is nnn-11-, will know what a nnn-11-2400-47-100 is; a mechanic will be able to guess what a nnn-19-225-45 might be; and a librarian is unlikely to have to look up 903.
  • 13. john@ItsNotExactlyRocket.Science 13/50 Hill 2: Variable width 1 Allows for meaningful structure and therefore changes a Version 2 of a nnn-101-234 can be nnn-101-234-2 b This is especially powerful with optional punctuation (next slide…) 2 Allows for shorter codes for commonly used parts a With a check digit (see later…) there are ten two digit, a hundred three digit, and a thousand four digit codes: 1110 short numbers b Avoids short codes like 31 having meaningless leading zeros (e.g. 00000031) or other prefixes (e.g. ‘30000031’) 3 The advantages of fixed width are now limited a Sort order is unimportant – and antithetical to structure b Error checking by check digit far superior to digit counting c Output formatting is less burdensome on modern systems
  • 14. john@ItsNotExactlyRocket.Science 14/50 thebenefitsofseparationwithpunctuation 1 Grouping with some form of punctuation a improves readability with more than 5 or 6 digits (e.g. ‘the long card number’) b can improve familiarization, allowing memorization or even meaning c can allow the use of numbers that are already recognized, like tyre sizes 2 However a spaces can wrap and other data systems can do weird things with them b almost every other symbol has a meaning already, including commas, periods and both slashes, and the ‘vertical bar’ | can look too much like 1, l or I. c _ isn’t on a numeric keypad, generally needs <SHIFT>, and has a long name 3 The hyphen (-) is a good compromise (usually on the top corner of a numeric keypad) and has a nice short name (‘dash’) in English. NB Punctuation must be transparent so ‘101-234-2’ is 1012342 Integers transfer unambiguously between systems
  • 15. john@ItsNotExactlyRocket.Science 15/50 Zero: The trouble with l'œuf 1 Leading zeros a often get mangled by other systems b destroy the valuable 1:1 relationship to integers c are not required when we’ve already abandoned constant width d are invisible to many check digit algorithms (usually by design) 2 And, all zeros a can cause some Damm issues (this algorithm discussed later …) b can appear dotted/slashed, 0, sometimes confused with 8 3 But, banning all zeros reduces a the number of items representable with a given number of digits b ability to use numbers with real meanings, such as 2400 for 2.4m
  • 16. john@ItsNotExactlyRocket.Science 16/50 Hill 3: Check your digits – with Check Digits! 1 When humans enter or communicate data, they make errors: a Letter / digit errors: 0 ⬌ O or I ⬌ 1 (N/A in decimal only codes) b Single digit errors: 8 ⬌ 9 (and more keys are adjacent on a numpad) c Simple transposition error: 98 ⬌ 89 d Twin error: 88 ⬌ 99 e Jump transposition error: 317 ⬌ 713 f Jump twin error: 313 ⬌ 414 g Phonetic: 60 ⬌ 16 (“sixty” ⬌ “sixteen”) 2 Simpler algorithms: easier and faster, but detect fewer errors 3 Let’s see how to get the maximum error detection when using a single digit checksum …
  • 17. john@ItsNotExactlyRocket.Science 17/50 Modulo 10 Sums 1 We could just sum all the digits modulo 10: 12345 = 1+2+3+4+5 = 15; divide by 10 is 1 remainder 5 Our new number would be 123455 (or 12345-5 etc.) We can detect all single digit errors: 12445-5 = 1+2+4+4+5 = 16, modulo 10 is 6 but check digit is 5 and therefore so we know the number is invalid This approach similarly detects insertions and deletions – but not (most) transpositions, as the sum of the digits is unchanged 2 Smalltalk ('12345' inject: 0 "start value; sum is accumulated" into: [ :sum :char | sum + char digitValue ]) 10
  • 18. john@ItsNotExactlyRocket.Science 18/50 Weighted Sum, Modulo 11 1 To detect transpositions, we ‘weight’ the digits: The ISBN-10 of the Blue Book is 0-201-11371-6 ISBN-10 uses weights of the digit position: 0×1 + 2×2 + 0×3 + 1×4 + 1×5 + 1×6 + 3×7 + 7×8 +1×9 + 6×10 = 165 165 modulo 11 is 0 and that means that the number is valid. The rightmost digit is a check digit whose value is adjusted to ensure the weighted sum of the digits is an exact multiple of 11 2 UK NHS Numbers use the same technique. 3 Note that the last digit might need a value of 10! So: a In the NHS, the next available number is used b In ISBN-10, the last digit is an X, e.g. 0-14-118260-X (Who loves a Roman numerals joke? I, for one!)
  • 19. john@ItsNotExactlyRocket.Science 19/50 Weighted Sum, Modulo 11 in Smalltalk (1/2) "WARNING: There is no error checking in any of these snippets. Initial checks must be done by Regex or manual format checking." | weight | weight:= 0. "left to right, weights are 1,2,3, ..." ('6088489006' inject: 0 into: [ :checksum :char | (weight:= weight+1) * char digitValue + checksum ]) 11 "or, with an additional method, the weight is the index (In Smalltalk, arrays, strings &c. start at 1 – as nature intended!)" ('608848900' withIndexInject: 0 into: [ :checksum :char :idx | idx * char digitValue + checksum ]) 11 "The former returns 0 indicating validity. The latter, with a 9 character input, returns 6, which is the value of the check digit needed to make the number valid, as in the former case.
  • 20. john@ItsNotExactlyRocket.Science 20/50 "NB: the checksum can be 10, e.g. some editions of Strunk & White..." ('020530902' withIndexInject: 0 into: [ :checksum :char :idx | idx * char digitValue + checksum ]) 11 "Returns 10, which is why the original ISBN was 0-205-30902-X. The UK NHS addresses this problem by simply not using any numbers which would require a check “digit” character outside the range 0-9" "But, for ISBN-10, we need to handle the occasional terminal X" | isbn | isbn:= '020530902X'. (isbn allButLast withIndexInject: 0 into: [ :checksum :char :idx | idx * char digitValue + checksum ]) 11 = (isbn last = $X ifTrue: [ 10 ] ifFalse: [ isbn last digitValue ]) "Generating ISBN-10s is left as an exercise for the reader" Weighted Sum, Modulo 11 in Smalltalk (2/2)
  • 21. john@ItsNotExactlyRocket.Science 21/50 ISBN-13 is EAN-13 1 ISBN numbers migrated from 10 to 13 digits in 2007, for compatibility with EAN (European Article Number), now internationally known as GTIN (Global Trade Item Number). a The symbology includes only digits 0-9 therefore a modulo 10 algorithm is used. b Counting from right to left, odd positions are weighted 1, even positions weighted 3. 2 ISBN-10s are converted by prefixing 978 to the payload and appending the new EAN-13 check digit. 3+2×3+0+9×3+0+3×3+5+0×3+2+0×3+8+7×3+9 = 90 ✓
  • 22. john@ItsNotExactlyRocket.Science 22/50 EAN/GTIN in Smalltalk (1/2) "reversed makes no difference in the GTIN-13 case as it has an odd number of digits, but allows GTIN-8 and GTIN-18 with no need to check input parity" "Note the index flip when moving from validation to generation" | lastWeight | lastWeight:= 3. "right to left, weights are 1,3,1,3 ..." ('1234512745657' reversed inject: 0 into: [ :sum :each | each digitValue * (lastWeight:= 4-lastWeight) + sum ]) 10 "Returns 0, which indicates the number is correct" "To generate a check digit, run the algorithm on the `payload` of the first 12 digits and subtract the result from 10" | lastWeight | lastWeight:= 1. "right to left, weights are 3,1,3,1 ..." (10 - ('123451274565' reversed inject: 0 into: [ :sum :each | each digitValue * (lastWeight:= 4-lastWeight) + sum ]) 10) 10 "Returns 7 as expected from the above. If this returns 10, the check digit is 0, hence the additional 10 at the end" "NB: without the check digit, our first digit is even, not odd."
  • 23. john@ItsNotExactlyRocket.Science 23/50 EAN/GTIN in Smalltalk (2/2) "EAN transposition check shows its most important weakness" | lastWeight | lastWeight:= 3. '1235412745657' reversed inject: 0 into: [ :sum :each | each digitValue * (lastWeight:= 4-lastWeight) + sum 10 ] "Returns 2, which proves the number is incorrect" 1 "But, if the transposed digits differ by 5, the error goes undetected: x + 3(x+5) = 4x+15 and 3x + (x+5) = 4x+5, which are congruent modulo 10" | lastWeight | lastWeight:= 3. '1234517245657' reversed inject: 0 into: [ :sum :each | each digitValue * (lastWeight:= 4-lastWeight) + sum 10 ] "Returns 0 indicating validity. However, such transpositions are less frequent: digits different by 5 are not adjacent on either the number row or standard numeric keypads (whether calculator style or telephone style)"
  • 24. john@ItsNotExactlyRocket.Science 24/50 Luhn 1 Patented in 1960 but now in the public domain, Luhn is used for: credit card numbers; IMEIs (mobile handsets); European Patent Application numbers; ICCIDs (SIM cards); several government IDs from round the world; &c. 2 Another “sum digits with alternating weights modulo 10” method; the weights, again starting from the right, alternate 1 and 2, rather than EAN/GTIN’s 1 and 3. 3 However, where doubling a digit gives a two digit result, we use the sum of the digits of the answer. 4 So, when in even digit positions, 0,1,2,3,4 yield 0,2,4,6,8 – but 5,6,7,8,9 yield 1,3,5,7,9 – not 10,12,14,16,18.
  • 25. john@ItsNotExactlyRocket.Science 25/50 Luhn Example 1 To validate: 8905543677 2 Going right to left every other digit we get 7+14+6+6+4+10+5+0+9+16 3 Replacing all two digit results with their digit sum yields 7+5+6+6+4+1+5+0+9+7 = 50 mod 10 = 0. 4 As with GTIN, if the sum is a multiple of 10, the number is valid; and, as with GTIN, this is effectively the same operation as dropping the check digit, calculating a check digit from the payload, and comparing it to the existing check digit. 5 As with GTIN, to generate a check digit, we must flip the weights as the now missing check digit was digit 1, and we start by doubling the first digit as we go right to left.
  • 26. john@ItsNotExactlyRocket.Science 26/50 Luhn Smalltalk (1/5) – digits and characters "How to sum digits of a two digit number?" 14 printString fold: [:a :b | a digitValue + b digitValue] "Or just add the tens and the units" (0 to: 9) collect: [:n | | d | d:= n*2. d // 10 + d 10 ] "#(0 2 4 6 8 1 3 5 7 9)" "Or, once we have that result, just use it as a lookup table" (0 to: 9) collect: [:n | #(0 2 4 6 8 1 3 5 7 9) at: 1+n" "Also, by using digitValue the argument can be any Unicode characters that represent the decimal digits, e.g. Sinhala" '෦෧෨෩෪෫෬෭෮෯' asArray collect: [ :char | char digitValue ] "#(0 1 2 3 4 5 6 7 8 9)" NB: We did all the input sanitization before, right? Right?"
  • 27. john@ItsNotExactlyRocket.Science 27/50 Luhn Smalltalk (2/5) – using a lookup table "This uses alternating offsets of 11 and 1 with a lookup table. The validation code, highlighted, is only four lines. This Snippet will return any invalid numbers in the array. All the inputs in the example below are valid." | luhn | luhn:= #[ 0 1 2 3 4 5 6 7 8 9 0 2 4 6 8 1 3 5 7 9 ]. #(18 #[3 7 2] '٥٦٤٨٧٢' "formats: integer, byte array, unicode string" 8905543677 "my example from earlier slide" 17893729974 "the example from Wikipedia" 1234567812345670 "the example from Rosetta Code" '3566002020360505' "an example Credit Card number, spaces removed" 339269927028426 "an example IMEI number" ) reject: [ :input | | index | index:=11. (input asString reversed inject:0 into:[:sum :digit || codePoint | codePoint:= digit asInteger & 15. sum + (luhn at:(index:= 12-index)+codePoint) ])10=0]
  • 28. john@ItsNotExactlyRocket.Science 28/50 Luhn Smalltalk (3/5) – tables beyond 0 to 9 "Unlike the other single digit check digit algorithms here, Luhn can be generalized to larger (or smaller) character sets with an even number of symbols. Lookup table generation translates to Smalltalk as follows:" | charset base luhn | charset:= '0123456789ABCDEFGHJK'. "works with any symbols, in any order" base:= charset size. (base2) isZero ifFalse: [ self error:'Luhn character set must have even size' ]. luhn:= OrderedCollection new. luhn addAll: (0 to:base-1). 0 to: base-1 do: [:digit || double | double:= 2 * digit. luhn add: double//base + doublebase ]. luhn asByteArray "#[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0 2 4 6 8 10 12 14 16 18 1 3 5 7 9 11 13 15 17 19 ]" e.g.: 14 ×2 = 28 which is 18 in base 20 so digit sum is 9 Note: a nice easy pattern is emerging
  • 29. john@ItsNotExactlyRocket.Science 29/50 Luhn Smalltalk (4/5) – validation modulo N "Note table construction and validation code. Snippet takes an array and returns an array of those elements which are invalid. Lookup table could be cached but typically construction is 2-5µs, with each check 1-2µs " | base charset luhn | base:= (charset := 'abcdef') size. luhn:= ((0 to:base-1),(0 to:base-2 by:2),(1 to:base by:2)) asByteArray. #( 'abcdefe' 'abcb' 'defc' ) reject:[:input || index | index:= 1+base. (input asString reversed inject: 0 into: [:sum :digit || codePoint | codePoint:= (charset indexOf: digit)-1. sum + (luhn at: (index:= 2+base-index) + codePoint) ])base=0] "NB: In design, ensure the charset size is even, with no repetition" "NB: In execution, ensure that each argument, after removing allowed punctuation, comprises only characters from the character set"
  • 30. john@ItsNotExactlyRocket.Science 30/50 Luhn Smalltalk (5/5) – generation modulo N "Because (m-(x mod m) mod m) simplifies to (-x) mod m, the generation method (again note that the index is flipped) is as follows ..." | base charset luhn | base:= (charset := ' ✓ ') size. 😎☀ ⚙💩⚡ luhn:= ((0 to:base-1),(0 to:base-2 by:2),(1 to:base by:2)) asByteArray. #( ' ✓ ' ' ✓' ' ' ) 😎☀ ⚙💩⚡ 😎☀ ⚙💩⚡ collect:[:input || index | index:= 1. input asString, (charset at: 1 + ((input asString reversed inject: 0 into: [:sum :digit || codePoint | codePoint:= (charset indexOf: digit)-1. sum + (luhn at: (index:= 2+base-index) + codePoint) ]) negated base)) asString ] "#(' ✓ ' ' ✓ ' ' ✓')" 😎☀ ⚙💩⚡💩 😎☀ ☀ ⚙💩⚡ "Check digits Pile of Poo Sun With Rays and Check Mark are appended" very little code is required to generate Luhn mod N check digits
  • 31. john@ItsNotExactlyRocket.Science 31/50 Luhn Weaknesses 1 Twin errors: 22 2 + 4 🠖 6 🠖 🠔 5 + 1 5 + 10 🠔 🠔 55 33 3 + 6 🠖 9 🠖 🠔 6 + 3 6 + 12 🠔 🠔 66 44 4 + 8 🠖 12 🠖 🠔 7 + 5 7 + 14 🠔 🠔 77 Unfortunately all three of these errors comprise adjacent keys on a numeric keypad; less likely to be typed on a number row 2 Transposition error: 90 9 + 0 🠖 9 🠖 🠔 0 + 9 0 + 18 🠔 🠔 09 Not only adjacent on number row, but at the extreme right, both conventionally operated by digitus minimus dexter 1 1 right pinky finger
  • 32. john@ItsNotExactlyRocket.Science 32/50 Luhn Conclusions 1 Works with any symbol set a there must be an even number of symbols – no problem for octal, decimal, hexadecimal and the Latin alphabet b but symbol codes do not need to be contiguous or in any particular order 2 Allows codes as short as 2 symbols, and up to arbitrary length. 3 Easy to implement, especially for the standard case of digits 0..9 4 Fast: typically lookup table construction 2-5µs; then validation < 1µs 5 Widely used, well-understood and still extremely useful 6 Has some unfortunate weaknesses a 60% of twin errors undetected, and easy to key on a numpad b cannot detect transpositions of the end digits in its character set – and this is particularly relevant with the 0..9 case
  • 33. john@ItsNotExactlyRocket.Science 33/50 Half Time! 1 Smalltalk is great! Any dissent to be handled later – in the car park 😛 2 Some codes have double digit checksums (checksa?) a e.g. International banking codes: IBAN uses a modulo 97 algorithm b possibly overkill for part numbering, we’ll stick to single digits 3 Despite the appeal of Luhn Mod N, decimal codes remain optimal a but weaknesses of Luhn 0..9 make it subobtimal for part numbering b and countering the weakness by manually preventing the creation of numbers with 09 or 90, or by using the digits out of order, e.g. 1234567890, would result in non-standard implementations and possibly maintainability issues 4 Advantages over later algorithms turn out to be underwhelming: a It’s not that much faster b It’s not that much simpler
  • 34. john@ItsNotExactlyRocket.Science 34/50 Verhoeff 1 In 1969 Jacobus Verhoeff presented the first decimal check digit algorithm which detects all adjacent transposition errors and all single digit errors, despite this having been thought (even mistakenly ‘proved’) impossible. 2 From some eye-watering maths, he created three tables: p, d, and inv. [Wikipedia]
  • 35. john@ItsNotExactlyRocket.Science 35/50 Navigating the Verhoeff Tables: Validation 1 Starting with, e.g., ‘3572’ and c = 0, we go L R 🠔 from position 0 (the check digit) 2 The permutation table, p, gives us p(0,2) (0 is position modulo 8, 2 is the digit) = 2 3 The 2 goes into the multiplicative table, d, with the current c, giving c = d(0,2) = 2 4 Moving left: p(1,7) = 0; c=d(2,0) = 2 5 Moving left: p(2,5) = 9; c=d(2,9) = 6 6 Moving left: p(3,3) = 6; c=d(6,6) = 0 7 We have ended on c = 0 8 We look up 0 in the inverse table, inv, and get 0 9 The number 3572 is therefore valid ✓ What is the point of step 8? The only 0 in inv maps to 0. Surely we could just ignore the inverse table for validation? We can, but we won’t – and the generation method tells us why…
  • 36. john@ItsNotExactlyRocket.Science 36/50 Navigating the Verhoeff Tables: Generation 1 Starting with, e.g., ‘3570’ and c = 0, we go L R 🠔 from position 0 (the check digit) 2 The permutation table, p, gives us p(0,0) (0 is position modulo 8, 0 is the digit) = 0 3 The 0 goes into the multiplicative table, d, with the current c, giving c = d(0,0) = 0 4 Moving left: p(1,7) = 0; c=d(0,0) = 0 5 Moving left: p(2,5) = 9; c=d(0,9) = 9 6 Moving left: p(3,3) = 6; c=d(9,6) = 3 7 We have ended on c = 3 8 We look up 3 in the inverse table, inv, and get 2 9 The number 357 therefore needs the check digit 2 to become 3572 as previous slide So, to generate a check digit we suffix a zero1 to our input and use the same method – and the result is our new check digit. 1 This is equivalent to multiplying the equivalent integer by 10.
  • 37. john@ItsNotExactlyRocket.Science 37/50 Verhoeff in Smalltalk (1/3) - lookups Recalling Arrays start at 1, we have the following lookup methods: privateInvTableAt: j ^ #[ 0 4 3 2 1 5 6 7 8 9 ] at: j+1 privatePtableRowAt: r column: c ^ ( #( #[ 0 1 2 3 4 5 6 7 8 9 ] #[ 1 5 7 6 2 8 3 0 9 4 ] #[ 5 8 0 3 7 9 6 1 4 2 ] #[ 8 9 1 6 0 4 3 5 2 7 ] #[ 9 4 5 3 1 2 6 8 7 0 ] #[ 4 2 8 6 5 7 3 9 0 1 ] #[ 2 7 9 3 8 0 6 4 1 5 ] #[ 7 0 4 6 9 1 3 2 5 8 ] ) at: r+1) at: c+1 privateDtableRowAt: r column: c "as above but with d(j,k) table” allowedCharacters ^Array new ,'0123456789' "Western Arabic” ,'٠١٢٣٤٥٦٧٨٩' "Eastern Arabic” ,'०१२३४५६७८९' "Devanagari" ,'০১২৩৪৫৬৭৮৯' "Bengali" ,'௦௧௨௩௪௫௬௭௮௯' "Tamil" ,'๐๑๒๓๔๕๖๗๘๙' "Thai" ,'၀၁၂၃၄၅၆၇၈၉' "Myanmar" ,'෦෧෨෩෪෫෬෭෮෯'"Sinhala" allowedSeparators ^Array new with: $- with: Character space
  • 38. john@ItsNotExactlyRocket.Science 38/50 Verhoeff in Smalltalk (2/4) - API Private method, privateVerhoeff: digits "NB: private – no error checking argument is any SequenceableCollection of integers in (0 to:9); returns a single Integer in that range." | pos c | pos := c := 0. digits reverseDo: [:n | | p | p := self pTableAtRow: pos col: n. c := self dTableAtRow: c col: p. pos := pos + 1 8 ]. ^ self invTableAt: c Private VerhoeffString: aString "Takes unpunctuated string of Unicode with digitValue (0 to:9), returns a single Integer in that range." aString isString ifFalse: [ self error: '...' ]. aString size > 1 ifFalse: [ self error: '...’ ]. aString reject: [ :char | self allowedCharacters includes:char]) isEmpty ifFalse: ['...’ ]. aString first digitValue > 0 ifFalse: [self error: '...' ]. ^ self privateVerhoeff:(aString asArray collect: [ :char | char digitValue ]) Public method,
  • 39. john@ItsNotExactlyRocket.Science 39/50 Verhoeff in Smalltalk (3/3) – in one block "Verhoeff algorithm; argument is [0-9][0-9]+, answer is integer in 0..9" | c pos | pos:= c:= 0. "checksum and position (right to left) start at zero" 3572 asString reversed do: [ :digit | c := (#( #( 0 1 2 3 4 5 6 7 8 9 ) #( 1 2 3 4 0 6 7 8 9 5 ) #( 2 3 4 0 1 7 8 9 5 6 ) #( 3 4 0 1 2 8 9 5 6 7 ) #( 4 0 1 2 3 9 5 6 7 8 ) #( 5 9 8 7 6 0 4 3 2 1 ) #( 6 5 9 8 7 1 0 4 3 2 ) #( 7 6 5 9 8 2 1 0 4 3 ) #( 8 7 6 5 9 3 2 1 0 4 ) #( 9 8 7 6 5 4 3 2 1 0 ) ) at:1+c) at:1+((#( #( 0 1 2 3 4 5 6 7 8 9 ) #( 1 5 7 6 2 8 3 0 9 4 ) #( 5 8 0 3 7 9 6 1 4 2 ) #( 8 9 1 6 0 4 3 5 2 7 ) #( 9 4 5 3 1 2 6 8 7 0 ) #( 4 2 8 6 5 7 3 9 0 1 ) #( 2 7 9 3 8 0 6 4 1 5 ) #( 7 0 4 6 9 1 3 2 5 8 ) ) at:1+pos) at: digit digitValue + 1). pos:= pos + 1 8 ]. #( 0 4 3 2 1 5 6 7 8 9 ) at:1+c "NB: No error checking – SANITIZE INPUTS! Integers must be >9; strings (or arrays &c.) MUST comprise only 0..9 (or $0..$9 &c.)"
  • 40. john@ItsNotExactlyRocket.Science 40/50 Verhoeff Weaknesses? 1 Verhoeff only works with decimal codes, but: a but we already rejected non-decimal codes. 2 The algorithm is complex, but: a the implementation is just table lookups: 2 per digit and 1 more b the operations involved, unicode lookup, table lookup &c. are simple c it’s pretty fast, ~2µs, and comparable to Luhn 3 Unlike most checksum algorithms, it is sensitive to leading zeros, but: a that’s actually useful for those who do use leading zeros b hopefully I’ve persuaded you that isn’t going to be us! 4 There may be other weaknesses, so: a find someone with a better understanding of dihedral groups b draw statistical inferences from experimentation (later… )
  • 41. john@ItsNotExactlyRocket.Science 41/50 Damm 1 In 2004 Michael Damm published this algorithm in his PhD on Totally Antisymmetric Quasigroups … Yes, me neither! 2 Again the underlying maths is complex but translates to table lookups, even easier ones than Verhoeff: just one table to iterate and no inverse. 3 Start with 0, end with 0: [Wikipedia]
  • 42. john@ItsNotExactlyRocket.Science 42/50 Navigating the Damm Table 1 Starting with, e.g., ‘64353’ and r = 0, we go L R 🠖 (for a change!) 2 column 6 row 0 8 🠖 3 column 4 row 8 6 🠖 4 column 3 row 6 9 🠖 5 column 5 row 9 3 🠖 6 column 3 row 3 🠖 0 7 We have ended on r = 0 8 There’s no inverse or final lookup table, once we hit 0, it’s valid 9 The input ‘64353’ is therefore valid ✓ How do we manage without an inverse table? Because the 10x10 Latin Square we are using for each iteration has a zero diagonal, when row r = column c, then r,c 0. 🠖 Consequently, the non-zero value returned on failed validation is also the check digit that should be appended on generation. So, as with Verhoeff, just one method for both validation and generation.
  • 43. john@ItsNotExactlyRocket.Science 43/50 Damm Smalltalk (1/2) "Damm algorithm; argument is [0-9][0-9]+, answer is integer in 0..9" 57245724 asString inject: 0 into: [ :row :col | #[ 0 3 1 7 5 9 8 6 4 2 7 0 9 2 1 5 4 8 6 3 4 2 0 6 8 7 1 3 5 9 1 7 5 0 9 8 3 4 2 6 6 1 2 3 0 4 5 9 7 8 3 6 7 4 2 0 9 5 8 1 5 8 6 9 7 2 0 1 3 4 8 9 4 5 3 6 2 0 1 7 9 4 3 8 6 1 7 2 0 5 2 5 8 1 4 3 6 7 9 0 ] at: 10 * row + (col digitValue) + 1 ] "NB: No error checking – SANITIZE INPUTS! Integers must be >9; strings (or arrays &c.) MUST comprise only 0..9 (or $0..$9 &c.)"
  • 44. john@ItsNotExactlyRocket.Science 44/50 Damm Smalltalk (2/2) "Could use the same digitValue approach to store the table as a String" '57245724' asString inject: 0 into: [ :row :col | ('03175986427092154863420687135917509834266123045978', '36742095815869720134894536201794386172052581436790' at: (10 * row + 1 + (col digitValue))) digitValue] "Naturally we could store the table as a constant, but this hardly needs optimising." | reader | reader := testInput readStream. [ reader next asString inject: 0 into: [ :row :col | ('03175986427092154863420687135917509834266123045978', '36742095815869720134894536201794386172052581436790' at: (10 * row + 1 + (col digitValue))) digitValue] ] bench2. "'O.956 picoFortnights (1.156µs)'" "It’s clear the Damm algorithm is very simple to code, and could easily be done client-side (e.g. in Javascript) where required."
  • 45. john@ItsNotExactlyRocket.Science 45/50 Damm Weaknesses? 1 Damm, like Verhoef, only works with decimal codes, but: a but we already rejected non-decimal codes 2 Like Verhoeff, the algorithm is complex, but: a the implementation is just table lookups: 1 per digit. b the operations involved, unicode lookup, table lookup &c. are simple c it’s pretty fast, ~2µs, and comparable to Luhn and Verhoeff 3 Like most checksum algorithms (but unlike Verhoeff) the algorithm is insensitive to leading zeros, but: a who wants to use leading zeros anyway? 4 There may be other weaknesses, so: a how are you with totally antisymmetric quasigroups? b draw statistical inferences from experimentation (later… )
  • 46. john@ItsNotExactlyRocket.Science 46/50 Zero Redux: Un œuf is un œuf? (1/3) 1 Unlike the Verhoeff algorithm, the position of digits (their index) is not used in the calculation 2 To calculate a check digit with Verhoeff, you process the number with a 0 appended (i.e. the number × 10) and the result is the check digit of the original number 3 But in Damm, the check digit is simply obtained by applying the algorithm to the number itself 4 So in both Damm and Verhoeff a valid number returns 0 a But in Damm that is also the next check digit b so Damm is insensitive to trailing zeros
  • 47. john@ItsNotExactlyRocket.Science 47/50 Zero Redux: More than un œuf? (2/3) 1 For example, the check digit of ‘1’ is 3, giving 13, the smallest “Damm number.” Running Damm on 13 yields 0, which, as with Verhoeff, indicates a correct checksum 2 But therefore 130 also has a correct Damm checksum, as do 1300 and 13000 &c. 3 One possible defence is to simply ban trailing zero a Using a dash to highlight the check digit, Part 14 would become Part 14-9 but Part 21 would become part 21-0. b This is not ideal if Part 2 has already become Part 2-1; so we would just avoid having a payload (the “unchecked” raw number) of 21.
  • 48. john@ItsNotExactlyRocket.Science 48/50 Zero Redux: You had un œuf yet? (3/3) 1 … but … 2 Because the “interim digit” (r) in the Damm algorithm starts with a value of 0, any Damm number followed by another will also be a Damm number 3 So 13, 1313, 13013, 130013, 1300013 all checksum 4 So, Damm is “zero blind” anywhere before or after a Damm number, including between them: 5 So in, e.g., ...13…515…37…6876… each ‘…’ could be any number of zeros (including zero of them) without the checksum failing.
  • 49. john@ItsNotExactlyRocket.Science 49/50 Defense (1/2) 1 Go back to fixed width. But: a I already chose this as a hill to die on b It makes commonly used part numbers long c It requires otherwise meaningful numbers to be padded d It makes meaningful number changes hard to manage: e.g. part nnn-14- can be replaced with part nnn-14-2- 2 Never mind leading or trailing, ban the zero! But: a reduces the pool of part numbers b may also impair choosing meaningful part numbers – The strategy would ban numbers such as 2400 for no reason. 3 Live with it: no check-digit algorithm detects all errors Defence against the Damm Zero
  • 50. john@ItsNotExactlyRocket.Science 50/50 tl:dr; 1 Already have a check digit? Use that. 2 Already using EAN/GTIN article numbers? Use those. 3 Need alphanumerics, other symbols, or character sets larger (or smaller) than decimal digits? Use Luhn. 4 Do you need maximum error resistance? Use Verhoeff. 5 Do you want effective but really simple? Use Damm In a further presentation I will provide some statistical analysis, some potential refinements, and my final recommendations. “I see you shiver with anticip …”