C Program Generates Unique 16-Character IDs

View profile for Enitan A.

Software Developer. MERN, PYTHON

An ID Generator in C I wrote a C program that generates a 16-character ID from: 10 digits (0–9) 26 uppercase letters (A–Z) 26 lowercase letters (a–z) That gives us 62 possible characters for each slot. Since the ID length is 16 characters, the total number of unique possible IDs is: 62 raised to the power of 16 which is approximately 40000000000000000000000000000 (40 octillion) possible combinations. In other words, the probability of generating the same ID twice (at random) is practically 0 for all real-world purposes. Here’s the program:

  • text

Looks so cool! Congrats!

Like
Reply
Tim Brown

Staff Software Engineer

4w

A couple of issues, you might consider: rand() produces a pseudo-random number in the range [0, RAND_MAX) = [0, 2_147_483_647), which means that there are at most that many strings. time(NULL) has a precision of one second, which means that any two processes started in the same second will generate the same id (and if you provide this as a library, the same sequence of IDs). Even if time had ms or ns precision, there is still a one in a billion chance that two processes started in the same second collide (chances increasing scarily as you add more processes).

Ray Myers

Tech Lead | Mender | Untangler

4w

Neat. Doing this in a production-grade way is surprisingly subtle and you might be interested to look at the history of the UUID standard. https://en.wikipedia.org/wiki/Universally_unique_identifier

Great! I think it would be more efficient if you used a function to get a random number on each iteration. The function would pick an ascii value that corresponds that of numbers: 48-57, lowercase chars: 97-122 and uppercase: 65-90. You begin the iteration over the generated_ID array and populate with the number returned from each iteration and at the end, you add the '\0' byte (0). Then you return the value using puts(generated_ID). Using a function would beat using arrays to define values as C uses malloc under the hood to allocate memory for arrays (and then copy each value to the memory location in the array) and that overhead cost for this can be avoided. One may argue with this approach for scalability reasons, and if that's the issue, then using this array approach like you did would be good, however, it'll be as easy as adding an extra case in switch statement in your random_value generator. Well done though...many run away from C for some unknown reasons 🤣

Like
Reply
Femi Fapohunda

Chief Technology Officer at SAFAM Digital Hub Limited

1mo

The time.h does the trick

Like
Reply
Sergiy Yevtushenko

Distributed Systems Backend Engineer | AI, Rust, Java

4w

Nitpick: ID_len better to have defined as 16. Then you'll have to adjust it in only one place - for denerated_ID declaration. In two other places you'll be able to use it without adjustments. This is irrelevant for performance (the compiler will substitute a constant anyway), but it will make code simpler to reason about: adding 1 for array declaration is easy to understand for anyone who knows C, while adjustments in the loop and for setting \0 (as it is now) require looking into constant declaration to realize that it is 1 byte longer than the declared result length. Just my 2 cents...

Buks van der Lingen

Consulting/freelance Software Design and Development

4w

If you'd like, message me, i'll give you some further food for thought. 👍

Like
Reply

A problem with this code is that it seeds the random generator with the current time in seconds. That means that if you run it twice in one second, you get the same id. Also, if you approximately know when the program was run, you can generate the likely ids that it generated. On Linux, prefer /dev/random for better quality random numbers. Also, read or watch: https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/heninger about the problem of generating good quality random keys in devices with low amounts of entropy.

Hillel Wayne

Formal Methods | Software Engineering | Software History

4w

Couple of mathematical gotchas to watch out for: 1. Birthday Paradox (https://en.wikipedia.org/wiki/Birthday_problem#Generalizations): for N possibilities, you have a 50% of generating a collision after only generating sqrt(2N). So you'll likely start having collisions after a couple hundred trillion generations, not one 40 octillion. Still in the realm of implausable but worth knowing about. 2. rand() % n is only evenly distributed if n is coprime with RAND_MAX. Neither of these affect your use case, but they show that the concerns we have get more convoluted as we use our tools to do more stuff

Snippy Valson

Staff software engineer at KeyValue Software Systems

1mo

You could also check where the rand gets its seed from.

See more comments

To view or add a comment, sign in

Explore content categories