Enginursday: Playing with CRCs in Python

Take a look at Cyclic Redundancy Checks beyond just copy/pasting an implementation. The underlying mathematical principles are very interesting, and you can try it yourself in Python!

Favorited Favorite 1

Recently, the engineering team has needed to implement Cyclic Redundancy Checks (CRCs) for several different projects. The algorithm is easy enough to copy from the internet and forget, but my curiosity just couldn't quit there! CRCs have a very fascinating mathematical underpinning that relates to information theory, computer hardware and more. Trying to get a better understanding of CRCs eventually led me to discover the fantastic legendary ASCII text file called 'crc_v3.txt`. Dr. Ross Williams, the author, once hosted the file at http://www.ross.net, however it can't be found there any more 😔.

Don't fret - everyone knows nothing ever really disappears on the internet, and this is no exception. A work of art and a beacon for curious minds 'crc_v3.txt' will live on forever. You can go pay it a visit here.

Turns out that the original text was just down for maintenance. You can find it here: http://www.ross.net/crc/download/crc_v3.txt

Alone 'crc_v3.txt' is a great read, but there are still a few points that might be hard to follow. To satiate my curiosity I created a follow-along Python script to demonstrate the math. You can check it out on GitHub CRC_Exploration or by trying it out live in this post, thanks to the REPL.it widget below. Just click the green 'run' arrow and peruse the output, then try changing the code yourself!


Comments 9 comments

  • Woah, I am a real fan of coding, the Internet and Websites.

  • I'd like to mention archive.org (most of you are probably already familiar with their great work).

    The CRC web site you mentioned is still available at:

    https://web.archive.org/web/20061205055731/http://www.ross.net/crc/

    Thanks for the article!

  • I was super confused about the multiplication in section 5 of the Python demo. The comments seem correct; but, the printed text didn't match. Then I realized line 195 should read: print(' 000000')

    Now it makes sense. :D

    • Nice catch, thanks ;)

      Fixed with this commit: addfd5b

      Using scalar multiplication added in this commit: d2d3754

      Though I wonder if changes will get picked up in this embedded widget?

      Yup - I was so preoccupied with how cool it felt to overload the left shift operator that I forgot to multiply by the scalar bit values.

  • CRCs are sometimes good for things other than just "error detection", the most common use for them (often used for checking disk files).

    Back in the 90s, I was working on a project where we'd look at (potentially) thousands (or more) "vectors" -- it's not unrasonable to think of the "vectors" as "strings of letters". We had a priori knowledge that all of the vectors (or "strings") in a run would have the same length, though that could be as large as 1024. We needed to sort out duplicates and only find the unique vectors. Comparing such long strings "brute force" would take WAY too long, so I came up with the idea of calculating a "hash code" wherein we would know that if the hash codes were different, the vectors HAD to be different. The problem was coming up with one where, say (for vectors of length 3) "AAB", "ABA", and "BAA" would have different hash codes. A co-worker suggested I try calculating the CRC of each vector, and use that as the hash code. It worked like a champ! We still had to compare all the vectors with the same CRC, but that was a much shorter list than the whole list of vectors we'd seen.

    • Admittedly I've been a little uncomfortable with the idea of hashing because (afaik) it is a many-to-one operation and nobody seemed concerned about it.. I'm very relieved to hear that you indeed had to check vectors that yielded the same CRC. Cool application too, thanks for sharing!

      • Yes, most (all?) hashing schemes are "many-to-one", but the number of bits in the output reduce the size of "many". In the case of the vectors I was working with, even a fairly large "many", typically on the order of a handfull, was better than the physically imposed limit of 1,024.

        Note that hashing schemes tend to be "one-way", that given the "hashed value" it's not easy to recover the "original". This is the reason why many (especially Unix and "Unix-like") systems store the hashed representation of passwords -- it's gotten (much) more complex over the nearly 45 years I've been dealing with it, but it still, at it's heart, is a "hashing" scheme.

  • Coincidentally, I went down the rabbit hole with CRCs yesterday and found some good resources.

    1. You can get the CRC paper directly from Dr. Williams' site at http://ross.net/crc/download/crc_v3.txt
    2. PyCRC ( https://pycrc.org/ ) is amazing. It's a Python tool that generates C code for any "RockSoft" CRC parameters.
    3. https://crccalc.com/ is great for quickly checking a CRC across a huge number of parameters.

Related Posts

Recent Posts

Tags


All Tags