I can confirm that the code works. A few things:

1. It suffices to just comment out the lines above the imports after you made the code once.
2. Is there a reason why you use the build option USE_SSE41=1, instead of something that is faster like AVX2? As it appears all of the Colab entities have at least this.
3. I added some more comments to the code below the compilation:
Thanks Dylan,

I tried a direct copy/paste and lost some formatting. I had to go back to my original. I'm being pulled away ATM, but plan to address all else later.

1. I considered a block delete easier than commenting out lines.
2. I have experienced segmentation faults with AVX2 in the past.
3. Thanks! I'll work on those later.
