Originally Posted by jasonp View Post
How difficult was the porting effort needed to run on ARM?
Set the cache size in the source, optionally remove the loop unrolling, set the optimization flags for the machine in Makefile (really just -Ofast -mcpu=native is usually fine) and compile. In the end I just used OpenMPI and GCC 10. I also tried the arm and Cray compilers, but GCC 10 was just as fast.

This exercise shattered my presumption that ARM cpus were efficient but slow.

