Just writing to confirm that your steps worked, and that the performance of the binary is atrocious. 4-threads on an i7-4930K gives about 125-130 million p/sec. 4-threads on the Model 3B gives about 5 million p/sec. Still, now have something to mess with, and I can try some other compilation with it. Thanks again!
