mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Mlucas (https://www.mersenneforum.org/forumdisplay.php?f=118)
-   -   Mlucas v18 available (https://www.mersenneforum.org/showthread.php?t=24100)

ewmayer 2019-10-16 03:12

Thanks - so you're getting quite a decent speedup from using both logical cores, though I haven't a clue if the absolute timings are reasonable for the hardware in question - 2 ms/iter @192K is quite slow by (say) Haswell-and-beyond desktop-PC standards.

I suggest you proceed to the full production-run-oriented self-tests, and please post a zipped copy of the resulting self-test logfile here:
[i]
./Mlucas -s m -iters 100 -cpu 0:1 >& selftest.log[/i]

Dylan14 2019-10-16 04:55

3 Attachment(s)
[QUOTE=ewmayer;528121]Thanks - so you're getting quite a decent speedup from using both logical cores, though I haven't a clue if the absolute timings are reasonable for the hardware in question - 2 ms/iter @192K is quite slow by (say) Haswell-and-beyond desktop-PC standards.

I suggest you proceed to the full production-run-oriented self-tests, and please post a zipped copy of the resulting self-test logfile here:
[I]
./Mlucas -s m -iters 100 -cpu 0:1 >& selftest.log[/I][/QUOTE]


See attached file. Note: this is on a new session of Colab, so the processor is not the same as before. I have also attached the cpu info and cfg files.

ewmayer 2019-10-16 19:14

Thanks for the build & test data - I see this particular new instance supports avx-512, so you'll want to prepare a second build that invokes those inline-asm macros in the code:
[i]
gcc -c -O3 -DUSE_AVX512 -march=skylake-avx512 -DUSE_THREADS ../src/*.c >& build.log
[/i]
...and use a different name for the resulting executable, you could call the 2 binaries mlucas_avx2 and mlucas_avx512, say. "grep avx512 /proc/cpuinfo" on whatever system you get during a particular session will tell you which binary to use. Rerun the self-tests on this new system to see what kind of speedup you get from using avx-512.

(Wait - while working through your selftest.log data further down in this note, I came across these infoprints @7168K:
[i]
radix28_ditN_cy_dif1: No AVX-512 support; Skipping this leading radix.
[/i]
So you did prepare and use an avx-512 build as per above compile flags for this set of runs? If so, that obviates the avx2-vs-avx512 parts of the commentary below.)

As to your avx2-build timings, I realized after posting my "seems slow' comment yesterday that I was thinking in terms of multicore running on hardware like my Haswell. For a single-physical-core running at 2 GHz, ~50 msec/iter at the current GIMPS wavefront (5120K) is not at all bad - for comparison, here is the mlucas.cfg file for all 4 physical cores (no hyperthreading on this CPU) of my 3.3GHz Haswell. On a single CPU the runtimes would be perhaps ~3.5x as large, so (say) at 5120K we'd expect ~47 msec/iter, only ~10% faster than your 1-core/2-thread timings, and this is at 3.3GHz vs your 2GHz:
[code]
18.0
2048 msec/iter = 5.25 ROE[avg,max] = [0.222878714, 0.312500000] radices = 64 16 32 32 0 0 0 0 0 0
2304 msec/iter = 5.85 ROE[avg,max] = [0.259770659, 0.375000000] radices = 144 16 16 32 0 0 0 0 0 0
2560 msec/iter = 6.28 ROE[avg,max] = [0.252363335, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 7.44 ROE[avg,max] = [0.239182557, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 8.35 ROE[avg,max] = [0.251998996, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 9.02 ROE[avg,max] = [0.243424657, 0.312500000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 9.25 ROE[avg,max] = [0.248507344, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 10.17 ROE[avg,max] = [0.256763639, 0.343750000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 10.63 ROE[avg,max] = [0.279075387, 0.343750000] radices = 256 16 16 32 0 0 0 0 0 0
4608 msec/iter = 12.21 ROE[avg,max] = [0.269211099, 0.343750000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 13.48 ROE[avg,max] = [0.300527545, 0.375000000] radices = 320 16 16 32 0 0 0 0 0 0
5632 msec/iter = 15.42 ROE[avg,max] = [0.230105748, 0.281250000] radices = 176 16 32 32 0 0 0 0 0 0
6144 msec/iter = 17.51 ROE[avg,max] = [0.246608585, 0.312500000] radices = 192 16 32 32 0 0 0 0 0 0
6656 msec/iter = 18.60 ROE[avg,max] = [0.231292347, 0.312500000] radices = 208 16 32 32 0 0 0 0 0 0
[/code]
Further using an avx-512 build on this type of instance should give a nice added speedup, perhaps as much as 1.6x. And if/when a Prime95/mprime build for these systems comes online, that should be faster still.

Looking more closely at your selftest.log and mlucas.cfg files, I see "Excessive level of roundoff error detected" messages for individual FFT radix sets at 2816K, 3328K, 5120K and 7168K, but in none of those cases did the skipped radix set(s) happen to be the fastest one(s) at the FFT length in question.

kracker 2019-11-28 02:09

Trying to compile under MSYS2/windows, getting 'SIGHUP' undeclared errors.
[code]
../src/fermat_mod_square.c:1869:18: error: 'SIGHUP' undeclared (first use in this function)
../src/mers_mod_square.c:2382:18: error: 'SIGHUP' undeclared (first use in this function)
../src/Mlucas.c:182:21: error: 'SIGHUP' undeclared (first use in this function)

[/code]

ewmayer 2019-11-28 02:53

[QUOTE=kracker;531614]Trying to compile under MSYS2/windows, getting 'SIGHUP' undeclared errors.
[code]
../src/fermat_mod_square.c:1869:18: error: 'SIGHUP' undeclared (first use in this function)
../src/mers_mod_square.c:2382:18: error: 'SIGHUP' undeclared (first use in this function)
../src/Mlucas.c:182:21: error: 'SIGHUP' undeclared (first use in this function)

[/code][/QUOTE]

I no longer have access to a Windows machine of any kind - perhaps SIGHUP has no proper analog in Windows? Anyhow, quick workaround is to simply comment out any clauses giving such errors and recompile. E.g. in Mlucas.c:
[code]
void sig_handler(int signo)
{
if (signo == SIGINT) {
fprintf(stderr,"received SIGINT signal.\n"); sprintf(cbuf,"received SIGINT signal.\n");
} else if(signo == SIGTERM) {
fprintf(stderr,"received SIGTERM signal.\n"); sprintf(cbuf,"received SIGTERM signal.\n");
// } else if(signo == SIGHUP) {
// fprintf(stderr,"received SIGHUP signal.\n"); sprintf(cbuf,"received SIGHUP signal.\n");
}
// Toggle a global to allow desired code sections to detect signal-received and take appropriate action:
MLUCAS_KEEP_RUNNING = 0;
}
[/code]
..and similarly in the other 2 files which define signal handlers and are giving errors.


All times are UTC. The time now is 10:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.