mersenneforum.org Mlucas v18 available
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2019-10-16, 03:12 #45 ewmayer ∂2ω=0     Sep 2002 República de California 22·5·11·53 Posts Thanks - so you're getting quite a decent speedup from using both logical cores, though I haven't a clue if the absolute timings are reasonable for the hardware in question - 2 ms/iter @192K is quite slow by (say) Haswell-and-beyond desktop-PC standards. I suggest you proceed to the full production-run-oriented self-tests, and please post a zipped copy of the resulting self-test logfile here: ./Mlucas -s m -iters 100 -cpu 0:1 >& selftest.log
2019-10-16, 04:55   #46
Dylan14

"Dylan"
Mar 2017

2×293 Posts

Quote:
 Originally Posted by ewmayer Thanks - so you're getting quite a decent speedup from using both logical cores, though I haven't a clue if the absolute timings are reasonable for the hardware in question - 2 ms/iter @192K is quite slow by (say) Haswell-and-beyond desktop-PC standards. I suggest you proceed to the full production-run-oriented self-tests, and please post a zipped copy of the resulting self-test logfile here: ./Mlucas -s m -iters 100 -cpu 0:1 >& selftest.log

See attached file. Note: this is on a new session of Colab, so the processor is not the same as before. I have also attached the cpu info and cfg files.
Attached Files
 cpuinfo.txt (2.3 KB, 215 views) selftest.log (64.4 KB, 217 views) mlucas.cfg.txt (2.3 KB, 201 views)

 2019-10-16, 19:14 #47 ewmayer ∂2ω=0     Sep 2002 República de California 22×5×11×53 Posts Thanks for the build & test data - I see this particular new instance supports avx-512, so you'll want to prepare a second build that invokes those inline-asm macros in the code: gcc -c -O3 -DUSE_AVX512 -march=skylake-avx512 -DUSE_THREADS ../src/*.c >& build.log ...and use a different name for the resulting executable, you could call the 2 binaries mlucas_avx2 and mlucas_avx512, say. "grep avx512 /proc/cpuinfo" on whatever system you get during a particular session will tell you which binary to use. Rerun the self-tests on this new system to see what kind of speedup you get from using avx-512. (Wait - while working through your selftest.log data further down in this note, I came across these infoprints @7168K: radix28_ditN_cy_dif1: No AVX-512 support; Skipping this leading radix. So you did prepare and use an avx-512 build as per above compile flags for this set of runs? If so, that obviates the avx2-vs-avx512 parts of the commentary below.) As to your avx2-build timings, I realized after posting my "seems slow' comment yesterday that I was thinking in terms of multicore running on hardware like my Haswell. For a single-physical-core running at 2 GHz, ~50 msec/iter at the current GIMPS wavefront (5120K) is not at all bad - for comparison, here is the mlucas.cfg file for all 4 physical cores (no hyperthreading on this CPU) of my 3.3GHz Haswell. On a single CPU the runtimes would be perhaps ~3.5x as large, so (say) at 5120K we'd expect ~47 msec/iter, only ~10% faster than your 1-core/2-thread timings, and this is at 3.3GHz vs your 2GHz: Code: 18.0 2048 msec/iter = 5.25 ROE[avg,max] = [0.222878714, 0.312500000] radices = 64 16 32 32 0 0 0 0 0 0 2304 msec/iter = 5.85 ROE[avg,max] = [0.259770659, 0.375000000] radices = 144 16 16 32 0 0 0 0 0 0 2560 msec/iter = 6.28 ROE[avg,max] = [0.252363335, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 7.44 ROE[avg,max] = [0.239182557, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 8.35 ROE[avg,max] = [0.251998996, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 9.02 ROE[avg,max] = [0.243424657, 0.312500000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 9.25 ROE[avg,max] = [0.248507344, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 10.17 ROE[avg,max] = [0.256763639, 0.343750000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 10.63 ROE[avg,max] = [0.279075387, 0.343750000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 12.21 ROE[avg,max] = [0.269211099, 0.343750000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 13.48 ROE[avg,max] = [0.300527545, 0.375000000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 15.42 ROE[avg,max] = [0.230105748, 0.281250000] radices = 176 16 32 32 0 0 0 0 0 0 6144 msec/iter = 17.51 ROE[avg,max] = [0.246608585, 0.312500000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 18.60 ROE[avg,max] = [0.231292347, 0.312500000] radices = 208 16 32 32 0 0 0 0 0 0 Further using an avx-512 build on this type of instance should give a nice added speedup, perhaps as much as 1.6x. And if/when a Prime95/mprime build for these systems comes online, that should be faster still. Looking more closely at your selftest.log and mlucas.cfg files, I see "Excessive level of roundoff error detected" messages for individual FFT radix sets at 2816K, 3328K, 5120K and 7168K, but in none of those cases did the skipped radix set(s) happen to be the fastest one(s) at the FFT length in question.
 2019-11-28, 02:09 #48 kracker     "Mr. Meeseeks" Jan 2012 California, USA 1000011110012 Posts Trying to compile under MSYS2/windows, getting 'SIGHUP' undeclared errors. Code: ../src/fermat_mod_square.c:1869:18: error: 'SIGHUP' undeclared (first use in this function) ../src/mers_mod_square.c:2382:18: error: 'SIGHUP' undeclared (first use in this function) ../src/Mlucas.c:182:21: error: 'SIGHUP' undeclared (first use in this function)
2019-11-28, 02:53   #49
ewmayer
2ω=0

Sep 2002
República de California

266148 Posts

Quote:
 Originally Posted by kracker Trying to compile under MSYS2/windows, getting 'SIGHUP' undeclared errors. Code: ../src/fermat_mod_square.c:1869:18: error: 'SIGHUP' undeclared (first use in this function) ../src/mers_mod_square.c:2382:18: error: 'SIGHUP' undeclared (first use in this function) ../src/Mlucas.c:182:21: error: 'SIGHUP' undeclared (first use in this function)
I no longer have access to a Windows machine of any kind - perhaps SIGHUP has no proper analog in Windows? Anyhow, quick workaround is to simply comment out any clauses giving such errors and recompile. E.g. in Mlucas.c:
Code:
void sig_handler(int signo)
{
if (signo == SIGINT) {
fprintf(stderr,"received SIGINT signal.\n");	sprintf(cbuf,"received SIGINT signal.\n");
} else if(signo == SIGTERM) {
fprintf(stderr,"received SIGTERM signal.\n");	sprintf(cbuf,"received SIGTERM signal.\n");
//	} else if(signo == SIGHUP) {
//		fprintf(stderr,"received SIGHUP signal.\n");	sprintf(cbuf,"received SIGHUP signal.\n");
}
// Toggle a global to allow desired code sections to detect signal-received and take appropriate action:
MLUCAS_KEEP_RUNNING = 0;
}
..and similarly in the other 2 files which define signal handlers and are giving errors.

 Thread Tools

 Similar Threads Thread Thread Starter Forum Replies Last Post ewmayer Mlucas 96 2019-10-16 12:55 Damian Mlucas 17 2017-11-13 18:12 ewmayer Mlucas 3 2017-06-17 11:18 Lorenzo Mlucas 52 2016-03-13 08:45 delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 12:35.

Sat Oct 23 12:35:58 UTC 2021 up 92 days, 7:04, 0 users, load averages: 0.89, 1.17, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.