mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Mlucas (https://www.mersenneforum.org/forumdisplay.php?f=118)
-   -   Mlucas v18 available (https://www.mersenneforum.org/showthread.php?t=24100)

M344587487 2019-04-01 13:44

1 Attachment(s)
Got some errors in the build log compiling on a Ryzen 1700, log attached.
[code]gcc -c -O3 -DUSE_AVX2 -mavx2 -DUSE_THREADS ../src/*.c >& build.log[/code]Errors with and without -mavx2, and without -DUSE_AVX2. Tried txz and tbz2 archives to rule out download corruption. Haven't investigated beyond that but I can if necessary. gcc (Ubuntu 8.3.0-3ubuntu1) 8.3.0, the default gcc on a daily build of Ubuntu 19.04. There's a chance it's a problem due to being a daily build, but this is the only issue I've come across so far.

ewmayer 2019-04-01 19:40

[QUOTE=M344587487;512374]Got some errors in the build log compiling on a Ryzen 1700, log attached.
[code]gcc -c -O3 -DUSE_AVX2 -mavx2 -DUSE_THREADS ../src/*.c >& build.log[/code]Errors with and without -mavx2, and without -DUSE_AVX2. Tried txz and tbz2 archives to rule out download corruption. Haven't investigated beyond that but I can if necessary. gcc (Ubuntu 8.3.0-3ubuntu1) 8.3.0, the default gcc on a daily build of Ubuntu 19.04. There's a chance it's a problem due to being a daily build, but this is the only issue I've come across so far.[/QUOTE]

Thanks for the log - here my summary:

o See a bunch of -Wformat-overflow warnings, those prints need to be replaced with buffer-overflow-proof ones;

o The "cast from pointer to integer of different size" warnings are benign, the statements in question are just checking alignment of various pointers using the bottom few bits, but I suppose chaging the (uint32) casts to casts to a pointer-sized int can't hurt;

Ah, I think I see why you "encountered errors" - the version of gcc you are using (btw, what version is it?) clearly is aggressively warning for potential buffer-overflow-unsafe string I/O - which is good. But those previously-unseen warnings are, among other things, flagging Mlucas-internal error-print statements, so when you do the case-insensitive 'grep -i error build.log' as per the README page, you now see these print statements containing the string "ERROR" appearing by way of the aforementioned -Wformat-overflow warnings:
[i]
125: sprintf(cbuf , "*** ERROR: Non-numeric character encountered in -shift argument %s.\n", stFlag);
139: sprintf(cbuf , "*** ERROR: -shift argument %s overflows uint64 field.\n", stFlag);
154: sprintf(cbuf, "Error writing residue to restart file %s.\n",RESTARTFILE);
169: sprintf(cbuf,"ERROR: bit_depth_done of %u > max. allowed of %u. The ini file entry was %s\n", bit_depth_done, MAX_FACT_BITS, in_line);
183: sprintf(cbuf, "ERROR: Illegal 'fftlen = ' argument - suggested FFT length for this p = %u. The ini file entry was %s\n", kblocks, in_line);
225: sprintf(cbuf, "ERROR: read_ppm1_savefiles Failed on savefile %s!\n",RESTARTFILE);
239: sprintf(cbuf, "ERROR: convert_res_bytewise_FP Failed on savefile %s!\n",RESTARTFILE);
288: sprintf(cbuf,"ERROR: unable to rename %s restart file ==> %s ... skipping every-million-iteration restart file archiving\n",RANGEFILE, STATFILE);
302: sprintf(cbuf, "ERROR: unable to open restart file %s for write of checkpoint data.\n",RESTARTFILE);
446: sprintf(cbuf , "*** ERROR: -f argument %s overflows integer field.\n", stFlag);
474: sprintf(cbuf , "*** ERROR: -m argument %s overflows integer field.\n", stFlag);
488: sprintf(cbuf , "*** ERROR: Non-numeric character encountered in -nthread argument %s.\n", stFlag);
502: sprintf(cbuf , "*** ERROR: -nthread argument %s overflows integer field.\n", stFlag);
516: sprintf(cbuf , "*** ERROR: Non-numeric character encountered in -prp argument %s.\n", stFlag);
530: sprintf(cbuf , "*** ERROR: -prp argument %s overflows integer field.\n", stFlag);
544: sprintf(cbuf , "*** ERROR: Non-numeric character encountered in -shift argument %s.\n", stFlag);
558: sprintf(cbuf , "*** ERROR: -shift argument %s overflows uint64 field.\n", stFlag);
572: sprintf(cbuf , "*** ERROR: Non-numeric character encountered in -radset argument %s.\n", stFlag);
586: sprintf(cbuf , "*** ERROR: -radset argument %s overflows integer field.\n", stFlag);
600: sprintf(cbuf , "*** ERROR: Non-numeric character encountered in -fftlen argument %s.\n", stFlag);
614: sprintf(cbuf , "*** ERROR: -fftlen argument %s overflows integer field.\n", stFlag);
628: sprintf(cbuf , "*** ERROR: Non-numeric character encountered in -iters argument %s.\n", stFlag);
642: sprintf(cbuf , "*** ERROR: -iters argument %s overflows integer field.\n", stFlag);
[/i]
The quick workaround is to simply drop '-i' from the grep, when I do that to your build.log it comes up empty (and in fact I'm not sure why I ever used the '-i' in that context to begin with, maybe some long-ago-used compiler used e.g. 'Error' in its messagin). Are you able to link?

M344587487 2019-04-01 20:19

That's amusing, I blindly followed the instruction to only link if grep comes up empty and didn't pay close enough attention to the log. It works fine, sorry for lighting the bat signal unnecessarily. April fool? ;)

ewmayer 2019-04-01 20:28

[QUOTE=M344587487;512395]That's amusing, I blindly followed the instruction to only link if grep comes up empty and didn't pay close enough attention to the log. It works fine, sorry for lighting the bat signal unnecessarily. April fool? ;)[/QUOTE]

No worries, it was still useful in reminding me that I should fix up all those possible-buffer-overflow and point-to-shorter-int-cast warnings, and I need to get rid of the '-i' in my grep-your-build.log instructions on the README page.

ewmayer 2019-04-12 20:45

[QUOTE=Lorenzo;512110]Just FYI
[CODE]root@lorenzoArm:~/mersenne/arm8# ./Mlucas_v18_c2simd -fftlen 18432 -iters 100 -cpu 0:7
[snip]
100 iterations of M337615261 with FFT length 18874368 = 18432 K, final residue shift count = 321038982
Res64: 69FF742497F16902. AvgMaxErr = 0.003191964. MaxErr = 0.375000000. Program: E18.0
Res mod 2^36 = 19729049858
Res mod 2^35 - 1 = 20161851329
Res mod 2^36 - 1 = 1044285462
Clocks = 00:00:21.067
[snip]
100 iterations of M337615261 with FFT length 18874368 = 18432 K, final residue shift count = 171176556
Res64: 2258A7342961B652. AvgMaxErr = 0.002428013. MaxErr = 0.281250000. Program: E18.0
Res mod 2^36 = 17874138706
Res mod 2^35 - 1 = 28069471175
Res mod 2^36 - 1 = 53816329185
Clocks = 00:00:21.009
NTHREADS = 8
[B][COLOR="Red"]ERROR: at line 1540 of file ../src/Mlucas.c
Assertion failed: Return value of shift_word(): unpadded-array-index out of range![/COLOR][/B][/CODE][/QUOTE]

This bug has been fixed in the patch I uploaded to the ftp server last week. It only affects runs at FFT lengths >= 16M (16384K), and since it doesn't permit the program to create an mlucas.cfg file entry for the FFT lengths in question, no actual user runs should be affected, since you can't do a production run at an FFT length without a cfg-file entry for said length.

kriesel 2019-10-15 12:27

V18 is the current latest release, yes? How about making this thread sticky.

Dylan14 2019-10-15 20:56

Colab test of v18, spot check
 
I made Mlucas v18 successfully in Colab using the reverse tunnel code that chalsall made. It built with no problems, however, as a spot check, one should run



[CODE]./Mlucas -fftlen 192 -iters 100 -radset 0[/CODE]When I did that, I get an excessive round off warning:


[CODE]root@colab_test:/content/mlucas/mlucas_v18# ./Mlucas -fftlen 192 -iters 100 -radset 0 > test1.txt

Mlucas 18.0

http://www.mersenneforum.org/mayer/README.html

INFO: using 64-bit-significand form of floating-double rounding constant for sca lar-mode DNINT emulation.
INFO: testing FFT radix tables...

Mlucas selftest running.....

/****************************************************************************/

INFO: Unable to find/open mlucas.cfg file in r+ mode ... creating from scratch.
No CPU set or threadcount specified ... running single-threaded.
INFO: Maximum recommended exponent for this runlength = 3888516; p[ = 3888517]/p max_rec = 1.0000002572.
specified FFT length 192 K is less than recommended 208 K for this p.
M3888517: using FFT length 192K = 196608 8-byte floats, initial residue shift count = 1942965
this gives an average 19.778020222981770 bits per digit
Using complex FFT radices 192 16 32
radix16_dif_dit_pass pfetch_dist = 4096
radix16_wrapper_square: pfetch_dist = 4096
Using 1 threads in carry step
M3888517 Roundoff warning on iteration 46, maxerr = 0.437500000000
100 iterations of M3888517 with FFT length 196608 = 192 K, final residue shift c ount = 3620533
Res64: 579D593FCE0707B2. AvgMaxErr = 0.003006696. MaxErr = 0.437500000. Program: E18.0
Res mod 2^36 = 67881076658
Res mod 2^35 - 1 = 21674900403
Res mod 2^36 - 1 = 42893438228
Clocks = 00:00:00.466
***** Excessive level of roundoff error detected - this radix set will not be us ed. *****

Done ...

[/CODE]but with radset 1 it works fine:


[CODE]root@colab_test:/content/mlucas/mlucas_v18# ./Mlucas -fftlen 192 -iters 100 -radset 1 > test1.txt

Mlucas 18.0

http://www.mersenneforum.org/mayer/README.html

INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation.
INFO: testing FFT radix tables...

Mlucas selftest running.....

/****************************************************************************/

INFO: Unable to find/open mlucas.cfg file in r+ mode ... creating from scratch.
No CPU set or threadcount specified ... running single-threaded.
INFO: Maximum recommended exponent for this runlength = 3888516; p[ = 3888517]/pmax_rec = 1.0000002572.
specified FFT length 192 K is less than recommended 208 K for this p.
M3888517: using FFT length 192K = 196608 8-byte floats, initial residue shift count = 1942965
this gives an average 19.778020222981770 bits per digit
Using complex FFT radices 192 32 16
radix16_dif_dit_pass pfetch_dist = 4096
radix16_wrapper_square: pfetch_dist = 4096
Using 1 threads in carry step
100 iterations of M3888517 with FFT length 196608 = 192 K, final residue shift count = 3620533
Res64: 579D593FCE0707B2. AvgMaxErr = 0.002918527. MaxErr = 0.375000000. Program: E18.0
Res mod 2^36 = 67881076658
Res mod 2^35 - 1 = 21674900403
Res mod 2^36 - 1 = 42893438228
Clocks = 00:00:00.393

Done ...

[/CODE]I am presently running a test on a known prime, and I will report back when it's finished.

ewmayer 2019-10-15 21:14

@Dylan14 -- Thanks for the build attempt and info. The ROE is benign ... Mlucas uses the self-tests for both performance and accuracy-testing. When it hits ROE >= 0.4375 during one of the self-tests it will simply omit the particular FFT-radix set from considering for writing to the mlucas.cfg file for said FFT length. The only problem is if that set of FFT radices happens to also give the best performance at the FFT length in question. What you really want to do is to run the the code in self-test (benchmarking) mode - to do this at a specific single FFT length of interest, do like so, using you example at 192K:

./Mlucas -iters 100 -fftlen 192

Then have a look at the resulting 1-line entry in the mlucas.cfg file.

What kind of CPU has that hardware you are testing on? What sort of processor, how many cores?

Dylan14 2019-10-15 21:26

[QUOTE=ewmayer;528101]@Dylan14 -- Thanks for the build attempt and info. The ROE is benign ... Mlucas uses the self-tests for both performance and accuracy-testing. When it hits ROE >= 0.4375 during one of the self-tests it will simply omit the particular FFT-radix set from considering for writing to the mlucas.cfg file for said FFT length. The only problem is if that set of FFT radices happens to also give the best performance at the FFT length in question. What you really want to do is to run the the code in self-test (benchmarking) mode - to do this at a specific single FFT length of interest, do like so, using you example at 192K:

./Mlucas -iters 100 -fftlen 192

Then have a look at the resulting 1-line entry in the mlucas.cfg file.

What kind of CPU has that hardware you are testing on? What sort of processor, how many cores?[/QUOTE]

This is the proc/cpuinfo file of the machine in question:

[CODE] CPU -- root@colab_test# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU @ 2.20GHz
stepping : 0
microcode : 0x1
cpu MHz : 2200.000
cache size : 56320 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips : 4400.00
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU @ 2.20GHz
stepping : 0
microcode : 0x1
cpu MHz : 2200.000
cache size : 56320 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips : 4400.00
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
[/CODE]

ewmayer 2019-10-15 21:33

Thanks - so it's a single-physical-core with hyperthreading and avx2/fma3, hence 2 entries in /proc/cpuinfo. Thus your compile should have used
[i]gcc -c -O3 -DUSE_AVX2 -mavx2 -DUSE_THREADS ../src/*.c >& build.log
grep error build.log[/i]
[Assuming above grep comes up empty] [i]gcc -o Mlucas *.o -lm -lpthread -lrt [/i]

Unlike mprime, Mlucas does often get some added speedup from using the virtual cores enabled by HT - a quick spot-check of this would be the following 2 single-FFT-length self-tests:

./Mlucas -fftlen 192 -iters 100
./Mlucas -fftlen 192 -iters 100 -cpu 0:1

Then post the 2 resulting mlucas.cfg lines here.

Dylan14 2019-10-16 00:38

Here are the resulting mlucas.cfg lines from the Colab build:


[CODE]192 msec/iter = 3.20 ROE[avg,max] = [0.002939732, 0.375000000] rad\
ices = 48 8 16 16 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^\
36-1 = 579D593FCE0707B2, 21674900403, 42893438228
192 msec/iter = 2.39 ROE[avg,max] = [0.002939732, 0.375000000] rad\
ices = 48 8 16 16 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^\
36-1 = 579D593FCE0707B2, 21674900403, 42893438228[/CODE]The first entry is the single threaded run, the second is the 2 threaded run.

And the program does show that M3021377 is prime, as I expected (this was another test to make sure it works).


All times are UTC. The time now is 11:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.