mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2017-02-11, 20:34   #34
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011001102 Posts
Default

Looking at it now. I've got an assert in some debug code after several thousand iterations. Preliminary evidence suggests the bug occurs when one thread completes ALL its work before one of the other threads even starts its work. Obviously, this is more likely using faster FMA3 hardware. Also, more likely running more threads and smaller FFT sizes.
Prime95 is offline   Reply With Quote
Old 2017-02-11, 20:53   #35
AG5BPilot
 
AG5BPilot's Avatar
 
Dec 2011
New York, U.S.A.

97 Posts
Default

Quote:
Originally Posted by Prime95 View Post
... Also, more likely running more threads and smaller FFT sizes.
Thanks for looking into it so quickly, George.

For what it's worth, I was unable to recreate the error on small numbers.

Another thought... Although a natural inclination when seeing a problem with multi-threaded code is to think "timing error", there's some behavior here that somewhat contradicts that notion. This bug is incredible consistent. On numbers where it doesn't occur, it seems to never occur under any circumstances. On numbers where it does happen, it seems to always occur when running FMA3. There's no middle ground. For example, while this test case, and similar R5 numbers cause this error, equivalent sized (same FFT size) S5 numbers do not. Nor do Proth numbers, as far as we know.

Last fiddled with by AG5BPilot on 2017-02-11 at 20:59
AG5BPilot is offline   Reply With Quote
Old 2017-02-11, 22:21   #36
IBethune
 
Nov 2010

52 Posts
Default

One extra bit of information (probably unrelated?), but could be fixed in a future LLR release:

Accidentally, I used the command line option "-t 4" instead of the correct "-t4" to run on 4 threads, and found this results in a segfault (on both Mac and Linux). Oddly the backtrace points to the FFT code. NB this seems to be using the single-threaded code (at least no multithreaded message is printed in the FFT info string).

Code:
[ibethune@cirrus-login0 tmp]$ ./sllr64 -d -q64598*5^2318694-1 -t 4
Base prime factor(s) taken : 5
Starting N+1 prime test of 64598*5^2318694-1
Using FMA3 FFT length 512K, Pass1=256, Pass2=2K, a = 3
Segmentation fault (core dumped)
[ibethune@cirrus-login0 tmp]$ gdb sllr64 core.46831 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /lustre/home/z04/ibethune/tmp/sllr64...done.
[New LWP 46831]
Core was generated by `./sllr64 -d -q64598*5^2318694-1 -t 4'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000076b8d3 in ??08FF ()
(gdb) bt
#0  0x000000000076b8d3 in ??08FF ()
#1  0x0000000004038dc0 in ?? ()
#2  0x000000000319c000 in ?? ()
#3  0x0000000000000001 in ?? ()
#4  0x0000000000000001 in ?? ()
#5  0x0000000000000001 in ?? ()
#6  0x0000000003c22c40 in ?? ()
#7  0x0000000002e49ae0 in ?? ()
#8  0x0000000000451b32 in gwfftfftmul ()
#9  0x0000000000458248 in gwsquare_carefully ()
#10 0x000000000042d43d in plusminustest ()
#11 0x000000000043cea9 in process_num ()
#12 0x000000000043fbc1 in primeContinue ()
#13 0x0000000000443379 in linuxContinue ()
#14 0x0000000000400a36 in main ()
Cheers

- Iain
IBethune is offline   Reply With Quote
Old 2017-02-12, 05:27   #37
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·53·71 Posts
Default

Quote:
Originally Posted by AG5BPilot View Post
Thanks for looking into it so quickly, George.
This is proving very difficult. The assert is triggered by an inconsistent state. All my attempts at determining how it got into the inconsistent state have been unsuccessful.

I'm also not certain the assert I'm looking into is related to the bad residues.
Prime95 is offline   Reply With Quote
Old 2017-02-13, 02:39   #38
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

35·13 Posts
Default

I tested on an 8 core Haswell-E 5960X. The strange thing is it worked with -t8 all 8 cores on it. Besides that it works with 576K FMA FFT and with 512K AVX FFT.

I tried to find a smaller number that showed this error, but could not find one. I tested k*5^10000, k*5^100000, k*5^200000, k*5^300000, k*5^500000, k*5^1000000. Could this be an issue with ONLY 512K FMA FFT?

512K FMA3 FFT
1 core: worked
2 cores: failed
4 cores: failed
8 cores: worked !!!

4 cores + Errorcheck=1: failed (Max roundoff: 0.1718750000 to 0.2500000000)
4 cores + 576K FMA3 FFT: worked
4 cores + 512K AVX FFT: worked (using -oCpuSupportsFMA3=0)

Here is the outputs from the runs:
lresults3_8_18.txt

Last fiddled with by ATH on 2017-02-13 at 02:42
ATH is offline   Reply With Quote
Old 2017-02-13, 06:59   #39
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

61278 Posts
Default

More evidence against 512K FMA3 FFT:

I chose another random number using the same FFT and the residue is different for 1, 2 and 4 core runs.

Code:
cllr64.exe -d -q"33333*7^1917000-1"
Base prime factor(s) taken : 7
Starting N+1 prime test of 33333*7^1917000-1
Using FMA3 FFT length 512K, Pass1=256, Pass2=2K, a = 3
33333*7^1917000-1 is not prime.  RES64: 67B17D3BA87D2187.  OLD64: A3CB4F4E952B5E9E  Time : 12814.739 sec.

cllr64.exe -d -t2 -q"33333*7^1917000-1"
Base prime factor(s) taken : 7
Starting N+1 prime test of 33333*7^1917000-1
Using FMA3 FFT length 512K, Pass1=256, Pass2=2K, 2 threads, a = 3
33333*7^1917000-1 is not prime.  RES64: 421E07AE2563A8EC.  OLD64: 3310EEA60BDEF4CD  Time : 6590.002 sec.

cllr64.exe -d -t4 -q"33333*7^1917000-1"
Base prime factor(s) taken : 7
Starting N+1 prime test of 33333*7^1917000-1
Using FMA3 FFT length 512K, Pass1=256, Pass2=2K, 4 threads, a = 3
33333*7^1917000-1 is not prime.  RES64: 4E4F693184E7F608.  OLD64: 57A513302A6BDC21  Time : 3370.069 sec.
ATH is offline   Reply With Quote
Old 2017-02-14, 01:08   #40
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

35·13 Posts
Default

I found another failure at 384K FMA3 FFT.

I also tested 1 number at each of these FFTs without any failures: 480K, 448K, 400K, 256K, 128K.



Code:
cllr64.exe -d -q"66666*5^1560000-1"
Base prime factor(s) taken : 5
Starting N+1 prime test of 66666*5^1560000-1
Using FMA3 FFT length 384K, Pass1=256, Pass2=1536, a = 3
66666*5^1560000-1 is not prime.  RES64: 7417FF24F2FBCEB9.  OLD64: 5C47FD6ED8F36C28  Time : 7048.131 sec.

cllr64.exe -d -t4 -q"66666*5^1560000-1"
Base prime factor(s) taken : 5
Starting N+1 prime test of 66666*5^1560000-1
Using FMA3 FFT length 384K, Pass1=256, Pass2=1536, 4 threads, a = 3
66666*5^1560000-1 is not prime.  RES64: 110BC8AEEB81CEBB.  OLD64: 348E7C54B5566DC5  Time : 2422.209 sec.
ATH is offline   Reply With Quote
Old 2017-02-14, 10:21   #41
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

23×3×5×72 Posts
Default

Pass 1 was 256 again
henryzz is online now   Reply With Quote
Old 2017-02-19, 18:08   #42
AG5BPilot
 
AG5BPilot's Avatar
 
Dec 2011
New York, U.S.A.

97 Posts
Default

If a fix for the multi-threading is either not imminent or not possible, does it make sense to create a release of LLR with only the PRP speed enhancement but without the multi-threading feature? I realize we could simply not use multi-threading, but I'm concerned that until we know definitively what the root cause is that there may be an underlying problem that also affects single threaded operations. Also, we'd like to start using the faster PRP code without worrying about someone inevitably deciding it would nice to try out the multi-threading feature on their own.

Part of what I'm looking for is assurance that the multi-threading bug isn't a symptom of a larger problem in LLR.
AG5BPilot is offline   Reply With Quote
Old 2017-02-19, 21:16   #43
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

22·23·103 Posts
Default

The new version should be out in a day or two (it has to be linked to 28.12 gwnum library; no change in LLR code). 28.12 gwnum library is included with the Prime95 v.29.1 source.
Batalov is offline   Reply With Quote
Old 2017-02-20, 12:05   #44
lalera
 
lalera's Avatar
 
Jul 2003

26316 Posts
Default

Quote:
Originally Posted by ATH View Post
I found another failure at 384K FMA3 FFT.

I also tested 1 number at each of these FFTs without any failures: 480K, 448K, 400K, 256K, 128K.



Code:
cllr64.exe -d -q"66666*5^1560000-1"
Base prime factor(s) taken : 5
Starting N+1 prime test of 66666*5^1560000-1
Using FMA3 FFT length 384K, Pass1=256, Pass2=1536, a = 3
66666*5^1560000-1 is not prime.  RES64: 7417FF24F2FBCEB9.  OLD64: 5C47FD6ED8F36C28  Time : 7048.131 sec.

cllr64.exe -d -t4 -q"66666*5^1560000-1"
Base prime factor(s) taken : 5
Starting N+1 prime test of 66666*5^1560000-1
Using FMA3 FFT length 384K, Pass1=256, Pass2=1536, 4 threads, a = 3
66666*5^1560000-1 is not prime.  RES64: 110BC8AEEB81CEBB.  OLD64: 348E7C54B5566DC5  Time : 2422.209 sec.
hi,
with llr v3.8.19
./llr64 -d -t8 -q"66666*5^1560000-1"
Base prime factor(s) taken : 5
Starting N+1 prime test of 66666*5^1560000-1
Using FMA3 FFT length 384K, Pass1=256, Pass2=1536, 8 threads, a = 3
66666*5^1560000-1 is not prime. RES64: 7417FF24F2FBCEB9. OLD64: 5C47FD6ED8F36C28 Time : 1411.185 sec.
lalera is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
LLR Version 3.8.19 released Jean Penné Software 11 2017-02-23 08:52
LLR Version 3.8.17 released [deprecated] Jean Penné Software 18 2017-02-01 12:49
LLR Version 3.8.14 released (deprecated) Jean Penné Software 67 2015-05-02 07:24
Prime95 version 28.5 (deprecated, use 28.7) Prime95 Software 162 2015-04-05 16:19
LLR beta Version 3.8.13 (deprecated) Jean Penné Software 111 2015-01-26 21:41

All times are UTC. The time now is 16:33.


Fri Jul 16 16:33:11 UTC 2021 up 49 days, 14:20, 1 user, load averages: 1.56, 1.51, 1.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.