![]() |
|
|
#34 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
35×31 Posts |
Looking at it now. I've got an assert in some debug code after several thousand iterations. Preliminary evidence suggests the bug occurs when one thread completes ALL its work before one of the other threads even starts its work. Obviously, this is more likely using faster FMA3 hardware. Also, more likely running more threads and smaller FFT sizes.
|
|
|
|
|
|
#35 | |
|
Dec 2011
New York, U.S.A.
97 Posts |
Quote:
For what it's worth, I was unable to recreate the error on small numbers. Another thought... Although a natural inclination when seeing a problem with multi-threaded code is to think "timing error", there's some behavior here that somewhat contradicts that notion. This bug is incredible consistent. On numbers where it doesn't occur, it seems to never occur under any circumstances. On numbers where it does happen, it seems to always occur when running FMA3. There's no middle ground. For example, while this test case, and similar R5 numbers cause this error, equivalent sized (same FFT size) S5 numbers do not. Nor do Proth numbers, as far as we know. Last fiddled with by AG5BPilot on 2017-02-11 at 20:59 |
|
|
|
|
|
|
#36 |
|
Nov 2010
52 Posts |
One extra bit of information (probably unrelated?), but could be fixed in a future LLR release:
Accidentally, I used the command line option "-t 4" instead of the correct "-t4" to run on 4 threads, and found this results in a segfault (on both Mac and Linux). Oddly the backtrace points to the FFT code. NB this seems to be using the single-threaded code (at least no multithreaded message is printed in the FFT info string). Code:
[ibethune@cirrus-login0 tmp]$ ./sllr64 -d -q64598*5^2318694-1 -t 4 Base prime factor(s) taken : 5 Starting N+1 prime test of 64598*5^2318694-1 Using FMA3 FFT length 512K, Pass1=256, Pass2=2K, a = 3 Segmentation fault (core dumped) [ibethune@cirrus-login0 tmp]$ gdb sllr64 core.46831 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /lustre/home/z04/ibethune/tmp/sllr64...done. [New LWP 46831] Core was generated by `./sllr64 -d -q64598*5^2318694-1 -t 4'. Program terminated with signal 11, Segmentation fault. #0 0x000000000076b8d3 in ??08FF () (gdb) bt #0 0x000000000076b8d3 in ??08FF () #1 0x0000000004038dc0 in ?? () #2 0x000000000319c000 in ?? () #3 0x0000000000000001 in ?? () #4 0x0000000000000001 in ?? () #5 0x0000000000000001 in ?? () #6 0x0000000003c22c40 in ?? () #7 0x0000000002e49ae0 in ?? () #8 0x0000000000451b32 in gwfftfftmul () #9 0x0000000000458248 in gwsquare_carefully () #10 0x000000000042d43d in plusminustest () #11 0x000000000043cea9 in process_num () #12 0x000000000043fbc1 in primeContinue () #13 0x0000000000443379 in linuxContinue () #14 0x0000000000400a36 in main () - Iain |
|
|
|
|
|
#37 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
753310 Posts |
This is proving very difficult. The assert is triggered by an inconsistent state. All my attempts at determining how it got into the inconsistent state have been unsuccessful.
I'm also not certain the assert I'm looking into is related to the bad residues. |
|
|
|
|
|
#38 |
|
Einyen
Dec 2003
Denmark
2×1,579 Posts |
I tested on an 8 core Haswell-E 5960X. The strange thing is it worked with -t8 all 8 cores on it. Besides that it works with 576K FMA FFT and with 512K AVX FFT.
I tried to find a smaller number that showed this error, but could not find one. I tested k*5^10000, k*5^100000, k*5^200000, k*5^300000, k*5^500000, k*5^1000000. Could this be an issue with ONLY 512K FMA FFT? 512K FMA3 FFT 1 core: worked 2 cores: failed 4 cores: failed 8 cores: worked !!! 4 cores + Errorcheck=1: failed (Max roundoff: 0.1718750000 to 0.2500000000) 4 cores + 576K FMA3 FFT: worked 4 cores + 512K AVX FFT: worked (using -oCpuSupportsFMA3=0) Here is the outputs from the runs: lresults3_8_18.txt Last fiddled with by ATH on 2017-02-13 at 02:42 |
|
|
|
|
|
#39 |
|
Einyen
Dec 2003
Denmark
2·1,579 Posts |
More evidence against 512K FMA3 FFT:
I chose another random number using the same FFT and the residue is different for 1, 2 and 4 core runs. Code:
cllr64.exe -d -q"33333*7^1917000-1" Base prime factor(s) taken : 7 Starting N+1 prime test of 33333*7^1917000-1 Using FMA3 FFT length 512K, Pass1=256, Pass2=2K, a = 3 33333*7^1917000-1 is not prime. RES64: 67B17D3BA87D2187. OLD64: A3CB4F4E952B5E9E Time : 12814.739 sec. cllr64.exe -d -t2 -q"33333*7^1917000-1" Base prime factor(s) taken : 7 Starting N+1 prime test of 33333*7^1917000-1 Using FMA3 FFT length 512K, Pass1=256, Pass2=2K, 2 threads, a = 3 33333*7^1917000-1 is not prime. RES64: 421E07AE2563A8EC. OLD64: 3310EEA60BDEF4CD Time : 6590.002 sec. cllr64.exe -d -t4 -q"33333*7^1917000-1" Base prime factor(s) taken : 7 Starting N+1 prime test of 33333*7^1917000-1 Using FMA3 FFT length 512K, Pass1=256, Pass2=2K, 4 threads, a = 3 33333*7^1917000-1 is not prime. RES64: 4E4F693184E7F608. OLD64: 57A513302A6BDC21 Time : 3370.069 sec. |
|
|
|
|
|
#40 |
|
Einyen
Dec 2003
Denmark
2·1,579 Posts |
I found another failure at 384K FMA3 FFT.
I also tested 1 number at each of these FFTs without any failures: 480K, 448K, 400K, 256K, 128K. Code:
cllr64.exe -d -q"66666*5^1560000-1" Base prime factor(s) taken : 5 Starting N+1 prime test of 66666*5^1560000-1 Using FMA3 FFT length 384K, Pass1=256, Pass2=1536, a = 3 66666*5^1560000-1 is not prime. RES64: 7417FF24F2FBCEB9. OLD64: 5C47FD6ED8F36C28 Time : 7048.131 sec. cllr64.exe -d -t4 -q"66666*5^1560000-1" Base prime factor(s) taken : 5 Starting N+1 prime test of 66666*5^1560000-1 Using FMA3 FFT length 384K, Pass1=256, Pass2=1536, 4 threads, a = 3 66666*5^1560000-1 is not prime. RES64: 110BC8AEEB81CEBB. OLD64: 348E7C54B5566DC5 Time : 2422.209 sec. |
|
|
|
|
|
#41 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×33×109 Posts |
Pass 1 was 256 again
|
|
|
|
|
|
#42 |
|
Dec 2011
New York, U.S.A.
11000012 Posts |
If a fix for the multi-threading is either not imminent or not possible, does it make sense to create a release of LLR with only the PRP speed enhancement but without the multi-threading feature? I realize we could simply not use multi-threading, but I'm concerned that until we know definitively what the root cause is that there may be an underlying problem that also affects single threaded operations. Also, we'd like to start using the faster PRP code without worrying about someone inevitably deciding it would nice to try out the multi-threading feature on their own.
Part of what I'm looking for is assurance that the multi-threading bug isn't a symptom of a larger problem in LLR. |
|
|
|
|
|
#43 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
949410 Posts |
The new version should be out in a day or two (it has to be linked to 28.12 gwnum library; no change in LLR code). 28.12 gwnum library is included with the Prime95 v.29.1 source.
|
|
|
|
|
|
#44 | |
|
Jul 2003
13×47 Posts |
Quote:
with llr v3.8.19 ./llr64 -d -t8 -q"66666*5^1560000-1" Base prime factor(s) taken : 5 Starting N+1 prime test of 66666*5^1560000-1 Using FMA3 FFT length 384K, Pass1=256, Pass2=1536, 8 threads, a = 3 66666*5^1560000-1 is not prime. RES64: 7417FF24F2FBCEB9. OLD64: 5C47FD6ED8F36C28 Time : 1411.185 sec. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| LLR Version 3.8.19 released | Jean Penné | Software | 11 | 2017-02-23 08:52 |
| LLR Version 3.8.17 released [deprecated] | Jean Penné | Software | 18 | 2017-02-01 12:49 |
| LLR Version 3.8.14 released (deprecated) | Jean Penné | Software | 67 | 2015-05-02 07:24 |
| Prime95 version 28.5 (deprecated, use 28.7) | Prime95 | Software | 162 | 2015-04-05 16:19 |
| LLR beta Version 3.8.13 (deprecated) | Jean Penné | Software | 111 | 2015-01-26 21:41 |