mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-10-24, 18:54   #34
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

145110 Posts
Default

I found in some occasion roundoff error on Haswell I5 chip.
When I add in worktodo

PRP=FFT2=xxxK,x,x,xxxxx,x error will disappear
But I never sow time difference .

Lets say Prime95 say that is 384K in length, and I add that candidate is 400K in length.
Will 400K increase time of PRP for that candidate?
pepi37 is offline   Reply With Quote
Old 2018-10-24, 18:58   #35
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I will investigate.
Quote:
Originally Posted by Asimov
The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” (I found it!) but “That’s funny …”
I tried posting just the above quotes, but the forum rejected it for lack of original content. Hopefully this paragraph will prove I'm sentient.
chalsall is offline   Reply With Quote
Old 2018-10-26, 11:38   #36
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2·3·11·73 Posts
Default

Got my segfault while running a PRP.
The issue happened when a benchmark started in the middle of the PRP test.

Nothing happened during the previous LL-D test.

Now I will download the new build.


Code:
[Work thread Oct 26 11:22] Iteration: 14110000 / 81950377 [17.21%], ms/iter:  9.027, ETA: 7d 02:06
[Main thread Oct 26 11:23] Benchmarking multiple workers to tune FFT selection.
[Work thread Oct 26 11:23] Stopping PRP test of M81950377 at iteration 14118224 [17.22%]
[Work thread Oct 26 11:23] Worker stopped while running needed benchmarks.
[Main thread Oct 26 11:23] Timing 4320K FFT, 2 cores, 1 worker.  Average times:  8.80 ms.  Total throughput: 113.64 iter/sec.
[Main thread Oct 26 11:23] Timing 4320K FFT, 2 cores, 1 worker.  Average times:  8.93 ms.  Total throughput: 112.01 iter/sec.
[Main thread Oct 26 11:24] Timing 4320K FFT, 2 cores, 1 worker.  Average times:  8.96 ms.  Total throughput: 111.58 iter/sec.
[Main thread Oct 26 11:24] Timing 4320K FFT, 2 cores, 1 worker.  Average times:  8.84 ms.  Total throughput: 113.13 iter/sec.
[Main thread Oct 26 11:24] Timing 4320K FFT, 2 cores, 1 worker.  Average times:  8.82 ms.  Total throughput: 113.32 iter/sec.
[Main thread Oct 26 11:24] Timing 4320K FFT, 2 cores, 1 worker.  Average times:  9.62 ms.  Total throughput: 104.00 iter/sec.
[Main thread Oct 26 11:24] Timing 4320K FFT, 2 cores, 1 worker.  Average times:  9.39 ms.  Total throughput: 106.51 iter/sec.
[Main thread Oct 26 11:25] Timing 4320K FFT, 2 cores, 1 worker.  Average times:  9.50 ms.  Total throughput: 105.24 iter/sec.
*** Error in `./mprime': double free or corruption (!prev): 0x00007f7c100471c0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81499)[0x7f7c18f3a499]
./mprime[0x45edbc]
./mprime[0x440a3a]
./mprime[0x441aa7]
./mprime[0x44986e]
./mprime[0x47cbca]
/lib64/libpthread.so.0(+0x7de5)[0x7f7c1990fde5]
/lib64/libc.so.6(clone+0x6d)[0x7f7c18fb7bad]
======= Memory map: ========
00400000-026a0000 r-xp 00000000 103:02 18997                             /home/ec2-user/mprime/29.5/mprime
0289f000-028a1000 r-xp 0229f000 103:02 18997                             /home/ec2-user/mprime/29.5/mprime
028a1000-028dc000 rwxp 022a1000 103:02 18997                             /home/ec2-user/mprime/29.5/mprime
028dc000-02903000 rwxp 00000000 00:00 0 
036da000-036fb000 rwxp 00000000 00:00 0                                  [heap]
7f7bfc000000-7f7bfc4fe000 rwxp 00000000 00:00 0 
7f7bfc4fe000-7f7c00000000 ---p 00000000 00:00 0 
7f7c00000000-7f7c019e9000 rwxp 00000000 00:00 0 
7f7c019e9000-7f7c04000000 ---p 00000000 00:00 0 
7f7c075e6000-7f7c075fc000 r-xp 00000000 103:02 2338                      /lib64/libresolv-2.17.so
7f7c075fc000-7f7c077fb000 ---p 00016000 103:02 2338                      /lib64/libresolv-2.17.so
7f7c077fb000-7f7c077fc000 r-xp 00015000 103:02 2338                      /lib64/libresolv-2.17.so
7f7c077fc000-7f7c077fd000 rwxp 00016000 103:02 2338                      /lib64/libresolv-2.17.so
7f7c077fd000-7f7c077ff000 rwxp 00000000 00:00 0 
7f7c077ff000-7f7c07800000 ---p 00000000 00:00 0 
7f7c07800000-7f7c08000000 rwxp 00000000 00:00 0 
7f7c08000000-7f7c0b9f5000 rwxp 00000000 00:00 0 
7f7c0b9f5000-7f7c0c000000 ---p 00000000 00:00 0 
7f7c0c000000-7f7c0d9e6000 rwxp 00000000 00:00 0 
7f7c0d9e6000-7f7c10000000 ---p 00000000 00:00 0 
7f7c10000000-7f7c11f7c000 rwxp 00000000 00:00 0 
7f7c11f7c000-7f7c14000000 ---p 00000000 00:00 0 
7f7c140e6000-7f7c140eb000 r-xp 00000000 103:02 2326                      /lib64/libnss_dns-2.17.so
7f7c140eb000-7f7c142eb000 ---p 00005000 103:02 2326                      /lib64/libnss_dns-2.17.so
7f7c142eb000-7f7c142ec000 r-xp 00005000 103:02 2326                      /lib64/libnss_dns-2.17.so
7f7c142ec000-7f7c142ed000 rwxp 00006000 103:02 2326                      /lib64/libnss_dns-2.17.so
7f7c142ed000-7f7c142f9000 r-xp 00000000 103:02 2328                      /lib64/libnss_files-2.17.so
7f7c142f9000-7f7c144f8000 ---p 0000c000 103:02 2328                      /lib64/libnss_files-2.17.so
7f7c144f8000-7f7c144f9000 r-xp 0000b000 103:02 2328                      /lib64/libnss_files-2.17.so
7f7c144f9000-7f7c144fa000 rwxp 0000c000 103:02 2328                      /lib64/libnss_files-2.17.so
7f7c144fa000-7f7c14500000 rwxp 00000000 00:00 0 
7f7c14500000-7f7c14501000 ---p 00000000 00:00 0 
7f7c14501000-7f7c14d01000 rwxp 00000000 00:00 0 
7f7c174a0000-7f7c174b6000 r-xp 00000000 103:02 2250                      /lib64/libgcc_s-7-20170915.so.1
7f7c174b6000-7f7c176b5000 ---p 00016000 103:02 2250                      /lib64/libgcc_s-7-20170915.so.1
7f7c176b5000-7f7c176b6000 rwxp 00015000 103:02 2250                      /lib64/libgcc_s-7-20170915.so.1
7f7c176b6000-7f7c176b7000 ---p 00000000 00:00 0 
7f7c176b7000-7f7c17eb7000 rwxp 00000000 00:00 0 
7f7c17eb7000-7f7c17eb8000 ---p 00000000 00:00 0 
7f7c17eb8000-7f7c186b8000 rwxp 00000000 00:00 0 
7f7c186b8000-7f7c186b9000 ---p 00000000 00:00 0 
7f7c186b9000-7f7c18eb9000 rwxp 00000000 00:00 0 
7f7c18eb9000-7f7c1907c000 r-xp 00000000 103:02 2310                      /lib64/libc-2.17.so
7f7c1907c000-7f7c1927b000 ---p 001c3000 103:02 2310                      /lib64/libc-2.17.so
7f7c1927b000-7f7c1927f000 r-xp 001c2000 103:02 2310                      /lib64/libc-2.17.so
7f7c1927f000-7f7c19281000 rwxp 001c6000 103:02 2310                      /lib64/libc-2.17.so
7f7c19281000-7f7c19286000 rwxp 00000000 00:00 0 
7f7c19286000-7f7c192fb000 r-xp 00000000 103:02 3420                      /usr/lib64/libgmp.so.10.2.0
7f7c192fb000-7f7c194fa000 ---p 00075000 103:02 3420                      /usr/lib64/libgmp.so.10.2.0
7f7c194fa000-7f7c194fc000 rwxp 00074000 103:02 3420                      /usr/lib64/libgmp.so.10.2.0
7f7c194fc000-7f7c194fe000 r-xp 00000000 103:02 2316                      /lib64/libdl-2.17.so
7f7c194fe000-7f7c196fe000 ---p 00002000 103:02 2316                      /lib64/libdl-2.17.so
7f7c196fe000-7f7c196ff000 r-xp 00002000 103:02 2316                      /lib64/libdl-2.17.so
7f7c196ff000-7f7c19700000 rwxp 00003000 103:02 2316                      /lib64/libdl-2.17.so
7f7c19700000-7f7c19707000 r-xp 00000000 103:02 2340                      /lib64/librt-2.17.so
7f7c19707000-7f7c19906000 ---p 00007000 103:02 2340                      /lib64/librt-2.17.so
7f7c19906000-7f7c19907000 r-xp 00006000 103:02 2340                      /lib64/librt-2.17.so
7f7c19907000-7f7c19908000 rwxp 00007000 103:02 2340                      /lib64/librt-2.17.so
7f7c19908000-7f7c1991f000 r-xp 00000000 103:02 2336                      /lib64/libpthread-2.17.so
7f7c1991f000-7f7c19b1e000 ---p 00017000 103:02 2336                      /lib64/libpthread-2.17.so
7f7c19b1e000-7f7c19b1f000 r-xp 00016000 103:02 2336                      /lib64/libpthread-2.17.so
7f7c19b1f000-7f7c19b20000 rwxp 00017000 103:02 2336                      /lib64/libpthread-2.17.so
7f7c19b20000-7f7c19b24000 rwxp 00000000 00:00 0 
7f7c19b24000-7f7c19c25000 r-xp 00000000 103:02 2318                      /lib64/libm-2.17.so
7f7c19c25000-7f7c19e24000 ---p 00101000 103:02 2318                      /lib64/libm-2.17.so
7f7c19e24000-7f7c19e25000 r-xp 00100000 103:02 2318                      /lib64/libm-2.17.so
7f7c19e25000-7f7c19e26000 rwxp 00101000 103:02 2318                      /lib64/libm-2.17.so
7f7c19e26000-7f7c19e48000 r-xp 00000000 103:02 2303                      /lib64/ld-2.17.so
7f7c1a03c000-7f7c1a041000 rwxp 00000000 00:00 0 
7f7c1a044000-7f7c1a047000 rwxp 00000000 00:00 0 
7f7c1a047000-7f7c1a048000 r-xp 00021000 103:02 2303                      /lib64/ld-2.17.so
7f7c1a048000-7f7c1a049000 rwxp 00022000 103:02 2303                      /lib64/ld-2.17.so
7f7c1a049000-7f7c1a04a000 rwxp 00000000 00:00 0 
7ffec9b5f000-7ffec9b80000 rwxp 00000000 00:00 0                          [stack]
7ffec9bcd000-7ffec9bd0000 r--p 00000000 00:00 0                          [vvar]
7ffec9bd0000-7ffec9bd2000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Annullato
ET_ is offline   Reply With Quote
Old 2018-10-27, 01:31   #37
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2·1,579 Posts
Default

The EC2 instance I posted logs from earlier with all the roundoff errors finished the DC and it matched despite the errors.

It now got a new exponent 30K higher than the last one, and it STILL chose 4M FFT, and already got 7 new roundoff errors: 295b3-2.txt

Setting it back to 4200K manually again.

Added this to prime.txt to try and prevent this issue from now on:
SoftCrossover=1.0
SoftCrossoverAdjust=-0.008


Edit: Another instance finished its DC successfully, got a new exponent: 77.97M and chose 4M FFT and quickly got 5 roundoff errors. It is now also set to 4200K manually.

Last fiddled with by ATH on 2018-10-27 at 01:41
ATH is offline   Reply With Quote
Old 2018-10-27, 09:13   #38
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

148610 Posts
Default

Quote:
Originally Posted by ATH View Post
The EC2 instance I posted logs from earlier with all the roundoff errors finished the DC and it matched despite the errors.
From your posted file:
Code:
[Work thread Oct 26 14:42:43] Iteration: 77900000 / 77947687 [99.938821%], roundoff: 0.243, ms/iter: 13.860, ETA: 00:11:00
[Work thread Oct 26 14:42:43] Possible hardware errors have occurred during the test! 24 ROUNDOFF > 0.4.
[Work thread Oct 26 14:42:43] Confidence in final result is excellent.
[Work thread Oct 26 14:53:46] Gerbicz error check passed at iteration 77946729.
[Work thread Oct 26 14:54:00] Gerbicz error check passed at iteration 77947629.
[Work thread Oct 26 14:54:04] Gerbicz error check passed at iteration 77947678.
[Work thread Oct 26 14:54:15] M77947687 is not prime.  RES64: 4BCF9784E9A93DEE. Wh8: 34742F74,19637789,00001800
We see it differently, yes, that is really valid RES64 with incredible high probability because of my error checks. And for this you wouldn't even need to do/see roundoff checks in the run. If my check fails, then you need to fall back to a previous iteration, and you lost 1M iterations work (in your run), but the confidence is still high. How many times the check failed in that test? Seeing the roundoff errors could be very useful, when we decide the fft tablelimits (for a new code/processor ?).

Ofcourse there is a trade off here: with higher FFT the iteration time is (in general) larger, but there is fewer number of fall backs.

Last fiddled with by R. Gerbicz on 2018-10-27 at 09:14 Reason: typo
R. Gerbicz is offline   Reply With Quote
Old 2018-10-27, 13:50   #39
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

61268 Posts
Default

Yes there were 24 errors before I manually switched to 4200K FFT, it is very nice that it still works fine.

But it is still a bug in this version, it should either choose a higher FFT or disable the roundoff error messages.

Full log since I switched to AVX-512 on that instance: 295b3.txt

Code:
[Work thread Oct 21 09:15:59] Iteration: 45256282/77947687, Possible error: round off (0.4344111008) > 0.42188
[Work thread Oct 21 11:26:42] Iteration: 45857274/77947687, Possible error: round off (0.4226175989) > 0.42188
[Work thread Oct 21 14:56:41] Iteration: 46817196/77947687, Possible error: round off (0.4234568715) > 0.42188
[Work thread Oct 21 16:00:42] Iteration: 47110891/77947687, Possible error: round off (0.4270896037) > 0.42188
[Work thread Oct 21 19:56:46] Iteration: 48197311/77947687, Possible error: round off (0.428642894) > 0.42188
[Work thread Oct 21 20:17:11] Iteration: 48291440/77947687, Possible error: round off (0.429588787) > 0.42188
[Work thread Oct 21 22:09:44] Iteration: 48809957/77947687, Possible error: round off (0.430043636) > 0.42188
[Work thread Oct 21 23:16:45] Iteration: 49117544/77947687, Possible error: round off (0.4256841092) > 0.42188
[Work thread Oct 22 04:15:09] Iteration: 50490754/77947687, Possible error: round off (0.4303004887) > 0.42188
[Work thread Oct 22 09:16:28] Iteration: 51847612/77947687, Possible error: round off (0.4338173701) > 0.42188
[Work thread Oct 22 09:55:51] Iteration: 52027255/77947687, Possible error: round off (0.4343059627) > 0.42188
[Work thread Oct 22 09:58:39] Iteration: 52040154/77947687, Possible error: round off (0.454604666) > 0.42188
[Work thread Oct 22 12:18:37] Iteration: 52683329/77947687, Possible error: round off (0.4277743109) > 0.42188
[Work thread Oct 22 13:44:48] Iteration: 53078067/77947687, Possible error: round off (0.4273455066) > 0.42188
[Work thread Oct 22 16:58:38] Iteration: 53968523/77947687, Possible error: round off (0.4307389637) > 0.42188
[Work thread Oct 22 19:02:21] Iteration: 54535700/77947687, Possible error: round off (0.4228008512) > 0.42188
[Work thread Oct 22 22:48:33] Iteration: 55573616/77947687, Possible error: round off (0.4268757585) > 0.42188
[Work thread Oct 23 01:15:29] Iteration: 56247746/77947687, Possible error: round off (0.4692008444) > 0.42188
[Work thread Oct 23 03:08:10] Iteration: 56738296/77947687, Possible error: round off (0.4224648114) > 0.42188
[Work thread Oct 23 04:05:36] Iteration: 57000989/77947687, Possible error: round off (0.4358626009) > 0.42188
[Work thread Oct 23 04:27:30] Iteration: 57101702/77947687, Possible error: round off (0.425993915) > 0.42188
[Work thread Oct 23 09:57:31] Iteration: 58617472/77947687, Possible error: round off (0.4493439792) > 0.42188
[Work thread Oct 23 11:18:36] Iteration: 58990146/77947687, Possible error: round off (0.4567632981) > 0.42188
[Work thread Oct 23 15:07:51] Iteration: 60041159/77947687, Possible error: round off (0.4326492791) > 0.42188

Last fiddled with by ATH on 2018-10-27 at 13:50
ATH is offline   Reply With Quote
Old 2018-10-27, 14:39   #40
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2·743 Posts
Default

Quote:
Originally Posted by ATH View Post
Yes there were 24 errors before I manually switched to 4200K FFT, it is very nice that it still works fine.

But it is still a bug in this version, it should either choose a higher FFT or disable the roundoff error messages.
No, asked the number of lines, when the check failed, I think you should see such line:
Code:
ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration
...
with the iteration number. In that partial file I don't see such line.
Only the number of roundoff errors doesn't matter, I think that with larger p we could see even more errors. What matters is how many times you need to rollback because that increase the additional overhead ( 0.2% ) of my check. And ofcourse there is a relation with these numbers (expected number of roundoff errors and the rollbacks for a given p,FFT), so these are not independent numbers.

Last fiddled with by R. Gerbicz on 2018-10-27 at 14:42 Reason: more info, typo
R. Gerbicz is offline   Reply With Quote
Old 2018-10-27, 16:25   #41
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·11·137 Posts
Default

Quote:
Originally Posted by ATH View Post
But it is still a bug in this version, it should either choose a higher FFT or disable the roundoff error messages.

[CODE][Work thread Oct 21 09:15:59] Iteration: 45256282/77947687, Possible error: round off (0.4344111008) > 0.42188
Yes, this is unexpected. I used the same FFT crossovers as for AVX FFTs, which for a 4M FFT is 77990000. I do not see why AVX-512 FFTs have worse round-off behavior than AVX FFTs. More to investigate....

@Gerbicz: It is important for me to get the FFT crossovers right as the gwnum FFT routines are used for LL, LLR, PFGW etc. where Gerbicz error checking is not used. I could (should?) change prime95 to not even look for roundoff errors during a Gerbicz PRP test, esp. since calculating the roundoff error is not free.

Last fiddled with by Prime95 on 2018-10-27 at 16:32
Prime95 is online now   Reply With Quote
Old 2018-10-27, 16:59   #42
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

27168 Posts
Default

Quote:
Originally Posted by Prime95 View Post
@Gerbicz: It is important for me to get the FFT crossovers right as the gwnum FFT routines are used for LL, LLR, PFGW etc. where Gerbicz error checking is not used. I could (should?) change prime95 to not even look for roundoff errors during a Gerbicz PRP test, esp. since calculating the roundoff error is not free.
Yes, we don't need those roundoff error calculations at least for PRP, Preda's Gpu Owl has already removed it for Prp. As written you would still need only when you have new code/processor to get the code's new FFT crossovers (basically it doesn't change a lot).
R. Gerbicz is offline   Reply With Quote
Old 2018-10-27, 17:03   #43
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

1,451 Posts
Default

@Prime95 : if you erase those roundoff error from code: will that affect all PPR testing or only PRP testing on base2? ( since Gerbicz error test is only for base2)
I use Prime95 in CRUS searching as in my personal search for primes, on base 2 but also on any other base
pepi37 is offline   Reply With Quote
Old 2018-10-27, 17:22   #44
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

148610 Posts
Default

Quote:
Originally Posted by pepi37 View Post
@Prime95 : if you erase those roundoff error from code: will that affect all PPR testing or only PRP testing on base2? ( since Gerbicz error test is only for base2)
I use Prime95 in CRUS searching as in my personal search for primes, on base 2 but also on any other base
Ofcourse I've spoken about prp with my error checking, For base!=2 you don't have this, hence for that you should keep those roundoff error checks.
R. Gerbicz is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 18:20.


Sun Aug 1 18:20:39 UTC 2021 up 9 days, 12:49, 0 users, load averages: 3.02, 3.02, 2.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.