mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2017-05-23, 05:38   #144
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

32·29·37 Posts
Default

Quote:
Originally Posted by preda View Post
David, so is your impression that there is a software bug involved?
No, we can not conclude that. Our impression is that the tool works correctly, and very nice. The list of the results in fact proved that, in a way. For the occasional mismatches, they mean nothing. The original residues may be wrong, or the hardware may have very seldom failures. That is why we need the random shifting implemented.

Last fiddled with by LaurV on 2017-05-23 at 05:40
LaurV is offline   Reply With Quote
Old 2017-05-23, 16:34   #145
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

Quote:
Originally Posted by LaurV View Post
No, we can not conclude that. Our impression is that the tool works correctly, and very nice. The list of the results in fact proved that, in a way. For the occasional mismatches, they mean nothing. The original residues may be wrong, or the hardware may have very seldom failures. That is why we need the random shifting implemented.
My impression so far is there are no repeatible bugs with the software, however it is reveals that some of my hardware (1 card in particular) was close to failure limits but other software was not stressing it enough to reveal them.
airsquirrels is offline   Reply With Quote
Old 2017-05-25, 19:00   #146
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by LaurV View Post
For the occasional mismatches, they mean nothing. The original residues may be wrong, or the hardware may have very seldom failures. That is why we need the random shifting implemented.
I've been trying to keep up with triple-checking the mismatches returned by AirSquirrels lately, although some may be further back in my queue. Right now my queue of work is only about a week out so any other assignments I have now should be done somewhat soon.
Madpoo is offline   Reply With Quote
Old 2017-05-26, 12:20   #147
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default Offset AKA shifting

Quote:
Originally Posted by LaurV View Post
The original residues may be wrong, or the hardware may have very seldom failures. That is why we need the random shifting implemented.
I finally gave in to Laur's persistence :), and added offset. (and bumped version to v0.3)

The randomly selected offset is printed when a new exponent is started, e.g.
Code:
LL FFT 4096K (1024*2048*2) of 60000757 (14.31 bits/word) offset 45732555 iteration 1120000
The offset is persisted in checkpoint files (the last line is human-readable, e.g. "tail -n1 cXXX.ll") and is fixed for the exponent (can't be changed while the exponent is ongoing).

There may be bugs, as usual, with this new feature.

The perf impact is about 0.5%.

The offset can be "forced" to a given value with command line flag -offset <value> (will take effect on a new exponent).
preda is offline   Reply With Quote
Old 2017-05-26, 12:25   #148
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101010110112 Posts
Default

Quote:
Originally Posted by kracker View Post
Using GCC 6.3.0/msys2/M76100027
I'll tinker around with it some more later though..
@kraker, it appears the problem you were seeing was due to compilation on a 32-bit system, or for some reason your c++ compiler was using a 4-byte "long" (a bit unusual, but possible). Hopefully fixed.
preda is offline   Reply With Quote
Old 2017-05-26, 20:46   #149
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19×613 Posts
Default

Quote:
Originally Posted by preda View Post
I finally gave in to Laur's persistence :), and added offset. (and bumped version to v0.3)

The randomly selected offset is printed when a new exponent is started, e.g.
Code:
LL FFT 4096K (1024*2048*2) of 60000757 (14.31 bits/word) offset 45732555 iteration 1120000
The offset is persisted in checkpoint files (the last line is human-readable, e.g. "tail -n1 cXXX.ll") and is fixed for the exponent (can't be changed while the exponent is ongoing).

There may be bugs, as usual, with this new feature.
1. The easy way to remove the 'fixed for the exponent' restriction is to circular-shift the integer residue back to 0-offset before writing to the checkpoint file. This is easy to do if one uses a packed-bits savefile format for the residue, since one needs to do the cshift in a sense anyway in order to properly compute the bottom 64 bits of the interim residue. BTW, if your above mods don't do this and instead print the 'bottom' 64 bits of the interim residue based on the double-precision-data-with-shift applied words w[0],w[1],..., I strongly urge you to change this and instead compute the Res64 starting at the proper 'zero bit', wherever it may occur in the current shifted residue. Otherwise it is impossible to cross-compare interim Res64 values for 2 separate runs using different shift values.

2. The most obvious kind of bug here is the sort which bit George way back when in v17 of prime95 - IIRC he neglected to cast the shift value to 64-bit before doing some operation on it (maybe read-initial-shift-value-for-the-run-from-savefile and compute the resulting shift for the current iteration?) which needed an intermediate value to be computed at double the 32-bit width. If you simply write the current shift value to the checkpoint file that shouldn't be an issue, since you only deal with modular doublings on each iteration.
ewmayer is offline   Reply With Quote
Old 2017-05-26, 21:46   #150
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by ewmayer View Post
1. The easy way to remove the 'fixed for the exponent' restriction is to circular-shift the integer residue back to 0-offset before writing to the checkpoint file. This is easy to do if one uses a packed-bits savefile format for the residue, since one needs to do the cshift in a sense anyway in order to properly compute the bottom 64 bits of the interim residue. BTW, if your above mods don't do this and instead print the 'bottom' 64 bits of the interim residue based on the double-precision-data-with-shift applied words w[0],w[1],..., I strongly urge you to change this and instead compute the Res64 starting at the proper 'zero bit', wherever it may occur in the current shifted residue. Otherwise it is impossible to cross-compare interim Res64 values for 2 separate runs using different shift values.
Right now I don't save packed-bits. I save "raw" words (they're even transposed), which is simple and fast. But I don't see a big need to change the offset "in-the-middle" of an exponent.

I do attempt to compute the Res64 correctly. In fact this is checked with -selftest, that the random offset does not affect residues.

Quote:
2. The most obvious kind of bug here is the sort which bit George way back when in v17 of prime95 - IIRC he neglected to cast the shift value to 64-bit before doing some operation on it (maybe read-initial-shift-value-for-the-run-from-savefile and compute the resulting shift for the current iteration?) which needed an intermediate value to be computed at double the 32-bit width. If you simply write the current shift value to the checkpoint file that shouldn't be an issue, since you only deal with modular doublings on each iteration.
Yep, hopefully I don't have that particular bug :)

OTOH what I save in the checkpoint file is only the "initial" offset, not the running offset. The initial offset is needed for writing to the results file, thus has to be saved anyway. On checkpoint load, a modular exponentiation is done to find the "offset at current iteration".
preda is offline   Reply With Quote
Old 2017-05-27, 01:36   #151
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by preda View Post
@kraker, it appears the problem you were seeing was due to compilation on a 32-bit system, or for some reason your c++ compiler was using a 4-byte "long" (a bit unusual, but possible). Hopefully fixed.
It's working now. thanks!

I'm playing around with v0.3 atm.. i'm getting 5.15ms/iter compared to 5ms/iter from v0.2 without offset
kracker is offline   Reply With Quote
Old 2017-05-27, 04:16   #152
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

CF116 Posts
Default

Quote:
Originally Posted by ewmayer View Post
2. The most obvious kind of bug here is the sort which bit George way back when in v17 of prime95
I don't know the details, but there was another funny bug that showed up when I was doing (mostly) unnecessary triple-checks of previously verified work.

If the shift was smaller than the exponent (do I have that right?) it would cause a problem. It was rare, especially once the exponent sizes got larger, but we did find a few cases where it was an issue.

Specifically I noticed it when doing triple checks of every exponent below 3M or whatever, and the shift count in some cases was smaller than 3e6 so I was getting residues that didn't match.

It might not apply to your algorithm... I forget what the exact problem was (if I ever even knew the details)
Madpoo is offline   Reply With Quote
Old 2017-05-27, 05:28   #153
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1164710 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I don't know the details, but there was another funny bug that showed up when I was doing (mostly) unnecessary triple-checks of previously verified work.

If the shift was smaller than the exponent (do I have that right?) it would cause a problem. It was rare, especially once the exponent sizes got larger, but we did find a few cases where it was an issue.
That baffles me - since the number being tested has precisely p bits, the only power-of-2 shifts which make sense are ones < p, since all our shift arithmetic (specifically the shiftcount doubling on each iteration) is done (mod p). I could see not-properly-modded shifts *greater* than p being a problem, but in the normal course of the LL test the shift should always be in [0,p-1].
ewmayer is offline   Reply With Quote
Old 2017-05-27, 05:42   #154
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

25B916 Posts
Default

1. No change needed! We are good. Thanks a billion. No reason to change the shift in the middle of the work (contrarily, I see that as detrimental for the succees of testing, some guy may run a single test 99%, then split in two, change the offset for one, report both LL+DC, for credit reasons, or whatever).

Only some more tests needed before we say if the shift works as expected.

2. The shift should never be larger than the exponent (as Ernst says, it is just a rotation of a value with a single bit set). In fact, we even do not need all the range of p bits, just few different starting points to give the FFT different data to play with, when LL and when DC. If it is more convenient for you to limit the initial shift to 16 or even 8 bits, than do so
LaurV is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 16:57.


Mon Aug 2 16:57:36 UTC 2021 up 10 days, 11:26, 0 users, load averages: 2.16, 2.30, 2.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.