mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

chalsall 2019-09-11 18:00

[QUOTE=TheJudger;525680]unless you're really sure about the increased sieve size limit I suggest to stay with 1024... "Doesn't crash" and "passes the builtin selftest" doesn't prove that 2047 is OK. 2048 crashes hard because of an (integer) overflow...[/QUOTE]

Could factors be missed? I'm deploying hansl's resultant executable on many CoLab and Kaggle instances, and am finding the expected number of factors.

But before I put this into "production", should I revert to an unmodified build?

nomead 2019-09-11 18:01

[QUOTE=TheJudger;525680]Hi,
"Doesn't crash" and "passes the builtin selftest" doesn't prove that 2047 is OK. 2048 crashes hard because of an (integer) overflow...

Oliver[/QUOTE]
Fair enough, the difference between 1024 and 2047 isn't that big anymore anyway, less than 1%. Although, I've been running it at 2047 since January, mostly on the >1G exponents on mersenne.ca, and while I haven't collected stats for all that time, the last 2.5 months are: 189717 factors found for 12729313 exponents, 14903 ppm / 1,49 %. Of course that doesn't *prove* that it doesn't miss any factors anywhere... but now I'm retesting some ranges that have been independently factored (2-55 bits, with something else than mfaktc), I can check whether it has missed anything there thus far.

TheJudger 2019-09-11 18:16

Actually I didn't spent much time on thinking about this. I'm not sure wheter TF to 2[SUP]55[/SUP] hits the wrap around or not.
I don't have any evidence that 2047 doesn't work, I'm just not a fan of "changed a number and it seems to work" changes.

Oliver

nomead 2019-09-11 19:03

[QUOTE=chalsall;525683]Could factors be missed? I'm deploying hansl's resultant executable on many CoLab and Kaggle instances, and am finding the expected number of factors.

But before I put this into "production", should I revert to an unmodified build?[/QUOTE]

The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing. Ordinarily the limit is 128, and you can get the same effect by setting GPUSieveSize=128 (or even less) in mfaktc.ini, as you like.

TheJudger 2019-09-11 19:36

[QUOTE=nomead;525687]The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing.[/QUOTE]

That perfectly explains why it crashes at 2048... :sad:

Oliver

Prime95 2019-09-11 20:15

The GPU sieve code was written ages ago. I've long forgotten its assumptions and limitations. My biggest fear is that the code requires the sieve size to be a power of two. Someone really needs to scrutinize the code before using 2047.

kriesel 2019-09-11 21:50

[QUOTE=Prime95;525689]The GPU sieve code was written ages ago. I've long forgotten its assumptions and limitations. My biggest fear is that the code requires the sieve size to be a power of two. Someone really needs to scrutinize the code before using 2047.[/QUOTE]
Here's the entirety of the mfaktc.ini section on gpusievesize as the program is distributed. Seems like if a power of two was a requirement it would have been disclosed in a comment there, same as for the requirement of another parameter to be a multiple of 8. What happens if one uses 5 or 6 or 7 or 15 or 31 or 63 or 127 in an unaltered executable, other than performance variations? Seems like 7 vs. 8 would have the highest odds of showing mischief in a test.
[CODE]
# GPUSieveSize defines how big of a GPU sieve we use (in M bits).
#
# Minimum: GPUSieveSize=4
# Maximum: GPUSieveSize=128
#
# Default: GPUSieveSize=64

GPUSieveSize=64
[/CODE]Skimming the gpusieve.cu code, nothing jumps out at me as requiring a power of 2 there, although that means essentially nothing; I don't know CUDA programming. There are some things that seem to me to indicate the sieve size should be a multiple of a considerable power of two, but it's in units of M bits (2[SUP]20[/SUP] bits).

hansl 2019-09-11 23:38

[QUOTE=nomead;525687]The build makes no changes to the code itself, it's just a modification to the bounds check in mfaktc.ini file processing. Ordinarily the limit is 128, and you can get the same effect by setting GPUSieveSize=128 (or even less) in mfaktc.ini, as you like.[/QUOTE]
Right, the only "code" change was increasing the maximum allowable limit for that in src/params.h
This enables the max GPUSieveSize to go beyond 128, but its still subject to what is set in mfaktc.ini.

So, if its a concern, you should be fine just to lower the setting in mfaktc.ini to 1024 or whatever (assuming you are even using the bundled mfaktc.ini and not your own).
Building it again shouldn't be necessary.

nomead 2019-09-11 23:54

I've mentioned in this thread what I did back in January, and no big objections were raised back then. Well, this only gives me a better motivation to finally build a bigger test set of exponents and bit depths of already found factors, as extracted from the mersenne.ca database.

But here's another data point. I've started taking the >1G range to 64 bits, as in, finding all the factors up to that point, not just the first ones that can be found. This was recently exhaustively done by hansl to 55 bits, but there are still factors waiting to be found between 55 and 64. So, for the bits and pieces I've managed to do between about 2800 and 3000 million, in the short time I've been running this job thus far, comparing against already known factors:
1240663 exponents
786890 factors (in database) - none were missed in this search.
42547 new factors
There were already some factors between 55 and 64 bits in length in the database, of course. Most notably, for quite a long way above 2900 million, someone had already factored everything up to 64 bits.

The question then becomes, is any amount of testing ever enough?

kriesel 2019-09-12 00:35

[QUOTE=nomead;525706]I've mentioned in this thread what I did back in January, and no big objections were raised back then. Well, this only gives me a better motivation to finally build a bigger test set of exponents and bit depths of already found factors, as extracted from the mersenne.ca database....
The question then becomes, is any amount of testing ever enough?[/QUOTE]Production use is somewhat informative but is not the equal of well designed testing. A test would look something like, run a set of exponents for a bit level on a non-power of two gpusievesize. Then run the no-factor-found survivors again in the same bit level with a power of two gpusievesize, and see how many more factors are found. If any, there's probably an issue. If none, maybe the sample size was too small and there's a small issue that's gone undetected.

Prime95 2019-09-12 03:21

[QUOTE=nomead;525706]The question then becomes, is any amount of testing ever enough?[/QUOTE]

The first step should be to look at the code and convince oneself that 2047 ought to work. No one has done that. So.....

I took a look at the code. I see no reason why a setting of 2047 would not work. In fact, changing gpu_sieve_size to an unsigned int might allow for values up to 4095. Changing to unsigned long long could allow much higher values. There would also be some typecasts required to avoid compiler warnings.

The real limit is imposed by CUDA on this code line:

[CODE] SegSieve<<<(sieve_size + block_size - 1) / block_size, threadsPerBlock>>>((uint8 *)mystuff->d_bitarray, (uint8 *)mystuff->d_sieve_info, primes_per_thread);
[/CODE]

What is CUDA's limit on the first parameter ((sieve_size + block_size - 1) / block_size)?


All times are UTC. The time now is 22:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.