![]() |
![]() |
#23 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
29·173 Posts |
![]() |
![]() |
![]() |
![]() |
#24 |
"Mihai Preda"
Apr 2015
54916 Posts |
![]() |
![]() |
![]() |
![]() |
#25 |
"Mihai Preda"
Apr 2015
135310 Posts |
![]()
Please either finish the exponent on 6.x and start a fresh one on 7.x, or:
carefully manually rename the 6.x savefile (.owl) to the new numbered name format .prp <exp>-<iteration>.prp where iteration should be on 9 digits as above. Feel free to make a backup beforehand. BTW, 7.x is not ready for general use yet, still plenty of rough corners I'm working on now. |
![]() |
![]() |
![]() |
#26 |
"Mihai Preda"
Apr 2015
3·11·41 Posts |
![]()
If you feel adventurous, please help with the new P-1 testing.
The bug: the most important bug we want to trigger is: the candidate *has a factor* that should be detected acording to B1/B2 bounds, but is not detected. How to trigger this bug: 1. please choose an exponent (of various size) with a known factor. You should know which are the minimally required B1/B2 to find this factor (these can be found from the factorization of factor-1). 2. repeatedly run PRP on this exponent while changing: - FFT size - -maxAlloc (e.g. use 3 values: max allowed on the GPU, a very small e.g. 1GB or 800M, and something in between like 3GB or 7GB) - anything else you feel like changing (-carry long, etc) 3. run with different bounds: - first set a B1 large enough that it should find the factor by itself (in first stage). Run first-stage. Feel free to repeatedly interrupt it (Ctrl-C) and reload, etc. - next set a B1 that is not large enough, and check detection in second stage. Again interrupt/reload at will. 4. Use your imagination to torment the P-1 in other ways. But in general always run with some bounds that should detect the factor, and if it's ever *not detected*, report the bug. If you do find a bug, try to reproduce it yourself. This helps identify the conditions that trigger it. In general, before anybody should do actual P-1 work, please run at least a few such tests. Otherwise P-1 may be broken (blind, does not find factors) and we're just wasting cycles imagining we're doing P-1. thanks PS: this testing is useful for both P-1 stages, not only first-stage, as second-stage changed too. PPS: a good approach is to start with some simple test-case, verify that it works correctly, and next complicate it a bit, and repeat. Don't start with something fancy when maybe something trivially simple would break it just as well. Last fiddled with by preda on 2020-09-29 at 02:10 |
![]() |
![]() |
![]() |
#28 | |
"Mihai Preda"
Apr 2015
3·11·41 Posts |
![]() Quote:
https://github.com/preda/gpuowl/blob...st-pm1/pm1.txt For testing it's a good idea to use exponents with lower B1, B2 values as they'll complete faster. |
|
![]() |
![]() |
![]() |
#29 |
Apr 2010
Over the rainbow
43×59 Posts |
![]()
so, smooth factor found on mersenne.ca?
|
![]() |
![]() |
![]() |
#30 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
116318 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#31 |
"Mihai Preda"
Apr 2015
3·11·41 Posts |
![]()
Because I need to specify all the time "first stage" or "second stage" of P-1, from now on I'm going to use this notation to identify the stages: "P1" denoting P-1 FS, and "P2" meaning P-1 SS.
I simplified the P2 implementation; now there are only two cases for memory use in P2, let's call them "low memory" and "high memory". a) "low memory" uses D=210, where only 24 "big" buffers are allocated for P2 (plus a few auxilliarry buffers). b) "high memory" uses D=2310, where 240 "big" buffers are allocated for P2 (plus the same nb. of auxilliaries as before) The cost of P2 is dominated by the number "n" of primes between B1 and B2, which require about 0.85 * n muls (this value is pretty much the same between the low/high memory variants), plus an "overhead" for walking in steps of size D from B1 to B2, where each step requires 2 muls. It turns out that this overhead is about 2% (of the whole P2) for the "high memory" case, and about 20% for the "low memory" case, and this is why the "high memory" P2 is more efficient (by about 20%) than the low-memory P2. Long story short, it is good for P2 to be able to run in the "240 buffers" mode (D=2310). At the wavefront one "big" buffer is 44MB, so the "high memory" case would require about -maxAlloc 11G Notes: "D" above is a parameter of P2 -- it indicates the "step" of walking from B1 to B2. The values above are small primorials: 210 = 2*3*5*7 2310 = 2*3*5*7*11 "big" buffer: the buffers used have fixed length N given by the FFT size (e.g. N=5.5M for FFT=5.5M), but contain either 32-bit integers ("small" buffers) or 64-bit FP ("big" buffers). Last fiddled with by preda on 2020-09-29 at 13:04 |
![]() |
![]() |
![]() |
#32 | |
"Mihai Preda"
Apr 2015
54916 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#33 |
Jul 2003
wear a mask
2×5×157 Posts |
![]()
Here's a list I gleaned from the mersenne.ca "Factors missed by P-1" list . Maybe try these for testing?
Code:
Pminus1=1,2,10002859,-1,55000,838750,64 Pminus1=1,2,21150827,-1,6133,596857,66 Pminus1=1,2,31919773,-1,1901,84737,66 Pminus1=1,2,48701273,-1,570000,570000,69 Pminus1=1,2,50077721,-1,280000,280000,69 Pminus1=1,2,61684171,-1,13381,2443933,74 Pminus1=1,2,72713617,-1,3847,1047701,81 Pminus1=1,2,89281183,-1,65173,1303669,80 Pminus1=1,2,95675581,-1,4493,1143563,75 Pminus1=1,2,102086261,-1,1733,7253,74 Pminus1=1,2,102227777,-1,10601,14159,74 Pminus1=1,2,102001051,-1,37123,3078469,72 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
GpuOwl PRP-Proof changes | preda | GpuOwl | 20 | 2020-10-17 06:51 |
gpuowl: runtime error | SELROC | GpuOwl | 59 | 2020-10-02 03:56 |
gpuOWL for Wagstaff | GP2 | GpuOwl | 22 | 2020-06-13 16:57 |
gpuowl tuning | M344587487 | GpuOwl | 14 | 2018-12-29 08:11 |
How to interface gpuOwl with PrimeNet | preda | PrimeNet | 2 | 2017-10-07 21:32 |