![]() |
|
|
#1343 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
There's a copy of the help output text and the license along with the stripped executable in the 7z file. The main feature addition is P-1 on NVIDIA. Note there's no P-1 save file capability yet, so any P-1 restart is from the beginning.
|
|
|
|
|
|
#1344 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
124538 Posts |
msys2 / MINGW64 make did not like that one at all, producing quite a shower of messages about it.
Last fiddled with by kriesel on 2019-09-07 at 00:31 |
|
|
|
|
|
#1345 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
541910 Posts |
On a Win7 Pro x64 system with 12GB of ram, 36GB page file, and gpu with 4GB (GTX1050Ti) gpuowl 6.7-4 produced mad paging for an hour before terminating its attempt to start stage 2 on a 50M P-1 run :
Code:
2019-09-06 23:33:12 50001781 610000 96.08%; 12509 us/sq; ETA 0d 00:05; 73d8a20f091dd815 2019-09-06 23:35:17 50001781 620000 97.66%; 12520 us/sq; ETA 0d 00:03; c2adf07b3ea6c52c 2019-09-06 23:37:22 50001781 630000 99.23%; 12502 us/sq; ETA 0d 00:01; 7fc58198458ecd8a 2019-09-07 00:54:35 Exception gpu_error: OUT_OF_HOST_MEMORY clCreateBuffer at clwrap.cpp:273 makeBuf _ 2019-09-07 00:54:36 Bye |
|
|
|
|
|
#1346 |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
|
|
|
|
|
|
#1347 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
|
|
|
|
|
|
|
#1348 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Correct, -maxAlloc not used on that run, which went to over 11GB peak working set and hit as high as 8000 page faults/sec. A retry with -maxAlloc 3072 started shortly after the earlier post has made it to stage 2 and so far has peak working set 182MB.
|
|
|
|
|
|
#1349 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
124538 Posts |
|
|
|
|
|
|
#1350 |
|
Oct 2009
Ukraine
32 Posts |
Sorry if I choose wrong forum thread but can you explain such case: on 1050TI I got near 315 Ghz/day in 24 hours on TF task. The almost same amount Ghz/day I got for one LL or PRP task which takes 15-16 days of computing on the same 1050 TI GPU. So 315 Ghz/days per 1 day or 16 days. Why so bid difference?
|
|
|
|
|
|
#1351 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
There are some other factors (ahem...) at play too, but the simplest way to put it is that LL/PRP and TF work uses different compute capabilities of the GPU core. TF is mostly 32-bit integer (INT32) while the FFT multiplication in LL/PRP uses mostly 64-bit double floating point (FP64) arithmetic. And in Nvidia consumer cards, the FP64 capability has been deliberately limited to protect their datacenter range of cards. So much so, that for every FP64 operation, the card can do 32 FP32 operations in the same time in parallel. On the Pascal cards (GTX10xx) INT32 is somewhat slower than FP32, so that's why the ratio in work done per day isn't quite 1:32. I don't know the exact figure though. But in the Turing generation of cards (GTX16xx and RTX20xx) INT32 is now as fast as FP32, so the difference between TF and LL work GHz-d/d is several times bigger.
For example, my Ryzen 5 3600 can do more LL work per day in mprime than my RTX 2080 (non-Super) can in CUDALucas. So that's why I only do trial factoring on the card. This is also the reason why the Radeon VII is such a beast at PRP work, because there, the FP64 to FP32 ratio was only limited to 1:4. On AMD consumer cards in general I think the ratio has mostly been 1:16, please correct me because I'm sure I'm wrong. |
|
|
|
|
|
#1352 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10101001010112 Posts |
Quote:
|
|
|
|
|
|
|
#1353 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
What is a block?
What is a round? What determines how many rounds are required for stage 2? What Brent Suyama extension exponent is used? Does it vary? What determines the maximum exponent that gpuowl can complete in P-1 stage 1 or stage 2, other than run time or available fft lengths? (No doubt available gpu ram is a constraint, but without more info, that does not enable computing or predicting max exponent per gpu model based on gpu specifications. "Just try it" is an unsatisfying answer when run times may be weeks or longer, depending on gpu model and exponent) |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |