mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-09-06, 22:41   #1343
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default Gpuowl v6.7-4-278407a Windows build

There's a copy of the help output text and the license along with the stripped executable in the 7z file. The main feature addition is P-1 on NVIDIA. Note there's no P-1 save file capability yet, so any P-1 restart is from the beginning.
Attached Files
File Type: 7z gpuowl-v6.7-4-g278407a.7z (409.8 KB, 94 views)
kriesel is online now   Reply With Quote
Old 2019-09-07, 00:30   #1344
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124538 Posts
Default

Quote:
Originally Posted by preda View Post
most recent commit now, tagged v6.8: v6.8-0-gab732a0
msys2 / MINGW64 make did not like that one at all, producing quite a shower of messages about it.
Attached Files
File Type: 7z make-error-messages.7z (1.7 KB, 63 views)

Last fiddled with by kriesel on 2019-09-07 at 00:31
kriesel is online now   Reply With Quote
Old 2019-09-07, 11:51   #1345
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

541910 Posts
Default

On a Win7 Pro x64 system with 12GB of ram, 36GB page file, and gpu with 4GB (GTX1050Ti) gpuowl 6.7-4 produced mad paging for an hour before terminating its attempt to start stage 2 on a 50M P-1 run :
Code:
2019-09-06 23:33:12 50001781      610000 96.08%; 12509 us/sq; ETA 0d 00:05; 73d8a20f091dd815
2019-09-06 23:35:17 50001781      620000 97.66%; 12520 us/sq; ETA 0d 00:03; c2adf07b3ea6c52c
2019-09-06 23:37:22 50001781      630000 99.23%; 12502 us/sq; ETA 0d 00:01; 7fc58198458ecd8a
2019-09-07 00:54:35 Exception gpu_error: OUT_OF_HOST_MEMORY clCreateBuffer at clwrap.cpp:273 makeBuf
_
2019-09-07 00:54:36 Bye
cmd line was gpuowl-win -user kriesel -cpu condorette -use ORIG_X2 -device 0, worktodo entry B1=440000,B2=8360000;PFactor=0,1,2,50001781,-1,73,2. The same system and gpu have run CUDAPm1 to completion both stages on up to 384M exponents.
kriesel is online now   Reply With Quote
Old 2019-09-07, 13:38   #1346
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by kriesel View Post
msys2 / MINGW64 make did not like that one at all, producing quite a shower of messages about it.
Please retry (I attempted a fix)
preda is offline   Reply With Quote
Old 2019-09-07, 13:41   #1347
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by kriesel View Post
On a Win7 Pro x64 system with 12GB of ram, 36GB page file, and gpu with 4GB (GTX1050Ti) gpuowl 6.7-4 produced mad paging for an hour before terminating its attempt to start stage 2 on a 50M P-1 run :
Code:
2019-09-06 23:33:12 50001781      610000 96.08%; 12509 us/sq; ETA 0d 00:05; 73d8a20f091dd815
2019-09-06 23:35:17 50001781      620000 97.66%; 12520 us/sq; ETA 0d 00:03; c2adf07b3ea6c52c
2019-09-06 23:37:22 50001781      630000 99.23%; 12502 us/sq; ETA 0d 00:01; 7fc58198458ecd8a
2019-09-07 00:54:35 Exception gpu_error: OUT_OF_HOST_MEMORY clCreateBuffer at clwrap.cpp:273 makeBuf
_
2019-09-07 00:54:36 Bye
cmd line was gpuowl-win -user kriesel -cpu condorette -use ORIG_X2 -device 0, worktodo entry B1=440000,B2=8360000;PFactor=0,1,2,50001781,-1,73,2. The same system and gpu have run CUDAPm1 to completion both stages on up to 384M exponents.
Did you not specify -maxAlloc?
preda is offline   Reply With Quote
Old 2019-09-07, 14:03   #1348
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by preda View Post
Did you not specify -maxAlloc?
Correct, -maxAlloc not used on that run, which went to over 11GB peak working set and hit as high as 8000 page faults/sec. A retry with -maxAlloc 3072 started shortly after the earlier post has made it to stage 2 and so far has peak working set 182MB.
kriesel is online now   Reply With Quote
Old 2019-09-07, 14:21   #1349
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124538 Posts
Default

Quote:
Originally Posted by preda View Post
Please retry (I attempted a fix)
Thanks, much better; clean compile on gpuowl v6.8-2-g0f3059b in Win7Pro x64 with msys2/mingw64 (indicated as 20180531 in Control Panel/Programs/Programs and Features; not sure if that indicates any subsequent updates).
kriesel is online now   Reply With Quote
Old 2019-09-08, 05:43   #1350
AlsXZ
 
Oct 2009
Ukraine

32 Posts
Default

Sorry if I choose wrong forum thread but can you explain such case: on 1050TI I got near 315 Ghz/day in 24 hours on TF task. The almost same amount Ghz/day I got for one LL or PRP task which takes 15-16 days of computing on the same 1050 TI GPU. So 315 Ghz/days per 1 day or 16 days. Why so bid difference?
AlsXZ is offline   Reply With Quote
Old 2019-09-08, 06:45   #1351
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

There are some other factors (ahem...) at play too, but the simplest way to put it is that LL/PRP and TF work uses different compute capabilities of the GPU core. TF is mostly 32-bit integer (INT32) while the FFT multiplication in LL/PRP uses mostly 64-bit double floating point (FP64) arithmetic. And in Nvidia consumer cards, the FP64 capability has been deliberately limited to protect their datacenter range of cards. So much so, that for every FP64 operation, the card can do 32 FP32 operations in the same time in parallel. On the Pascal cards (GTX10xx) INT32 is somewhat slower than FP32, so that's why the ratio in work done per day isn't quite 1:32. I don't know the exact figure though. But in the Turing generation of cards (GTX16xx and RTX20xx) INT32 is now as fast as FP32, so the difference between TF and LL work GHz-d/d is several times bigger.

For example, my Ryzen 5 3600 can do more LL work per day in mprime than my RTX 2080 (non-Super) can in CUDALucas. So that's why I only do trial factoring on the card.

This is also the reason why the Radeon VII is such a beast at PRP work, because there, the FP64 to FP32 ratio was only limited to 1:4. On AMD consumer cards in general I think the ratio has mostly been 1:16, please correct me because I'm sure I'm wrong.
nomead is offline   Reply With Quote
Old 2019-09-08, 16:39   #1352
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001010112 Posts
Default

Quote:
Originally Posted by nomead View Post
On AMD consumer cards in general I think the ratio has mostly been 1:16, please correct me because I'm sure I'm wrong.
This is why Preda was discouraged back around v3.8 gpuowl, on implementing an NVIDIA compatible gpuowl; GTX10xx correctly seemed slow to him, at 1:32. There is wide variation in the NVIDIA line. Some older NVIDIA gpus have much better DP performance ratios. See https://www.mersenneforum.org/showpo...12&postcount=3
kriesel is online now   Reply With Quote
Old 2019-09-08, 17:41   #1353
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default gpuowl P-1 stage 2 terminology questions

What is a block?
What is a round?
What determines how many rounds are required for stage 2?
What Brent Suyama extension exponent is used? Does it vary?

What determines the maximum exponent that gpuowl can complete in P-1 stage 1 or stage 2, other than run time or available fft lengths? (No doubt available gpu ram is a constraint, but without more info, that does not enable computing or predicting max exponent per gpu model based on gpu specifications. "Just try it" is an unsatisfying answer when run times may be weeks or longer, depending on gpu model and exponent)
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 20:31.


Sun Aug 1 20:31:03 UTC 2021 up 9 days, 15 hrs, 0 users, load averages: 2.56, 2.30, 1.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.