mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2019-05-05, 20:58   #1123
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by SELROC View Post
Experimenting with -block sizes for a 332M exponent:


1. the GEC time with block 400K is 2.11~ sec.
2. the GEC time with block 1000K is 4.25~ sec.


The GEC time varies with block size.
Yes because a check involves doing "block-size" additional iterations. E.g. with block=400, 400 additional iterations are done every 400^2==160K iterations, while with block=1000, 1000 additional iterations are done every 1000^2=1M iterations.
preda is offline   Reply With Quote
Old 2019-05-06, 06:15   #1124
SELROC
 

27·3·17 Posts
Default

Quote:
Originally Posted by preda View Post
Yes because a check involves doing "block-size" additional iterations. E.g. with block=400, 400 additional iterations are done every 400^2==160K iterations, while with block=1000, 1000 additional iterations are done every 1000^2=1M iterations.

That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here.
Do I want long unfrequent checks or short frequent checks ?


When using a block of 1000 and a log of 20000, sometimes gpuowl misses to display the OK... (check ...) output, probably because it is in between.
  Reply With Quote
Old 2019-05-06, 10:18   #1125
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by SELROC View Post
That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here.
Do I want long unfrequent checks or short frequent checks ?


When using a block of 1000 and a log of 20000, sometimes gpuowl misses to display the OK... (check ...) output, probably because it is in between.
The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M.

But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan)
preda is offline   Reply With Quote
Old 2019-05-06, 10:37   #1126
SELROC
 

127108 Posts
Default

Quote:
Originally Posted by preda View Post
The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M.

But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan)

Ok so if you confirm that the check is displayed, I may have missed it.
Experimenting further ...
  Reply With Quote
Old 2019-05-06, 13:01   #1127
SELROC
 

17×193 Posts
Default

Quote:
Originally Posted by preda View Post
The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M.

But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan)

PS: The Mersenne number https://www.mersenne.org/report_expo...2252533&full=1 is composite.
Computation took 15 days 10 hours ~ on Radeon VII.
  Reply With Quote
Old 2019-05-06, 14:53   #1128
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011101112 Posts
Default

Quote:
Originally Posted by SELROC View Post
PS: The Mersenne number https://www.mersenne.org/report_expo...2252533&full=1 is composite.
Computation took 15 days 10 hours ~ on Radeon VII.
Impressive speed.

IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing?
Prime95 is offline   Reply With Quote
Old 2019-05-06, 14:58   #1129
SELROC
 

22·67 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Impressive speed.

IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing?



I am currently using ROCm 2.3 which has a performance regression. I bet that with ROCm 2.4 (if they fix the issue) the ETA for 332M will be around 13-14 days.
  Reply With Quote
Old 2019-05-06, 15:34   #1130
SELROC
 

10DB16 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Impressive speed.

IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing?

At this time GpuOwl supports P-1, so do I better do p-1 before PRP ?
  Reply With Quote
Old 2019-05-06, 15:51   #1131
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

27168 Posts
Default

Quote:
Originally Posted by SELROC View Post
That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here.
Do I want long unfrequent checks or short frequent checks ?
Trade-off is a good word, beacuse larger blocksize=L means more work at rollbacks, because you need to redo L^2 iterations.
It isn't that easy for a not that faulty card/cpu to choose the best L, but for example
if you have 0.2 rollbacks per p (so basically one for 5 tests) at p~1e8 then your optimal L value is 1000.
Just interestingly the exact formula:

L=(2*p/#rollback)^(1/3),
where #rollback is the average number of rollbacks for p (so this could be even higher than 1, for a faulty card).

ps. and don't choose L>sqrt(p), because then you'd not make any check, but this also depends on your implementation.

Last fiddled with by R. Gerbicz on 2019-05-06 at 15:54 Reason: typo at formula
R. Gerbicz is offline   Reply With Quote
Old 2019-05-06, 18:38   #1132
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default

Quote:
Originally Posted by SELROC View Post
At this time GpuOwl supports P-1, so do I better do p-1 before PRP ?
Yes, P-1 is highly recommended.

It's a question of "relative" performance. That is, if on your machine GpuOwl is 5x faster than prime95 at PRP but only 2x faster at P-1, then GpuOwl should be doing PRP 100% of the time. Use prime95 on your CPU to do all your P-1 work prior to having the the GPU do the PRP.

Your P-1 bounds are "suspicious" in that when prime95 is used to double-check your result years down the road, I think prime95 will want to redo the P-1 to higher bounds.

IIUC, Preda has a somewhat different view on optimal P-1 bounds in that he believes Gerbicz error checking in the initial PRP test means double-checking is not necessary.

For reference, prime95 would use these bounds (assuming 1.8GB of memory).
Saving 1 LL/PRP test: B1=1,115,000 B2=14,495,000
Saving 2 LL/PRP tests: B1=2,395,000 B2=35,326,000
Prime95 is offline   Reply With Quote
Old 2019-05-06, 19:05   #1133
SELROC
 

110111001012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Yes, P-1 is highly recommended.

It's a question of "relative" performance. That is, if on your machine GpuOwl is 5x faster than prime95 at PRP but only 2x faster at P-1, then GpuOwl should be doing PRP 100% of the time. Use prime95 on your CPU to do all your P-1 work prior to having the the GPU do the PRP.

Your P-1 bounds are "suspicious" in that when prime95 is used to double-check your result years down the road, I think prime95 will want to redo the P-1 to higher bounds.

IIUC, Preda has a somewhat different view on optimal P-1 bounds in that he believes Gerbicz error checking in the initial PRP test means double-checking is not necessary.

For reference, prime95 would use these bounds (assuming 1.8GB of memory).
Saving 1 LL/PRP test: B1=1,115,000 B2=14,495,000
Saving 2 LL/PRP tests: B1=2,395,000 B2=35,326,000

The ETA for 332M exponent is 2 months on Radeon RX580 and 15 days on Radeon VII.


I am using primenet .py to get assignments and return results, gpuowl runs in parallel, and the worktodo.txt file is checked every 2 hours. The assignment type is 153.



To do P-1, do I have to get a different assignment type ?
  Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 21:37.


Fri Aug 6 21:37:53 UTC 2021 up 14 days, 16:06, 2 users, load averages: 2.06, 2.50, 2.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.