mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-04-08, 13:21   #2047
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32·241 Posts
Default

Windows binaries(untested!)
Attached Files
File Type: zip gpuowl-win-af403e2.zip (625.4 KB, 50 views)
kracker is offline   Reply With Quote
Old 2020-04-08, 14:38   #2048
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,009 Posts
Default

Quote:
Originally Posted by preda View Post
I just commited a first iteration of LL. Here's a brief summary of changes
Excellent! Will have a look shortly.

You separately added offset and Jacobi check back at v0.6.
How much of that is reusable?

https://www.mersenneforum.org/showpo...83&postcount=7
kriesel is offline   Reply With Quote
Old 2020-04-08, 14:59   #2049
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·11·41 Posts
Default

Quote:
Originally Posted by kriesel View Post
Excellent! Will have a look shortly.

You separately added offset and Jacobi check back at v0.6.
How much of that is reusable?

https://www.mersenneforum.org/showpo...83&postcount=7
The Jacobi implem should be the same. For the "offset", it's too much trouble to fit it in without adversely affecting PRP which is still the main focus; also "offset" brings too little benefit to be worth bothering with IMO.

I consider a matching gpuowl DC with offset==0 for a mprime LL with offset != 0 a very strong verification. What additional benefit is there for the trouble of adding offset to gpuowl? I don't see the point -- maybe you could explain the motivation for adding offset in this context.
preda is offline   Reply With Quote
Old 2020-04-08, 23:50   #2050
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,009 Posts
Default

Quote:
Originally Posted by preda View Post
The Jacobi implem should be the same. For the "offset", it's too much trouble to fit it in without adversely affecting PRP which is still the main focus; also "offset" brings too little benefit to be worth bothering with IMO.

I consider a matching gpuowl DC with offset==0 for a mprime LL with offset != 0 a very strong verification. What additional benefit is there for the trouble of adding offset to gpuowl? I don't see the point -- maybe you could explain the motivation for adding offset in this context.
There are two issues; gpuowl zero offset twice on the same exponent, and gpuowl zero offset matching some other software's offset.

As gpus become a greater fraction of the total primality testing throughput, the work of manually ensuring that gpuowl is not double-checking gpuowl results with the same (zero) offset becomes more onerous. As you know, a Radeon VII with your code and George's modifications is a very fast way of running primality tests, so these gpus and their NVIDIA or cloud-computing near equivalents will have an outsized effect on the increasing number of gpu-produced primality tests, greater than their unit count would indicate.

Gpu assignments are manual assignments. The PrimeNet server does not know what software will be used for a manual assignment. There is no way of manually communicating that. There is no way of specifying which software will be used, or specifying for double-check assignments that first-tests from any specific software or initial-run-offset are desired. I find that I generally forget to consider checking before putting the work on my gpus. How many others do also?

Maybe mprime/prime95, Mlucas, CUDALucas, etc have or could have zero-offset avoidance in its runs? But that does not address the chance of a previous software version (Mlucas before V18 for example) having produced a zero offset result, or gpuowl-gpuowl zero-offset coincidence on the same exponent. Its chance of producing zero offset from a pseudorandom offset generator is quite low. I agree that different software running differing offsets are good verifications. In fact, it's superior to same software and different offsets, since remaining software bugs are less likely to align among very different softwares.

However, if this is working properly, including for any gpuowl results from its LL infancy, it may not be much of a problem while gpuowl work assignments flow through manual reservations and certain other conditions are met. https://www.mersenneforum.org/showpo...8&postcount=36

Quote:
I just changed the manual reservations for double-checks. The page should no longer hand out exponents previously tested by GLucas, Mlucas, or CUDALucas.

That is, only prime95 with its shift count capability is allowed to do the double (or triple) checking.

This feature needs some testing.
Zero-offset tandem runs could still be performed manually by users, as Laurv or others have done for large exponents. Creation of a PrimeNet API connection for gpu applications would change the offset calculus. If gpus become a large majority of the primality testing throughput, it will become difficult to avoid gpuowl-first-test double checks as manual assignments. Maybe it also starts to affect strategic double and triple checking https://www.mersenneforum.org/showth...ewpost&t=24148

I don't know what the overall project mix manual (gpus)/ primenet-API (cpus) throughput ratio is. But on my own cpus page data, it's about 7.6 to 1. I expect that ratio to increase over time. The gpu throughput is a mix of TF, P-1, and primality testing, with several gpus dedicated to double-checking.

A desire to not slow PRP by offset provisions motivated by LL, and a desire not to duplicate code and increase complexity further by separating offset behavior between PRP and LL, are both understandable. As is conserving your available time for other things, such as P-1 error detection and handling.

Last fiddled with by kriesel on 2020-04-09 at 00:40
kriesel is offline   Reply With Quote
Old 2020-04-09, 01:28   #2051
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,009 Posts
Default

CUDALucas v2.05 was the beginning of nonzero shift there; late 2013 or so. There's no telling how long changeover from earlier versions took.
https://www.mersenneforum.org/showpo...postcount=1962
We're still double-checking LL tests from 2010 and 2011.
https://www.mersenne.org/report_expo...0281067&full=1
https://www.mersenne.org/report_expo...0485051&full=1
https://www.mersenne.org/report_expo...exp_hi=&full=1


Mlucas V18 and its introduction of nonzero shift would have been sometime in 2018.

Last fiddled with by kriesel on 2020-04-09 at 02:06
kriesel is offline   Reply With Quote
Old 2020-04-09, 03:44   #2052
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

500910 Posts
Default

I didn't find anything indicating nonzero offset was ever implemented in clLucas.
kriesel is offline   Reply With Quote
Old 2020-04-09, 08:52   #2053
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·11·41 Posts
Default

Quote:
Originally Posted by kriesel View Post
There are two issues; gpuowl zero offset twice on the same exponent, and gpuowl zero offset matching some other software's offset.
It seems to me that the zero-offset-DC problem can be addressed through external means. For example, manual DC assignments could be handed out only for exponents that had initial-LL with non-zero offset; and the need to DC the cases with zero-offset-initial-LL can be adequately covered by mprime through non-manual assignments.
preda is offline   Reply With Quote
Old 2020-04-09, 11:11   #2054
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×3×7×73 Posts
Default

If people do 2 zero-offset LL tests with gpuowl, we just have to triple check them with Prime95/mprime/CUDALucas. It should not occur that often.

Maybe you should limit the exponent to 90M for LL test, since it is only for double checks. The few LL tests above 90M does not need to be double checked for a long time.

Last fiddled with by ATH on 2020-04-09 at 11:13
ATH is offline   Reply With Quote
Old 2020-04-09, 12:57   #2055
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

116218 Posts
Default

Quote:
Originally Posted by ATH View Post
Maybe you should limit the exponent to 90M for LL test, since it is only for double checks. The few LL tests above 90M does not need to be double checked for a long time.
As a subproject I am running spot double checks ahead of the first-test wavefront, and it would be useful to be able to use the very efficient gpuowl on a fast new reliable Radeon VII in that. In general, while the extent of changeover from LL to PRP first test is heartening, there is much further to go in that regard. There are LL first test results reported this morning up to 108M on https://www.mersenne.org/report_recent_cleared/
The two highest exponent primality tests were 107985967 and 107981609 LL by brode-runner.
Ten of the highest 25 exponents' primality tests were LL. Note that some older gpu hardware is not capable of running gpuowl, so if used for primality testing, it will run CUDALucas for primality tests, forcing LL not PRP. Old hardware output should be checked sooner since it is less likely to be reliable and CUDALucas has no Jacobi check. A subjective summary of the recent cleared I saw follows.

Anonymous and many other users submitted mixed LL & PRP. In some cases, including WR and kriesel, the LL were DC.

all LL:
AUM - Kuwait
brode-runner
curtisc
Ryan Propper (all DC)
TAMUC-ComputerScience

all PRP:
Ben Delo
dcheuk
George Woltman
Gordon Spence
marssystems
Mihai Preda (shocking! ;)
mrh.org
Oliver Kruse
oodaira
S00030
Simon Josefsson
Sebastien Broucke
trebor

Other things being equal or nearly so, I'm in favor of orthogonality, and against artificial limitations built into the software.
If GIMPS as a project wants to limit future LL activity to below some exponent value, the place to do that is at the PrimeNet server.
When the next Mersenne prime is found, we'll want to use gpuowl to confirm it. There are many >100Mdigit exponents LL tested and without a double-check.

(end)

Last fiddled with by kriesel on 2020-04-09 at 13:27
kriesel is offline   Reply With Quote
Old 2020-04-09, 17:21   #2056
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

163478 Posts
Default

Quote:
Originally Posted by preda View Post
It seems to me that the zero-offset-DC problem can be addressed through external means. For example, manual DC assignments could be handed out only for exponents that had initial-LL with non-zero offset;.
The server currently does this.
Prime95 is offline   Reply With Quote
Old 2020-04-09, 18:55   #2057
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

23·1,453 Posts
Default

Mihai/George, could you explain why residue shift apparently incurs such a heavy performance penalty in gpuOwl?

For LL with shift, I can maybe understand it - during the carry step of each iteration, one needs to precompute the bit offset of the -2 for the current shift value and then inject it into the corresponding residue word - not a lot of cycles needed, but perhaps in a massively-parallel GPU context, slowing whichever one of those smaller work units gets the shifted -2 causes the others to stall - just speculating here.

But in a PRP context, there is no per-iteration -2 subtrahend, we simply apply some initial shift value to the starting residue, then repeated-square-mod happily away, with the only shift-related expense being the per-iteration update of the shift value, shift = 2*shift (mod p), where the * and mod can both be replaced with low-latency operations, shift (or add), compute shift2 = shift - p, followed by cmov to select the proper one of shift and shift2.

Last fiddled with by ewmayer on 2020-04-09 at 18:56
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 11:51.

Sat Apr 10 11:51:24 UTC 2021 up 2 days, 6:32, 1 user, load averages: 1.57, 1.51, 1.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.