![]() |
Shouldn't the distinction between PRP an LL checks be abolished ?
Multiplying categories is not really useful. Also the default for the newest versions should be PRP => no more LL for first time checks. (Of course double checks should be using the same method as the first time checks for any exponent.) Jacob |
[QUOTE=S485122;500247]Shouldn't the distinction between PRP an LL checks be abolished ?
Multiplying categories is not really useful. Also the default for the newest versions should be PRP => no more LL for first time checks. (Of course double checks should be using the same method as the first time checks for any exponent.) Jacob[/QUOTE] Thanks for your comments. There are reasons to not discard or conceal information. Default assignment for mprime/prime95 managed by primenet API is one thing. GPU computing is quite another. There is no application supporting PRP on NVIDIA gpus currently. There's also no primenet integration in gpu applications currently. GPUs are run via manual assignment and manual reporting or separate client management tool (which are apparently application-specific). see [URL]http://www.mersenneforum.org/showpost.php?p=488292&postcount=3[/URL] There is currently no released separate client management tool supporting PRP assignment or reporting (or P-1). Manual assignments are likely to remain up to the requester to select type, subject to what assignments are available. |
These may be answered somewhere before but are PRP tests preferred to LL?
I read that PRP may require only 1 test and no double testing for certainty? Has it been established yet? Also what is the command to put in worktodo file to start PRP test? I expect PRP to only be available for prime95 not cudalucas? |
PRP is better (if no LL tests has already been done on the exponent) because of the improved error checks. We are still doing PRP double checks though to be completely safe by comparing residues.
You should just add to prime.txt for PRP tests: WorkPreference=150 and for PRP double checks: WorkPreference=151 then the server will hand out exponents, and yes it will not work on CUDALucas, only Prime95 and mprime. |
[QUOTE=ATH;504017]and yes it will not work on CUDALucas, only Prime95 and mprime.[/QUOTE]
That would be CUDAPrp ? Gpuowl (from Preda) is also using Prp, and correct me if not, but that is faster than CUDALucas. Btw there should be no problem in switching to (error checked) Prp, the only difficulty could be that if you have "only" squaremod, and not general mulmod what is needed in our case. The easy workaround (with some overhead) is: x*y=((x+y)^2-(x-y)^2)/4. |
[QUOTE=R. Gerbicz;504030]That would be CUDAPrp ?
Gpuowl (from Preda) is also using Prp, and correct me if not, but that is faster than CUDALucas. Btw there should be no problem in switching to (error checked) Prp, the only difficulty could be that if you have "only" squaremod, and not general mulmod what is needed in our case. The easy workaround (with some overhead) is: x*y=((x+y)^2-(x-y)^2)/4.[/QUOTE] Re CUDA mulmod, perhaps that could be borrowed / adapted from the CUDAPm1 code. GpuOwL is for PRP (+ optionally P-1) with GC via OpenCl on AMD gpus; CUDALucas is for LL (no Jacobi check) via CUDA on NVIDIA gpus. clLucas for LL (no Jacobi check) via OpenCl on AMD gpus is noticeably lower performance than gpuOwL on the same hardware. There is currently no PRP or GC or Jacobi check implemented for Mersenne hunting on NVIDIA gpus to my knowledge, after searching for relevant software and tracking developments for nearly two years. CUDALucas implements fft lengths at all 7-smooth multiples of 2[SUP]10[/SUP] up to 2[SUP]27[/SUP] (128M, the NVIDIA fft library limit). clLucas implements fft lengths at all 7-smooth multiples of 2[SUP]10[/SUP] up to at least 2[SUP]26[/SUP]. clLucas seems to derive less benefit from the non-power-of-two lengths than CUDALucas does. As I recall Preda also saw less benefit during extension of the set of fft lengths gpuowl supports. GpuOwL implements a subset of the 5-smooth multiples of 2[SUP]10[/SUP], from 128K up to 144M. Comparing CUDALucas on a GTX1060 or 1070 to gpuOwL on an RX480, gpus with comparable benchmark figures at James Heinrichs' site, on primality test iteration times, [FONT=Fixedsys][SIZE=3][CODE]gpu exponent fft length msec/it(m) TFGhzD/day LL GhD/d(L) m*L GTX1060 3gb 74101927 4096K 7.25 428 28.7 208. RX480 74207281 4096K 3.18 535 41.5 [B]132[/B]. GTX1070 74170739 4096K 5.18 714 47.8 248.[/CODE][/SIZE][/FONT]It does appear that gpuowl is more efficient than CUDALucas on comparable hardware. It is not simply a case of the benchmarks being off due to cllucas or other factors, since the TF figures from mfakt* are independent comparisons. For equally efficient code, and perfect benchmarks, the product mL would be about constant for the same fftlength. ML is an indication of how much throughput is consumed per iteration accomplished, so lower is better. (A reference pdf for some available mersenne hunting software is attached at [URL]https://www.mersenneforum.org/showpost.php?p=488291&postcount=2[/URL] and periodically updated in place. It's part of the reference threads I've created at [URL]https://www.mersenneforum.org/forumdisplay.php?f=154[/URL].) |
Something is wrong with that table. The product ms/iter times ghd/d [U]must[/U] be constant, for the same exponent (range) and the same FFT length, one is the reverse of the other, times a constant. As you need more time per iteration, your output is less iterations per day, and therefore less GHzDays per day. For a given exponent (/range), with a chosen FFT size, each iteration is a fix number of GHzDays. So, in the table, the line for 1070 is a bit inflated, and the line for 480 is highly deflated.
|
[QUOTE=kriesel;504035]Comparing CUDALucas on a GTX1060 or 1070 to gpuOwL on an RX480, gpus with comparable benchmark figures at James Heinrichs' site, on primality test iteration times,
[FONT=Fixedsys][SIZE=3][CODE]gpu exponent fft length msec/it(m) TFGhzD/day LL GhD/d(L) m*L GTX1060 3gb 74101927 4096K 7.25 428 28.7 208. RX480 74207281 4096K 3.18 535 41.5 [B]132[/B]. GTX1070 74170739 4096K 5.18 714 47.8 248.[/CODE][/SIZE][/FONT]It does appear that gpuowl is more efficient than CUDALucas on comparable hardware. It is not simply a case of the benchmarks being off due to cllucas or other factors, since the TF figures from mfakt* are independent comparisons. For equally efficient code, and perfect benchmarks, the product mL would be about constant for the same fftlength. ML is an indication of how much throughput is consumed per iteration accomplished, so lower is better.[/QUOTE] Poor methodology results in invalid results (GIGO). 1. RX480 has 16:1 SP:DP. GTX 10 has 32:1. So using mfakt* performance as yardstick for LL performance is fatally flawed. 2. The figure of 41.5 GHz d/d for RX480 is derived using clLucas (cudalucas doesn't run on non-nvidia things). So you're comparing efficiency of clLucas to gpuowl (but you don't realise it). 3. m*L should be a constant (for a given FFT) (because, the latter is derived from the former). If it is not, you can't make any conclusions from it. Instead you should send the benchmarks to James to updated the numbers in his table. Here you're using his tables as some kind of authority on how much GHz d/d a card should produce -- it is not. It is calculated based on benchmarks submitted by others. If he'd used gpuowl for his RX480 numbers, you wouldn't be able to tell the difference, because then the GH D/d on his table wold be higher, exactly compensating for the lower iteration time. That's why basically you're comparing clLucas vs gpuOwl, and not cudalucas vs gpuowl. |
Beat you to it this time :razz:
Just to state it right, he does not involve the TF result in any way, it is just there in the table, as additional measurement ("independent" as he says). But the LL stuff is wrong. However, he may be into something there, the method of computing the GHzDays/Day of one of those programs may be buggy. I will try to recalculate if I have some time during the lunch break later (at job, and quite busy! grrrr). |
Ha! replying to myself again...
I just realized what Ken did, he got the 3.xx ms from the owl's output, but the ghd from James' site. That is indeed, not correct, as said already by axn. I was assuming that the numbers are outputs of the program, I didn't know gpuOwl does not output the ghd, because I am not using it since it switched to PRP testing (sorry, I can't keep up with everything, I am getting order...) |
[QUOTE=LaurV;504074]I didn't know gpuOwl does not output the ghd, because I am not using it since it switched to PRP testing (sorry, I can't keep up with everything, I am getting order...)[/QUOTE]
That's fine. But switching between LL and PRP is rather trivial. And, now that we have the Gerbicz Error Check, doing LL on a *GPU* at 80+M exponents is asking for trouble with no plus side. |
| All times are UTC. The time now is 14:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.