mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   Update PRP (on the web interface) (https://www.mersenneforum.org/showthread.php?t=22796)

LaurV 2018-12-27 14:18

[QUOTE=preda;504104]That's fine. But switching between LL and PRP is rather trivial. And, now that we have the Gerbicz Error Check, doing LL on a *GPU* at 80+M exponents is asking for trouble with no plus side.[/QUOTE]That was no complaint. Keep up the good work! We keep you at high esteem for it and we can tell to our friends "hey this guy is Romanian" :razz:

It is just the fact that we own a lot of cuda-aware stuff, and only one single 7970 or so "owl"-aware card. And for what we own, cudaLucas is our friend.

We could enumerate few of the LL advantages versus PRP (one being the speed, especially when you run two cards in parallel, you exclude the possibility of mismatch, don't need any error check beside of matching residues, iterations are a bit faster, and you save a lot of time by avoiding taking the test to the end), but we don't want to argue too much here. People should do what they want with their hardware.

And we wanted to say that we are getting "older", not "order". We have no idea where the hack our fingers were at that time. :redface:

preda 2018-12-27 14:24

[QUOTE=LaurV;504117]That was no complaint. Keep up the good work! We keep you at high esteem for it and we can tell to our friends "hey this guy is Romanian" :razz:

It is just the fact that we own a lot of cuda-aware stuff, and only one single 7970 or so "owl"-aware card. And for what we own, cudaLucas is our friend.

We could enumerate few of the LL advantages versus PRP, but we don't want to argue here. People should do what they want with their hardware.

And we wanted to say that we are getting "older", not "order". We have no idea where the hack our fingers were at that time. :redface:[/QUOTE]

OK, thanks! I completely agree with people using the hardware as they see fit, no argument there. And I also hope somebody will pick up the task of implementing PRP on CUDA (with the check).

One advantage of LL is that it's considered "verification" for a new MP, while PRP isn't. As I see the rate of discovery of new primes accelerating rapidly, this may become significant :)

LaurV 2018-12-27 14:48

[QUOTE=R. Gerbicz;504030]That would be CUDAPrp ?
Gpuowl (from Preda) is also using Prp, and correct me if not, but that is faster than CUDALucas.[/QUOTE]
Correction: you are wrong. :razz: Making the same mistake Ken did. GpuOwl is OpenCL/GL/Whatever implementation, it is faster than [U]clLucas[/U] on identical AMD cards due to the fact that openCL FFT library (used in clLucas) sucks, and Mihai implemented his own for the Owl. But on fast nVidia cuda cards comparable at flops cudaLucas is faster, by far.

My two top cards, [COLOR=Gray](which are NOT last Volta architecture, and are NOT Tesla cards with 1/2 DP/SP, and which currently run cudaLucas and do LL for M666666667 at ~25-29ms/iter, depending of ambient temperature and overclocking)[/COLOR], do 2.2 to 2.4ms/iter where Ken's table has 3.18 for the RX card.

Now, the thing is that cudaLucas also uses cuFFT libraries from nVidia, and we wonder what could happen if a guy like you or Mihai, or Ernst, etc, put his nose into it and try to implement it "properly".

kriesel 2018-12-27 16:05

I'm glad you had some fun with my GTX/RX comparison post attempting to estimate relative efficiency of different code. I realized after I had posted the GTX / RX comparison and gone off running errands while some of you were having so much fun poking holes in my post, another flaw in the comparison is that both the GTX cards ms/iter numbers are while they are thermally throttled somewhat, while I've not seen an indication of thermal throttling on the RX480. I think the thermal throttling penalty is somewhere between 15 and 30% against CUDALucas on these two GTX gpus in my systems. (Trying swapping out the existing fans for windier noisier models on the HP Z600s housing them is on the todo list.)

It would be good to have an efficient CUDA implementation of PRP, or any really, so the comparisons could be a bit more apples to apples. (Hint hint, nudge nudge, coding wizards.)

I used the data I could readily lay my hands on, recognizing there are limitations. I suppose I could add a column for the manufacturers' DP TFlops claims or some independent source. Constructive suggestions?

LL on CUDA is pretty reliable, even without Jacobi check. PRP with GC is _extremely_ reliable. PRP/GC if it was available on CUDA would make parallel crosscheck LL runs even on the largest exponents a 50% waste of gpu time. PRP/GC in prime95 on a flaky AMD pc was seemingly bulletproof.
[CODE]Of my runs: LL verified all sources 339; manual (gpu) 152, 187 cpu
bad all sources 6; prime95 bad 3; gpu bad 3
% bad cpu 3/187= 1.60%
% bad gpu 3/152= 1.97%
note small-sample uncertainty is significant on both percentages[/CODE]The 3 bad prime95 residues came from 3 different systems, 2 of which have failed. The other system is doing another LLDC, and will be reassigned or decommission if that is bad.

2 of the 3 bad gpu residues came from one gpu which has been decommissioned.
The other gpu that produced a bad LL residue was switched to trial factoring. There have been no bad LL residues from my gpus in the past 16 months.


All times are UTC. The time now is 14:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.