mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Calling airsquirrels (https://www.mersenneforum.org/showthread.php?t=20511)

Prime95 2015-09-25 16:44

Calling airsquirrels
 
You have a CUDALucas problem. It is reporting primes.

TWIMC, can CUDALucas be made more robust in its detection of 0x0000000000000002 and 0x0000000000000000 residues during the LL test and abort with reasonable error message?

ewmayer 2015-09-25 21:21

Does CUDALucas do any mandatory self-tests of the user's install prior to allowing 'production runs'?

airsquirrels 2015-09-26 01:57

I suspected as much, especially since that was the first completed Cudalucas run. I was waiting for a double check of the number on another Cuda card and a PC to complete in the rare event that I did happen to hit a prime. I'm not sure why CudaLucas reported that number prime, my other doublecheck trials have matched.

Madpoo 2015-09-26 02:45

[QUOTE=airsquirrels;411300]I suspected as much, especially since that was the first completed Cudalucas run. I was waiting for a double check of the number on another Cuda card and a PC to complete in the rare event that I did happen to hit a prime. I'm not sure why CudaLucas reported that number prime, my other doublecheck trials have matched.[/QUOTE]

We ran a DC on that exponent as well (it finished up this morning and was NOT prime).

I wonder at what point your run on CUDA went bad? If you're interested in running it again on another CUDA, set it to save residues every 5M iterations... I did the same on my double-check run, but I only remembered to do it at the 30M iteration mark. :)

Presumably, and I think George was hinting at this, at some point during it's run it tossed an error and the residue was all zero or 0x02 at which point it'll stay that way 'til the end. The software should detect that and roll back to the last save file if that were to ever happen, I guess?

airsquirrels 2015-09-26 03:44

I don't have saved checkpoints, but the screen terminal back buffer shows the last few pages of iterations with residues as all 0x00000000000000. There may have been thermal issue with the card during that run that led to a result of zero, I'm surprised the CudaLucas code doesn't detect that error.

Sorry for the false alarm/cycles/excitement! I thought the interval spacing looked nice vs #48 but I had more suspicion than hope.

Madpoo 2015-09-27 04:41

[QUOTE=airsquirrels;411303]I don't have saved checkpoints, but the screen terminal back buffer shows the last few pages of iterations with residues as all 0x00000000000000. There may have been thermal issue with the card during that run that led to a result of zero, I'm surprised the CudaLucas code doesn't detect that error.

Sorry for the false alarm/cycles/excitement! I thought the interval spacing looked nice vs #48 but I had more suspicion than hope.[/QUOTE]

Oh, no problem. We (at least me) had fun with this as a dry rehearsal, of sorts, for the real thing.

kladner 2015-09-27 08:12

[QUOTE=ewmayer;411281]Does CUDALucas do any mandatory self-tests of the user's install prior to allowing 'production runs'?[/QUOTE]

No. While it is foolish to run real work without passing [B]"-r 1"[/B] -the "long form" self-test, at the very least; there is nothing in the program to compel wise behavior by users. There are several tests one really should run to determine the limits of the hardware it is running on. "Memtest" is also really important to get your GPU to play nice with CUDALucas.

airsquirrels 2015-09-27 14:36

[QUOTE=Madpoo;411366]Oh, no problem. We (at least me) had fun with this as a dry rehearsal, of sorts, for the real thing.[/QUOTE]

What I really want to know is what system specs and worker configuration did you use when you wanted to know as fast as possible if ol' 519 was prime?

Your DC test beat my Titan CUDA DC to completion.

LaurV 2015-09-27 14:50

A Titan is about 6 times faster than a good core. Technically everything with more than 6 good cores will do it. This guy has few hundred cores put together :razz:
Edit: now if it is settled, can you tell us the exponent? :razz:

airsquirrels 2015-09-27 15:18

[url]http://www.mersenne.org/report_exponent/?exp_lo=73850519&full=1[/url]

I got angry at it and tried to factor it.

Madpoo 2015-09-27 17:56

[QUOTE=airsquirrels;411392]What I really want to know is what system specs and worker configuration did you use when you wanted to know as fast as possible if ol' 519 was prime?

Your DC test beat my Titan CUDA DC to completion.[/QUOTE]

It took me 33 hours. The system was a dual 14-core Xeon, and the optimal config was to use 22 cores. Admittedly, once I got past maybe 18 cores, each additional one only shrank the total time by a few minutes here and there.

I'm still super impressed that I can add so many cores from the other CPU. On a v2 Xeon E5, I can only add 1 or maybe 2 cores from the other CPU before performance starts to actually degrade. It's either the larger QPI speed or the effect of DDR4 RAM at work there, but whatever the case, I'm happy. :smile:

I saw you checked in your new result which matched my residue, so I just checked mine in as well.
[URL="http://www.mersenne.org/M73850519"]http://www.mersenne.org/M73850519[/URL]

James is doing a check with his CudaLucas as well to see if he encounters anything along the way. So far his residues have been matching mine.


All times are UTC. The time now is 04:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.