mersenneforum.org llrCUDA
 Register FAQ Search Today's Posts Mark Forums Read

2018-01-03, 23:09   #331
masser

Jul 2003
Behind BB

70E16 Posts

Quote:
 Originally Posted by Jean Penné Hi All, I released now for the first time a GPU version of the LLR program on my personal page : jpenne.free.fr This program is an extension of llrcuda.0.931 written by Shoichiro Yamada, that was released the 07/11/2015 on www.mersenneforum.org, and, indeed, also of llrp version 3.8.1 (the last released portable LLR version). Thanks to the nice work of Shoichiro, it was not difficult for me to extend his code to rational bases DWT, and also, to generic modular reduction ; that is what is done here! This code is fully C and C++ written, no Assembler code. Large numbers (at least 1 mega digits) benefit more from the GPU parallelism, but this program may also be used on smaller positive results for verification... Please, let me know if you have any problem to run the binary on Linux and/or to build it on your system. I wish you an happy new year, and many successes in prime hunting! Best Regards, Jean
Wow - exciting news! I just looked at the source briefly; does this support k*b^n+/-1?

2018-01-04, 01:20   #332
diep

Sep 2006
The Netherlands

30E16 Posts

Quote:
 Originally Posted by Jean Penné Hi All, I released now for the first time a GPU version of the LLR program on my personal page : jpenne.free.fr This program is an extension of llrcuda.0.931 written by Shoichiro Yamada, that was released the 07/11/2015 on www.mersenneforum.org, and, indeed, also of llrp version 3.8.1 (the last released portable LLR version). Thanks to the nice work of Shoichiro, it was not difficult for me to extend his code to rational bases DWT, and also, to generic modular reduction ; that is what is done here! This code is fully C and C++ written, no Assembler code. Large numbers (at least 1 mega digits) benefit more from the GPU parallelism, but this program may also be used on smaller positive results for verification... Please, let me know if you have any problem to run the binary on Linux and/or to build it on your system. I wish you an happy new year, and many successes in prime hunting! Best Regards, Jean
Hi Jean,

If i may ask - why do you use the cufft library?

That's closed source code and source codes of it are, as far as i know, not publicly available for a couple of years now. They stopped spreading it.

It's difficult to 'improve' something you do not have the source codes from.

Start improving a marslander without having legal access to the current ones...

 2018-01-04, 02:31 #333 diep     Sep 2006 The Netherlands 2×17×23 Posts Seemingly power of 2 - yet that's fine with me. No source code publicly available AFAIK to all of us is bad though. (reaction to Masser) Last fiddled with by diep on 2018-01-04 at 02:31
2018-01-04, 02:51   #334
diep

Sep 2006
The Netherlands

2·17·23 Posts

Quote:
 Originally Posted by Prime95 Jean, You should consider adding PRP tests with Gerbicz error checking. See http://www.mersenneforum.org/showthread.php?t=22471 for a lengthy discussion. While PRP tests don't prove primes, the error checking catches all computation errors at minimal cost. This is very important on error-prone GPUs. Once a PRP has been found, one can prove the prime with the current LLR algorithm. Gerbicz error-checking is so powerful that I am considering changing GIMPS to PRP testing instead of LL testing sometime in the future.
Slowing down your project factor 2?

Watercooling here for the TitanZ - yet it's 4 AM. Tomorrow will try benchmarking it on it. The TitanZ has 2 GPU's on a single card. Together delivers in theory 2.7 Tflops - or let's call that 1.3T double precision instructions a second.

We will compare it with results from Paul on the 69 testing on his i7 (if i get it going and if the TitanZ holds as i didn't benchmark it yet after i started watercooling it). Yet it's 4 AM here now i need to avoid looking like the stereotype nerd...

All those GPU's under full load with just air cooling keep temperatures (at least the better ones try) under say 70C or some even under 80C.

This whereas ASML's machines, which dominate this planet kind of, like 90% or so i would guess is ASML machines at intel/TSMC and so on - the newer generations in this century are said that the ideal temperature for what they produce is around room temperature. So a lot cooler or a lot hotter is simply eating 10% more power to start with and definitely is not healthy.

Most CPU's running LLR are kept at much closer to room temperatures than those GPU's. As for the RAM - GDDR5 has buillt in CRC. Not as good as ECC yet definitely very effective. Not sure about the newer HBM2 ram there. Do not know much about it. GDDR5 on my card though!

Need to keep that under 85C for sure. Yet same problems apply.

Auch it's early - what i meant to say is - they keep those GPU's just a handful of degrees away from where they know for sure bitflips can occur. Which is 85C for the RAM not to mention the copper connections everywhere.

That's impossible, at least for me, to see as a serious production environment.

Watercooling it is here!

Last fiddled with by diep on 2018-01-04 at 02:59

2018-01-04, 04:49   #335
axn

Jun 2003

2·32·293 Posts

Quote:
 Originally Posted by diep Slowing down your project factor 2?
Speeding up project by about 3%. There will be a 1% slowdown, but will be compensated by a 4% reduction in error.

2018-01-04, 07:43   #336
Jean Penné

May 2004
FRANCE

10010001102 Posts

Quote:
 Originally Posted by pinhodecarlos Hi Jean, Can you post some benchmarks please? TIA.
That is one on the largest known non-Mersenne prime :
jpenne@421360c21a63:~/llrcuda381/llrcuda381linux64$./llrCUDA -a1 -oVerbose=1 -d -q"10223*2^31172165+1" Starting Proth prime test of 10223*2^31172165+1 10223*2^31172165+1, bit: 60000 / 31172178 [0.19%]. Time per bit: 11.648 ms. To be compared to : jpenne@crazycomp:~$ llr64 -a2 -t4 -oVerbose=1 -d -q"10223*2^31172165+1"
Starting Proth prime test of 10223*2^31172165+1
Using all-complex FMA3 FFT length 2560K, Pass1=640, Pass2=4K, 4 threads, a = 3
10223*2^31172165+1, bit: 60000 / 31172178 [0.19%]. Time per bit: 7.542 ms.

So, not large enough to really compete...

Regards,
Jean

2018-01-04, 10:44   #337
henryzz
Just call me Henry

"David"
Sep 2007
Liverpool (GMT/BST)

3×5×397 Posts

Quote:
 Originally Posted by Jean Penné That is one on the largest known non-Mersenne prime : jpenne@421360c21a63:~/llrcuda381/llrcuda381linux64$./llrCUDA -a1 -oVerbose=1 -d -q"10223*2^31172165+1" Starting Proth prime test of 10223*2^31172165+1 10223*2^31172165+1, bit: 60000 / 31172178 [0.19%]. Time per bit: 11.648 ms. To be compared to : jpenne@crazycomp:~$ llr64 -a2 -t4 -oVerbose=1 -d -q"10223*2^31172165+1" Starting Proth prime test of 10223*2^31172165+1 Using all-complex FMA3 FFT length 2560K, Pass1=640, Pass2=4K, 4 threads, a = 3 10223*2^31172165+1, bit: 60000 / 31172178 [0.19%]. Time per bit: 7.542 ms. So, not large enough to really compete... Regards, Jean
How close were llrcuda and llr to the fft boundary(llrcuda didn't output which fft length)? What gpu and cpu were these benchmarks on?

Previous versions of llrcuda were better for small ks. Is this still true? Does increasing the k slow down the cpu version and gpu version the same amount?

2018-01-04, 13:57   #338
Jean Penné

May 2004
FRANCE

2×3×97 Posts

Quote:
 Originally Posted by henryzz How close were llrcuda and llr to the fft boundary(llrcuda didn't output which fft length)? What gpu and cpu were these benchmarks on? Previous versions of llrcuda were better for small ks. Is this still true? Does increasing the k slow down the cpu version and gpu version the same amount?
Sorry, these lines were on lresu* file, but not on the screen :
[Thu Jan 4 07:23:16 2018]
Starting Proth prime test of 10223*2^31172165+1
Using complex irrational base DWT, FFT length = 7340032, a = 3

The GPU is EVGA GeForce GTX 1080 FTW HYBRID GAMING 8Go
The CUDA version is 8.0.44

The CPU is Intel Core i7-5930K(3.5GHz, 6 cores)

The CPU system is Ubuntu Linux x86_64

Regards,
Jean

2018-01-04, 14:17   #339
Jean Penné

May 2004
FRANCE

2·3·97 Posts

Quote:
 Originally Posted by masser Wow - exciting news! I just looked at the source briefly; does this support k*b^n+/-1?
No, for now, bases != 2 candidates are processed using generic modular reduction, but the drawback is only for small bases, for example, most generalized Fermat candidates are also processed by the gwnum code using generic modular reduction.

IBDWT processing on base!=2 numbers is a work to do for me...

Regards,
Jean

 2018-01-04, 14:24 #340 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 24·313 Posts Jean, whilst running llrcuda do you see CPU consumption? What’s the CPU percentage load during the candidate test?
2018-01-04, 14:36   #341
Jean Penné

May 2004
FRANCE

2·3·97 Posts

Quote:
 Originally Posted by Prime95 Jean, You should consider adding PRP tests with Gerbicz error checking. See http://www.mersenneforum.org/showthread.php?t=22471 for a lengthy discussion. While PRP tests don't prove primes, the error checking catches all computation errors at minimal cost. This is very important on error-prone GPUs. Once a PRP has been found, one can prove the prime with the current LLR algorithm. Gerbicz error-checking is so powerful that I am considering changing GIMPS to PRP testing instead of LL testing sometime in the future.
Thanks, George for you advice, I shall study this thread as soon a possible.

Also I am very happy to learn of the 50th known Mersenne prime discovery, still a nice success for GIMPS! Congrats to the discoverer and all persons involved in this work. Nice beginning for the year 2018!

Best Regards,
Jean

 Similar Threads Thread Thread Starter Forum Replies Last Post shanecruise Riesel Prime Search 8 2014-09-16 02:09 diep GPU Computing 1 2013-10-02 12:12

All times are UTC. The time now is 23:02.

Mon Jan 24 23:02:46 UTC 2022 up 185 days, 17:31, 1 user, load averages: 1.28, 1.38, 1.35