mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-01-03, 23:09   #331
masser
 
masser's Avatar
 
Jul 2003
Behind BB

70E16 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
Hi All,

I released now for the first time a GPU version of the LLR program on my personal page : jpenne.free.fr
This program is an extension of llrcuda.0.931 written by Shoichiro Yamada,
that was released the 07/11/2015 on www.mersenneforum.org, and, indeed, also of llrp version 3.8.1 (the last released portable LLR version).

Thanks to the nice work of Shoichiro, it was not difficult for me to
extend his code to rational bases DWT, and also, to generic modular
reduction ; that is what is done here!

This code is fully C and C++ written, no Assembler code.
Large numbers (at least 1 mega digits) benefit more from the GPU
parallelism, but this program may also be used on smaller positive
results for verification...

Please, let me know if you have any problem to run the binary on Linux and/or to build it on your system.

I wish you an happy new year, and many successes in prime hunting!
Best Regards,
Jean
Wow - exciting news! I just looked at the source briefly; does this support k*b^n+/-1?
masser is online now   Reply With Quote
Old 2018-01-04, 01:20   #332
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

30E16 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
Hi All,

I released now for the first time a GPU version of the LLR program on my personal page : jpenne.free.fr
This program is an extension of llrcuda.0.931 written by Shoichiro Yamada,
that was released the 07/11/2015 on www.mersenneforum.org, and, indeed, also of llrp version 3.8.1 (the last released portable LLR version).

Thanks to the nice work of Shoichiro, it was not difficult for me to
extend his code to rational bases DWT, and also, to generic modular
reduction ; that is what is done here!

This code is fully C and C++ written, no Assembler code.
Large numbers (at least 1 mega digits) benefit more from the GPU
parallelism, but this program may also be used on smaller positive
results for verification...

Please, let me know if you have any problem to run the binary on Linux and/or to build it on your system.

I wish you an happy new year, and many successes in prime hunting!
Best Regards,
Jean
Hi Jean,

If i may ask - why do you use the cufft library?

That's closed source code and source codes of it are, as far as i know, not publicly available for a couple of years now. They stopped spreading it.

It's difficult to 'improve' something you do not have the source codes from.

Start improving a marslander without having legal access to the current ones...
diep is offline   Reply With Quote
Old 2018-01-04, 02:31   #333
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

2×17×23 Posts
Default

Seemingly power of 2 - yet that's fine with me.
No source code publicly available AFAIK to all of us is bad though.
(reaction to Masser)

Last fiddled with by diep on 2018-01-04 at 02:31
diep is offline   Reply With Quote
Old 2018-01-04, 02:51   #334
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

2·17·23 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Jean,

You should consider adding PRP tests with Gerbicz error checking. See http://www.mersenneforum.org/showthread.php?t=22471 for a lengthy discussion.

While PRP tests don't prove primes, the error checking catches all computation errors at minimal cost. This is very important on error-prone GPUs. Once a PRP has been found, one can prove the prime with the current LLR algorithm.

Gerbicz error-checking is so powerful that I am considering changing GIMPS to PRP testing instead of LL testing sometime in the future.
Slowing down your project factor 2?

Watercooling here for the TitanZ - yet it's 4 AM. Tomorrow will try benchmarking it on it. The TitanZ has 2 GPU's on a single card. Together delivers in theory 2.7 Tflops - or let's call that 1.3T double precision instructions a second.

We will compare it with results from Paul on the 69 testing on his i7 (if i get it going and if the TitanZ holds as i didn't benchmark it yet after i started watercooling it). Yet it's 4 AM here now i need to avoid looking like the stereotype nerd...

All those GPU's under full load with just air cooling keep temperatures (at least the better ones try) under say 70C or some even under 80C.

This whereas ASML's machines, which dominate this planet kind of, like 90% or so i would guess is ASML machines at intel/TSMC and so on - the newer generations in this century are said that the ideal temperature for what they produce is around room temperature. So a lot cooler or a lot hotter is simply eating 10% more power to start with and definitely is not healthy.

Most CPU's running LLR are kept at much closer to room temperatures than those GPU's. As for the RAM - GDDR5 has buillt in CRC. Not as good as ECC yet definitely very effective. Not sure about the newer HBM2 ram there. Do not know much about it. GDDR5 on my card though!

Need to keep that under 85C for sure. Yet same problems apply.

Auch it's early - what i meant to say is - they keep those GPU's just a handful of degrees away from where they know for sure bitflips can occur. Which is 85C for the RAM not to mention the copper connections everywhere.

That's impossible, at least for me, to see as a serious production environment.

Watercooling it is here!

Last fiddled with by diep on 2018-01-04 at 02:59
diep is offline   Reply With Quote
Old 2018-01-04, 04:49   #335
axn
 
axn's Avatar
 
Jun 2003

2·32·293 Posts
Default

Quote:
Originally Posted by diep View Post
Slowing down your project factor 2?
Speeding up project by about 3%. There will be a 1% slowdown, but will be compensated by a 4% reduction in error.
axn is offline   Reply With Quote
Old 2018-01-04, 07:43   #336
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

10010001102 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Hi Jean,

Can you post some benchmarks please?

TIA.
That is one on the largest known non-Mersenne prime :
jpenne@421360c21a63:~/llrcuda381/llrcuda381linux64$ ./llrCUDA -a1 -oVerbose=1 -d -q"10223*2^31172165+1"
Starting Proth prime test of 10223*2^31172165+1
10223*2^31172165+1, bit: 60000 / 31172178 [0.19%]. Time per bit: 11.648 ms.

To be compared to :

jpenne@crazycomp:~$ llr64 -a2 -t4 -oVerbose=1 -d -q"10223*2^31172165+1"
Starting Proth prime test of 10223*2^31172165+1
Using all-complex FMA3 FFT length 2560K, Pass1=640, Pass2=4K, 4 threads, a = 3
10223*2^31172165+1, bit: 60000 / 31172178 [0.19%]. Time per bit: 7.542 ms.

So, not large enough to really compete...

Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2018-01-04, 10:44   #337
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3×5×397 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
That is one on the largest known non-Mersenne prime :
jpenne@421360c21a63:~/llrcuda381/llrcuda381linux64$ ./llrCUDA -a1 -oVerbose=1 -d -q"10223*2^31172165+1"
Starting Proth prime test of 10223*2^31172165+1
10223*2^31172165+1, bit: 60000 / 31172178 [0.19%]. Time per bit: 11.648 ms.

To be compared to :

jpenne@crazycomp:~$ llr64 -a2 -t4 -oVerbose=1 -d -q"10223*2^31172165+1"
Starting Proth prime test of 10223*2^31172165+1
Using all-complex FMA3 FFT length 2560K, Pass1=640, Pass2=4K, 4 threads, a = 3
10223*2^31172165+1, bit: 60000 / 31172178 [0.19%]. Time per bit: 7.542 ms.

So, not large enough to really compete...

Regards,
Jean
How close were llrcuda and llr to the fft boundary(llrcuda didn't output which fft length)? What gpu and cpu were these benchmarks on?

Previous versions of llrcuda were better for small ks. Is this still true? Does increasing the k slow down the cpu version and gpu version the same amount?
henryzz is online now   Reply With Quote
Old 2018-01-04, 13:57   #338
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

2×3×97 Posts
Default

Quote:
Originally Posted by henryzz View Post
How close were llrcuda and llr to the fft boundary(llrcuda didn't output which fft length)? What gpu and cpu were these benchmarks on?

Previous versions of llrcuda were better for small ks. Is this still true? Does increasing the k slow down the cpu version and gpu version the same amount?
Sorry, these lines were on lresu* file, but not on the screen :
[Thu Jan 4 07:23:16 2018]
Starting Proth prime test of 10223*2^31172165+1
Using complex irrational base DWT, FFT length = 7340032, a = 3

The GPU is EVGA GeForce GTX 1080 FTW HYBRID GAMING 8Go
The CUDA version is 8.0.44

The CPU is Intel Core i7-5930K(3.5GHz, 6 cores)

The CPU system is Ubuntu Linux x86_64

About your last question, I presently don't know...
Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2018-01-04, 14:17   #339
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

2·3·97 Posts
Default

Quote:
Originally Posted by masser View Post
Wow - exciting news! I just looked at the source briefly; does this support k*b^n+/-1?
No, for now, bases != 2 candidates are processed using generic modular reduction, but the drawback is only for small bases, for example, most generalized Fermat candidates are also processed by the gwnum code using generic modular reduction.

IBDWT processing on base!=2 numbers is a work to do for me...

Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2018-01-04, 14:24   #340
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

24·313 Posts
Default

Jean, whilst running llrcuda do you see CPU consumption? What’s the CPU percentage load during the candidate test?
pinhodecarlos is offline   Reply With Quote
Old 2018-01-04, 14:36   #341
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

2·3·97 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Jean,

You should consider adding PRP tests with Gerbicz error checking. See http://www.mersenneforum.org/showthread.php?t=22471 for a lengthy discussion.

While PRP tests don't prove primes, the error checking catches all computation errors at minimal cost. This is very important on error-prone GPUs. Once a PRP has been found, one can prove the prime with the current LLR algorithm.

Gerbicz error-checking is so powerful that I am considering changing GIMPS to PRP testing instead of LL testing sometime in the future.
Thanks, George for you advice, I shall study this thread as soon a possible.

Also I am very happy to learn of the 50th known Mersenne prime discovery, still a nice success for GIMPS! Congrats to the discoverer and all persons involved in this work. Nice beginning for the year 2018!

Best Regards,
Jean
Jean Penné is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRcuda shanecruise Riesel Prime Search 8 2014-09-16 02:09
LLRCUDA - getting it to work diep GPU Computing 1 2013-10-02 12:12

All times are UTC. The time now is 23:02.


Mon Jan 24 23:02:46 UTC 2022 up 185 days, 17:31, 1 user, load averages: 1.28, 1.38, 1.35

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔