mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-01-13, 14:13   #56
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
V0.16, about 3% faster than 0.15.

5*2^1282755+1 is prime! Time : 911.813 sec.
And about 0.7 ms/bit!

Changes:

- Everted if statement with carry clause.
- Merged cuda_normalize2_kernel into the two cuda_normalize1_kernels.
- Fixed flag error check - or, not and.
- Used flag error check only on the error checking cases. This is how Jean Penne did it.
- "flag" is now returned as a special err value, as it's an exceptional case. This also saves bandwidth.
ralf@quadriga ~/tmp/llrcuda.0.16 $ time ./llrCUDA -q5*2^1282755+1 -d
Starting Proth prime test of 5*2^1282755+1, FFTLEN = 131072 ; a = 3
5*2^1282755+1 is prime! Time : 848.820 sec.

real 14m8.854s
user 5m31.889s
sys 6m15.963s

---

Linux 64 bit / GTX 460 / CUDA SDK 3.2 / 260.19.29 drivers

Last fiddled with by Ralf Recker on 2011-01-13 at 14:14
Ralf Recker is offline   Reply With Quote
Old 2011-01-13, 16:07   #57
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by Jean Penné View Post
I hope it can resolve your overflow problem...
Jean
Thank you everything,
karatsquareg is slow with conversions.
I modify addsignal & giant_to_double.
$ time ./llrCUDA -d -q7*2^3015762+1
Starting Proth prime test of 7*2^3015762+1, FFTLEN = 262144 ; a = 3
7*2^3015762+1 is prime! Time : 3994.725 sec.

real 66m35.339s
user 30m51.700s
sys 27m54.610s
Attached Files
File Type: gz llrcuda.0.17.tar.gz (94.4 KB, 239 views)
msft is offline   Reply With Quote
Old 2011-01-13, 16:35   #58
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default Timings...

$ time ./llrCUDA -d -q7*2^3015762+1 vs. $ time ./llr -q7*2^3015762+1 -d

GPU: ca. 1.167 ms/bit - llrCUDA v0.17
CPU: ca. 2.321 ms/bit - llr 3.8.4

Result:


ralf@quadriga ~/tmp/llrcuda.0.17 $ ./llrCUDA -q7*2^3015762+1 -d
Starting Proth prime test of 7*2^3015762+1, FFTLEN = 262144 ; a = 3
7*2^3015762+1 is prime! Time : 3529.997 sec.


Waiting for the CPU task to finish (near 50%)...

Last fiddled with by Ralf Recker on 2011-01-13 at 17:34
Ralf Recker is offline   Reply With Quote
Old 2011-01-13, 18:44   #59
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default Update - GPU/CPU timings

Starting Proth prime test of 7*2^3015762+1
Using all-complex Pentium4 type-3 FFT length 192K, Pass1=256, Pass2=768, a = 3
7*2^3015762+1 is prime! Time : 6947.736 sec.
Ralf Recker is offline   Reply With Quote
Old 2011-01-13, 19:53   #60
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

$ time ./llrCUDA -d -q3*2^5082306+1
Starting Proth prime test of 3*2^5082306+1, FFTLEN = 524288 ; a = 5
3*2^5082306+1 is prime! Time : 13317.977 sec.

real 221m58.636s
user 122m8.680s
sys 115m11.880s
msft is offline   Reply With Quote
Old 2011-01-17, 04:59   #61
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

$ time ./llrCUDA -d -q7*2^3015762+1
Starting Proth prime test of 7*2^3015762+1, FFTLEN = 262144 ; a = 3
7*2^3015762+1 is prime! Time : 3760.825 sec.

real 62m42.125s
user 20m43.400s
sys 24m50.010s

Tuning Finish.
Please report bug(or prime).
Attached Files
File Type: gz llrcuda.0.20.tar.gz (94.8 KB, 225 views)
msft is offline   Reply With Quote
Old 2011-01-17, 06:08   #62
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Fix bug.
Thank you Ken_g6,
Attached Files
File Type: gz llrcuda.0.21.tar.gz (94.8 KB, 250 views)
msft is offline   Reply With Quote
Old 2011-01-23, 15:32   #63
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DED16 Posts
Default

If anyone is using the giants library, then I recommend changing the lower-level transforms to use the arbitrary-precision arithmetic from an older version of msieve. Start here, and use

common/ap.c
common/mp.c
common/fastmult.c
include/ap.h
include/fastmult.h
include/mp.h
include/util.h

This has a much cleaner implementation of the FFT arithmetic and uses 32-bit integers throughout.

msft, the code you were wondering about does the carry propagation after the FFT multiply completes.
jasonp is offline   Reply With Quote
Old 2011-01-24, 11:43   #64
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

Hi, jasonp
Good information.
thank you,

Last fiddled with by msft on 2011-01-24 at 11:44
msft is offline   Reply With Quote
Old 2011-01-29, 07:38   #65
Jean Penné
 
Jean Penné's Avatar
 
May 2004
FRANCE

24·3·13 Posts
Default llrpi completed on Version 3.8.0

Hi,

I just released a new llrpi version that can test all the numbers that LLR 3.8.4 can test ; if you are interested, see :

http://www.mersenneforum.org/showthr...238#post250238

for details.

Regards,
Jean
Jean Penné is offline   Reply With Quote
Old 2011-01-29, 08:24   #66
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi,
rdft() is familiar to me.
Thank you,
msft is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRcuda shanecruise Riesel Prime Search 8 2014-09-16 02:09
LLRCUDA - getting it to work diep GPU Computing 1 2013-10-02 12:12

All times are UTC. The time now is 15:07.


Fri Jul 7 15:07:16 UTC 2023 up 323 days, 12:35, 0 users, load averages: 0.93, 1.18, 1.16

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔