mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2009-11-10, 21:28   #56
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010011001112 Posts
Default

Here's the profiler output for k in case it's useful. I can send the CSV is you wish.

It appears that in version u, transpose speeds up normalize_kernel but by a bit less than the transpose takes so it's a small net loss. I presume on the GTX260, normalize_kernel was significantly slower before so the transpose saves much more time than it takes.

The C1060s are only running at a clock rate of 1.3GHz, so if your GTX260 is running at 1.44GHz (deviceQuery will tell you) then the difference between our measured times for version u is simply the clock rate of the cards.
Attached Thumbnails
Click image for larger version

Name:	profile_k.PNG
Views:	310
Size:	40.0 KB
ID:	4285  

Last fiddled with by frmky on 2009-11-10 at 21:43
frmky is offline   Reply With Quote
Old 2009-11-11, 07:26   #57
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by frmky View Post
The C1060s are only running at a clock rate of 1.3GHz, so if your GTX260 is running at 1.44GHz (deviceQuery will tell you)
then the difference between our measured times for version u is simply the clock rate of the cards.
Exactry, I add compile option -DTESRA(Version "k" like code).

Add -DTESRA at .0157 sec/iter for the 2048K FFT and .0415 sec/iter for the 4096K FFT on GTX260.
No -DTESRA at .0131 sec/iter for the 2048K FFT and .0247 sec/iter for the 4096K FFT on GTX260.


Thank you,
Attached Files
File Type: gz MacLucasFFTW.cuda.v.tar.gz (31.9 KB, 269 views)
msft is offline   Reply With Quote
Old 2009-11-11, 08:57   #58
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,663 Posts
Default

Excellent! On the Tesla C1060, version v 4096K FFTs times are

without TESRA: 0.0275 sec/iter
with TESRA: 0.0264 sec/iter

so it does match the speed of version k. I've also confirmed that restart works correctly. The 10000 iteration residue matches the mprime "interim Wd8 residue at iteration 10002." Is this just a difference in how iterations are counted in the two programs?
frmky is offline   Reply With Quote
Old 2009-11-11, 09:35   #59
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by frmky View Post
I've also confirmed that restart works correctly. The 10000 iteration residue matches the mprime "interim Wd8 residue at iteration 10002." Is this just a difference in how iterations are counted in the two programs?
I'm not touch this part.
msft is offline   Reply With Quote
Old 2009-11-11, 09:45   #60
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,663 Posts
Default

OK, it's not really an issue anyway. Just have to remember that when comparing interim residues with Prime95. Actually, I should check that the final residues for composites match those of Prime95...
frmky is offline   Reply With Quote
Old 2009-11-11, 10:20   #61
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by frmky View Post
OK, it's not really an issue anyway. Just have to remember that when comparing interim residues with Prime95. Actually, I should check that the final residues for composites match those of Prime95...
I check it. Tell me script.(how command ?)
msft is offline   Reply With Quote
Old 2009-11-11, 11:30   #62
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,663 Posts
Default

I checked it. They do agree with the Res64 of Prime95. I've found a problem, though...

[childers test]$ ./MacLucas_v_T_0 23102129
cutilCheckMsg() CUTIL CUDA error: CUDA Kernel execution failed in file <MacLucasFFTW.cu>, line 1218 : invalid configuration argument.
frmky is offline   Reply With Quote
Old 2009-11-11, 12:18   #63
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by frmky View Post
I checked it. They do agree with the Res64 of Prime95. I've found a problem, though...

[childers test]$ ./MacLucas_v_T_0 23102129
cutilCheckMsg() CUTIL CUDA error: CUDA Kernel execution failed in file <MacLucasFFTW.cu>, line 1218 : invalid configuration argument.
I can not reproduce.
100% Repeatability ?
When after machine reboot...
msft is offline   Reply With Quote
Old 2009-11-11, 17:37   #64
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

10111111111012 Posts
Default

Quote:
Originally Posted by msft View Post
I'm not touch this part.
its a feature of mlucas(actually i suspect its possibly a feature of prime95 as i think glucas might have had the same problem)
when doing doublechecks of mersenne primes they had to fiddle this
henryzz is offline   Reply With Quote
Old 2009-11-11, 20:25   #65
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,663 Posts
Default

Quote:
Originally Posted by msft View Post
I can not reproduce.
100% Repeatability ?
When after machine reboot...
The non-TESRA version works fine, but the TESRA version seems numerically unstable. One compile would work on some GPUs but not others. I'd simply recompile and that would shuffle the GPUs it would work on. When it failed, it did so because the err returned was larger than kErrLimit, so it would keep increasing n until the grid size (n/512) of the rftfsub_kernel call was too large.

I modified the source so that when TESRA is defined, it uses the normalize_kernel from version k rather than breaking it into the three sequential calls. This works correctly on all of the GPUs in the S1070. I'm not sure why breaking it into three kernel calls was causing a problem. Attached is the source as modified.
Attached Files
File Type: gz MacLucasFFTW.cu.gz (11.0 KB, 254 views)
frmky is offline   Reply With Quote
Old 2009-11-12, 01:22   #66
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Hi, henryzz
Quote:
Originally Posted by henryzz View Post
its a feature of mlucas(actually i suspect its possibly a feature of prime95 as i think glucas might have had the same problem)
when doing doublechecks of mersenne primes they had to fiddle this
I understand.

Thank you,
msft is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 15:17.


Fri Jul 7 15:17:39 UTC 2023 up 323 days, 12:46, 0 users, load averages: 0.99, 1.11, 1.12

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔