mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2009-11-12, 01:27   #67
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

Hi, frmky
Quote:
Originally Posted by frmky View Post
The non-TESRA version works fine, but the TESRA version seems numerically unstable. One compile would work on some GPUs but not others. I'd simply recompile and that would shuffle the GPUs it would work on. When it failed, it did so because the err returned was larger than kErrLimit, so it would keep increasing n until the grid size (n/512) of the rftfsub_kernel call was too large.
GTX2xx's max block number is 65536(Sorry this number from Japanease book), 4194304/512 < 65536.
But, I change next version.
Quote:
Originally Posted by frmky View Post
I modified the source so that when TESRA is defined, it uses the normalize_kernel from version k rather than breaking it into the three sequential calls. This works correctly on all of the GPUs in the S1070. I'm not sure why breaking it into three kernel calls was causing a problem. Attached is the source as modified.
Version "k" on My GTX260 return correct result, all time, absolutely.

But, a book "CUDA Programming Guide" say.
Quote:
4.4.2 Synchronization Function
void __syncthreads();
synchronizes all threads in a block. Once all threads have reached this point,
execution resumes normally.
threads in a block is only block not all threads.
Quote:
D.2.1 cudaThreadSynchronize()
cudaError_t cudaThreadSynchronize(void);
blocks until the device has completed all preceding requested tasks.
cudaThreadSynchronize() returns an error if one of the preceding tasks failed.
What on your mind ?
msft is offline   Reply With Quote
Old 2009-11-12, 02:08   #68
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010011001112 Posts
Default

Quote:
Originally Posted by msft View Post
Hi, frmky

GTX2xx's max block number is 65536(Sorry this number from Japanease book), 4194304/512 < 65536.
But, I change next version.
That wasn't the real problem. It's just where it failed.

Quote:
Originally Posted by msft View Post
What on your mind ?
I really don't know. cudaThreadSynchronize() is the right thing to do there. And of course the non-TESRA version works perfectly with the three part normalization, with a couple transposes put in. But at least it's easy to avoid. I've started four double-check runs around 23 million with the modified version v I posted earlier. We'll have the results in a few days.
frmky is offline   Reply With Quote
Old 2009-11-12, 02:16   #69
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by frmky View Post
I really don't know. cudaThreadSynchronize() is the right thing to do there. And of course the non-TESRA version works perfectly with the three part normalization, with a couple transposes put in. But at least it's easy to avoid. I've started four double-check runs around 23 million with the modified version v I posted earlier. We'll have the results in a few days.
No problem, All of your decision.
Thank you,
msft is offline   Reply With Quote
Old 2009-11-12, 05:12   #70
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

I test frmky's version source code with -DTESTRA and not -DTESTRA,
M131101 to M1548619 1000 iterations check sum compare to Glucas,it is correct.
Thank you,frmky

"WD1" is a "Public key cryptography" somethig?
It is important?
I use Glucas and Mlucas's Res64 result.
msft is offline   Reply With Quote
Old 2009-11-15, 07:07   #71
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

Hi,

New version MaclucasFFTW.

checkpoint bug fix(MaclucasFFTW original bug).

Thank you,
Attached Files
File Type: gz MacLucasFFTW.cuda.y.tar.gz (32.3 KB, 238 views)
msft is offline   Reply With Quote
Old 2009-11-15, 21:48   #72
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

266310 Posts
Default

Three of the four double-check runs matched the previous residue. This gives me reasonable confidence that the fourth is correct as well:

M23102129
M23102213
M23102267
M23102341

I'll switch to version y.
frmky is offline   Reply With Quote
Old 2009-11-16, 02:38   #73
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi, frmky

I run one double-check runs around 23 million.

Thank you,
msft is offline   Reply With Quote
Old 2009-11-16, 14:01   #74
Cruelty
 
Cruelty's Avatar
 
May 2005

22×11×37 Posts
Default

OT: can't wait to see some FFT implementation using OpenCL or DirectCompute 11, finally I would utilize my HD5870
BTW: AMD plans to expand developer tools for both
Attached Thumbnails
Click image for larger version

Name:	amd.jpg
Views:	269
Size:	47.9 KB
ID:	4305  
Cruelty is offline   Reply With Quote
Old 2009-11-17, 04:36   #75
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Hi, Cruelty

Quote:
Originally Posted by Cruelty View Post
OT: can't wait to see some FFT implementation using OpenCL or DirectCompute 11, finally I would utilize my HD5870
BTW: AMD plans to expand developer tools for both
I hope they release ACML-GPU-FFT.
msft is offline   Reply With Quote
Old 2009-11-17, 11:02   #76
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by msft View Post
I run one double-check runs around 23 million.
My GTX260 result.

M23103809

Primenet supoort MacLucasFFTW result, thank you
msft is offline   Reply With Quote
Old 2009-11-17, 12:54   #77
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5×103 Posts
Default

How long are the 20M exponent double checks taking to complete?

-- Craig
nucleon is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 15:19.


Fri Jul 7 15:19:47 UTC 2023 up 323 days, 12:48, 0 users, load averages: 0.88, 1.06, 1.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔