mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-12-30, 10:03   #56
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41×251 Posts
Default

Quote:
Originally Posted by Dubslow View Post
For me at least, I can (almost) max out a my one GPU (460) with one of my four CPU cores, so mfaktc/TF makes more sense. I think it varies more with hardware setup than with actual stats and total throughput etc.. (Do you type a .. ?)

Note to flash: For reference, PrimeNet reports expected 5 days for 25M, and 19 days for 45M.
I won't exactly call 460 a "Fermi", it contains the first version of GF100 chip, for which the double multiply took 4 times a single multiply operation. For that, TF is more profitable. On a "real" Fermi (5x0, tesla, GF101 chip, quadro, newer stuff with double precision inside), TF become "just a bit" faster (due to clock increase), but CudaLucas become MUCH faster.

At DC front, anyhow, it makes no sense to TF anything behind 68-69 bits, regardless of what GPU you have. Look at GPU-2-72 status, people found a DC-factor every 1.5-2.2 days, in average, and "lowlevel" bits (65-68 bits) are "end of life". For 69-70-etc bits, it will take even more time per factor. So, why should I (here "I" means "any owner of a Fermi GPU card") waste a double-time to TF at DC front, when I can directly LL-DC them? (that is LL at DC front). And get rid of one exponent EVERY day. And a bit more, having a CPU core free for P95 DC or P-1, or whatever.

At LL front the things are different, because a factor found by TF (every 3-5 days, with an average GPU, as it seems now, or say, every 2-3 days with a high-end GPU) will save TWO tests (LL's) AND some P-1 testing on CPU. That is, every factor found would save about 10 days of LL work with the BEST GPU around, or two months of work with the best CPU around (one core). As long as we are still finding factors faster (more often then 10 days per factor) by TF, we should "raise" the bit level and do TF on GPUs.

But we should do LL tests with CudaLucas for all "optimum FFT lengths", regardless if they are on LL-front or DC-front. People don't really get it how CudaLucas works, and why the time per test is almost constant for a very long range of exponents, then it is instantly doubling for the next exponent. CL is using FFT which is powers of 2 in length, contrary to P95 which has a finer "granulation" of FFT. To put it in a graphic, it would look like the attached picture. That is, CL is "not optimum" in the purple areas, it could use a smaller FFT and get the test done faster. Unfortunately we are now with the LL-front exactly on such a "purple" area.
(I did not put any numbers on graphic, in fact I deleted the numbers, this is done on purpose, as the numbers will vary depending on hardware).

The times on P95 are also in stairs, but with a better granulation, as P95 "adapts" the FFT size to the exponent size much better then CL does. But CL multiply them in parallel, getting a better time per iteration, and P95 not. For the same FFT size, the time per iteration is the same (theoretical), regardless of the exponent. The total time increases a little with the exponent increasing, because more iterations will be necessary for a bigger exponent. That is why the stairs are not horizontal. They become "optimum" at their "ends" (marked green on the CL graphic), both for P95 and CL.
Attached Thumbnails
Click image for larger version

Name:	CL P95 FFT Time.jpg
Views:	315
Size:	34.7 KB
ID:	7472  
LaurV is offline   Reply With Quote
Old 2011-12-30, 18:40   #57
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by LaurV View Post
At DC front, anyhow, it makes no sense to TF anything behind 68-69 bits, regardless of what GPU you have.
I agree here.


My only point is that is makes more sense for me to run mfaktc than CUDALucas, regardless of what assignments people are doing or should be doing etc. I can get (almost) full GPU utilization with only one of four cores with mfaktc. Therefore I run mfaktc. This decision has nothing to do with GIMPS/PrimeNet assignments/status.
Dubslow is offline   Reply With Quote
Old 2011-12-30, 19:44   #58
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default

Quote:
Originally Posted by LaurV View Post
People don't really get it how CudaLucas works, and why the time per test is almost constant for a very long range of exponents, then it is instantly doubling for the next exponent. CL is using FFT which is powers of 2 in length, contrary to P95 which has a finer "granulation" of FFT.
This does not apply for CUDALucas >= 1.4 any more. msft implemented non-power-of-2-FFTs. But this version is a only aged a few days and being tested.

Last fiddled with by Brain on 2011-12-30 at 19:45
Brain is offline   Reply With Quote
Old 2011-12-30, 20:14   #59
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by Brain View Post
Hi,
here an updated version of the GPU Computing Guide.

Changes:
- New versions of mfaktc, mfakto and CUDALucas. Links to all binaries...
- Missing CUDA 3.2/4.0 libs for CUDALucas can be downloaded, see page 2

Please check for major bugs. If valid maybe an admin could update the stickies...

Happy new year, Brain
The most recent mfakto readme says something to the effect of "With 11.07+ in Win, you do not need the SDK. Same for 11.11+ in Linux." I would double check to be sure though.

What about CUDALucas 1.3? Is that of use?

Also, I would consider removing "MOST NEEDED GIMPS WORK TYPE" from CUDALucas. Because all TF <60M has been moved to GPU only, one could make a decent argument that we're short on TF. GPU272 is barely keeping up with the 45M-55M work, much less the current wavefront. (Obviously what I say is not final, but I think it's worth consideration.)

Suggestion: Move the link for LESS_CLASSES mfaktc to the remarks section, next to where you talk about efficiency. (Maybe specifically mention LMH?)

Last fiddled with by Dubslow on 2011-12-30 at 20:15
Dubslow is offline   Reply With Quote
Old 2011-12-31, 09:50   #60
f11ksx
 
Dec 2011

1310 Posts
Talking V1.41

CudaLucas v1.41 is running pretty well !
9.3 ms/iter for 54M exponent on GTX-580 card.

Thanks a lot.
f11ksx is offline   Reply With Quote
Old 2011-12-31, 11:19   #61
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default

Quote:
Originally Posted by f11ksx View Post
CudaLucas v1.41 is running pretty well !
9.3 ms/iter for 54M exponent on GTX-580 card.

Thanks a lot.
Maybe you could/should join the discussion in the CUDALucas thread. Could you test and reply there that CUDALucas >=1.4 now uses CPU resources if -c flag is set?

Quote:
Originally Posted by Dubslow View Post
What about CUDALucas 1.3? Is that of use?
There are two 1.3 versions: One by Ethan (EO) which is older (a tuned 1.2b) but laggy for me and another 1.3 version by msft which has additional timing output. As there is a 1.4 (by msft) I'd like to skip 1.3 for confusion reasons...
Brain is offline   Reply With Quote
Old 2011-12-31, 15:21   #62
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default GPU Computing Guide Update to v 0.07a

Changes:
- CUDALucas 1.4.2
- mfakto requirements

GIMPS GPU Computing Cheat Sheet (pdf)

Last fiddled with by Brain on 2012-08-05 at 10:06
Brain is offline   Reply With Quote
Old 2012-01-29, 14:50   #63
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default GPU Computing Guide Update to v 0.08

Changes:
- CUDALucas 1.48
- mfaktc for CUDA 4.1

Now, we should really update the sticky post #1 attachments. Otherwise, I'd prefer no such file to having outdated files...

GIMPS GPU Computing Cheat Sheet (pdf)

Last fiddled with by Brain on 2012-08-05 at 10:07
Brain is offline   Reply With Quote
Old 2012-01-29, 16:23   #64
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

114018 Posts
Default

Quote:
Originally Posted by Brain View Post
Changes:
- CUDALucas 1.48
- mfaktc for CUDA 4.1

Now, we should really update the sticky post #1 attachments. Otherwise, I'd prefer no such file to having outdated files...
Thank you Brain.

I was wodering if the "Restrictions" on FFT size (CUDALucas 1.48) still hold, as it now supports non power of 2 FFt sizes.

Another question to the forum readers: when you say "Compilable with CUDA Toolkit 3.1" do you mean "the source code compiles, but won't work with CUDA Toolkit < 3.1"?

Luigi

Last fiddled with by ET_ on 2012-01-29 at 16:23
ET_ is offline   Reply With Quote
Old 2012-01-29, 16:52   #65
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

33110 Posts
Default

Quote:
Originally Posted by ET_ View Post
Thank you Brain.

I was wodering if the "Restrictions" on FFT size (CUDALucas 1.48) still hold, as it now supports non power of 2 FFt sizes.
I have no reliable values as I never tested 8M FFTs and above. My GTX 560 Ti 1GB will probably be memory limited. Understand the FFT borders as guidelines, I took them from one of msft's posts.

Quote:
Originally Posted by ET_ View Post
Another question to the forum readers: when you say "Compilable with CUDA Toolkit 3.1" do you mean "the source code compiles, but won't work with CUDA Toolkit < 3.1"?

Luigi
Oh, it was just my reaction to today's msft response to CUDA-Compatibility-Mode which works only in CUDA 3.1.
I only compiled it for CUDA 4.0 and 4.1.
Brain is offline   Reply With Quote
Old 2012-01-29, 17:31   #66
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5·7·139 Posts
Default

Quote:
Originally Posted by Brain View Post
I have no reliable values as I never tested 8M FFTs and above. My GTX 560 Ti 1GB will probably be memory limited. Understand the FFT borders as guidelines, I took them from one of msft's posts.


Oh, it was just my reaction to today's msft response to CUDA-Compatibility-Mode which works only in CUDA 3.1.
I only compiled it for CUDA 4.0 and 4.1.
Shoichiro just answered my question. CUDALucas v1.48 is worthless with my CUDA Toolkit 3.0 and CC 1.3, so I will stick on v1.3 for now.

Luigi

Last fiddled with by ET_ on 2012-01-29 at 17:31
ET_ is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Anti-poverty drug testing vs "high" tax deduction testing kladner Soap Box 3 2016-10-14 18:43
What am I testing? GARYP166 Information & Answers 9 2009-02-18 22:41
k=243 testing ?? gd_barnes Riesel Prime Search 20 2007-11-08 21:13
Testing grobie Marin's Mersenne-aries 1 2006-05-15 12:26
Speed of P-1 testing vs. Trial Factoring testing eepiccolo Math 6 2006-03-28 20:53

All times are UTC. The time now is 14:46.


Fri Jul 7 14:46:12 UTC 2023 up 323 days, 12:14, 0 users, load averages: 1.20, 1.23, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔