![]() |
|
|
#56 | |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41·251 Posts |
Quote:
At DC front, anyhow, it makes no sense to TF anything behind 68-69 bits, regardless of what GPU you have. Look at GPU-2-72 status, people found a DC-factor every 1.5-2.2 days, in average, and "lowlevel" bits (65-68 bits) are "end of life". For 69-70-etc bits, it will take even more time per factor. So, why should I (here "I" means "any owner of a Fermi GPU card") waste a double-time to TF at DC front, when I can directly LL-DC them? (that is LL at DC front). And get rid of one exponent EVERY day. And a bit more, having a CPU core free for P95 DC or P-1, or whatever. At LL front the things are different, because a factor found by TF (every 3-5 days, with an average GPU, as it seems now, or say, every 2-3 days with a high-end GPU) will save TWO tests (LL's) AND some P-1 testing on CPU. That is, every factor found would save about 10 days of LL work with the BEST GPU around, or two months of work with the best CPU around (one core). As long as we are still finding factors faster (more often then 10 days per factor) by TF, we should "raise" the bit level and do TF on GPUs. But we should do LL tests with CudaLucas for all "optimum FFT lengths", regardless if they are on LL-front or DC-front. People don't really get it how CudaLucas works, and why the time per test is almost constant for a very long range of exponents, then it is instantly doubling for the next exponent. CL is using FFT which is powers of 2 in length, contrary to P95 which has a finer "granulation" of FFT. To put it in a graphic, it would look like the attached picture. That is, CL is "not optimum" in the purple areas, it could use a smaller FFT and get the test done faster. Unfortunately we are now with the LL-front exactly on such a "purple" area. (I did not put any numbers on graphic, in fact I deleted the numbers, this is done on purpose, as the numbers will vary depending on hardware). The times on P95 are also in stairs, but with a better granulation, as P95 "adapts" the FFT size to the exponent size much better then CL does. But CL multiply them in parallel, getting a better time per iteration, and P95 not. For the same FFT size, the time per iteration is the same (theoretical), regardless of the exponent. The total time increases a little with the exponent increasing, because more iterations will be necessary for a bigger exponent. That is why the stairs are not horizontal. They become "optimum" at their "ends" (marked green on the CL graphic), both for P95 and CL. |
|
|
|
|
|
|
#57 | |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
Quote:
My only point is that is makes more sense for me to run mfaktc than CUDALucas, regardless of what assignments people are doing or should be doing etc. I can get (almost) full GPU utilization with only one of four cores with mfaktc. Therefore I run mfaktc. This decision has nothing to do with GIMPS/PrimeNet assignments/status. |
|
|
|
|
|
|
#58 | |
|
Dec 2009
Peine, Germany
331 Posts |
Quote:
Last fiddled with by Brain on 2011-12-30 at 19:45 |
|
|
|
|
|
|
#59 | |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
1C3516 Posts |
Quote:
What about CUDALucas 1.3? Is that of use? Also, I would consider removing "MOST NEEDED GIMPS WORK TYPE" from CUDALucas. Because all TF <60M has been moved to GPU only, one could make a decent argument that we're short on TF. GPU272 is barely keeping up with the 45M-55M work, much less the current wavefront. (Obviously what I say is not final, but I think it's worth consideration.) Suggestion: Move the link for LESS_CLASSES mfaktc to the remarks section, next to where you talk about efficiency. (Maybe specifically mention LMH?) Last fiddled with by Dubslow on 2011-12-30 at 20:15 |
|
|
|
|
|
|
#60 |
|
Dec 2011
158 Posts |
CudaLucas v1.41 is running pretty well !
9.3 ms/iter for 54M exponent on GTX-580 card. Thanks a lot. |
|
|
|
|
|
#61 | |
|
Dec 2009
Peine, Germany
331 Posts |
Quote:
There are two 1.3 versions: One by Ethan (EO) which is older (a tuned 1.2b) but laggy for me and another 1.3 version by msft which has additional timing output. As there is a 1.4 (by msft) I'd like to skip 1.3 for confusion reasons... |
|
|
|
|
|
|
#62 |
|
Dec 2009
Peine, Germany
331 Posts |
Last fiddled with by Brain on 2012-08-05 at 10:06 |
|
|
|
|
|
#63 |
|
Dec 2009
Peine, Germany
14B16 Posts |
Changes:
- CUDALucas 1.48 - mfaktc for CUDA 4.1 Now, we should really update the sticky post #1 attachments. Otherwise, I'd prefer no such file to having outdated files... GIMPS GPU Computing Cheat Sheet (pdf) Last fiddled with by Brain on 2012-08-05 at 10:07 |
|
|
|
|
|
#64 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
5·7·139 Posts |
Quote:
I was wodering if the "Restrictions" on FFT size (CUDALucas 1.48) still hold, as it now supports non power of 2 FFt sizes. Another question to the forum readers: when you say "Compilable with CUDA Toolkit 3.1" do you mean "the source code compiles, but won't work with CUDA Toolkit < 3.1"? Luigi Last fiddled with by ET_ on 2012-01-29 at 16:23 |
|
|
|
|
|
|
#65 | ||
|
Dec 2009
Peine, Germany
331 Posts |
Quote:
Quote:
I only compiled it for CUDA 4.0 and 4.1. |
||
|
|
|
|
|
#66 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
5·7·139 Posts |
Quote:
Luigi Last fiddled with by ET_ on 2012-01-29 at 17:31 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Anti-poverty drug testing vs "high" tax deduction testing | kladner | Soap Box | 3 | 2016-10-14 18:43 |
| What am I testing? | GARYP166 | Information & Answers | 9 | 2009-02-18 22:41 |
| k=243 testing ?? | gd_barnes | Riesel Prime Search | 20 | 2007-11-08 21:13 |
| Testing | grobie | Marin's Mersenne-aries | 1 | 2006-05-15 12:26 |
| Speed of P-1 testing vs. Trial Factoring testing | eepiccolo | Math | 6 | 2006-03-28 20:53 |