![]() |
|
|
#45 |
|
Jul 2009
Tokyo
10011000102 Posts |
|
|
|
|
|
|
#46 |
|
Dec 2011
1310 Posts |
Thank you for the answers
|
|
|
|
|
|
#47 | |
|
"Jerry"
Nov 2011
Vancouver, WA
46316 Posts |
So I finally got around to installing CUDALucas 1.2b to use with my GTX 580s. I've been TFing 8 instances with two HD 5870s and two 580s. The 580s are faster.
I dropped one instance of mfaktc for CUDALucas and I can't believe how fast it is for LL testing. I haven't run a full 4 cores on one LL in a while, but even with 3 instances of mfakc running the LL is only going to take ~60 hours. I set 3 cores to run mfaktc and one core for CUDA. On my x9650 if I don't set it up that way it makes the system slow because TFing drives the cores to 100% on the nVidia cards.. So, the reason for my post is that I kinda feel like I'm wasting time using CPUs to LL or TF anymore. I have several systems that are runing LLs that might be better off doing something else. I know it's better to have them do something rather than just sit, but in the time it takes them to do one LL I could finish all my current assignments with CUDA (and that's just using the one 580). Once 580s and 590s (and whatever else is coming) drop in price, we're going to be able to make a huge dent into LLing and TFing. And the other systems can work P-1 or easier DC checks. I can't wait to pickup some more cards that can run CUDA. Hopefully Windows 8 will fix the Bulldozer problem so I can use some of that system for LL or TF also. Just curious what everyone's thoughts are on this? Quote:
Last fiddled with by flashjh on 2011-12-29 at 06:48 Reason: Find out what runs 1.3 DPTF |
|
|
|
|
|
|
#48 | |
|
Feb 2004
2408 Posts |
Quote:
I'm a bit curious about your setup: 1) What's the size of your exponent? Are we talking LL or DC here? 2) How long does your CPU take for the same exponent? 3) Did you consider that your CPU, if it's a 4 core could do 4 test in parallel? 4) While using CUDALucas, what is the performance of your mfaktc instance? Exponent size, Bit Factored and SievePrime depth? Also when using CUDALucas the core in the CPU basically does nothing, you can run an LL test on it with little to no impact on performance. Thanks, |
|
|
|
|
|
|
#49 | |||||
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
I'm a bit curious about your setup:
This is a QX9650 with 8GB DDR2-1066, 2 MSI GTX 580s, GA-EP45-UD3P - Boot overclock is 9.0 multiplier, 450FSB, memory set to 2.40B. Then I downclock with EasyTune6 to 290FSB - I haven't figured out why I get much better performance with that and it stays a lot cooler. I have the 3 mfaktc instances all using cores 1-3, not individually assigned and CUDA assigned to core 4. All 3 mfaktc use GPU 1 and CUDA uses GPU 2. Quote:
Quote:
Quote:
Quote:
mfakto1: Code:
class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait 657/4620 | 1.91G | 10.812s | 2h28m | 176.80M/s | 6153 | 2.40% Code:
class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait 2316/4620 | 1.89G | 11.658s | 1h33m | 161.81M/s | 7033 | 2.11% Code:
class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait 2280/4620 | 1.89G | 12.222s | 1h38m | 154.34M/s | 7033 | 2.19% Code:
Iteration 13990000 M( 4524XXXX )C, 0x0fc83c04f4e74388, n = 4194304, CUDALucas v1 .2b (0:52 real, 5.1693 ms/iter, ETA 44:52:20) Quote:
Last fiddled with by flashjh on 2011-12-29 at 15:05 Reason: add multiplier |
|||||
|
|
|
|
|
#50 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
11100001101012 Posts |
On a 2600, I can get one of those LL's done in slightly less than a month, so three per month with one core for mfaktc. That fourth core of yours is doing literally nothing at the moment -- task manager should be reporting 1 or 2% usage. If you run LL on that core, memory restrictions will reduce mfakto throughput by 1 or 2% -- minor compared to the LL work you're doing. May or may not affect CUDALucas, and if it does, then it will be even less than mfakto. Some notes: CUDALucas I believe is up to version 1.4. Also, CUDALucas, in general, gets around 1/5th=1/4th of the throughput of mfakt*, measured in PrimeNet's GHz-Days metric. This is because the LL test is only sort of parallelizable, whereas TF is so-called 'embarrassingly parallel'. Thus most people run mfakt* on the GPU's, and keep the LL on the CPU simply because that's what it's most efficient at. Some people do use CUDALucas anyways because they don't care about PrimeNet GHz-Days anyways, and there's also the fact that P-1 factoring currently has no GPU equivalent and PrimeNet always has need there. If you can't wait for LL on CPU, then do P-1 factoring with that extra core. (Or TF-LMH, but P-1 would be more useful, I think.) (Edit: You could also run DC's.)
Last fiddled with by Dubslow on 2011-12-29 at 15:27 |
|
|
|
|
|
#51 |
|
Dec 2011
11012 Posts |
For information: i run LL tests on CudaLucas in 5 days for n= +/- 50.xxx.xxx exponents, on GTX 580's card.
Last fiddled with by f11ksx on 2011-12-29 at 22:02 |
|
|
|
|
|
#52 | |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41·251 Posts |
Quote:
Currently I am doing 130 hours per LL in the higher 50M area, and 24 hours per DC in 28M-32M area, per each GPU, with a single copy of CL running on each GPU, and that will almost maximize the GPU. Unfortunately mfaktc does not seems to take all the advantage of the Fermi's, the internal memory is not used at all, and it relies on CPU for filtering, I need to put all 4/8 cores into 4 or 6 copies of mfaktc to be able to maximize the two GPU's with them, and in this case the computer can't do something else without decreasing the GPU occupation percent. To have the GPU's at max, I need to keep the computer "idle". That is why I would prefer to use CL for DC in one GPU, and two or three copies of mfaktc to TF at the LL-front on the second GPU. This is the optimum performance. At DC front you can clear one expo per each day per each GPU. This is the faster-ever method to clear the exponents. With trial-factoring at DC front you will NOT find a factor each day. Some days you can test 50 exponents for 2-3 bitlevels, or combinations of these (100-300Ghz-days/day) and find 1, 2, 3 factors, but next 5,7,15, etc days you will find none. TF is "lucky draw". DC is "sure". With DC at DC-front, you will clear one exponent per day, per GPU, no question! And (AND!) this will let your CPU free, so you still can do some P-1 testing on it. Or another DC, if you like, using P95, for a 3G processor you will get about 15-20ms per iteration using one core, so you can get one DC-out every week, or every two weeks. That is, with a Fermi and one (ONE!) CPU-core, you can clear 35 expos per month, at least. If you decide to work at DC front. If you decide for LL front, the things are a bit different, and I explained them (not only once) in the GPU-2-72 topic. |
|
|
|
|
|
|
#53 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
For me at least, I can (almost) max out a my one GPU (460) with one of my four CPU cores, so mfaktc/TF makes more sense. I think it varies more with hardware setup than with actual stats and total throughput etc.. (Do you type a .. ?)
Note to flash: For reference, PrimeNet reports expected 5 days for 25M, and 19 days for 45M. Last fiddled with by Dubslow on 2011-12-30 at 06:38 Reason: flash! read this again! |
|
|
|
|
|
#54 | |
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
Quote:
|
|
|
|
|
|
|
#55 |
|
Dec 2009
Peine, Germany
14B16 Posts |
Hi,
here an updated version of the GPU Computing Guide. Changes: - New versions of mfaktc, mfakto and CUDALucas. Links to all binaries... - Missing CUDA 3.2/4.0 libs for CUDALucas can be downloaded, see page 2 Please check for major bugs. If valid maybe an admin could update the stickies... Happy new year, Brain GIMPS GPU Computing Cheat Sheet (pdf) Last fiddled with by Brain on 2012-08-05 at 10:06 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Anti-poverty drug testing vs "high" tax deduction testing | kladner | Soap Box | 3 | 2016-10-14 18:43 |
| What am I testing? | GARYP166 | Information & Answers | 9 | 2009-02-18 22:41 |
| k=243 testing ?? | gd_barnes | Riesel Prime Search | 20 | 2007-11-08 21:13 |
| Testing | grobie | Marin's Mersenne-aries | 1 | 2006-05-15 12:26 |
| Speed of P-1 testing vs. Trial Factoring testing | eepiccolo | Math | 6 | 2006-03-28 20:53 |