![]() |
|
|
#529 |
|
Mar 2007
179 Posts |
I haven't played games with the priorities, but I have successfully run CUDALucas and mfaktc concurrently on the same card. With respect to a production-worthy binary, I returned a successful double-check residue with apsen's build from post #518.
|
|
|
|
|
|
#530 | |
|
Dec 2010
Monticello
5·359 Posts |
Quote:
It also needs cudafftw64_40_17 from Nvidia. Mangle Detachments wouldn't let me upload it....it's too big! On a GTX480, running also mfaktc, I'm getting 40 or 50ms per iteration. Exponent size is 25.0M for an LL-D done in perhaps 300 hours. That about right? Last fiddled with by Christenson on 2011-08-16 at 02:47 Reason: Found out why I can't mangle that detachment! |
|
|
|
|
|
|
#531 |
|
"Dave"
Sep 2005
UK
53308 Posts |
That's very slow. On a GTX-570 I was completing a 26M exponent in about 33 hours. That was using the stock CUDALucas on Linux.
|
|
|
|
|
|
#532 |
|
Dec 2010
Monticello
111000000112 Posts |
Ideas? have I saturated the card's memory? (mfaktc is busy TF'ing something for OBD to 84 bits, and will be done with the assignment in 8 days or so). Or did I simply not assign cudaLucas enough priority with respect to mfaktc? Machine is 4-core, Win64, and running P95 on the other 3 cores. I ran out of experimentation time last night, need to remember how I got the temperature out of the GPU last time.
|
|
|
|
|
|
#533 |
|
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996
1348 Posts |
If memory serves me correctly, mfaktc keeps the ALUs and the host-to-device memory controllers pretty busy, and executes its inner loops almost entirely in shared memory -- I don't think there's terribly much GPU left over if you're already running mfaktc on a fast host processor.
The priority you give to the two executables shouldn't make too much difference, as CUDALucas uses essentially 0 host CPU. Because it eliminates one devicetohost memory copy in the inner loop, you may want to give my build from http://www.mersenneforum.org/showpos...&postcount=524 a try -- I've completed successful doublechecks on 25056947 and 25038353 and verfied all the known mersenne primes between 216091 and 20996011 with it. On an unrelated topic, CUDALucas on my card is much less prone to errors with overclocking than mfaktc, which surprises me. Both of the doublechecks I metioned were run with the core clock at 775mhz, while mfaktc starts failing selftests around 700mhz on the same card. |
|
|
|
|
|
#534 |
|
Dec 2010
Monticello
5·359 Posts |
The host CPU isn't terribly fast -- it only managed to use about 40% of the GPU capacity with mfaktc.
To my mind, your overclocking result says that mfaktc stresses different parts of the GPU. I suppose that when I next visit that machine, I'll double-check my timing and try your build and see if it is significantly faster. Is it worth forcing mfaktc and cudaLucas to run on separate cores? |
|
|
|
|
|
#535 | ||||
|
"Oliver"
Mar 2005
Germany
45716 Posts |
Hi!
Quote:
Quote:
Quote:
Quote:
CUDALucas is memory bandwidth limited, thats why those 2000 series Teslas aren't much faster than GTX [45][78]0. I would assume that CUDALucas doesn't stress the ALU/FPU that hard.Oliver |
||||
|
|
|
|
|
#536 |
|
Dec 2010
Monticello
5×359 Posts |
Any idea what my context switches might cost? Good ideas on how to reduce their number (system has 2-4 Gigs of real memory, mfaktc on one core cannot keep up, so it certainly could be made to buffer a bit more and swap out of the GPU less often)?
|
|
|
|
|
|
#537 | ||
|
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996
22·23 Posts |
Quote:
Quote:
-Ethan ps. I haven't been keeping up on goings on in the mfaktc world for a while; hope things are good :) |
||
|
|
|
|
|
#538 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
Regarding priorities and OpenCL: At least in the current OpenCL version 1.1 there is no mention of them. Possible that AMD will provide it as a non-standard extension, but otherwise the host processes need to do that scheduling. This is a little bit like I/O: Two processes writing data to the same disk will get the writes done independent on the process priorities. Context switches: They happen only on the CPU. A running GPU kernel will not be interrupted when its host process gets inactivated or even paged out. I know of no way to interrupt a running OpenCL kernel (maybe it is possible on CUDA or lower-level GPU access methods). As the GPU runs the kernels one after the other anyway, context switches should not be a big problem here. |
|
|
|
|
|
|
#539 |
|
Dec 2010
Monticello
70316 Posts |
Context Switches, in the classical sense, only happen on the CPU. But I need some word to describe what it is that happens when we switch between the CUDALucas kernel and data and the mfaktc kernel and data on the GPU. Whatever this mechanism is called, it becomes more important when mfaktc adds a second kernel to the GPU for sieving.
What my multicore machine is doing right now is running both CUDALucas and mfaktc. Prime95 runs on the other 3 cores. The mfaktc performance was definitely bound by the fact that it could only sieve so quickly on the one core; I estimated it was using 40% of the GPU's power. Timing CUDAlucas with my watch, I estimated 300 hours to finish a 25M LL test. mfaktc seemed to speed up as I killed off the stuff I'd had to do to get the cufft dlls installed from nvidia. I'd like CUDAlucas to replace the CPU I have lost to mfaktc. The GPU card is an unmarked GTX480, Nvidia seems to be handing them out to the right people, such as First Robotics teams at competitions, to whet the appetite of the future market. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
| CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
| CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
| settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
| Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |