mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

rysk 2011-10-15 04:04

So I thought I had mfaktc going correctly, but the times are extremely slow??? Can anyone shed some light on what I should do? Or is it just my video card? I'm using the win64 files from [url]http://www.mersenneforum.org/mfaktc/[/url] running win7 home premium with an i7 860 (2.8Ghz) processor and 8 G's of ram and here are my results....


[CODE]
C:\mfaktc>mfaktc-win-64.exe
mfaktc v0.17-Win (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 3
CPUStreams 3
GridSize 3
WorkFile worktodo.txt
Checkpoints enabled
Stages enabled
StopAfterFactor bitlevel
PrintMode full
AllowSleep no

CUDA device info
name GeForce GT 220
compute capability 1.2
maximum threads per block 512
number of multiprocessors 6 (48 shader cores)
clock rate 1335MHz

CUDA version info
binary compiled for CUDA 3.20
CUDA driver version 4.0
CUDA runtime version 3.20

Automatic parameters
threads per grid 786432

running a simple selftest...
Selftest statistics
number of tests 31
successfull tests 31

selftest PASSED!

got assignment: exp=59193781 bit_min=70 bit_max=71
tf(59193781, 70, 71, ...);
k_min = 9972260602920
k_max = 19944521211061
Using GPU kernel "71bit_mul24"

found a valid checkpoint file!
last finished class was: 2891
found 0 factor(s) already

class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait
2895/4620 | 463.99M | 31.665s | 14.65M/s | 25000 | 3h08m | 44549us
2900/4620 | 459.28M | 31.348s | 14.65M/s | 28125 | 3h06m | 44115us
2903/4620 | 454.56M | 31.027s | 14.65M/s | 31640 | 3h04m | 43606us
2904/4620 | 450.63M | 30.653s | 14.70M/s | 35595 | 3h01m | 42871us
2915/4620 | 445.91M | 30.108s | 14.81M/s | 40044 | 2h57m | 41872us
2916/4620 | 441.19M | 29.793s | 14.81M/s | 45049 | 2h55m | 41221us
2919/4620 | 437.26M | 29.527s | 14.81M/s | 50680 | 2h53m | 40526us
2924/4620 | 433.32M | 29.264s | 14.81M/s | 57015 | 2h51m | 39789us
2928/4620 | 428.61M | 28.947s | 14.81M/s | 64141 | 2h48m | 39010us
2931/4620 | 424.67M | 28.684s | 14.81M/s | 72158 | 2h46m | 38187us
2936/4620 | 420.74M | 28.419s | 14.80M/s | 81177 | 2h44m | 37308us
2939/4620 | 416.81M | 28.177s | 14.79M/s | 91324 | 2h42m | 36416us
2943/4620 | 413.66M | 28.000s | 14.77M/s | 102739 | 2h41m | 35478us
2960/4620 | 409.73M | 27.807s | 14.73M/s | 115581 | 2h39m | 34563us
2963/4620 | 405.80M | 27.530s | 14.74M/s | 130028 | 2h37m | 33433us
2964/4620 | 402.65M | 27.282s | 14.76M/s | 146281 | 2h35m | 32178us
2975/4620 | 398.72M | 26.983s | 14.78M/s | 164566 | 2h33m | 30871us
2979/4620 | 395.58M | 26.856s | 14.73M/s | 185136 | 2h32m | 29675us
2988/4620 | 393.22M | 26.663s | 14.75M/s | 200000 | 2h31m | 28695us
2991/4620 | 393.22M | 26.580s | 14.79M/s | 200000 | 2h30m | 28547us
class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait
2996/4620 | 393.22M | 26.581s | 14.79M/s | 200000 | 2h29m | 28537us
2999/4620 | 393.22M | 26.640s | 14.76M/s | 200000 | 2h29m | 28644us
3000/4620 | 393.22M | 26.856s | 14.64M/s | 200000 | 2h30m | 29019us
3003/4620 | 393.22M | 26.632s | 14.76M/s | 200000 | 2h28m | 28620us
3008/4620 | 393.22M | 26.610s | 14.78M/s | 200000 | 2h28m | 28575us
3015/4620 | 393.22M | 26.662s | 14.75M/s | 200000 | 2h27m | 28676us
3020/4620 | 393.22M | 26.694s | 14.73M/s | 200000 | 2h27m | 28751us[/CODE]

delta_t 2011-10-15 05:47

From the looks of it you have it set up and running properly, it's just your video card is slow. You may be able to get a little more performance if you decrease the GridSize.

Dubslow 2011-10-15 06:35

Well, you're using a GT 220, that's a near-the-bottom-of-its-generation from 2 generations ago, while the CPU is top-of-the-line 1 generation old. The avg. wait indicates how long the CPU is waiting for work. If it's greater than 1000, than the CPU is waiting a lot, which means the GPU is overwhelmed. Sieve Primes controls how much work is done on the CPU; that's why the program auto-adjusted that up to 200,000 (the default is 25,000, and 5,000 is the minimum). Given the very nice CPU and much slower GPU, I'd say these numbers are to be expected. I don't think there's much you can do besides buying new hardware, sorry :P. Other forum members, does this analysis seem correct?

Having reread what delta_t said, his advice seems the best. (I'm not as familiar with GridSize as other parameters, so if he says so, go for it.)

rysk 2011-10-15 12:52

thanks guys. I'll play around with it. Maybe I'll try to pick up a new card. :grin:

bcp19 2011-10-15 19:27

I downloaded the 32 bit version of CudaLucas and unzipped it, but when I try to run it I get an error "cufft32_32_16.dll not found". I have tried searching both the forum and the web for this but have had no success. Can anyone point me in the right direction?

Christenson 2011-10-15 19:35

@Dubslow: Good...
@BCP19: I get that file by installing the developer's kit from nvidia...it may be posted here somewhere, too.

bcp19 2011-10-15 20:24

I installed the Cuda Toolkit 4.0 and still get the same error :/

kar_bon 2011-10-15 21:00

[QUOTE=bcp19;274652]I downloaded the 32 bit version of CudaLucas and unzipped it, but when I try to run it I get an error "cufft32_32_16.dll not found". I have tried searching both the forum and the web for this but have had no success. Can anyone point me in the right direction?[/QUOTE]

The zip from [url=http://www.mersenneforum.org/showpost.php?p=273900&postcount=363]this post[/url] contains "cudart32_32_16.dll". Perhaps helps.

Dubslow 2011-10-15 23:20

Nope, cudart and cudafft are different. There are various 64 bit cudafft files floating around the forum, but I haven't yet found a 32 bit. I'll keep looking.

Edit: I'm pretty sure the file you need is in the download you installed, but CUDALucas won't find it unless it's in the same directory. Do you remember "where" you installed CUDA 4.0, or does somebody else know where it installs? Also could a mod move this to the CUDALucas thread?

kladner 2011-10-16 01:54

"Nope, cudart and cudafft are different"

That's "cufft", in various permutations of "cufft_##_##_##.dll", right?

I need a 32bit version of that.

Christenson 2011-10-16 02:27

Right...you need cufft_32_xx_xx.dll. You can find it in the place where nvidia installed the developer's kit. Just go to the root of the installation and search for files named cufft...IIRC, it installs either in C:\nvidia or C:\program files\nvidia....the simple thing to do is to copy it into the directory with CUDALucas, the more complicated way is to ensure that that binary directory is on your path when you call CUDALucas....either with a set command in your batch file, or by adding it to the global PATH environment variable, or possibly be the settings for your desktop icon.


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.