mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

Dubslow 2013-01-08 00:12

[QUOTE=kracker;323986]I see. So it's not really a literal "max" it's just "over this, gains almost useless unless.."[/QUOTE]

Well... sort of. :smile: Like I said, the "gains almost useless" point is (quite a bit) lower than the max he mentions; the max refers to the memory required to process the "standard" amount of relative primes in one pass (where "standard" is a hand-waving over-simplification).

Xyzzy 2013-01-08 00:46

[QUOTE]So it's not really a literal "max" it's just "over this, gains almost useless unless.."[/QUOTE][url]http://www.mersenneforum.org/showpost.php?p=282335&postcount=10[/url]

James Heinrich 2013-01-08 00:56

[QUOTE=kracker;323986]I see. So it's not really a literal "max" it's just "over this, gains almost useless unless.."[/QUOTE]For any given bounds, there is a certain amount of RAM required to run a given number of relative primes at once. Normally Prime95 runs several passes with as many RPs as it has RAM for at once, to complete a full set of 480 relative primes. I don't believe Prime95 will let you run P-1 if you don't have enough RAM to run at least 8 RPs at once (hence the "minimum" value). Each pass has some (small) overhead, so fewer passes means a bit (slightly) faster. The "maximum" value represents running all 480 RPs in one pass. Under certain semi-rare conditions, Prime95 will select a number of relative primes other than 480, but that's the "normal" value.

However, the P-1 bounds are partially selected based on the amount of RAM available, so a machine with 512MB allocated and another with 20GB allocated won't pick the same bounds for the same exponent. The one with more RAM will pick higher bounds, run a little [i]slower[/i], but have a higher chance of a factor. If they were forced to use the same bounds, the more-RAM machine would run the assignment slightly faster.

Dubslow 2013-01-08 01:40

[QUOTE=James Heinrich;323994]run a little [i]slower[/i], but have a higher chance of a factor.[/QUOTE]

...the end result being that you get more factors per cpu time. :smile:

ixfd64 2013-01-08 06:59

[QUOTE=ixfd64;319724]I've set up my CUDA environment, but I get the following errors when I try to compile mfaktc 0.19: [see attachment]

Anyone know what I'm doing wrong?

Edit: I've changed the item type to CUDA C/C++ and the platform to VC90, and I've also installed Visual C++ 2008. However, it's still complaining of an issue with the "atomicInc" function. Anyone know how to resolve this?[/QUOTE]

OK, I've decided to try compiling mfaktc again. The error went away after I changed the code generation parameter to "compute_11,sm_11" as suggested. However, I'm getting a bunch of new errors:

[QUOTE]1>------ Build started: Project: mfaktc_0.20, Configuration: Debug Win32 ------
1> Compiling CUDA source file tf_96bit_base_math.cu...
1>
1> C:\Users\danny\Desktop\mfaktc-0.20\src>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_11,code=\"sm_11,compute_11\" --use-local-env --cl-version 2008 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -DWIN32 -D_DEBUG -D_WINDOWS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\tf_96bit_base_math.cu.obj" "C:\Users\danny\Desktop\mfaktc-0.20\src\tf_96bit_base_math.cu"
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(21): error : identifier "int96" is undefined
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(21): error : identifier "int96" is undefined
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(33): error : incomplete type is not allowed
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(33): error : identifier "int96" is undefined
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(33): error : identifier "a" is undefined
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(35): error : expected a ";"
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(170): warning : parsing restarts here after previous syntax error
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(194): error : expected a declaration
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(281): warning : parsing restarts here after previous syntax error
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(304): error : expected a declaration
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(342): warning : parsing restarts here after previous syntax error
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(384): error : expected a declaration
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(418): warning : parsing restarts here after previous syntax error
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(453): error : expected a declaration
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(21): warning : function "cmp_ge_96" was declared but never referenced
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\extras\visual_studio_integration\MSBuildExtensions\CUDA 5.0.targets(592,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_11,code=\"sm_11,compute_11\" --use-local-env --cl-version 2008 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -DWIN32 -D_DEBUG -D_WINDOWS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\tf_96bit_base_math.cu.obj" "C:\Users\danny\Desktop\mfaktc-0.20\src\tf_96bit_base_math.cu"" exited with code 2.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========[/QUOTE]

Anyone know what I'm doing wrong? For the record, I'm using the CUDA 5.0 toolkit.

Dubslow 2013-01-08 07:02

Looks like you're missing a header file.

James Heinrich 2013-01-08 13:45

[QUOTE=James Heinrich;323926]Now that everyone has access to v0.20, I'd like to ask for a new round of benchmarks from everyone so I can update my [url=http://www.mersenne.ca/mfaktc.php#benchmark]GPU-TF benchmark page[/url].[/QUOTE]Thanks for the 10 benchmarks I've received so far. Unfortunately they've all been in the GTX 5xx series (550, 560, 570, 580). I'd be very interested in benchmarks from people with 400- and 600-series cards, please.

TheJudger 2013-01-08 15:23

Hi,

[QUOTE=ixfd64;324018]OK, I've decided to try compiling mfaktc again. The error went away after I changed the code generation parameter to "compute_11,sm_11" as suggested. However, I'm getting a bunch of new errors:



Anyone know what I'm doing wrong? For the record, I'm using the CUDA 5.0 toolkit.[/QUOTE]

you should enable code generation for newer GPU types, too. This code will run faster on those cards.
The Problem is that you try to compile a file which is just part of another file, so you can't compile tf_96bit_base_math.cu standalone. This file shares some common code which is used by other .cu files (tf_96bit.cu, tf_barrett96.cu and tf_barrett96_gs.cu). Take a look at the makefile and you get those dependencies. I'm not using the Microsoft IDE so I have no project file for you. I'm using GNU Make on Windows, too.

Oliver

P.S. I plan to upgrade my Windows to CUDA 5.0 within the next few days so I can provide CUDA 5.0 executables, too.

TObject 2013-01-08 20:19

[QUOTE=TheJudger;323840]As usual: finish your current assignment and upgrade to mfaktc 0.20 after that.[/QUOTE]

It will take me a couple of months to complete some of my longer running assignments. So I would like to check again, is fiddling with checkpoint files strongly discouraged?

Thank you

TObject 2013-01-08 20:45

BTW, I just tried running 0.19 and 0.20 side-by-side, – no problem; everything appears operational (although at a slight [about 5% at a first glance] loss to the overall efficiency).

The 0.20 is insanely fast.

Thank you very much.

LaurV 2013-01-09 07:59

[QUOTE=TObject;324071]The 0.20 is insanely fast.[/QUOTE]

Well, not really (LaurV grumpy now! :razz:)

Everybody seems to miss the fact that on the old 0.18 [B]you had to run more instances to max the GPU[/B]. Comparing the new one with the old one "side by side" is like comparing plums with mangoes: when they are green they look exactly the same except the size. Mangoes are 4 times bigger.

With the old version I was able to get 340-360 GHzDays/day from a single card, running 3 or 4 instances, in 3 (non HT) or respective 2 (HT) overclocked CPU cores.

With the new one I am able to get 390-410 GHzDays/Day for the same exponent range and the same bit levels, with NO CPU participation.

You can not get from a card more that it can give, beside small optimizations. Mfaktc is now a (brilliant) mature product, maybe small future optimizations will make it a bit better and a bit faster, but you won't expect from the future versions to be 100 times faster. Or 10 times faster. Or 3 times faster either! Which I did not expect from 0.20, of course. It is just using the card better, for a small surplus of speed.

Of course, if you max the card (like 97-100% busy) with a single instance, than such run would be "insanely fast", theoretically 2-3-4 times faster then the old version for one instance, same as using more cores in P95 to LL/DC the same exponent, the time per iteration halves, or is 3-4 times shorter (and the LL test faster) depending of how many cores you use.

Put 3-4 instances of the new mfaktc on the same card, and you will see that the times are comparable. The old one was losing time with CPU/GPU communication, which is "solved" by GPU sieving in the new version. That is where the "additional" speed come (plus other small things :razz:).

[B]The biggest advantage[/B] of the new version (as I repeatedly said in the past when we were talking of what I want, and what we should expect with the newer versions), is that [B]IT LETS YOUR CPU FREE[/B], beside of the fact that is "a little bit" faster :razz:.

For me, this (letting the CPU free) is the [URL="http://en.wikipedia.org/wiki/Manna"]manna from the heaven[/URL]! (as everybody knows, my systems are all CPU-bottle-necked). Now I can run P-1, or LL, or DC or aliquots, with the CPU, which before I could not. THIS IS THE BIG ADVANTAGE. For which I bow again to the people who made this possible. :bow:


All times are UTC. The time now is 23:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.