mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-01-08, 00:12   #2047
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default

Quote:
Originally Posted by kracker View Post
I see. So it's not really a literal "max" it's just "over this, gains almost useless unless.."
Well... sort of. Like I said, the "gains almost useless" point is (quite a bit) lower than the max he mentions; the max refers to the memory required to process the "standard" amount of relative primes in one pass (where "standard" is a hand-waving over-simplification).
Dubslow is offline   Reply With Quote
Old 2013-01-08, 00:46   #2048
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

2×23×179 Posts
Default

Quote:
So it's not really a literal "max" it's just "over this, gains almost useless unless.."
http://www.mersenneforum.org/showpos...5&postcount=10
Xyzzy is offline   Reply With Quote
Old 2013-01-08, 00:56   #2049
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

65358 Posts
Default

Quote:
Originally Posted by kracker View Post
I see. So it's not really a literal "max" it's just "over this, gains almost useless unless.."
For any given bounds, there is a certain amount of RAM required to run a given number of relative primes at once. Normally Prime95 runs several passes with as many RPs as it has RAM for at once, to complete a full set of 480 relative primes. I don't believe Prime95 will let you run P-1 if you don't have enough RAM to run at least 8 RPs at once (hence the "minimum" value). Each pass has some (small) overhead, so fewer passes means a bit (slightly) faster. The "maximum" value represents running all 480 RPs in one pass. Under certain semi-rare conditions, Prime95 will select a number of relative primes other than 480, but that's the "normal" value.

However, the P-1 bounds are partially selected based on the amount of RAM available, so a machine with 512MB allocated and another with 20GB allocated won't pick the same bounds for the same exponent. The one with more RAM will pick higher bounds, run a little slower, but have a higher chance of a factor. If they were forced to use the same bounds, the more-RAM machine would run the assignment slightly faster.
James Heinrich is offline   Reply With Quote
Old 2013-01-08, 01:40   #2050
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
run a little slower, but have a higher chance of a factor.
...the end result being that you get more factors per cpu time.
Dubslow is offline   Reply With Quote
Old 2013-01-08, 06:59   #2051
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

95B16 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
I've set up my CUDA environment, but I get the following errors when I try to compile mfaktc 0.19: [see attachment]

Anyone know what I'm doing wrong?

Edit: I've changed the item type to CUDA C/C++ and the platform to VC90, and I've also installed Visual C++ 2008. However, it's still complaining of an issue with the "atomicInc" function. Anyone know how to resolve this?
OK, I've decided to try compiling mfaktc again. The error went away after I changed the code generation parameter to "compute_11,sm_11" as suggested. However, I'm getting a bunch of new errors:

Quote:
1>------ Build started: Project: mfaktc_0.20, Configuration: Debug Win32 ------
1> Compiling CUDA source file tf_96bit_base_math.cu...
1>
1> C:\Users\danny\Desktop\mfaktc-0.20\src>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_11,code=\"sm_11,compute_11\" --use-local-env --cl-version 2008 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -DWIN32 -D_DEBUG -D_WINDOWS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\tf_96bit_base_math.cu.obj" "C:\Users\danny\Desktop\mfaktc-0.20\src\tf_96bit_base_math.cu"
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(21): error : identifier "int96" is undefined
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(21): error : identifier "int96" is undefined
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(33): error : incomplete type is not allowed
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(33): error : identifier "int96" is undefined
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(33): error : identifier "a" is undefined
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(35): error : expected a ";"
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(170): warning : parsing restarts here after previous syntax error
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(194): error : expected a declaration
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(281): warning : parsing restarts here after previous syntax error
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(304): error : expected a declaration
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(342): warning : parsing restarts here after previous syntax error
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(384): error : expected a declaration
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(418): warning : parsing restarts here after previous syntax error
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(453): error : expected a declaration
1>C:/Users/danny/Desktop/mfaktc-0.20/src/tf_96bit_base_math.cu(21): warning : function "cmp_ge_96" was declared but never referenced
1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\extras\visual_studio_integration\MSBuildExtensions\CUDA 5.0.targets(592,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc.exe" -gencode=arch=compute_11,code=\"sm_11,compute_11\" --use-local-env --cl-version 2008 -ccbin "c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -G --keep-dir "Debug" -maxrregcount=0 --machine 32 --compile -g -DWIN32 -D_DEBUG -D_WINDOWS -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o "Debug\tf_96bit_base_math.cu.obj" "C:\Users\danny\Desktop\mfaktc-0.20\src\tf_96bit_base_math.cu"" exited with code 2.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
Anyone know what I'm doing wrong? For the record, I'm using the CUDA 5.0 toolkit.
ixfd64 is offline   Reply With Quote
Old 2013-01-08, 07:02   #2052
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Looks like you're missing a header file.
Dubslow is offline   Reply With Quote
Old 2013-01-08, 13:45   #2053
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Now that everyone has access to v0.20, I'd like to ask for a new round of benchmarks from everyone so I can update my GPU-TF benchmark page.
Thanks for the 10 benchmarks I've received so far. Unfortunately they've all been in the GTX 5xx series (550, 560, 570, 580). I'd be very interested in benchmarks from people with 400- and 600-series cards, please.
James Heinrich is offline   Reply With Quote
Old 2013-01-08, 15:23   #2054
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default

Hi,

Quote:
Originally Posted by ixfd64 View Post
OK, I've decided to try compiling mfaktc again. The error went away after I changed the code generation parameter to "compute_11,sm_11" as suggested. However, I'm getting a bunch of new errors:



Anyone know what I'm doing wrong? For the record, I'm using the CUDA 5.0 toolkit.
you should enable code generation for newer GPU types, too. This code will run faster on those cards.
The Problem is that you try to compile a file which is just part of another file, so you can't compile tf_96bit_base_math.cu standalone. This file shares some common code which is used by other .cu files (tf_96bit.cu, tf_barrett96.cu and tf_barrett96_gs.cu). Take a look at the makefile and you get those dependencies. I'm not using the Microsoft IDE so I have no project file for you. I'm using GNU Make on Windows, too.

Oliver

P.S. I plan to upgrade my Windows to CUDA 5.0 within the next few days so I can provide CUDA 5.0 executables, too.

Last fiddled with by TheJudger on 2013-01-08 at 15:24
TheJudger is offline   Reply With Quote
Old 2013-01-08, 20:19   #2055
TObject
 
TObject's Avatar
 
Feb 2012

19516 Posts
Default

Quote:
Originally Posted by TheJudger View Post
As usual: finish your current assignment and upgrade to mfaktc 0.20 after that.
It will take me a couple of months to complete some of my longer running assignments. So I would like to check again, is fiddling with checkpoint files strongly discouraged?

Thank you
TObject is offline   Reply With Quote
Old 2013-01-08, 20:45   #2056
TObject
 
TObject's Avatar
 
Feb 2012

34·5 Posts
Thumbs up

BTW, I just tried running 0.19 and 0.20 side-by-side, – no problem; everything appears operational (although at a slight [about 5% at a first glance] loss to the overall efficiency).

The 0.20 is insanely fast.

Thank you very much.
TObject is offline   Reply With Quote
Old 2013-01-09, 07:59   #2057
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

100101101101112 Posts
Default

Quote:
Originally Posted by TObject View Post
The 0.20 is insanely fast.
Well, not really (LaurV grumpy now! )

Everybody seems to miss the fact that on the old 0.18 you had to run more instances to max the GPU. Comparing the new one with the old one "side by side" is like comparing plums with mangoes: when they are green they look exactly the same except the size. Mangoes are 4 times bigger.

With the old version I was able to get 340-360 GHzDays/day from a single card, running 3 or 4 instances, in 3 (non HT) or respective 2 (HT) overclocked CPU cores.

With the new one I am able to get 390-410 GHzDays/Day for the same exponent range and the same bit levels, with NO CPU participation.

You can not get from a card more that it can give, beside small optimizations. Mfaktc is now a (brilliant) mature product, maybe small future optimizations will make it a bit better and a bit faster, but you won't expect from the future versions to be 100 times faster. Or 10 times faster. Or 3 times faster either! Which I did not expect from 0.20, of course. It is just using the card better, for a small surplus of speed.

Of course, if you max the card (like 97-100% busy) with a single instance, than such run would be "insanely fast", theoretically 2-3-4 times faster then the old version for one instance, same as using more cores in P95 to LL/DC the same exponent, the time per iteration halves, or is 3-4 times shorter (and the LL test faster) depending of how many cores you use.

Put 3-4 instances of the new mfaktc on the same card, and you will see that the times are comparable. The old one was losing time with CPU/GPU communication, which is "solved" by GPU sieving in the new version. That is where the "additional" speed come (plus other small things ).

The biggest advantage of the new version (as I repeatedly said in the past when we were talking of what I want, and what we should expect with the newer versions), is that IT LETS YOUR CPU FREE, beside of the fact that is "a little bit" faster .

For me, this (letting the CPU free) is the manna from the heaven! (as everybody knows, my systems are all CPU-bottle-necked). Now I can run P-1, or LL, or DC or aliquots, with the CPU, which before I could not. THIS IS THE BIG ADVANTAGE. For which I bow again to the people who made this possible.

Last fiddled with by LaurV on 2013-01-09 at 08:18
LaurV is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 09:56.


Mon Aug 2 09:56:36 UTC 2021 up 10 days, 4:25, 0 users, load averages: 1.50, 1.37, 1.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.