![]() |
|
|
#243 |
|
Jun 2005
8116 Posts |
Thanks Luigi, glad to see that it at least it works for 1 other person.
Henry - we'll need more detail on your system and what isn't working. I don't know of any specific limits based on GPU type, but I'm just building the code that Oliver's written so he might have a better idea. |
|
|
|
|
|
#244 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61·79 Posts |
Quote:
If you are running under 32bit, there could be a bug in the parsing routine that I developed... - What kind of error do you get? - Does the executable ever start? - Does the CUDA-related printout show? - Does the line tf(exponent, bit_min, bit_max) correctly show? Luigi |
|
|
|
|
|
|
#245 | |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hi David,
Quote:
As long as you GPU has compute capability >= 1.1 it should work. Exponents must be < 2^32 (not depending on GPU!) --- Kevin: - did you increase THREADS_PER_BLOCK to 512? (from Luigis output I think you did) - did you compile the code without '--maxrregcount=16' If this is the case: throw away the current windows binary and ignore all "no factor" results from this binary. ![]() Both together is a bad idea, older GPUs will run out of registers and do only half of the work! It seems to run twice as fast on those GPUs but actually it does only half of the dataset... This will work fine on GT200 but not on older GPUs. I recommend THREADS_PER_BLOCK = 256 and compile with '--maxrregcount=16'! Oliver Last fiddled with by TheJudger on 2010-06-01 at 16:20 |
|
|
|
|
|
|
#246 |
|
Banned
"Luigi"
Aug 2002
Team Italia
61×79 Posts |
I had just benchmarked this executable against Prime95 on exponent 130631869, 63-64 bits , getting 241" on mfaktc and 493" on Prime95_64 25.11
Luigi |
|
|
|
|
|
#247 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61·79 Posts |
Quote:
THREADS_PER_BLOCK =256 in both 0.06 and 0.07. Code:
C:\Users\adm\Documents\luigi\mfaktc>mfaktc-hack-64.exe 3321928097 1 7 mfaktc v0.06 Compiletime Options THREADS_PER_GRID 983040 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled Runtime Options SievePrimes 55000 SievePrimesAdjust 1 WARNING: Cannot read CudaStreams from mfaktc.ini, using default value CudaStreams 2 CUDA device info name: GeForce 9500M GS compute capabilities: 1.1 maximum threads per block: 512 number of multiprocessors: 4 (32 shader cores) clock rate: 950MHz tf(3321928097, 1, 71); k_min = 0 k_max = 355393490239 Last fiddled with by ET_ on 2010-06-01 at 16:33 |
|
|
|
|
|
|
#248 |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
|
|
|
|
|
|
#249 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·33·109 Posts |
Win-64 using kjaget's binary
Code:
mfaktc v0.07 Compiletime Options THREADS_PER_GRID 983040 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled Runtime Options WARNING: Read SievePrimes=250000 from mfaktc.ini, using max value (100000) SievePrimes 100000 SievePrimesAdjust 1 NumStreams 5 WorkFile worktodo.txt CUDA device info name: GeForce 8600 GTS compute capabilities: 1.1 maximum threads per block: 512 number of multiprocessors: 4 (32 shader cores) clock rate: 1450MHz After more tests I have discovered anything more than M31 it fails to parse. It crashes whenever doing something from the worktodo.txt I am sure i used to use higher SievePrimes a while back since my graphics card is so slow. Why has the limit changed so low? Also I would like to test exponents<1Mil so if the next binary to be posted could have that limit removed I will test that. I only know a 6 digit prime from memory and would like to be able to use it for tests without having to lookup a prime.
|
|
|
|
|
|
#250 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61·79 Posts |
Quote:
Code:
Factor=bla,3321928097,1 69 Luigi |
|
|
|
|
|
|
#251 | |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×33×109 Posts |
Quote:
I also had to add the highlighted comma to your post. ![]() Even the large exponent worked that way. Just the commandline parsing doesn't work above MM31. |
|
|
|
|
|
|
#252 | |
|
Jun 2005
12910 Posts |
Quote:
Will that combination cause problems on older GPUs, or will it only happen if THREAD_PER_BLOCK is 512 and the nvcc option isn't included? Sorry about the confusion - hopefully I managed to luck out and not cause any problems here. Maybe it would be a good idea for me to build a self-test version and distribute that as well just to make sure everything is working? Last fiddled with by kjaget on 2010-06-01 at 19:40 |
|
|
|
|
|
|
#253 |
|
Jun 2005
8116 Posts |
More info. When building the .cu file, I see the message :
Code:
nvcc -m64 -O2 -c tf_72bit.cu --ptxas-options=-v -ccbin="C:\Program Files *x86)\Microsoft Visual Studio 9.0\VC\bin" -DWIN64 -Xcompiler /EHsc,W3,/nologo,/Ox,/GL tf_72bit.cu tmpxft_00000588_00000000-3_tf_72bit.cudafe1.gpu tmpxft_00000588_00000000-8_tf_72bit.cudafe2.gpu ptxas info : Compiling entry function '_Z5mfaktj5int72Pji6int144S0_' ptxas info : Used 16 registers, 80+72 bytes smem, 48 bytes cmem[1] tmpxft_00000588_00000000-3_tf_72bit.cudafe1.cpp tmpxft_00000588_00000000-13_tf_72bit.ii |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |