![]() |
Thanks Luigi, glad to see that it at least it works for 1 other person.
Henry - we'll need more detail on your system and what isn't working. I don't know of any specific limits based on GPU type, but I'm just building the code that Oliver's written so he might have a better idea. |
[QUOTE=henryzz;216929]Whats the exponent limit on old cards(8600 GTS)? This binary didn't work with billion digit exponents although it did work with M165150761.[/QUOTE]
Are you running on 32bit or 64bit Windows? If you are running under 32bit, there could be a bug in the parsing routine that I developed... - What kind of error do you get? - Does the executable ever start? - Does the CUDA-related printout show? - Does the line tf(exponent, bit_min, bit_max) correctly show? Luigi |
Hi David,
[QUOTE=henryzz;216929]Whats the exponent limit on old cards(8600 GTS)? This binary didn't work with billion digit exponents although it did work with M165150761.[/QUOTE] a bit more specific, please! As long as you GPU has compute capability >= 1.1 it should work. Exponents must be < 2^32 (not depending on GPU!) --- Kevin: - did you increase THREADS_PER_BLOCK to 512? (from Luigis output I think you did) - did you compile the code without '--maxrregcount=16' [B]If this is the case: throw away the current windows binary and ignore all "no factor" results from this binary.[/B] :sad: Both together is a bad idea, older GPUs will run out of registers and do only half of the work! It seems to run twice as fast on those GPUs but actually it does only half of the dataset... This will work fine on GT200 but not on older GPUs. I recommend THREADS_PER_BLOCK = 256 and compile with '--maxrregcount=16'! Oliver |
I had just benchmarked this executable against Prime95 on exponent 130631869, 63-64 bits , getting 241" on mfaktc and 493" on Prime95_64 25.11
Luigi |
[QUOTE=TheJudger;216942]Hi David,
a bit more specific, please! As long as you GPU has compute capability >= 1.1 it should work. Exponents must be < 2^32 (not depending on GPU!) --- Kevin: - did you increase THREADS_PER_BLOCK to 512? (from Luigis output I think you did) - did you compile the code without '--maxrregcount=16' [B]If this is the case: throw away the current windows binary and ignore all "no factor" results from this binary.[/B] :sad: Both together is a bad idea, older GPUs will run out of registers and do only half of the work! It seems to run twice as fast on those GPUs but actually it does only half of the dataset... This will work fine on GT200 but not on older GPUs. I recommend THREADS_PER_BLOCK = 256 and compile with '--maxrregcount=16'! Oliver[/QUOTE] Note that also previous Windows executable had [COLOR="Red"]maximum[/COLOR] threads per block = 512... THREADS_PER_BLOCK =256 in both 0.06 and 0.07. [code] C:\Users\adm\Documents\luigi\mfaktc>mfaktc-hack-64.exe 3321928097 1 7 mfaktc v0.06 Compiletime Options THREADS_PER_GRID 983040 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits USE_PINNED_MEMORY enabled USE_ASYNC_COPY enabled VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled Runtime Options SievePrimes 55000 SievePrimesAdjust 1 WARNING: Cannot read CudaStreams from mfaktc.ini, using default value CudaStreams 2 CUDA device info name: GeForce 9500M GS compute capabilities: 1.1 maximum threads per block: 512 number of multiprocessors: 4 (32 shader cores) clock rate: 950MHz tf(3321928097, 1, 71); k_min = 0 k_max = 355393490239 [/code] Luigi |
[QUOTE=ET_;216949]Note that also previous Windows executable had [COLOR="Red"]maximum[/COLOR] threads per block = 512...
THREADS_PER_BLOCK =256 in both 0.06 and 0.07. Luigi[/QUOTE] OK, my fault. Lets wait for David and Kevins informations. Oliver |
Win-64 using kjaget's binary
[code]mfaktc v0.07 Compiletime Options THREADS_PER_GRID 983040 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled Runtime Options WARNING: Read SievePrimes=250000 from mfaktc.ini, using max value (100000) SievePrimes 100000 SievePrimesAdjust 1 NumStreams 5 WorkFile worktodo.txt CUDA device info name: GeForce 8600 GTS compute capabilities: 1.1 maximum threads per block: 512 number of multiprocessors: 4 (32 shader cores) clock rate: 1450MHz[/code] Trying a 1bil digit exponent means it fails to parse it from the commandline. After more tests I have discovered anything more than M31 it fails to parse. It crashes whenever doing something from the worktodo.txt I am sure i used to use higher SievePrimes a while back since my graphics card is so slow. Why has the limit changed so low? Also I would like to test exponents<1Mil so if the next binary to be posted could have that limit removed I will test that. I only know a 6 digit prime from memory and would like to be able to use it for tests without having to lookup a prime.:smile: |
[QUOTE=henryzz;216962]Win-64 using kjaget's binary
[code]mfaktc v0.07 Compiletime Options THREADS_PER_GRID 983040 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits VERBOSE_TIMING disabled SELFTEST disabled MORE_CLASSES disabled Runtime Options WARNING: Read SievePrimes=250000 from mfaktc.ini, using max value (100000) SievePrimes 100000 SievePrimesAdjust 1 NumStreams 5 WorkFile worktodo.txt CUDA device info name: GeForce 8600 GTS compute capabilities: 1.1 maximum threads per block: 512 number of multiprocessors: 4 (32 shader cores) clock rate: 1450MHz[/code] Trying a 1bil digit exponent means it fails to parse it from the commandline. After more tests I have discovered anything more than M31 it fails to parse. It crashes whenever doing something from the worktodo.txt I am sure i used to use higher SievePrimes a while back since my graphics card is so slow. Why has the limit changed so low? Also I would like to test exponents<1Mil so if the next binary to be posted could have that limit removed I will test that. I only know a 6 digit prime from memory and would like to be able to use it for tests without having to lookup a prime.:smile:[/QUOTE] Did you put the factor into the worktodo.txt file, like [code]Factor=bla,3321928097,1 69[/code]? Luigi |
[quote=ET_;216976]Did you put the factor into the worktodo.txt file, like
[code]Factor=bla,3321928097,1[COLOR=Red],[/COLOR]69[/code]? Luigi[/quote] [COLOR=Black]It works[/COLOR] with the extra bla, I also had to add the highlighted comma to your post.:smile: Even the large exponent worked that way. Just the commandline parsing doesn't work above MM31. |
[QUOTE=TheJudger;216942]
Kevin: - did you increase THREADS_PER_BLOCK to 512? (from Luigis output I think you did) - did you compile the code without '--maxrregcount=16'[/QUOTE] It was compiled with THREADS_PER_BLOCK at 256 (the params.h file was unchanged). I missed seeing the change to include --maxrregcount=16 in the build script so did not compile using that option. Will that combination cause problems on older GPUs, or will it only happen if THREAD_PER_BLOCK is 512 and the nvcc option isn't included? Sorry about the confusion - hopefully I managed to luck out and not cause any problems here. Maybe it would be a good idea for me to build a self-test version and distribute that as well just to make sure everything is working? |
More info. When building the .cu file, I see the message :
[CODE]nvcc -m64 -O2 -c tf_72bit.cu --ptxas-options=-v -ccbin="C:\Program Files *x86)\Microsoft Visual Studio 9.0\VC\bin" -DWIN64 -Xcompiler /EHsc,W3,/nologo,/Ox,/GL tf_72bit.cu tmpxft_00000588_00000000-3_tf_72bit.cudafe1.gpu tmpxft_00000588_00000000-8_tf_72bit.cudafe2.gpu ptxas info : Compiling entry function '_Z5mfaktj5int72Pji6int144S0_' ptxas info : [B]Used 16 registers[/B], 80+72 bytes smem, 48 bytes cmem[1] tmpxft_00000588_00000000-3_tf_72bit.cudafe1.cpp tmpxft_00000588_00000000-13_tf_72bit.ii[/CODE] I'm hoping the bolded section means that the exe I built is OK, since it didn't use more than 16 registers even though I didn't specify a limit on the command line. |
| All times are UTC. The time now is 22:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.