![]() |
Hi David,
[QUOTE=henryzz;216982]Just the commandline parsing doesn't work above MM31.[/QUOTE] OK, only a commandline issue, not a problem of the GPU kernel. :smile: Seems that it affects only the Windows versions. Kevin (kjaget) allready sent me some comments about it. :tu: Oliver |
[QUOTE=kjaget;216993]It was compiled with THREADS_PER_BLOCK at 256 (the params.h file was unchanged). I missed seeing the change to include --maxrregcount=16 in the build script so did not compile using that option.
Will that combination cause problems on older GPUs, or will it only happen if THREAD_PER_BLOCK is 512 and the nvcc option isn't included? Sorry about the confusion - hopefully I managed to luck out and not cause any problems here. Maybe it would be a good idea for me to build a self-test version and distribute that as well just to make sure everything is working?[/QUOTE] Only both together triggers the problem. Usually there should be no need to increase THREADS_PER_BLOCK above 256. If there are enough free resources available there will be multiple blocks running at the same time (up to 3 or 4 blocks for compute capability 1.x). [QUOTE=kjaget;217013]More info. When building the .cu file, I see the message : [CODE]nvcc -m64 -O2 -c tf_72bit.cu --ptxas-options=-v -ccbin="C:\Program Files *x86)\Microsoft Visual Studio 9.0\VC\bin" -DWIN64 -Xcompiler /EHsc,W3,/nologo,/Ox,/GL tf_72bit.cu tmpxft_00000588_00000000-3_tf_72bit.cudafe1.gpu tmpxft_00000588_00000000-8_tf_72bit.cudafe2.gpu ptxas info : Compiling entry function '_Z5mfaktj5int72Pji6int144S0_' ptxas info : [B]Used 16 registers[/B], 80+72 bytes smem, 48 bytes cmem[1] tmpxft_00000588_00000000-3_tf_72bit.cudafe1.cpp tmpxft_00000588_00000000-13_tf_72bit.ii[/CODE] I'm hoping the bolded section means that the exe I built is OK, since it didn't use more than 16 registers even though I didn't specify a limit on the command line.[/QUOTE] Yes, right! The current 71bit kernel needs only 16 registers by default. An older variant used more. The new 75/95 bit kernel need a little bit more. To be on the save side I recomment to add the --maxrregcount=16 option anyway. On the older 71bit kernel this gave a perfomance improvement on my old 8400/8600 GPUs because this increases the occupancy (running 2 blocks instead of one at the same time). Does -O2 for nvcc give you a performance improvement? I had some issues with -O2 for nvcc on Linux/CUDA 2.3 while it didn't increase the performace. It just screwed up the code. :sad: Oliver |
1 Attachment(s)
[QUOTE=TheJudger;217080]Hi David,
OK, only a commandline issue, not a problem of the GPU kernel. :smile: Seems that it affects only the Windows versions. Kevin (kjaget) allready sent me some comments about it. :tu: Oliver[/QUOTE] Speaking of which, here's an update to fix that problem. As Oliver mentioned, the only change here is to fix exponents greater than 2.1 billion on the command line using -tf. I tested it using 3321928097 using -tf and it worked with no problems. As always, report and problems here. [ATTACH]5269[/ATTACH] |
Maybe you could include the same kernel compiled two different ways, then choose which one to use based on the number of GPU registers as reported by the Nvidia driver.
|
Hi,
[QUOTE=kjaget;217101]Speaking of which, here's an update to fix that problem. As Oliver mentioned, the only change here is to fix exponents greater than 2.1 billion on the command line using -tf. I tested it using 3321928097 using -tf and it worked with no problems. As always, report and problems here. [ATTACH]5269[/ATTACH][/QUOTE] JFYI: it was my fault, not Kevins. The bug was in my code but it occurs only on the windows binary. The problem was signed vs. unsigned on the commandline parsing of the exponent. |
[quote=TheJudger;217110]Hi,
JFYI: it was my fault, not Kevins. The bug was in my code but it occurs only on the windows binary. The problem was signed vs. unsigned on the commandline parsing of the exponent.[/quote] Does that mean the limit on exponents from the command-line is now 2^32-1? |
Hi David,
[QUOTE=henryzz;217124]Does that mean the limit on exponents from the command-line is now 2^32-1?[/QUOTE] Yes, but this limit is not specific to the command line. The limit for exponents is 2^32 -1 in all cases by design. Oliver |
Currently at OBD all the available assignments are taking numbers on from 75 bits or more. Based on testing upto 70 bits 75-76 will take me ~8.4 hours. I can't often guarantee that my pc will be running that long at once but I would like to help out a bit. Is there any chance of making partial bit levels available or having some sort of saving feature.
|
[QUOTE=henryzz;217260]Currently at OBD all the available assignments are taking numbers on from 75 bits or more. Based on testing upto 70 bits 75-76 will take me ~8.4 hours. I can't often guarantee that my pc will be running that long at once but I would like to help out a bit. Is there any chance of making partial bit levels available or having some sort of saving feature.[/QUOTE]
AFAIK, there is some resume capability coming on a release next you... :smile: Luigi |
Is it possible to make this work for bases other than 2? It would be nice for 881^11192861-1 and similar problems here:
[url]http://oddperfect.org/FermatQuotients3.html[/url] William |
Hi William,
[QUOTE=wblipp;217304]Is it possible to make this work for bases other than 2? It would be nice for 881^11192861-1 and similar problems here: [url]http://oddperfect.org/FermatQuotients3.html[/url] William[/QUOTE] at least this won't be an easy task. Sorry, currently not on my todo list. Oliver |
| All times are UTC. The time now is 22:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.