mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-06-02, 07:31   #254
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Hi David,

Quote:
Originally Posted by henryzz View Post
Just the commandline parsing doesn't work above MM31.
OK, only a commandline issue, not a problem of the GPU kernel.
Seems that it affects only the Windows versions. Kevin (kjaget) allready sent me some comments about it.

Oliver
TheJudger is offline   Reply With Quote
Old 2010-06-02, 07:40   #255
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default

Quote:
Originally Posted by kjaget View Post
It was compiled with THREADS_PER_BLOCK at 256 (the params.h file was unchanged). I missed seeing the change to include --maxrregcount=16 in the build script so did not compile using that option.

Will that combination cause problems on older GPUs, or will it only happen if THREAD_PER_BLOCK is 512 and the nvcc option isn't included?

Sorry about the confusion - hopefully I managed to luck out and not cause any problems here. Maybe it would be a good idea for me to build a self-test version and distribute that as well just to make sure everything is working?
Only both together triggers the problem. Usually there should be no need to increase THREADS_PER_BLOCK above 256. If there are enough free resources available there will be multiple blocks running at the same time (up to 3 or 4 blocks for compute capability 1.x).

Quote:
Originally Posted by kjaget View Post
More info. When building the .cu file, I see the message :

Code:
nvcc -m64 -O2 -c tf_72bit.cu --ptxas-options=-v -ccbin="C:\Program Files *x86)\Microsoft Visual Studio 9.0\VC\bin" -DWIN64 -Xcompiler /EHsc,W3,/nologo,/Ox,/GL tf_72bit.cu

tmpxft_00000588_00000000-3_tf_72bit.cudafe1.gpu
tmpxft_00000588_00000000-8_tf_72bit.cudafe2.gpu
ptxas info    : Compiling entry function '_Z5mfaktj5int72Pji6int144S0_'
ptxas info    : Used 16 registers, 80+72 bytes smem, 48 bytes cmem[1]
tmpxft_00000588_00000000-3_tf_72bit.cudafe1.cpp
tmpxft_00000588_00000000-13_tf_72bit.ii
I'm hoping the bolded section means that the exe I built is OK, since it didn't use more than 16 registers even though I didn't specify a limit on the command line.
Yes, right! The current 71bit kernel needs only 16 registers by default. An older variant used more. The new 75/95 bit kernel need a little bit more. To be on the save side I recomment to add the --maxrregcount=16 option anyway. On the older 71bit kernel this gave a perfomance improvement on my old 8400/8600 GPUs because this increases the occupancy (running 2 blocks instead of one at the same time).

Does -O2 for nvcc give you a performance improvement? I had some issues with -O2 for nvcc on Linux/CUDA 2.3 while it didn't increase the performace. It just screwed up the code.

Oliver
TheJudger is offline   Reply With Quote
Old 2010-06-02, 11:22   #256
kjaget
 
kjaget's Avatar
 
Jun 2005

3·43 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi David,



OK, only a commandline issue, not a problem of the GPU kernel.
Seems that it affects only the Windows versions. Kevin (kjaget) allready sent me some comments about it.

Oliver
Speaking of which, here's an update to fix that problem. As Oliver mentioned, the only change here is to fix exponents greater than 2.1 billion on the command line using -tf. I tested it using 3321928097 using -tf and it worked with no problems.

As always, report and problems here.

mfaktc-0.07p1-win64.zip
kjaget is offline   Reply With Quote
Old 2010-06-02, 11:48   #257
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Maybe you could include the same kernel compiled two different ways, then choose which one to use based on the number of GPU registers as reported by the Nvidia driver.
jasonp is offline   Reply With Quote
Old 2010-06-02, 12:54   #258
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

21278 Posts
Default

Hi,

Quote:
Originally Posted by kjaget View Post
Speaking of which, here's an update to fix that problem. As Oliver mentioned, the only change here is to fix exponents greater than 2.1 billion on the command line using -tf. I tested it using 3321928097 using -tf and it worked with no problems.

As always, report and problems here.

Attachment 5269
JFYI: it was my fault, not Kevins. The bug was in my code but it occurs only on the windows binary. The problem was signed vs. unsigned on the commandline parsing of the exponent.
TheJudger is offline   Reply With Quote
Old 2010-06-02, 15:36   #259
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

7·292 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi,



JFYI: it was my fault, not Kevins. The bug was in my code but it occurs only on the windows binary. The problem was signed vs. unsigned on the commandline parsing of the exponent.
Does that mean the limit on exponents from the command-line is now 2^32-1?
henryzz is online now   Reply With Quote
Old 2010-06-02, 15:47   #260
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Hi David,

Quote:
Originally Posted by henryzz View Post
Does that mean the limit on exponents from the command-line is now 2^32-1?
Yes, but this limit is not specific to the command line.
The limit for exponents is 2^32 -1 in all cases by design.

Oliver
TheJudger is offline   Reply With Quote
Old 2010-06-03, 16:17   #261
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

16FF16 Posts
Default

Currently at OBD all the available assignments are taking numbers on from 75 bits or more. Based on testing upto 70 bits 75-76 will take me ~8.4 hours. I can't often guarantee that my pc will be running that long at once but I would like to help out a bit. Is there any chance of making partial bit levels available or having some sort of saving feature.
henryzz is online now   Reply With Quote
Old 2010-06-03, 16:27   #262
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

481910 Posts
Default

Quote:
Originally Posted by henryzz View Post
Currently at OBD all the available assignments are taking numbers on from 75 bits or more. Based on testing upto 70 bits 75-76 will take me ~8.4 hours. I can't often guarantee that my pc will be running that long at once but I would like to help out a bit. Is there any chance of making partial bit levels available or having some sort of saving feature.
AFAIK, there is some resume capability coming on a release next you...


Luigi
ET_ is offline   Reply With Quote
Old 2010-06-03, 21:47   #263
wblipp
 
wblipp's Avatar
 
"William"
May 2003
New Haven

2·7·132 Posts
Default

Is it possible to make this work for bases other than 2? It would be nice for 881^11192861-1 and similar problems here:

http://oddperfect.org/FermatQuotients3.html

William
wblipp is offline   Reply With Quote
Old 2010-06-04, 19:26   #264
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Hi William,

Quote:
Originally Posted by wblipp View Post
Is it possible to make this work for bases other than 2? It would be nice for 881^11192861-1 and similar problems here:

http://oddperfect.org/FermatQuotients3.html

William
at least this won't be an easy task.
Sorry, currently not on my todo list.

Oliver
TheJudger is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 20:46.


Fri Aug 6 20:46:21 UTC 2021 up 14 days, 15:15, 1 user, load averages: 2.36, 2.45, 2.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.