mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-05-04, 08:16   #1
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

1010010112 Posts
Default GPU Computing Cheat Sheet (a.k.a. GPU Computing Guide)

Hi,

find here the latest version of the PDF known as "GPU Computing Cheat Sheet". It's the essence of many GPU Computing thread posts on a single piece of paper.

Current latest is: 1.04
GIMPS GPU Computing Cheat Sheet latest (pdf)

All files

Bye, Brain
Brain is offline   Reply With Quote
Old 2012-06-07, 20:00   #2
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

14B16 Posts
Post GPU Computing Cheat Sheet Update to v1.01

Changes: mfakto 0.11 integrated
Please report errors / suggestions.

Last fiddled with by Brain on 2012-08-05 at 09:54
Brain is offline   Reply With Quote
Old 2012-08-01, 19:16   #3
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default GPU Computing Cheat Sheet Update to v1.02

Quick update:
Changes: mfakto 0.12, CUDALucas 2.03

Last fiddled with by Brain on 2012-08-05 at 09:57
Brain is offline   Reply With Quote
Old 2012-08-01, 22:10   #4
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Some notes on CUDALucas 2.03: In many places, examples and instructions are for 2.01, not 2.03. All command line options are the same (except for -i which prints device info), however the work file requires pseudo-GIMPS format.
Code:
Test=<exp>
Test=<AID>,<exp>,<tf>[,<p-1>]
<AID> can be "N/A" (actually, it can be anything, AID isn't actually used anywhere in 2.03).

As of sometime before 2.00 but after 1.2, save files should be O(n) in size, where n is the fft length. A length of 1474560 (1440K, though 2.03 isn't that smart (2.04 is!)) should have a save file a bit under 1.5 MB.

As for max FFT, threads is capped at 1024, and max FFT is capped at 64K*threads, or 64M. That assumes, of course, that there is sufficient memory, that's an excellent point. I would add that if a user gets an "over specifications Grid" error (as you once did) the solution is either to increase threads or decrease FFT length (again assuming sufficient memory). (That help message is added in 2.04 as well.)

Also, thanks for the links to the .dlls. Whenever I feel like cleaning up the SourceForge files page, I'll make use of those. (LaurV was able to provide some of them, but not all.)
Dubslow is offline   Reply With Quote
Old 2012-08-02, 03:28   #5
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22·2,137 Posts
Default

This is a good sign. If this pdf file appeared, then it means we will have soon the CudaLucas 2.04 made official and all bugs solved (I am already using it since ages, I don't remember using a different version ever ), and mfaktc 0.19 made public (few selected people are already testing the betas, see the discussions in the mfaktc subforum)

This usually happens. When we have new models of our products, we do big advertizement to the old models and big discounts to get rid of them... I bought two of my laptops (in an interval of 10-11 years I changed my laptop 4 times) just before new models were launched, which in few months were better and cheaper then the price I paid for my models, already aged, hehe... I still use the sony fw46, core2duo, bought in 2009, just before the big launch of exactly the same laptop, exactly the same price, but with core i7 and more memory.... (they were the only laptops with full hd 1920 screen for coming years, now such things are very common, and they have cuda gtx580m too, even better!, but I did not get one yet, and don't plan to)

And by the way, trying to be not totally offtopic, the "B" in the FFT length [edit: in the PDF file] makes no sense and it is technically incorrect. Please correct that. The FFT lenght is not measured in bytes. In fact, each FFT "element" has 8 bytes, and what Dubslow said [edit: about the lenght of the saving files] is therefore wrong: the 1440K FFT size (or 1474560 FFT size, or 1.44M FFT size, but NOT 1440KB FFT, nor 1.44MB FFT, these are WRONG) produces a save file of exactly 11 megabytes, if you do not compress it with gzip or whatever compression algorithm (which compression will be very bad if you do it, because it will not be possible to directly compare residue files using a binary editor/viewer - someone did it in the past for former releases and people, including me, got mad about it).

There is also something wrong with the last versions related to "big fft" part, I had in my hands gtx580 with 1.5GB, 3GB, and teslas with 6GB, and I could NOT run any 100Mdigit exponent. But this is more technical and it was/will be discussed in the suitable threads, it has nothing to do with the advertizing (i.e. the pdf file in discussion says what CL advertizes, it is not your fault the program does not work as advertized, hehe).

Last fiddled with by LaurV on 2012-08-02 at 04:16 Reason: did a total amalgrammar, it seems I woke up in a bad mood today...
LaurV is offline   Reply With Quote
Old 2012-08-02, 03:38   #6
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Umm... I never said anything except 1440K. The only place I see a B is 1.5MB. But you're right, I forgot that sizeof(double) != 1, it's 8 as you point out, so 11.8MB plus a few bytes overhead.

[OT]
As for 2.04, I've not seen any progress made, and I haven't seen flash on in a while. I haven't kept particularly close track of the issues either, since it appeared to be platform specific. I too have been using 2.04 Beta since then (at least a month) without any file locking issues (or issues of any kind ). LaurV, perhaps you could get MSVS/CUDA set up and restart the debugging? [/OT]
Dubslow is offline   Reply With Quote
Old 2012-08-02, 04:13   #7
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

854810 Posts
Default

edited to explain (the two "[edit...]" brackets). added text, corrected grammar as much as I could spot. I woke up in the wrong posture today... I better refrain posting until after the forth coffee cup...

Related to installing cuda and msvs, well, I already did, but beside of trying to recompile some of older flashjh's releases, I didn't do too much. I can't really find the time for programming at home (at the office is no way! plenty of little things and Thai minions are pissing me off every minute), and I still have a list of "things to program at home", including that P-1 stuff, I did not write a line of code to it since months, but for that project the story is different, beside of scarce time, there is also scarce inspiration/knowledge. I am still playing with P-1 in pari/gp, trying to optimize it (from the theoretical point of view) as much as possible for mersenne numbers, and trying to get it as parallel as possible, but beside of multiplying primes in pairs on different threads there is not too much to optimize. I have learned a lot of things from this, but the magic spark is still missing. It may be a good reason why other (more clever) people didn't implement P-1 on cuda till now. If any spark, I may return to writing "close to the metal" (i.e. cuda) again, but the chances are low for the time being.
LaurV is offline   Reply With Quote
Old 2012-08-02, 04:26   #8
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Re the B thing: Ah yes, that makes much more sense. Please fix it Brain.

Re CUDALucas: flash did some programming, but none I couldn't have done myself. (It would have taken me quite a bit more time than him though. ) Unfortunately, until he returns, you're the only one (AFAIK) who is capable of compiling CUDALucas for Windows, and I can theoretically take care of the programming myself. (We might need to go through some extensive PM conversations as you copy/paste warnings/errors etc., but I should be able to do it with little more than the copy/pasting from you.)

Re P-1: Since P-1 is also a bunch of multiplication mod Mp, like the LL test, I think our best bet is to modify CUDALucas. First step would be to learn (and I mean learn) about this thingy. That would be something that would require some serious tutoring from the smart people, e.g. msft, ewmayer, Prime95, etc. etc. This might be a good place to start, being the genesis of all modern LL programs, CUDALucas and Prime95 included (though perhaps excluding Mlucas, you'll have to ask ewmayer about that), and being written by Richard Crandall, one of the guys who came up with the IBDWT. (If you look closely, some of the comments and functions in CUDALucas.cu are actually verbatim (or close to it) leftovers from that link.) PS: I have been considering starting such a tutorial thread, but between fixing YAFU's minrels, BOINCifying a modified Msieve (not to mention Prime95) and restarting university in less than a month, I figured it was too much. PPS: Wiki says this: "If we perform carrying on the negacyclic convolution, the result is equivalent to the product of the inputs mod B^n + 1." together with "If we perform carrying on the cyclic convolution, the result is equivalent to the product of the inputs mod B^n − 1." But then it says: "In this algorithm, it will be more useful to compute the negacyclic convolution" but it seems to me that using the cyclic convolution would make more sense since our test should be mpd 2^p-1? Or is the negacyclic thingy still faster, and just be sure to catch values that are 2^p and 2^p-1 (which should reduce to 1 and 0 mod 2^p-1?) PPPS: How do you represent a bignum as an array of doubles? How many bits of the num does each double represent? (Do we assume a double has 64 bits of memory? Do you assume IEEE 754 format? Can you use the exponent bits, e.g. via shifts?)

Last fiddled with by Dubslow on 2012-08-02 at 05:16
Dubslow is offline   Reply With Quote
Old 2012-08-02, 17:26   #9
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default GPU Computing Cheat Sheet Update to v1.02a

Quote:
Originally Posted by LaurV View Post
And by the way, trying to be not totally offtopic, the "B" in the FFT length [edit: in the PDF file] makes no sense and it is technically incorrect. Please correct that. The FFT lenght is not measured in bytes. In fact, each FFT "element" has 8 bytes, and what Dubslow said [edit: about the lenght of the saving files] is therefore wrong: the 1440K FFT size (or 1474560 FFT size, or 1.44M FFT size, but NOT 1440KB FFT, nor 1.44MB FFT, these are WRONG) produces a save file of exactly 11 megabytes, if you do not compress it with gzip or whatever compression algorithm (which compression will be very bad if you do it, because it will not be possible to directly compare residue files using a binary editor/viewer - someone did it in the past for former releases and people, including me, got mad about it).
Ah, a simple typo. Save file size should have suggested that 2M FFT length needs 16 MB disk space. ;-)

I assume v1.03 will come out soon as of the new upcoming mfaktc/o kernels and CL 2.04.

File now here.

Last fiddled with by Brain on 2012-08-05 at 09:58
Brain is offline   Reply With Quote
Old 2012-08-02, 17:31   #10
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

5138 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Some notes on CUDALucas 2.03: In many places, examples and instructions are for 2.01, not 2.03. All command line options are the same (except for -i which prints device info), however the work file requires pseudo-GIMPS format.
Code:
Test=<exp>
Test=<AID>,<exp>,<tf>[,<p-1>]
<AID> can be "N/A" (actually, it can be anything, AID isn't actually used anywhere in 2.03).

Also, thanks for the links to the .dlls. Whenever I feel like cleaning up the SourceForge files page, I'll make use of those. (LaurV was able to provide some of them, but not all.)
I will mention the new file format in the next release.

The web space I use is very limited. I will have to remove older versions for publishing new CUDA dlls. I think they have a nice place to be over there at sourceforge.
Brain is offline   Reply With Quote
Old 2012-08-02, 17:35   #11
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

No, I mean I don't have the dlls you've linked, so in the next few days I'll download those and update the SF page with them.

(And btw, with 2.04 the assignment format will be more flexible, among some other changes. I'll detail them when it's actually released, if the file locking ever gets fixed.)
Dubslow is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
History of Computing Nick Computer Science & Computational Number Theory 0 2017-10-10 20:45
Error while Computing Antonio NFS@Home 5 2016-06-30 17:30
Cloud computing Unregistered Information & Answers 10 2011-05-10 00:57
The ATI GPU Computing thread Brain Hardware 7 2009-12-19 18:54
The difference between P2P and distributed computing and grid computing GP2 Lounge 2 2003-12-03 14:13

All times are UTC. The time now is 01:09.

Sun Jun 7 01:09:54 UTC 2020 up 73 days, 22:42, 0 users, load averages: 1.58, 1.41, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.