mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

KyleAskine 2012-05-07 00:55

1 Attachment(s)
I have uploaded my results when I run one instance alone, when I run two instances on the same card, and when I run four instances (two on each card).

As you guessed, transfer rate gets demolished with more than one.

Bdot 2012-05-07 10:28

[QUOTE=KyleAskine;298634]I have uploaded my results when I run one instance alone, when I run two instances on the same card, and when I run four instances (two on each card).

As you guessed, transfer rate gets demolished with more than one.[/QUOTE]

Ouch, I did not expect the copying performance to deteriorate so much ... at least we seem to have found the reason of the strange behavior.

OK, I just measured the same thing on my HD5770 here, and I get ~2.1GB/s single instance, or 2x 190MB/s, 4x 54MB/s.

I guess this is some serious scheduling issue inside the OpenCL runtime. I think I prepare a case for AMD ...

I then modified the code to ignore the number of compute units in the GPU and always run 2M FCs at once, which increased the copy performance to 2.4GB/s, 2x220MB/s, 4x105MB/s. Certainly some improvement, but I guess I need to invest in the 4x24bit=3x32bit idea for data transfers.

aketilander 2012-05-15 18:16

One of my oldest boxes has a GPU: [B]AMD Radeon X1650 Series. [/B]If I have understood it rightly this GPU cannot be used for TF. Just to make sure I have installed mfakto 0.10p1 and "Additional required software".

When I run the program with the -st I get the following output:

[code]mfakto 0.10p1-Win (32bit build)

Runtime options
Inifile mfakto.ini
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 5
GridSize 4
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode full
AllowSleep yes
VectorSize 4
PreferKernel mfakto_cl_barrett79
SieveOnGPU no
Compiletime options
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled
Select device - GPU not found, fallback to CPU.
Get device info - Compiling kernels .
BUILD OUTPUT
Internal Error: as failed
END OF BUILD OUTPUT
init_CL(5, 0) failed[/code]

I just want to make sure that I cannot use this GPU ([B]AMD Radeon X1650 Series)[/B] for TF or any other GIMPS related work. Is that so?

chalsall 2012-05-15 18:42

[QUOTE=aketilander;299545]I just want to make sure that I cannot use this GPU ([B]AMD Radeon X1650 Series)[/B] for TF or any other GIMPS related work. Is that so?[/QUOTE]

This line should have given it away: "Select device - GPU not found, fallback to CPU.

flashjh 2012-05-15 18:55

You can visit James' website to see which cards will work.

[url]http://mersenne-aries.sili.net/mfaktc.php?sort=ghdpd&noN=1[/url]

Bdot 2012-05-15 21:24

[QUOTE=aketilander;299545]One of my oldest boxes has a GPU: [B]AMD Radeon X1650 Series. [/B]If I have understood it rightly this GPU cannot be used for TF.[/QUOTE]
The X1650 has got an RV535 GPU chip. The first chip that supports OpenCL is RV700. Therefore, no OpenCL program will run on this GPU. In fact, it is 3 generations too old (X1000 -> HD2000 -> HD3000 -> HD4000, which is the first generation for OpenCL).

aketilander 2012-05-16 12:37

[QUOTE=Bdot;299564]The X1650 has got an RV535 GPU chip. The first chip that supports OpenCL is RV700. Therefore, no OpenCL program will run on this GPU. In fact, it is 3 generations too old (X1000 -> HD2000 -> HD3000 -> HD4000, which is the first generation for OpenCL).[/QUOTE]

Thank you, chalsall, flashjh and Bdot. Your help was much appreciated!

Bdot 2012-05-20 21:58

v0.11
 
v0.11 is ready. Please get it from [URL]http://mersenneforum.org/mfakto/mfakto-0.11/[/URL]

What's new:
[LIST][*]24-bit barrett kernel for FCs up to 2^70 - very fast![*]15-bit barrett kernel for FCs up to 2^73 - almost as fast, especially on Cayman this one has a speedup of 50% over 0.10p1[*]new [B]SievePrimesMin [/B]ini-file variable to replace the so-far fix value of 5000 (hard minimum is 256)[*]new [B]V5UserID [/B]and [B]ComputerID [/B]ini-file variables that let you configure these ID's for the results file output (so far only useful for mersenne-aries.sili.net)[*]new [B]TimeStampInResults [/B]ini-file variable allows to configure that each result line should be preceded by a time stamp[*]new [B]ProgressHeader [/B]and [B]PrintFormat [/B]ini-file variables to adapt the information that is printed after each class is finished. See the included mfakto.ini file for details.[*]On Linux: Siever code is now compiled with gcc4.6: ~10% faster sieve[*]file locking: worktodo and results files accesses are now synchronized using a lock file (.lck appended to the file name).[*]evaluation of GHz-days of assignments, and current speed as GHz-days/day[*]Ctrl-C handler already in selftest to get a summary of so-far-completed tests[*]new --pertest option to test the siever performance depending on SievePrimes and SieveSizeLimit (if that is not fix at compile time)[*]using a fix power of 2 for the number of GPU threads (still set via GridSize)[/LIST]Source code is at [URL]https://github.com/Bdot42/mfakto[/URL], [URL="https://github.com/Bdot42/mfakto/zipball/v0.11"]v0.11[/URL]

Note that the new fast kernels can not be used without Stages=1, as they need to process each bitlevel separately. Also, because of the other new config variables I suggest using the new shipped ini file and adjust it to your needs.

And, as usual, let me know if anything does not work as expected :smile:

LaurV 2012-05-21 03:12

You make me feel terrible sad that I don't have an AMD card... :smile:
I believe some of those are already implemented into mfaktc, but some of them are still missing, especially many "cosmetic" stuff... Do you still have a dialog with Oliver, or you went totally different paths now? It would be nice (for us, the blind users) if the two programs grow up together, and they don't become totally different stuff in few years...

Bdot 2012-05-21 11:23

[QUOTE=LaurV;299942]You make me feel terrible sad that I don't have an AMD card... :smile:
I believe some of those are already implemented into mfaktc, but some of them are still missing, especially many "cosmetic" stuff... Do you still have a dialog with Oliver, or you went totally different paths now? It would be nice (for us, the blind users) if the two programs grow up together, and they don't become totally different stuff in few years...[/QUOTE]

Hehe, mfaktc has the performance, mfakto has the fancy stuff?

I'm in contact with Oliver and he said he'd merge the stuff to mfaktc, [B]if users requested it explicitly.[/B] I understood he did not want to plainly merge everything. But if you, the mfaktc users tell him exactly which features you'd like to see in mfaktc, then he'd do. In most cases I can easily extract the changes that would be required - still it is quite some effort on Oliver's side to build and test. As CUDA code is not as separated from the C-code as OpenCL, merging may also be challenging in some cases.

TheJudger 2012-05-21 12:58

[QUOTE=Bdot;299918][*]new [B]SievePrimesMin [/B]ini-file variable to replace the so-far fix value of 5000 (hard minimum is 256)
[/QUOTE]
Let us extend this to SievePrimesMin + SievePrimesMax in mfakt?.ini:
SIEVE_PRIMES_MIN <= SievePrimesMin < SievePrimesMax <= SIEVE_PRIMES_MAX
With SIEVE_PRIMES_M[IN|AX] hardcoded and fix and SievePrimesM[in|ax] usertuneable in mfakt?.ini. (Something that I've on my todo for 0.19)
[QUOTE=Bdot;299918][*]new [B]V5UserID [/B]and [B]ComputerID [/B]ini-file variables that let you configure these ID's for the results file output (so far only useful for mersenne-aries.sili.net)[*]new [B]TimeStampInResults [/B]ini-file variable allows to configure that each result line should be preceded by a time stamp
[/QUOTE]
I guess I'll can addept those two easily in mfaktc.
[QUOTE=Bdot;299918][*]new [B]ProgressHeader [/B]and [B]PrintFormat [/B]ini-file variables to adapt the information that is printed after each class is finished. See the included mfakto.ini file for details.
[/QUOTE]
I have to look at this, fancy stuff! :smile:
[QUOTE=Bdot;299918][*]On Linux: Siever code is now compiled with gcc4.6: ~10% faster sieve
[/QUOTE]
mfaktc compiles fine with gcc 4.6 / CUDA >= 4.2. The sieve code is ~10% faster on my IVB compared to gcc 4.4. :cool:
[QUOTE=Bdot;299918][*]file locking: worktodo and results files accesses are now synchronized using a lock file (.lck appended to the file name).
[/QUOTE]
I have to check but personally I'm not really a fan of file locking... two many failures in the past...

[QUOTE=LaurV;299942]You make me feel terrible sad that I don't have an AMD card... :smile:
I believe some of those are already implemented into mfaktc, but some of them are still missing, especially many "cosmetic" stuff... Do you still have a dialog with Oliver, or you went totally different paths now? It would be nice (for us, the blind users) if the two programs grow up together, and they don't become totally different stuff in few years...[/QUOTE]

Yes, we are talking together, usually via PM in german (which is easier for both of us I guess). It is a good idea to have both, mfaktc and mfakto, similar/identical in places where it is doable. Ofcourse this is not the case for the GPU code and CUDA/OpenCL specific stuff. An it is no secret that my focus is on the performance while I tend to ignore the "useless stuff" like an user interface. :blush:

Oliver


All times are UTC. The time now is 22:59.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.