mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-05-07, 00:55   #441
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2·5·29 Posts
Default

I have uploaded my results when I run one instance alone, when I run two instances on the same card, and when I run four instances (two on each card).

As you guessed, transfer rate gets demolished with more than one.
Attached Files
File Type: zip PIresults.zip (24.4 KB, 82 views)
KyleAskine is offline   Reply With Quote
Old 2012-05-07, 10:28   #442
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
I have uploaded my results when I run one instance alone, when I run two instances on the same card, and when I run four instances (two on each card).

As you guessed, transfer rate gets demolished with more than one.
Ouch, I did not expect the copying performance to deteriorate so much ... at least we seem to have found the reason of the strange behavior.

OK, I just measured the same thing on my HD5770 here, and I get ~2.1GB/s single instance, or 2x 190MB/s, 4x 54MB/s.

I guess this is some serious scheduling issue inside the OpenCL runtime. I think I prepare a case for AMD ...

I then modified the code to ignore the number of compute units in the GPU and always run 2M FCs at once, which increased the copy performance to 2.4GB/s, 2x220MB/s, 4x105MB/s. Certainly some improvement, but I guess I need to invest in the 4x24bit=3x32bit idea for data transfers.
Bdot is offline   Reply With Quote
Old 2012-05-15, 18:16   #443
aketilander
 
aketilander's Avatar
 
"Åke Tilander"
Apr 2011
Sandviken, Sweden

2×283 Posts
Question

One of my oldest boxes has a GPU: AMD Radeon X1650 Series. If I have understood it rightly this GPU cannot be used for TF. Just to make sure I have installed mfakto 0.10p1 and "Additional required software".

When I run the program with the -st I get the following output:

Code:
mfakto 0.10p1-Win (32bit build)
 
Runtime options
  Inifile                   mfakto.ini
  SievePrimes               25000
  SievePrimesAdjust         1
  NumStreams                5
  GridSize                  4
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  AllowSleep                yes
  VectorSize                4
  PreferKernel              mfakto_cl_barrett79
  SieveOnGPU                no
Compiletime options
  SIEVE_SIZE_LIMIT          32kiB
  SIEVE_SIZE                193154bits
  SIEVE_SPLIT               250
  MORE_CLASSES              enabled
Select device - GPU not found, fallback to CPU.
Get device info - Compiling kernels . 
 BUILD OUTPUT
Internal Error:  as failed
  END OF BUILD OUTPUT
init_CL(5, 0) failed
I just want to make sure that I cannot use this GPU (AMD Radeon X1650 Series) for TF or any other GIMPS related work. Is that so?
aketilander is offline   Reply With Quote
Old 2012-05-15, 18:42   #444
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·67·73 Posts
Default

Quote:
Originally Posted by aketilander View Post
I just want to make sure that I cannot use this GPU (AMD Radeon X1650 Series) for TF or any other GIMPS related work. Is that so?
This line should have given it away: "Select device - GPU not found, fallback to CPU.
chalsall is online now   Reply With Quote
Old 2012-05-15, 18:55   #445
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

112310 Posts
Default

You can visit James' website to see which cards will work.

http://mersenne-aries.sili.net/mfakt...rt=ghdpd&noN=1
flashjh is offline   Reply With Quote
Old 2012-05-15, 21:24   #446
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by aketilander View Post
One of my oldest boxes has a GPU: AMD Radeon X1650 Series. If I have understood it rightly this GPU cannot be used for TF.
The X1650 has got an RV535 GPU chip. The first chip that supports OpenCL is RV700. Therefore, no OpenCL program will run on this GPU. In fact, it is 3 generations too old (X1000 -> HD2000 -> HD3000 -> HD4000, which is the first generation for OpenCL).
Bdot is offline   Reply With Quote
Old 2012-05-16, 12:37   #447
aketilander
 
aketilander's Avatar
 
"Åke Tilander"
Apr 2011
Sandviken, Sweden

10001101102 Posts
Smile

Quote:
Originally Posted by Bdot View Post
The X1650 has got an RV535 GPU chip. The first chip that supports OpenCL is RV700. Therefore, no OpenCL program will run on this GPU. In fact, it is 3 generations too old (X1000 -> HD2000 -> HD3000 -> HD4000, which is the first generation for OpenCL).
Thank you, chalsall, flashjh and Bdot. Your help was much appreciated!
aketilander is offline   Reply With Quote
Old 2012-05-20, 21:58   #448
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Lightbulb v0.11

v0.11 is ready. Please get it from http://mersenneforum.org/mfakto/mfakto-0.11/

What's new:
  • 24-bit barrett kernel for FCs up to 2^70 - very fast!
  • 15-bit barrett kernel for FCs up to 2^73 - almost as fast, especially on Cayman this one has a speedup of 50% over 0.10p1
  • new SievePrimesMin ini-file variable to replace the so-far fix value of 5000 (hard minimum is 256)
  • new V5UserID and ComputerID ini-file variables that let you configure these ID's for the results file output (so far only useful for mersenne-aries.sili.net)
  • new TimeStampInResults ini-file variable allows to configure that each result line should be preceded by a time stamp
  • new ProgressHeader and PrintFormat ini-file variables to adapt the information that is printed after each class is finished. See the included mfakto.ini file for details.
  • On Linux: Siever code is now compiled with gcc4.6: ~10% faster sieve
  • file locking: worktodo and results files accesses are now synchronized using a lock file (.lck appended to the file name).
  • evaluation of GHz-days of assignments, and current speed as GHz-days/day
  • Ctrl-C handler already in selftest to get a summary of so-far-completed tests
  • new --pertest option to test the siever performance depending on SievePrimes and SieveSizeLimit (if that is not fix at compile time)
  • using a fix power of 2 for the number of GPU threads (still set via GridSize)
Source code is at https://github.com/Bdot42/mfakto, v0.11

Note that the new fast kernels can not be used without Stages=1, as they need to process each bitlevel separately. Also, because of the other new config variables I suggest using the new shipped ini file and adjust it to your needs.

And, as usual, let me know if anything does not work as expected
Bdot is offline   Reply With Quote
Old 2012-05-21, 03:12   #449
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3·3,221 Posts
Default

You make me feel terrible sad that I don't have an AMD card...
I believe some of those are already implemented into mfaktc, but some of them are still missing, especially many "cosmetic" stuff... Do you still have a dialog with Oliver, or you went totally different paths now? It would be nice (for us, the blind users) if the two programs grow up together, and they don't become totally different stuff in few years...

Last fiddled with by LaurV on 2012-05-21 at 03:13
LaurV is offline   Reply With Quote
Old 2012-05-21, 11:23   #450
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by LaurV View Post
You make me feel terrible sad that I don't have an AMD card...
I believe some of those are already implemented into mfaktc, but some of them are still missing, especially many "cosmetic" stuff... Do you still have a dialog with Oliver, or you went totally different paths now? It would be nice (for us, the blind users) if the two programs grow up together, and they don't become totally different stuff in few years...
Hehe, mfaktc has the performance, mfakto has the fancy stuff?

I'm in contact with Oliver and he said he'd merge the stuff to mfaktc, if users requested it explicitly. I understood he did not want to plainly merge everything. But if you, the mfaktc users tell him exactly which features you'd like to see in mfaktc, then he'd do. In most cases I can easily extract the changes that would be required - still it is quite some effort on Oliver's side to build and test. As CUDA code is not as separated from the C-code as OpenCL, merging may also be challenging in some cases.
Bdot is offline   Reply With Quote
Old 2012-05-21, 12:58   #451
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Quote:
Originally Posted by Bdot View Post
[*]new SievePrimesMin ini-file variable to replace the so-far fix value of 5000 (hard minimum is 256)
Let us extend this to SievePrimesMin + SievePrimesMax in mfakt?.ini:
SIEVE_PRIMES_MIN <= SievePrimesMin < SievePrimesMax <= SIEVE_PRIMES_MAX
With SIEVE_PRIMES_M[IN|AX] hardcoded and fix and SievePrimesM[in|ax] usertuneable in mfakt?.ini. (Something that I've on my todo for 0.19)
Quote:
Originally Posted by Bdot View Post
[*]new V5UserID and ComputerID ini-file variables that let you configure these ID's for the results file output (so far only useful for mersenne-aries.sili.net)[*]new TimeStampInResults ini-file variable allows to configure that each result line should be preceded by a time stamp
I guess I'll can addept those two easily in mfaktc.
Quote:
Originally Posted by Bdot View Post
[*]new ProgressHeader and PrintFormat ini-file variables to adapt the information that is printed after each class is finished. See the included mfakto.ini file for details.
I have to look at this, fancy stuff!
Quote:
Originally Posted by Bdot View Post
[*]On Linux: Siever code is now compiled with gcc4.6: ~10% faster sieve
mfaktc compiles fine with gcc 4.6 / CUDA >= 4.2. The sieve code is ~10% faster on my IVB compared to gcc 4.4.
Quote:
Originally Posted by Bdot View Post
[*]file locking: worktodo and results files accesses are now synchronized using a lock file (.lck appended to the file name).
I have to check but personally I'm not really a fan of file locking... two many failures in the past...

Quote:
Originally Posted by LaurV View Post
You make me feel terrible sad that I don't have an AMD card...
I believe some of those are already implemented into mfaktc, but some of them are still missing, especially many "cosmetic" stuff... Do you still have a dialog with Oliver, or you went totally different paths now? It would be nice (for us, the blind users) if the two programs grow up together, and they don't become totally different stuff in few years...
Yes, we are talking together, usually via PM in german (which is easier for both of us I guess). It is a good idea to have both, mfaktc and mfakto, similar/identical in places where it is doable. Ofcourse this is not the case for the GPU code and CUDA/OpenCL specific stuff. An it is no secret that my focus is on the performance while I tend to ignore the "useless stuff" like an user interface.

Oliver
TheJudger is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 01:05.


Fri Aug 6 01:05:27 UTC 2021 up 13 days, 19:34, 1 user, load averages: 2.44, 2.42, 2.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.