![]() |
|
|
#441 |
|
Oct 2011
Maryland
2·5·29 Posts |
I have uploaded my results when I run one instance alone, when I run two instances on the same card, and when I run four instances (two on each card).
As you guessed, transfer rate gets demolished with more than one. |
|
|
|
|
|
#442 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
OK, I just measured the same thing on my HD5770 here, and I get ~2.1GB/s single instance, or 2x 190MB/s, 4x 54MB/s. I guess this is some serious scheduling issue inside the OpenCL runtime. I think I prepare a case for AMD ... I then modified the code to ignore the number of compute units in the GPU and always run 2M FCs at once, which increased the copy performance to 2.4GB/s, 2x220MB/s, 4x105MB/s. Certainly some improvement, but I guess I need to invest in the 4x24bit=3x32bit idea for data transfers. |
|
|
|
|
|
|
#443 |
|
"Åke Tilander"
Apr 2011
Sandviken, Sweden
2×283 Posts |
One of my oldest boxes has a GPU: AMD Radeon X1650 Series. If I have understood it rightly this GPU cannot be used for TF. Just to make sure I have installed mfakto 0.10p1 and "Additional required software".
When I run the program with the -st I get the following output: Code:
mfakto 0.10p1-Win (32bit build) Runtime options Inifile mfakto.ini SievePrimes 25000 SievePrimesAdjust 1 NumStreams 5 GridSize 4 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300s Stages enabled StopAfterFactor class PrintMode full AllowSleep yes VectorSize 4 PreferKernel mfakto_cl_barrett79 SieveOnGPU no Compiletime options SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Select device - GPU not found, fallback to CPU. Get device info - Compiling kernels . BUILD OUTPUT Internal Error: as failed END OF BUILD OUTPUT init_CL(5, 0) failed |
|
|
|
|
|
#444 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2·67·73 Posts |
|
|
|
|
|
|
#445 |
|
"Jerry"
Nov 2011
Vancouver, WA
112310 Posts |
You can visit James' website to see which cards will work.
http://mersenne-aries.sili.net/mfakt...rt=ghdpd&noN=1 |
|
|
|
|
|
#446 |
|
Nov 2010
Germany
25516 Posts |
The X1650 has got an RV535 GPU chip. The first chip that supports OpenCL is RV700. Therefore, no OpenCL program will run on this GPU. In fact, it is 3 generations too old (X1000 -> HD2000 -> HD3000 -> HD4000, which is the first generation for OpenCL).
|
|
|
|
|
|
#447 |
|
"Åke Tilander"
Apr 2011
Sandviken, Sweden
10001101102 Posts |
Thank you, chalsall, flashjh and Bdot. Your help was much appreciated!
|
|
|
|
|
|
#448 |
|
Nov 2010
Germany
3×199 Posts |
v0.11 is ready. Please get it from http://mersenneforum.org/mfakto/mfakto-0.11/
What's new:
Note that the new fast kernels can not be used without Stages=1, as they need to process each bitlevel separately. Also, because of the other new config variables I suggest using the new shipped ini file and adjust it to your needs. And, as usual, let me know if anything does not work as expected
|
|
|
|
|
|
#449 |
|
Romulan Interpreter
Jun 2011
Thailand
3·3,221 Posts |
You make me feel terrible sad that I don't have an AMD card...
![]() I believe some of those are already implemented into mfaktc, but some of them are still missing, especially many "cosmetic" stuff... Do you still have a dialog with Oliver, or you went totally different paths now? It would be nice (for us, the blind users) if the two programs grow up together, and they don't become totally different stuff in few years... Last fiddled with by LaurV on 2012-05-21 at 03:13 |
|
|
|
|
|
#450 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
I'm in contact with Oliver and he said he'd merge the stuff to mfaktc, if users requested it explicitly. I understood he did not want to plainly merge everything. But if you, the mfaktc users tell him exactly which features you'd like to see in mfaktc, then he'd do. In most cases I can easily extract the changes that would be required - still it is quite some effort on Oliver's side to build and test. As CUDA code is not as separated from the C-code as OpenCL, merging may also be challenging in some cases. |
|
|
|
|
|
|
#451 | ||||||
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Quote:
SIEVE_PRIMES_MIN <= SievePrimesMin < SievePrimesMax <= SIEVE_PRIMES_MAX With SIEVE_PRIMES_M[IN|AX] hardcoded and fix and SievePrimesM[in|ax] usertuneable in mfakt?.ini. (Something that I've on my todo for 0.19) Quote:
Quote:
![]() Quote:
![]() Quote:
Quote:
![]() Oliver |
||||||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2719 | 2021-08-05 22:43 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |