![]() |
|
|
#683 | |
|
Nov 2010
Germany
11258 Posts |
Quote:
If the CPU can do, you could even sieve 1000000 primes. This will almost be the point where M/s and GHz-days/day have the same numbers (i.e. you'd be close to 200 GHz-days/day, but you may have to add another CPU to get there ).Monitor the CPU and GPU load. If you want to max overall throughput, both should be close to 100%. |
|
|
|
|
|
|
#684 | |
|
Aug 2002
North San Diego County
5×137 Posts |
Quote:
Code:
C:\hd>mfakto.hd4000-pi -d 11 -st mfakto 0.12-Win-HD4000 (64bit build) Runtime options Inifile mfakto.ini WARNING: Cannot read SievePrimesMin from inifile, using default value (5000) SievePrimesMin 5000 WARNING: Cannot read SievePrimesMax from inifile, using default value (1000000) SievePrimesMax 1000000 WARNING: Cannot read SievePrimes from inifile, using default value (25000) SievePrimes 25000 WARNING: Cannot read SievePrimesAdjust from inifile, using default value (0) SievePrimesAdjust 0 NumStreams 1 GridSize 1 WARNING: Cannot read WorkFile from inifile, using default (worktodo.txt) WorkFile worktodo.txt WARNING: Cannot read ResultsFile from inifile, using default (results.txt) ResultsFile results.txt WARNING: Cannot read Checkpoints from inifile, enabled by default Checkpoints enabled WARNING: Cannot read CheckpointDelay from inifile, set to 300s by default CheckpointDelay 300s WARNING: Cannot read Stages from inifile, enabled by default Stages enabled WARNING: Cannot read StopAfterFactor from inifile, set to 1 by default StopAfterFactor bitlevel WARNING: Cannot read PrintMode from inifile, set to 0 by default PrintMode full V5UserID none ComputerID none WARNING: Cannot read AllowSleep from inifile, set to 0 by default AllowSleep no TimeStampInResults no VectorSize 1 WARNING: Cannot read GPUType from inifile, using default (AUTO) GPUType AUTO WARNING: Cannot read SieveOnGPU from inifile, set to 0 by default SieveOnGPU no WARNING: Cannot read SmallExp from inifile, set to 0 by default SmallExp no WARNING: Cannot read SieveCPUMask from inifile, set to 0 by default SieveCPUMask 0 Compiletime options SIEVE_SIZE_LIMIT 36kiB SIEVE_SIZE 289731bits SIEVE_SPLIT 250 MORE_CLASSES enabled CL_PERFORMANCE_INFO enabled (DEBUG option) Select device - Get device info - Compiling kernels .......... WARNING: Unknown GPU name, assuming VLIW5 type. Please post the device name "Int el(R) HD Graphics 4000 (Intel(R) Corporation)" to http://www.mersenneforum.org/s howthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to s elect a GPU type yourself and avoid this warning. OpenCL device info name Intel(R) HD Graphics 4000 (Intel(R) Corporation) device (driver) version OpenCL 1.1 (9.17.10.2932) maximum threads per block 512 maximum threads per grid 134217728 number of multiprocessors 16 (1280 compute elements) clock rate 350MHz Automatic parameters threads per grid 262144 optimizing kernels for VLIW5 ########## testcase 1/1559 ########## Starting trial factoring M50804297 from 2^67 to 2^68 (0.59GHz-days) k_min = 1599999998520 - k_max = 1900000000000 Using GPU kernel "barrett15_75" done | ETA | GHz |time/class| #FCs | avg. rate | SieveP. |CPU idle 262144 FCs copied in 0.11 ms (9546.39 MB/s), proc'd in 47.81 ms (5.48 M/s) Of course, while I was posting: Code:
########## testcase 1/1559 ########## Starting trial factoring M50804297 from 2^67 to 2^68 (0.59GHz-days) k_min = 1599999998520 - k_max = 1900000000000 Using GPU kernel "barrett15_75" done | ETA | GHz |time/class| #FCs | avg. rate | SieveP. |CPU idle 262144 FCs copied in 0.11 ms (9546.39 MB/s), proc'd in 47.81 ms (5.48 M/s) Error -5: Copying h_ktab(clEnqueueWriteBuffer) ERROR from tf_class. Error exit as selftest failed C:\hd> Last fiddled with by sdbardwick on 2013-01-31 at 21:28 |
|
|
|
|
|
|
#685 | |
|
Nov 2010
Germany
25516 Posts |
Quote:
That is the point where some serious debugging seems necessary ... or better drivers from Intel or AMD (I'm not even sure whose code is running).The prospect of adding 5 or maybe 10 GHz-days/day is also not too encouraging ... |
|
|
|
|
|
|
#686 |
|
Aug 2002
North San Diego County
68510 Posts |
I'd agree that waiting for Intel to go through another iteration of OpenCL development is the efficient choice. Now about that GPU sieving...
|
|
|
|
|
|
#687 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
216810 Posts |
Quote:
My cpu is definitely not the best, so I'll keep fiddling.What I need is
|
|
|
|
|
|
|
#688 |
|
Sep 2002
Austin, TX
56110 Posts |
I'm glad to see that GPU TF has come so far; thank you all for your effort. I'm trying out Mfakto on an AMD A8-5600K APU. Looks promising, but CPU sieving appears to be a bottleneck.
Any hope of AMD OpenCL implementations getting LL testing and factoring with GPU Sieving? It seems hard to believe that AMD's modern hardware doesn't have the same instructions as Cuda 2.x (Double precision floating point operations). |
|
|
|
|
|
#689 |
|
Romulan Interpreter
Jun 2011
Thailand
100101101101012 Posts |
By the way, I accidentally run into a OpenCL FFT implementation by Apple, which seems to be just the missing link to make "OpenCL_LL" (i.e. CudaLucas for Radeons). Maybe someone else stronger then me on the subject can have a look?
Last fiddled with by LaurV on 2013-02-25 at 14:28 |
|
|
|
|
|
#690 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
87816 Posts |
Quote:
|
|
|
|
|
|
|
#691 |
|
Nov 2010
Germany
3·199 Posts |
If you have all your CPU cores occupied with something (e.g. 2x mfakto + 2x prime95 on a quad core), then you need to manually select a suitable SievePrimes value and disable SievePrimesAdjust. AutoAdjust does not work on a fully loaded system - it will max out SievePrimes, which indeed results in a performance bottleneck.
Regarding GPU sieving: It is not so much that the GPU part of OpenCL is so much different from the GPU part of CUDA - most of that can even be solved by one or two dozen #defines. The CPU part of both is so much different. And a few concepts/abilities are hard to "translate" (Assembler inlines, for instance). Anyway, I have George's GPU sieve running on OpenCL so that it provides some output. I need some verification of it being correct, and I need to do the adaptations of the kernels to read the raw sieve bitfield. As mfakto has many more kernels than mfaktc, I'm still thinking of a smart way to do this ... OpenCL FFTs are available, not only from Apple but also from AMD. However, they are far from well-optimized, and I'm not sure if the ones that CuLu needs are there. Double precision is available on GCN and the higher end previous generation cards, but at a rather high penalty (1/4 single precision speed). Not sure if this allows for an efficient OcLu .
|
|
|
|
|
|
#692 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
nVidia's cards suffer a worse SP->DP penalty than 1/4th. We would be a lot happier if it was "only" 1/4th.
|
|
|
|
|
|
#693 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
?BTW, I'm still busy with the GPU sieving for mfakto, and I could use some suggestions. I think I found out why my sieve seems to randomly kill FCs, it's code like this: Code:
mask = 1 << i37;
mask |= (1 << i41) | (1 << i43);
mask |= (1 << i47) | (1 << i53);
mask |= (1 << i59) | (1 << i61);
So how can I get the code above to work correctly and efficientlly on OpenCL? I best I could come up with is three instructions for one Code:
mask = i37 > 31 ? 0 : (1 << i37);
mask |= (i41 > 31 ? 0 : (1 << i41)) | (i43 > 31 ? 0 : (1 << i43));
mask |= (i47 > 31 ? 0 : (1 << i47)) | (i53 > 31 ? 0 : (1 << i53));
mask |= (i59 > 31 ? 0 : (1 << i59)) | (i61 > 31 ? 0 : (1 << i61));
|
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |