![]() |
![]() |
#78 | ||
Nov 2010
Germany
25516 Posts |
![]() Quote:
I'll check if we can get rid of that. Quote:
Now that is odd! The 72-bit kernel fails, but the vectored versions of the same kernel succeed! I just compared the kernels, but there are no code-differences. Plus, I can reproduce it now on my Linux box: I still had the LD_LIBRARY_PATH point to 2.4, and that runs fine. When pointing it to 2.5, the problem appears. Looks like an AMD APP issue, I'll check what I can do about it. Running 2.5 on the CPU also works fine ... I already wanted to drop the single kernel because it is so much slower ... As you built your own binary anyway, go to mfaktc.c and comment out line 487 (removing the _71BIT_MUL24 kernel). Don't submit results with that to primenet, just use it to check what your GPU can do ![]() |
||
![]() |
![]() |
![]() |
#79 |
Sep 2010
Annapolis, MD, USA
33·7 Posts |
![]()
I rebuilt the program with the change you recommended. All the tests pass, including the large selftest. The card seems to do about 5-10M/s in the "lower" ranges (like below 75M) and is about 10% of that in the 332M+ range.
Code:
Selftest statistics number of tests 3637 successfull tests 3637 selftest PASSED! Thanks again! ![]() |
![]() |
![]() |
![]() |
#80 |
Sep 2010
Annapolis, MD, USA
33×7 Posts |
![]()
The very first test I ran saved me an LL test, and of course saved someone else the LL-D down the road.
Code:
class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait 3760/4620 | 159.38M | 16.721s | 9.53M/s | 50000 | 49m53s | 90889us 3765/4620 | 159.38M | 16.696s | 9.55M/s | 50000 | 49m32s | 90729us Result[00]: M40660811 has a factor: 490782599517282826471 found 1 factor(s) for M40660811 from 2^68 to 2^69 (partially tested) [mfakto 0.07 mfakto_cl_barrett79] tf(): total time spent: 3h 53m 44.575s |
![]() |
![]() |
![]() |
#81 | |
Nov 2010
Germany
3·199 Posts |
![]() Quote:
![]() BTW, at the expense of a little more CPU you can speed up the tests a little: Set SievePrimes to 200000 and the siever will eliminate some more candidates so the GPU will not test them. What's mfakto's CPU-load right now and with SievePrimes at 200k? 9.5 M/s is also not bad for an entry-level GPU - I guess it is as least twice as fast as one of your CPU cores. Grats also to the successful selftest. The speed of the tests does not depend a lot on the size of the exponent but mainly on the kernel being used. The selftest will run each test with all kernels that can handle the required factor length. If you still have the output of the selftest you should see that mfakto_cl_barrett79 is always close to 10 M/s, most others a bit below that, and mfakto_cl_95 slowly crawling along. |
|
![]() |
![]() |
![]() |
#82 |
Nov 2010
Germany
3·199 Posts |
![]()
Did anyone else give mfakto a try? Any experiences to share (anything strange happening, suggestions you'd like to get included or excluded for the next versions, performance figures for other GPUs, ...)?
I'm running this version on a SuSE 11.4 box with AMD APP SDK 2.4, and when multiple instances are running I occasionally see one instance hanging. It will completely occupy one CPU core but no GPU resources. It is looping inside some kernel code, being immune to kill, kill -9 or attempts to attach a debugger or gcore. So far, reboot was the only way I know to get rid of it. How can I find out where that hang occurs? And what else could I try to kick such a process without a reboot? |
![]() |
![]() |
![]() |
#83 |
Jun 2011
131 Posts |
![]()
I had the same experience as another poster: had to recompile to reduce number of threads per block and disable one kernel. Apart from that AMD_APP refused to install on Win2008 so I had to swap the graphic cards between two machines so the AMD one would be on Windows 7. The performance is about 20% of what I get out of GeForce 8800 GTS (around 6 M/s comparing to 29 M/s). I haven't played with sieve parameter much - just had to disable auto adjust as it will raise the setting to the limit slowing the testing to a crawl. If I'll lower it to below default I would probably get better overall performance.
Last fiddled with by apsen on 2011-08-25 at 17:31 |
![]() |
![]() |
![]() |
#84 | |
Nov 2010
Germany
3×199 Posts |
![]() Quote:
According to hwcompare, the 8800 GTS should be 3-4 times faster, so 8-10 M/s would be expected if OpenCL and my port were as efficient as Oliver's CUDA implementation. Last fiddled with by Bdot on 2011-08-25 at 20:29 Reason: added hwcompare |
|
![]() |
![]() |
![]() |
#85 |
Aug 2011
216 Posts |
![]()
Hi, I'm running mfakto on my HD6950 @912MHz, Catalyst 11.8 SDK 2.5, one thing that I seen is that it uses approx. 30 percent of my GPU utilization and gives about 50M/s. Does anyone knows how to make it fully use the GPU?
Thanks. |
![]() |
![]() |
![]() |
#86 | |
Jun 2011
131 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#87 |
Dec 2003
Paisley Park & Neverland
5×37 Posts |
![]()
I get ~28M/s on my HD5670 / Phenom II 4 Core 925 with 2 Cores on P-1 tests, 1 Core LL-D; and 1 core is busy video editing. I'll look again when the video job is done.
|
![]() |
![]() |
![]() |
#88 | |
Dec 2010
Monticello
70316 Posts |
![]() Quote:
50M/s is doing a bit better than my GTX440 under mfaktc, incidentally. Setting up both mfaktc and mfakto to sieve on the GPU is at least a dream for the developers. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2696 | 2021-04-18 17:48 |
mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3492 | 2021-03-24 14:09 |
LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |