![]() |
|
|
#716 |
|
Nov 2010
Germany
3×199 Posts |
Thanks to all testers who responded so far! I'm happy no new bugs have been discovered so far (apart from a cooling issue at Axelsson's HD 6970, that the CPU-sieve versions never reached
).kracker reported that even the GPU-sieve version would consume one CPU core. Could you all please have a look again on your machines - for me, mfakto sits at 0.2% CPU. I think his Catalyst 1124.2 is the 13.3 beta driver (?) -maybe some issue with that, or because of the two active GPUs (again, a driver issue). Apart from that, everything seems to work well, but performance does not seem to keep up to the expectations for many (5-10% slower than the CPU sieve, if the CPU could saturate the GPU). Let's see. I also got word from AMD that the 13.4 Catalyst version will have a fix for a compiler bug. When I can remove the workaround for that bug, I expect a 5% speedup. The performance on VLIW4/5 is rather bad because only vector-size 2 is working at the moment, the performance on GCN suffers from having to use the "second best" kernel. Tradeoffs for the prototype, there are still a few things left for me to do ... Today I tested the Linux64 version with identical results to Win64. Edit: I just added VectorSize=4 on my HD 5770 (only for the TF kernel, not (yet) the sieve): 2.5% faster. Last fiddled with by Bdot on 2013-04-09 at 22:40 |
|
|
|
|
|
#717 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
And yes, it is 13.3, I might try the stable driver later, when I have time. |
|
|
|
|
|
|
#718 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
![]() Edit: Now I also see it in your email-report: this was the CPU sieve, not GPU ... Last fiddled with by Bdot on 2013-04-09 at 23:59 |
|
|
|
|
|
|
#719 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
Code:
# The barrett15_75 kernel is 1-2% faster if we can limit the exponent to # 2^29 and k<2^60, using this switch (no effect on other kernels). The default # keeps the original limits of exp<2^32 and k<2^64 # # Default: SmallExp=0 SmallExp=0 # move the sieving to the GPU. This will free most of the CPU resources # # SieveOnGPU=1 |
|
|
|
|
|
|
#720 |
|
Nov 2010
Germany
11258 Posts |
Oh, you're right, I'm sorry. For the real tests (not any selftest), it is reporting "Using GPU kernel ..." and shows the kernel it would use for CPU sieving. Later, it switches to the "cl_barrett32_77_gs" kernel(hardcoded).
I tricked myself. However, it is really only reporting the wrong kernel, it is using the correct one. So we're back at "potential driver issue" in your case. Also, that 12.10 did not work is something I likely should be testing as well. Even if it was just for documenting that we now need 13.x. Has anyone a driver (catalyst) version below 13.1 working with the new prototype? mfakto reports the version like this: device (driver) version OpenCL 1.2 AMD-APP (1084.4) (1084.4) Anyone below 1084.4? |
|
|
|
|
|
#721 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
41708 Posts |
Quote:
|
|
|
|
|
|
|
#722 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
Try removing Catalyst again, then remove amdocl.dll and amdocl64.dll and reinstall Catalyst. AMD even provides an extra utility to do a clean uninstall to prepare a downgrade. |
|
|
|
|
|
|
#723 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
87816 Posts |
Quote:
![]() EDIT: Finally... it works! The "cpu usage" is gone too... thanks Bdot
Last fiddled with by kracker on 2013-04-11 at 02:11 |
|
|
|
|
|
|
#724 | |
|
Jul 2012
Sweden
4210 Posts |
Quote:
![]() Code:
OpenCL device info name Cayman (Advanced Micro Devices, Inc.) device (driver) version OpenCL 1.2 AMD-APP (1016.4) (1016.4 (VM)) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 24 (1536 compute elements) clock rate 880MHz ![]() If I get some free time I'll try to upgrade the drivers and see if the CPU usage goes down. But so far I like it a lot even with my issues and I would run it for production whenever I'm not using my computer, keeping the CPU sievers for when I'm using it. The most I got from my system before were when running four instances and doing 120 GHz-days/day and now it's doing 160 GHz-days/day straight out of the box... Code:
running a simple selftest ... got assignment: exp=66065887 bit_min=73 bit_max=74 (28.96 GHz-days) Starting trial factoring M66065887 from 2^73 to 2^74 (28.96GHz-days) Using GPU kernel "cl_barrett32_77" No checkpoint file "M66065887.ckp" found. Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Apr 13 01:30 | 4049 87.8% | 16.276 31m44s | 160.12 82485 0.00% M66065887 has a factor: 17587853595837070511807 found 1 factor for M66065887 from 2^73 to 2^74 (partially tested) [mfaktc mfakto 0.13pre3-Win cl_barrett32_77] tf(): total time spent: 3h 49m 36.682s (181.60 GHz-days / day) |
|
|
|
|
|
|
#725 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Is that a HD 6970? (based on 1536 cores) Shouldn't it do more?
My 7770 does around ~120 GHz a day and it's a low-med end model... |
|
|
|
|
|
#726 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
Axelsson, regarding the slow response: try reducing GPUSieveSize and especially GPUSieveProcessSize in mfakto.ini - this should make it more responsive. And then, please, have a look if it really is CPU usage by mfakto, or if it is just the GPU at its limit (which will also lead to slow screen responses). |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |