![]() |
|
|
#276 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
What type of CPU do you have? I've reduce the sieve size to fit most CPU's 32kb L1 cache. If you have a CPU with 64k L1 cache, then the siever might be slower ... I've lost my Phenom machine (again) therefore I could not test that. As most Intel CPUs have just 32k L1 data cache, I found the optimum sieve size to be ~24kB for those. If you have a 64k-L1-cache-machine, I can send you a special version and note for the next version to either adjust that automatically or make it configurable. Also, for bulldozer, I can create a 12kiB-siever-version. Can you confirm that you still see the line Using GPU kernel "mfakto_cl_71" if you select that kernel be be run? And can you see a difference in GPU utilization? Last fiddled with by Bdot on 2011-12-20 at 08:37 |
|
|
|
|
|
|
#277 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
|
|
|
|
|
|
|
#278 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Well, I've built a mfaktc executable for nucleons Bulldozer with a smaller sieve. It helps a little bit but my sieve code really runs bad on Bulldozer. Per clock something like 1/4 to 1/3 of a current Intel CPU.
![]() Oliver |
|
|
|
|
|
#279 | |
|
Oct 2011
Maryland
2×5×29 Posts |
Quote:
I don't think it is a siever issue... my utilization is the same (around 90%) with both .09 and .10. I confirmed that it does say that it is using mfakto_cl_71. |
|
|
|
|
|
|
#280 |
|
Nov 2010
Germany
3×199 Posts |
Yes, I've seen that reducing the sieve size any further dramatically reduces speed. In so far, the Phenoms (64kiB L1) should be best at sieving, if they get a 60kiB siever ...
|
|
|
|
|
|
#281 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
Can you please pm me your email address? I'd like to send you something to test ... |
|
|
|
|
|
|
#282 |
|
Dec 2011
Ottawa, Canada
22 Posts |
yep, don't have APP SDK 2.4 installed AFAIK. i wanted to install 2.6, but the download link was corrupted so i'm using 2.5.
in terms of stability, mfakto hasn't crashed in the last 10 or so hours. this is coinciding with changing my usage pattern from 2 instances+1 instance to running only 1 instance on each card (so 1+1). from a resource standpoint, i'm using 3 cores of my i5 to feed the cards and 1 core to run prime95. if i allow 2 cores of primes to run, i get a major throughput hit in mfakto. thanks for this version! i didn't want to have to do a driver rollback to run this on my main machine :) |
|
|
|
|
|
#283 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
And another note: the aforementioned performance issue seems resolved. kyleaskine and flashjh are helping me test it, so I'll probably release a fix for it tomorrow - together with the linux binary. |
|
|
|
|
|
|
#284 |
|
Feb 2005
The Netherlands
2×109 Posts |
When using CheckpointDelay=0 and PrintMode=1, the first column (class) of the output is always overwritten by the text 'CP written.', makes it impossible to see which class is being tested.
|
|
|
|
|
|
#285 | |
|
Nov 2010
Germany
25516 Posts |
Quote:
I'll think of some better way to tell that a checkpoint was written. Thanks for the report. Here's the fix for the performance issues. It just contains 2 kernel files that need to replace original files from the 0.10 package. Last fiddled with by Bdot on 2011-12-21 at 14:00 |
|
|
|
|
|
|
#286 |
|
Nov 2010
Germany
3·199 Posts |
Here comes the linux version of mfakto 0.10. It has the performance issues resolved, but is otherwise unchanged (also 32kiB sieve limit).
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3498 | 2021-08-06 21:07 |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2719 | 2021-08-05 22:43 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |