mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-12-20, 08:24   #276
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
What is the MUL24 Kernal?

In .09 I was using the mfakto_cl_71 for my 6950s with the shaders unlocked (so basically 6970s). I was getting around 140 M/s

With mfakto_cl_barrett79 I was getting around 120 M/s, so barrett was around 15-20% slower.

With .10 I seem to be getting around 120 M/s with both, so the mfakto_cl_71 seems to have gotten slower for me. The barrett kernel still runs at the same speed.

I am installing something ATM, but I can check again when I am done if you would like. Let me know if there are any screenshots or output files I can send you if that would help.

Edit: I just confirmed that at the same load, I now run around 20% slower with the new version.
mul24 kernel is the kernel "mfakto_cl_71".

What type of CPU do you have? I've reduce the sieve size to fit most CPU's 32kb L1 cache. If you have a CPU with 64k L1 cache, then the siever might be slower ... I've lost my Phenom machine (again) therefore I could not test that. As most Intel CPUs have just 32k L1 data cache, I found the optimum sieve size to be ~24kB for those. If you have a 64k-L1-cache-machine, I can send you a special version and note for the next version to either adjust that automatically or make it configurable.

Also, for bulldozer, I can create a 12kiB-siever-version.

Can you confirm that you still see the line
Using GPU kernel "mfakto_cl_71" if you select that kernel be be run?

And can you see a difference in GPU utilization?

Last fiddled with by Bdot on 2011-12-20 at 08:37
Bdot is offline   Reply With Quote
Old 2011-12-20, 08:27   #277
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by therealwebs View Post
I'm running both mfakto win 0.09 and 0.10 on different PCs. I've noticed that mfakto 0.10 (x64) seems to crash fairly regularly. I'm using cat 11.12 with 2x5870s. Since I'm running remote, I haven't been able to monitor the circumstance of the crashes. Event viewer doesn't have anything helpful to add at the moment. I'll update if I can find a set of circumstances that cause the crash.
Make sure to not have AMD APP SDK 2.4 on your box.
Bdot is offline   Reply With Quote
Old 2011-12-20, 11:17   #278
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Quote:
Originally Posted by Bdot View Post
Also, for bulldozer, I can create a 12kiB-siever-version.
Well, I've built a mfaktc executable for nucleons Bulldozer with a smaller sieve. It helps a little bit but my sieve code really runs bad on Bulldozer. Per clock something like 1/4 to 1/3 of a current Intel CPU.

Oliver
TheJudger is offline   Reply With Quote
Old 2011-12-20, 11:22   #279
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2×5×29 Posts
Default

Quote:
Originally Posted by Bdot View Post
mul24 kernel is the kernel "mfakto_cl_71".

What type of CPU do you have? I've reduce the sieve size to fit most CPU's 32kb L1 cache. If you have a CPU with 64k L1 cache, then the siever might be slower ... I've lost my Phenom machine (again) therefore I could not test that. As most Intel CPUs have just 32k L1 data cache, I found the optimum sieve size to be ~24kB for those. If you have a 64k-L1-cache-machine, I can send you a special version and note for the next version to either adjust that automatically or make it configurable.

Also, for bulldozer, I can create a 12kiB-siever-version.

Can you confirm that you still see the line
Using GPU kernel "mfakto_cl_71" if you select that kernel be be run?

And can you see a difference in GPU utilization?
I have an i5-2500k.

I don't think it is a siever issue... my utilization is the same (around 90%) with both .09 and .10.

I confirmed that it does say that it is using mfakto_cl_71.
KyleAskine is offline   Reply With Quote
Old 2011-12-20, 15:28   #280
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Well, I've built a mfaktc executable for nucleons Bulldozer with a smaller sieve. It helps a little bit but my sieve code really runs bad on Bulldozer. Per clock something like 1/4 to 1/3 of a current Intel CPU.

Oliver
Yes, I've seen that reducing the sieve size any further dramatically reduces speed. In so far, the Phenoms (64kiB L1) should be best at sieving, if they get a 60kiB siever ...
Bdot is offline   Reply With Quote
Old 2011-12-20, 15:31   #281
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
I have an i5-2500k.

I don't think it is a siever issue... my utilization is the same (around 90%) with both .09 and .10.

I confirmed that it does say that it is using mfakto_cl_71.
That is really sad, and it seems to depend on your GPU's - mfakto_cl_71 v0.10 on my box is faster than v0.09 ...

Can you please pm me your email address? I'd like to send you something to test ...
Bdot is offline   Reply With Quote
Old 2011-12-20, 17:52   #282
therealwebs
 
Dec 2011
Ottawa, Canada

22 Posts
Default

yep, don't have APP SDK 2.4 installed AFAIK. i wanted to install 2.6, but the download link was corrupted so i'm using 2.5.

in terms of stability, mfakto hasn't crashed in the last 10 or so hours. this is coinciding with changing my usage pattern from 2 instances+1 instance to running only 1 instance on each card (so 1+1). from a resource standpoint, i'm using 3 cores of my i5 to feed the cards and 1 core to run prime95. if i allow 2 cores of primes to run, i get a major throughput hit in mfakto.

thanks for this version! i didn't want to have to do a driver rollback to run this on my main machine :)
therealwebs is offline   Reply With Quote
Old 2011-12-20, 23:45   #283
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by therealwebs View Post
yep, don't have APP SDK 2.4 installed AFAIK. i wanted to install 2.6, but the download link was corrupted so i'm using 2.5.

in terms of stability, mfakto hasn't crashed in the last 10 or so hours. this is coinciding with changing my usage pattern from 2 instances+1 instance to running only 1 instance on each card (so 1+1). from a resource standpoint, i'm using 3 cores of my i5 to feed the cards and 1 core to run prime95. if i allow 2 cores of primes to run, i get a major throughput hit in mfakto.

thanks for this version! i didn't want to have to do a driver rollback to run this on my main machine :)
If you could enable userdumper or some other tool to get a crash dump when it aborts next time, that would be really helpful. But of course I hope it does not crash again ;-)

And another note: the aforementioned performance issue seems resolved. kyleaskine and flashjh are helping me test it, so I'll probably release a fix for it tomorrow - together with the linux binary.
Bdot is offline   Reply With Quote
Old 2011-12-21, 11:45   #284
BigBrother
 
Feb 2005
The Netherlands

2×109 Posts
Default

When using CheckpointDelay=0 and PrintMode=1, the first column (class) of the output is always overwritten by the text 'CP written.', makes it impossible to see which class is being tested.
BigBrother is offline   Reply With Quote
Old 2011-12-21, 13:48   #285
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by BigBrother View Post
When using CheckpointDelay=0 and PrintMode=1, the first column (class) of the output is always overwritten by the text 'CP written.', makes it impossible to see which class is being tested.
Hehe, that's a use-case that was not intended ... now that mfakto can delay writing the checkpoints, the idea is that they are written only occasionally.

I'll think of some better way to tell that a checkpoint was written.

Thanks for the report.


Here's the fix for the performance issues. It just contains 2 kernel files that need to replace original files from the 0.10 package.
Attached Files
File Type: zip mfakto-fix-0.10p1.zip (13.9 KB, 98 views)

Last fiddled with by Bdot on 2011-12-21 at 14:00
Bdot is offline   Reply With Quote
Old 2011-12-21, 13:55   #286
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default mfakto 0.10 - Linux version

Here comes the linux version of mfakto 0.10. It has the performance issues resolved, but is otherwise unchanged (also 32kiB sieve limit).
Attached Files
File Type: zip mfakto-0.10 - Linux.zip (133.7 KB, 96 views)
Bdot is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3498 2021-08-06 21:07
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 22:00.


Fri Aug 6 22:00:24 UTC 2021 up 14 days, 16:29, 1 user, load averages: 2.48, 2.73, 2.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.