mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-06-23, 21:27   #496
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Latest test from dbaugh:
Quote:
Originally Posted by dbaugh
2097152 FCs copied in 1.35 ms (6201.88 MB/s), proc'd in 3.59 ms (584.32 M/s)
Both the transfer rates of 6GB/s and the theoretical peak throughput of 584M/s are way beyond what I've seen so far with mfakto. It shows there is no immediate need for a GCN-version, but VectorSize=2 does the trick.

His 7970 is overclocked from 925 to 1125MHz, but even at default clock this would be the fastest AMD GPU right now.

Thanks for your tests, I hope you can get close to this throughput when running multiple instances ...
Bdot is offline   Reply With Quote
Old 2012-07-05, 06:53   #497
dbaugh
 
dbaugh's Avatar
 
Aug 2005

11810 Posts
Default mfakto switches

Is there a switch or something that will cause mfakto to include in the results file the datetime of when the result was written? This would be very useful for calculating wall time for different efforts and CPU/GPU loads.
dbaugh is offline   Reply With Quote
Old 2012-07-05, 11:21   #498
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by dbaugh View Post
Is there a switch or something that will cause mfakto to include in the results file the datetime of when the result was written? This would be very useful for calculating wall time for different efforts and CPU/GPU loads.
Try adding to mfakto.ini
Code:
TimeStampInResults=1
How many instances do you currently run on your 7970? I assume to get to SievePrimes > 20k you'll need at least 4?
Bdot is offline   Reply With Quote
Old 2012-07-05, 22:42   #499
dbaugh
 
dbaugh's Avatar
 
Aug 2005

7616 Posts
Default

Thanks for the info. I am at 67% on my CPU running 2x mfakto and "other stuff". With hyperthreading 50% would mean no idle physical cores. Two instances of mfakto on the 7970 averages 91% GPU load. The sieve primes have settled to around 500 on each. Setting the memory clock to 975 instead of 1575 cost me 0.03% in performance and saves me 3% in power usage.
dbaugh is offline   Reply With Quote
Old 2012-07-06, 13:53   #500
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2×5×29 Posts
Default

Quote:
Originally Posted by dbaugh View Post
Thanks for the info. I am at 67% on my CPU running 2x mfakto and "other stuff". With hyperthreading 50% would mean no idle physical cores. Two instances of mfakto on the 7970 averages 91% GPU load. The sieve primes have settled to around 500 on each. Setting the memory clock to 975 instead of 1575 cost me 0.03% in performance and saves me 3% in power usage.
500 is probably too low to be maximally efficient.

I think once you go below 5k you are probably checking way too many non primes on your gfx card.

Try raising that and see how it affects your throughput.
KyleAskine is offline   Reply With Quote
Old 2012-07-07, 21:14   #501
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by dbaugh View Post
Thanks for the info. I am at 67% on my CPU running 2x mfakto and "other stuff". With hyperthreading 50% would mean no idle physical cores. Two instances of mfakto on the 7970 averages 91% GPU load. The sieve primes have settled to around 500 on each. Setting the memory clock to 975 instead of 1575 cost me 0.03% in performance and saves me 3% in power usage.
Quote:
Originally Posted by KyleAskine View Post
500 is probably too low to be maximally efficient.

I think once you go below 5k you are probably checking way too many non primes on your gfx card.

Try raising that and see how it affects your throughput.
Yes, 500 probably makes it rather inefficient. If you're at 67% total CPU load, adding one or two more mfakto instances would probably increase the throughput, even if they shared a physical core. There are a couple of instructions that the hyper-threads of one core can really do in parallel, which can help here quite a bit. On the other hand, they share the L1-cache which is heavily used when sieving. I'll see that I can provide a binary using a little less cache and thus should be better with hyper-threads.
Bdot is offline   Reply With Quote
Old 2012-07-10, 02:37   #502
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Thanks Bdot, for your work on this!

(If only we had CLLucas.......... Just kidding :P)
kracker is offline   Reply With Quote
Old 2012-07-28, 10:52   #503
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

(moving this over from the CUDALucas thread)
Quote:
Originally Posted by kracker View Post
I would like to know how well the 7850 performs, since I have a 7770, (one step down), just curious how much of a performance increase going up to it (not that I'll get it, just curious)
I picked this card because it is the biggest AMD that can live on a 6-pin power-cable, so I don't need to upgrade my PSU. It's factory-OC'ed to 975MHz (from 860), resulting in 257M/s theoretical max throughput. At about the same power draw, my HD5770 delivered ~160M/s.

Selftest OK, GPU temp 60C at 99% load.

Regarding the comparison to the 7770, I've noted these figures:
Code:
BARRETT73_MUL15;  // 165M/s on HD7770, 258M/s on HD7850 (975MHz)
BARRETT79_MUL32;  // 137M/s            212M/s
BARRETT72_MUL24;  // 135M/s            209M/s
_71BIT_MUL24;     // 115M/s            178M/s
BARRETT92_MUL32;  // 106M/s            163M/s
Which is a bit more than 50% speedup. At default clock, that would be 35%, matching the GFLOPS ratio of 1,375.

And finally, the reason why my old 5770 lately ran as hot as 87C.
Attached Thumbnails
Click image for larger version

Name:	P1020464.jpg
Views:	136
Size:	116.5 KB
ID:	8306  
Bdot is offline   Reply With Quote
Old 2012-07-28, 14:15   #504
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Nice... better performance than I thought.
My PSU only has 1 6pin, so... :D
This is something I NEE*zip* I wish i had.

P.S.: Have you tried out getting the dust out of that fan? I had one of my computers in which my cpu fan was full of dust, it was at 75C on full load, dropped to about (right now it is 57C)
kracker is offline   Reply With Quote
Old 2012-07-28, 19:07   #505
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

100111101011102 Posts
Default

Quote:
Originally Posted by kracker View Post
P.S.: Have you tried out getting the dust out of that fan? I had one of my computers in which my cpu fan was full of dust, it was at 75C on full load, dropped to about (right now it is 57C)
Consider filtering the air intake fan(s) for the case, too.

Last fiddled with by kladner on 2012-07-28 at 19:07
kladner is offline   Reply With Quote
Old 2012-07-28, 22:21   #506
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by kladner View Post
Consider filtering the air intake fan(s) for the case, too.
Ahh, that's probably what I should do w mine :)

P.S.: I have a little problem... I usually find one factor a day at the range and depth I'm on now.... but for the last three days, NO FACTOR!! I don't know what's wrong...
...
...
...
...
kracker is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 01:05.


Fri Aug 6 01:05:24 UTC 2021 up 13 days, 19:34, 1 user, load averages: 2.48, 2.42, 2.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.