mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2014-11-05, 18:48   #1255
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by kracker View Post
Well... this is depressing. I reinstalled the driver about three times now for my new 285, the driver crashes while running mfakto's test...

EDIT: Running the "regular" 0.14 mfakto on regular work doesn't crash... Weird.
UPDATE: It seems that only some of the tests in pre5 crash, which are:
Code:
m-gs-fulltest
m-gs-96-24
m-gs-128-16
m-gs-128-32
All the other ones works fine from what I can see now.
kracker is offline   Reply With Quote
Old 2014-11-05, 21:14   #1256
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by NickOfTime View Post
260x
OK, it turns out that my guess was wrong - bonaire still has the slow (4-cycle) int32 multiplications. I reverted the code base accordingly. Thanks for this test.

It's sad that AMD's versioning of GCN 1.0/1.1/1.2 does not seem to mean anything at all (at least nothing that I could use).
Bdot is offline   Reply With Quote
Old 2014-11-05, 21:29   #1257
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by NickOfTime View Post
Hmm, I tried running it concurrently on my 290x's in the 70M 74bit range and I start seeing fluxations in Ghz/d , GPU core clock and GPU utilization resulting what looks like overall throughput of 900 ghz/d instead of the potential 1450 ghz/d...

On 14 I am have 100% utilization but the overall throughput drops by 25ghz for both cards (525ghz to 500ghz) so 1000ghz/d of potential 1050ghz...

Are we hitting a memory / bus bandwidth? Asus z87-ws I7-4770S 8G DDR3-1866
Before I saw the test results of your x16 vs. x8 PCIex cards I would have said that the bus etc. does not have any influence on mfakto. But seeing the x16 card consistently a few percent ahead of the x8 counterpart suggests it does make a difference.

Do I understand it right that each instance, when running alone would give ~725GHz, but when starting the other instance the speed drops to ~450GHz per card?

In this case I'd say this is AMD's Powertune technology in action. Maybe you can try to use Catalyst Control Center and set the power target to some percent higher and watch if the GHz-d/d output increases accordingly? But careful, if you do that over a longer period of time: the additional heat generation can be significant. I don't have a good explanation why the speed would drop below the 0.14 level, though. Maybe the 15-bit kernels have a bigger share of simple instructions that do not generate so much heat?
Edit: which settings did you use for this parallel TF test? Your files suggest you should be using something similar to m-gs-128-32.ini or m-gs-fulltest.ini for maximum performance.

Last fiddled with by Bdot on 2014-11-05 at 21:47
Bdot is offline   Reply With Quote
Old 2014-11-05, 21:35   #1258
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by kracker View Post
UPDATE: It seems that only some of the tests in pre5 crash, which are:
Code:
m-gs-fulltest
m-gs-96-24
m-gs-128-16
m-gs-128-32
All the other ones works fine from what I can see now.
These are the tests that use a bigger share of the GPU memory. Maybe downclock GPU memory a bit?

To ease the perftest, you can add a number on the mfakto command line for these tests:

mfakto -i m-gs-128-32.ini --perftest 2 > testresults/m-gs-128-32.log

This number gives something like the number of iterations for each test, default is 10 so that 2 would be ~5 times faster.

Edit: Does the driver survive running a TF test using m-gs-128-32.ini?

Last fiddled with by Bdot on 2014-11-05 at 21:45
Bdot is offline   Reply With Quote
Old 2014-11-05, 21:56   #1259
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Bdot View Post
Edit: Does the driver survive running a TF test using m-gs-128-32.ini?
I'll try that. Little cosmetic bug?
Attached Thumbnails
Click image for larger version

Name:	erop.png
Views:	133
Size:	36.5 KB
ID:	11941  
kracker is offline   Reply With Quote
Old 2014-11-05, 22:11   #1260
NickOfTime
 
Apr 2014

2×3×7 Posts
Default

Quote:
Originally Posted by Bdot View Post
Before I saw the test results of your x16 vs. x8 PCIex cards I would have said that the bus etc. does not have any influence on mfakto. But seeing the x16 card consistently a few percent ahead of the x8 counterpart suggests it does make a difference.

Do I understand it right that each instance, when running alone would give ~725GHz, but when starting the other instance the speed drops to ~450GHz per card?

In this case I'd say this is AMD's Powertune technology in action. Maybe you can try to use Catalyst Control Center and set the power target to some percent higher and watch if the GHz-d/d output increases accordingly? But careful, if you do that over a longer period of time: the additional heat generation can be significant. I don't have a good explanation why the speed would drop below the 0.14 level, though. Maybe the 15-bit kernels have a bigger share of simple instructions that do not generate so much heat?
Edit: which settings did you use for this parallel TF test? Your files suggest you should be using something similar to m-gs-128-32.ini or m-gs-fulltest.ini for maximum performance.
Initially I started with GCN3 128/32 and then 96/24 and 64/16 and then I changed GPUType to GCN and tried again... From looking at GPUZ and seeing that GPU util was 100% and then sometimes zero.. It would seem that the GPU kernel was finished and it was waiting for a Write or Read Buffer from main memory...
NickOfTime is offline   Reply With Quote
Old 2014-11-05, 22:35   #1261
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

Quote:
Originally Posted by kracker View Post
Little cosmetic bug?
Oh you guys with these tiny windows
But feel free to truncate the status lines ...
Bdot is offline   Reply With Quote
Old 2014-11-05, 22:44   #1262
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by NickOfTime View Post
Initially I started with GCN3 128/32 and then 96/24 and 64/16 and then I changed GPUType to GCN and tried again... From looking at GPUZ and seeing that GPU util was 100% and then sometimes zero.. It would seem that the GPU kernel was finished and it was waiting for a Write or Read Buffer from main memory...
And ... what were the results? Did all these ini-values result in (about) the same? Does a single mfakto instance give ~725GHz?

I noticed on my system that the GPU throughput was lower when I allow the CPU to go idle and spin down. Keeping one prime95 thread running gives best results for me ... Probably the GPU results are served faster in this mode.

Finally, another test would be to run 2 mfakto instances per GPU: between the kernel invocations there is always a bit delay because the CPU has to set up the next one. On a faster card, these gaps may play a bigger role. So you could check if 4 instances give you a total of ~1400GHz ... If not, then check the power thing.
Bdot is offline   Reply With Quote
Old 2014-11-05, 22:56   #1263
Stef42
 
Feb 2012
the Netherlands

2×29 Posts
Default

Here are results of the 280X, the CPU is a Pentium S775 dual-core (4GB ram).
Attached Files
File Type: zip testresults_280x.zip (55.6 KB, 78 views)
Stef42 is offline   Reply With Quote
Old 2014-11-06, 00:15   #1264
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Bdot View Post
Oh you guys with these tiny windows
But feel free to truncate the status lines ...
That's the max width I have.
Also, I ran "regular" TF with same ini for a hour, no crashes. Running --perftest crashes, "2" works...

Last fiddled with by kracker on 2014-11-06 at 00:16
kracker is offline   Reply With Quote
Old 2014-11-06, 22:01   #1265
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

Quote:
Originally Posted by Stef42 View Post
Here are results of the 280X, the CPU is a Pentium S775 dual-core (4GB ram).
Thank you. These results also show signs of reaching the power limit. In the GCN2 test, cl_barrett15_82 is even faster than cl_barrett15_73/74 - this can only happen because cl_barrett15_82 was run before the others and was executed at higher clock speed.

Please also check out the Catalyst Control Center's Overclocking department (Graphics Overdrive). The safe method is to downclock the core, say from 1070MHz (it that your clock speed?) to 1000MHz, and see what single tests (like "mfakto -i m-cpu-GCN2.ini --perftest" return. If the results are almost unchanged, then you had reached the power limit. If they drop by ~6%, then you had not. The other way to test it would by to increase the power target to +10% and see if the results are better.

In case it is not the power target but the GPU temperature, you can try to manually set the fan speed to 80% and check if that yields an improvement. If so, then probably the cooler is not correctly seated (or poor thermal paste does not transfer the heat quickly enough).

Another indication of throttling are the fulltest's GPU sieve results. The very first test with the smallest sieve comes out fastest. Normally, way bigger sieve sizes are needed for best performance. It's just that the first test ran at full speed for a longer time.

On the other hand, by simple scaling of my HD7950's clock and number of compute elements, you should come out 11% ahead of me. In the GCN and GCN2 tests, this can be observed for the more difficult tasks, like 4000M. For the 2M TF, my card is 20% ahead. It almost looks like your card refuses to go faster than 500GHz-days/day - no matter which kernel. For 2M, they all have the same speed!
Bdot is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 17:24.


Mon Aug 2 17:24:36 UTC 2021 up 10 days, 11:53, 0 users, load averages: 2.33, 2.26, 2.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.