mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2014-10-24, 14:30   #1222
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

87816 Posts
Default

Quote:
Originally Posted by Bdot View Post
OK, that is really interesting because it means that the GCN chips in the R290 are really different from its predecessors. Or in other words, I would need one in order to optimize for it.

Or ... OK. I will post another pre-release version on the weekend that has an enhanced --perftest mode that exactly measures each kernel. Together with a script and a set of ini files (and maybe some alternative kernels files) I should be able to provide an automatic test, if you'd be willing to run such a thing ...
Also... the low end "older" GCN cards(77xx,78xx etc) have 1/16 DP, the high end ones have 1/4 DP, and the "new" GCN cards (290(X)) have 1/8 DP....
kracker is offline   Reply With Quote
Old 2014-10-24, 18:49   #1223
AK76
 
Sep 2014

19 Posts
Default

2 questions:

1. how factoring bitlevels below 64 with mfakto?

2. how to LLR-test with ATI gpu?
AK76 is offline   Reply With Quote
Old 2014-10-24, 19:14   #1224
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by kracker View Post
Also... the low end "older" GCN cards(77xx,78xx etc) have 1/16 DP, the high end ones have 1/4 DP, and the "new" GCN cards (290(X)) have 1/8 DP....
Indeed. Would be good to see if 1/8 is sufficient to show an improvement over SP. 1/16 is definitely too slow. I'll add that to the tests I'm going to prepare.
Bdot is offline   Reply With Quote
Old 2014-10-24, 19:31   #1225
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by AK76 View Post
2 questions:

1. how factoring bitlevels below 64 with mfakto?

2. how to LLR-test with ATI gpu?
1. From 260 upwards you just use it like for the higher levels. For large exponents it may be required to list each bitlevel separately in the worktodo file, otherwise mfakto may decide to combine the bitlevels (e.g. try 260 to 263 in one rush but does not find a kernel that can handle multiple bitlevels at once). For very short tests, setting MoreClasses=0 will result in a good performance improvement.
Below 260 you need to switch to CPU sieving because no GPU-sieve-enabled kernel can handle those. MoreClasses=0 is not available for the CPU sieve, therefore mfakto may not be the best choice for such tests.

2. Maybe the GPU LL Testing FAQ needs an update to mention clLucas? There's an LL with OpenCL thread for it.
Bdot is offline   Reply With Quote
Old 2014-10-25, 20:44   #1226
AK76
 
Sep 2014

100112 Posts
Default

Quote:
Originally Posted by Bdot View Post
There's an LL with OpenCL thread for it.
Thx for this info. My first LLR test on Ati took 5h and 30min.

M( 64847711 )C, 0xffffffff80000000, n = 131072, clLucas v1.02
AK76 is offline   Reply With Quote
Old 2014-10-26, 03:24   #1227
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

10011000001002 Posts
Default

Quote:
Originally Posted by AK76 View Post
Thx for this info. My first LLR test on Ati took 5h and 30min.

M( 64847711 )C, 0xffffffff80000000, n = 131072, clLucas v1.02
If you read through that thread, you'll see that the FFT size your run used was much too small, making your result meaningless. A 64M test will not finish in 5 hrs on any card.
VBCurtis is offline   Reply With Quote
Old 2014-10-26, 10:17   #1228
AK76
 
Sep 2014

100112 Posts
Default

I do not set -f parameter myself. clLucas set this number. I do the same test one more time. My ATI is not overclocked.
AK76 is offline   Reply With Quote
Old 2014-10-26, 17:35   #1229
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default mfakto-0.15pre4 available

There's a new pre-version out on the ftp. It comes with a test script to run various settings and measure the different kernels.

To run this test, you should not need the computer for a while: the full test will take between one and two hours. During this time it would be best to not use the computer at all, if possible. Watching videos, running CPU or GPU hogs will definitely make the test results worthless.

To start the test, just run

perftestmfakto.cmd

After it finished, just zip the testresults folder that it created and post it here or send it to me (email address in the README).

The test will notice that some ini files do not contain the "TestSievePrimes" parameter, which is intentional.
The test script may need adaptations if you want to use a specific GPU (then please add the appropriate -d switch).
The provided ini files may need adaptations if you want to run it on a predecessor of GCN (adapt VectorSize to your usual settings, most likely 4).
If you think other variations are worth testing, please go ahead (and let me know ).
I successfully tested it on two different CPUs (-d c) - I hope it also works with the IntelHD devices.

Last fiddled with by Bdot on 2014-10-26 at 17:36 Reason: wording
Bdot is offline   Reply With Quote
Old 2014-10-27, 14:42   #1230
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

216810 Posts
Default

Attached Files
File Type: zip testresults_RadeonHD_7770.zip (39.8 KB, 168 views)
File Type: zip testresults_IntelHD4600.zip (31.5 KB, 69 views)
kracker is offline   Reply With Quote
Old 2014-10-27, 20:51   #1231
NickOfTime
 
Apr 2014

2·3·7 Posts
Default

290x
Attached Files
File Type: 7z testresults 290x.7z (20.4 KB, 86 views)
NickOfTime is offline   Reply With Quote
Old 2014-10-27, 22:28   #1232
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

11258 Posts
Default

Very nice! All seems to work as expected. I'll need a bit more time to go through the results, the first things I noticed:
  • 290x seems to have major improvements in int32 performance: the 32-bit kernels are now 20% faster than the 15-bit ones (on previous GCN, they are 15% slower). As the current code does not yet honor this, expect a 20% performance boost with the next mfakto version, and no more performance drop at the 73-bit-boundary.
  • 290x behaves pretty much like GCN regarding ini file settings: according to these tests, GPUSieveSize=126 and GPUSieveProcessSize=24 should be fastest on this card as well.
  • 290x: Measuring the CPU-sieve-based TF kernels only worked for the smallest exponent, the other results are way too low - either something overflowed, some throttling kicked in or my test did not fully utilize the GPU.
  • 1/8 DP rate on 290x is not sufficient to give DP calculations an advantage over SP. Therefore, only Tahiti and Malta chips will use DP in mfakto. Has anyone still some HD5870/5850 sitting around? This one would also be a good candidate.
  • HD4600 worked well, delivering 18-19 GHz-days/day for current LL test range.
  • performance dependency to the exponent size is stronger than I expected, e.g. 290x: 975GHz (2M), 770GHz (39M), 739GHz (78M), 684GHz (332M), 616GHz (4200M).
  • I missed to include an m-gs-128-32.ini file. Could you please copy from m-gs-128-16.ini and set GPUSieveProcessSize=32. Then, please run mfakto [-d ..] -i m-gs-128-32.ini --perftest > m-gs-128-32.log
    I think I know the outcome for HD7770, but HD4600 and R290x would be interesting.
Bdot is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 08:18.


Mon Aug 2 08:18:26 UTC 2021 up 10 days, 2:47, 0 users, load averages: 2.84, 2.16, 1.76

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.