![]() |
[QUOTE=Bdot;385956]OK, that is really interesting because it means that the GCN chips in the R290 are really different from its predecessors. Or in other words, I would need one in order to optimize for it.
Or ... OK. I will post another pre-release version on the weekend that has an enhanced --perftest mode that exactly measures each kernel. Together with a script and a set of ini files (and maybe some alternative kernels files) I should be able to provide an automatic test, if you'd be willing to run such a thing ...[/QUOTE] Also... the low end "older" GCN cards(77xx,78xx etc) have 1/16 DP, the high end ones have 1/4 DP, and the "new" GCN cards (290(X)) have 1/8 DP.... :mike: |
2 questions:
1. how factoring bitlevels below 64 with mfakto? 2. how to LLR-test with ATI gpu? |
[QUOTE=kracker;385967]Also... the low end "older" GCN cards(77xx,78xx etc) have 1/16 DP, the high end ones have 1/4 DP, and the "new" GCN cards (290(X)) have 1/8 DP.... :mike:[/QUOTE]
Indeed. Would be good to see if 1/8 is sufficient to show an improvement over SP. 1/16 is definitely too slow. I'll add that to the tests I'm going to prepare. |
[QUOTE=AK76;386003]2 questions:
1. how factoring bitlevels below 64 with mfakto? 2. how to LLR-test with ATI gpu?[/QUOTE] 1. From 2[sup]60[/sup] upwards you just use it like for the higher levels. For large exponents it may be required to list each bitlevel separately in the worktodo file, otherwise mfakto may decide to combine the bitlevels (e.g. try 2[sup]60[/sup] to 2[sup]63[/sup] in one rush but does not find a kernel that can handle multiple bitlevels at once). For very short tests, setting MoreClasses=0 will result in a good performance improvement. Below 2[sup]60[/sup] you need to switch to CPU sieving because no GPU-sieve-enabled kernel can handle those. MoreClasses=0 is not available for the CPU sieve, therefore mfakto may not be the best choice for such tests. 2. Maybe the [URL="http://mersenneforum.org/showthread.php?t=16142"]GPU LL Testing FAQ[/URL] needs an update to mention clLucas? There's an [URL="http://www.mersenneforum.org/showthread.php?t=18297&page=26"]LL with OpenCL[/URL] thread for it. |
[QUOTE=Bdot;386006]There's an [URL="http://www.mersenneforum.org/showthread.php?t=18297&page=26"]LL with OpenCL[/URL] thread for it.[/QUOTE]
Thx for this info. My first LLR test on Ati took 5h and 30min. M( 64847711 )C, 0xffffffff80000000, n = 131072, clLucas v1.02 |
[QUOTE=AK76;386082]Thx for this info. My first LLR test on Ati took 5h and 30min.
M( 64847711 )C, 0xffffffff80000000, n = 131072, clLucas v1.02[/QUOTE] If you read through that thread, you'll see that the FFT size your run used was much too small, making your result meaningless. A 64M test will not finish in 5 hrs on any card. |
I do not set -f parameter myself. clLucas set this number. I do the same test one more time. My ATI is not overclocked.
|
mfakto-0.15pre4 available
There's a [URL="http://mersenneforum.org/mfakto/mfakto-0.15pre4/mfakto-0.15pre4.zip"]new pre-version[/URL] out on the ftp. It comes with a test script to run various settings and measure the different kernels.
To run this test, you should not need the computer for a while: the full test will take between one and two hours. During this time it would be best to not use the computer at all, if possible. Watching videos, running CPU or GPU hogs will definitely make the test results worthless. To start the test, just run perftestmfakto.cmd After it finished, just zip the testresults folder that it created and post it here or send it to me (email address in the README). The test will notice that some ini files do not contain the "TestSievePrimes" parameter, which is intentional. The test script may need adaptations if you want to use a specific GPU (then please add the appropriate -d switch). The provided ini files may need adaptations if you want to run it on a predecessor of GCN (adapt VectorSize to your usual settings, most likely 4). If you think other variations are worth testing, please go ahead (and let me know :smile:). I successfully tested it on two different CPUs (-d c) - I hope it also works with the IntelHD devices. |
2 Attachment(s)
:smile:
|
1 Attachment(s)
290x
|
Very nice! All seems to work as expected. I'll need a bit more time to go through the results, the first things I noticed:
[LIST][*]290x seems to have major improvements in int32 performance: the 32-bit kernels are now 20% faster than the 15-bit ones (on previous GCN, they are 15% slower). As the current code does not yet honor this, expect a 20% performance boost with the next mfakto version, and no more performance drop at the 73-bit-boundary.[*]290x behaves pretty much like GCN regarding ini file settings: according to these tests, GPUSieveSize=126 and GPUSieveProcessSize=24 should be fastest on this card as well.[*]290x: Measuring the CPU-sieve-based TF kernels only worked for the smallest exponent, the other results are way too low - either something overflowed, some throttling kicked in or my test did not fully utilize the GPU.[*]1/8 DP rate on 290x is not sufficient to give DP calculations an advantage over SP. Therefore, only Tahiti and Malta chips will use DP in mfakto. Has anyone still some HD5870/5850 sitting around? This one would also be a good candidate.[*]HD4600 worked well, delivering 18-19 GHz-days/day for current LL test range.[*]performance dependency to the exponent size is stronger than I expected, e.g. 290x: 975GHz (2M), 770GHz (39M), 739GHz (78M), 684GHz (332M), 616GHz (4200M).[*]I missed to include an m-gs-128-32.ini file. Could you please copy from m-gs-128-16.ini and set GPUSieveProcessSize=32. Then, please run mfakto [-d ..] -i m-gs-128-32.ini --perftest > m-gs-128-32.log I think I know the outcome for HD7770, but HD4600 and R290x would be interesting.[/LIST] |
| All times are UTC. The time now is 23:03. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.