![]() |
6970
c:\mfakto-0.13pre5>mfakto-0.13pre5-pi-win64.exe --perftest
Runtime options Inifile mfakto.ini Verbosity 1 SieveOnGPU yes GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none TimeStampInResults yes VectorSize 4 GPUType VLIW4 SmallExp no Select device - Get device info - Compiling kernels. Perftest Generate list of the first 10^6 primes: 6913.06 ms 1. Sieve-Init (once per class, 960 times per test, avg. for 10 iterations) Init_class(sieveprimes= 5000): 1.40 ms Init_class(sieveprimes= 20000): 6.32 ms Init_class(sieveprimes= 80000): 28.83 ms Init_class(sieveprimes= 200000): 78.58 ms Init_class(sieveprimes= 500000): 213.20 ms Init_class(sieveprimes=1000000): 451.98 ms 2. Sieve (M/s) Sieve size is fixed at compile time, cannot test with variable sizes. Just runni ng 3 fixed tests. SievePrimes: 256 396 611 945 1460 2257 3487 5389 8328 12871 19890 30738 47503 73411 113449 175323 270944 418716 64 7083 1000000 SieveSizeLimit 24 kiB 264.8 241.2 220.6 202.2 184.8 168.9 155.0 141.8 1 28.8 116.1 105.0 90.4 75.2 61.7 50.4 40.5 31.8 24.1 17.1 10.0 24 kiB 264.1 241.2 220.7 202.1 184.8 169.4 155.3 142.2 1 28.6 116.1 104.5 90.6 75.3 61.7 50.4 40.4 31.6 23.8 16.2 10.1 24 kiB 263.1 240.3 207.6 200.7 183.8 169.2 154.9 141.9 1 28.8 115.9 104.7 90.5 75.0 61.7 50.5 40.5 31.8 24.0 17.2 10.0 Best SieveSizeLimit for SievePrimes: 256 396 611 945 1460 2257 3487 5389 8328 12871 19890 30738 47503 73411 113449 175323 270944 418716 64 7083 1000000 at kiB: 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 max M/s: 264.8 241.2 220.7 202.2 184.8 169.4 155.3 142.2 1 28.8 116.1 105.0 90.6 75.3 61.7 50.5 40.5 31.8 24.1 17.2 10.1 Survivors: 36.36% 34.06% 32.05% 30.28% 28.69% 27.27% 26.00% 24.84% 23 .79% 22.82% 21.94% 21.12% 20.36% 19.67% 19.01% 18.40% 17.82% 17.29% 16 .80% 16.32% 3. Memory copy to GPU (blocks of 8388608 bytes) Standard copy, standard queue: 800 MB in 244.5 ms (3430.4 MB/s) (real) Standard copy, profiled queue: 800 MB in 244.4 ms (3432.1 MB/s) (real) 800 MB in 0.0 ms (103409861.9 MB/s) (profiled data) 8 MB in 0.0 ms ( 1.$ MB/s) (profiled data, peak) Standard copy, two queues: 800 MB in 194.5 ms (4312.0 MB/s) (real) 4. mfakto_cl_63 kernel soon 5. mfakto_cl_71 kernel soon 6. barrett_79 kernel soon 7. barrett_92 kernel soon c:\mfakto-0.13pre5> |
6970
I see that the output is very different from MfaktC, I guess that will get cleaned up, and mabye get similar kind of output ?
CalcBitToClear 82688 primes: 250 us (330.752 M/s) sieve using 262144 threads: 10.34 ms (25.3524 M/s), 6490.22 M FCs/s sieved TF using 1048576 threads: 48.4379 ms (21.6478 M/s), 1385.46 M FCs/s TF'd (incl. sieving) CalcBitToClear 82688 primes: 260.445 us (317.487 M/s) sieve using 262144 threads: 10.3958 ms (25.2164 M/s), 6455.4 M FCs/s sieved TF using 1048576 threads: 48.4279 ms (21.6523 M/s), 1385.75 M FCs/s TF'd (incl. sieving) CalcBitToClear 82688 primes: 240.778 us (343.42 M/s) sieve using 262144 threads: 10.4173 ms (25.1642 M/s), 6442.04 M FCs/s sieved TF using 1048576 threads: 48.4308 ms (21.651 M/s), 1385.67 M FCs/s TF'd (incl. s ieving) CalcBitToClear 82688 primes: 266.556 us (310.209 M/s) sieve using 262144 threads: 10.2207 ms (25.6484 M/s), 6566 M FCs/s sieved TF using 1048576 threads: 48.4203 ms (21.6557 M/s), 1385.96 M FCs/s TF'd (incl. sieving) CalcBitToClear 82688 primes: 268.556 us (307.899 M/s) sieve using 262144 threads: 10.3352 ms (25.3641 M/s), 6493.22 M FCs/s sieved TF using 1048576 threads: 48.4021 ms (21.6638 M/s), 1386.49 M FCs/s TF'd (incl. sieving) CalcBitToClear 82688 primes: 289.445 us (285.678 M/s) sieve using 262144 threads: 10.2964 ms (25.4597 M/s), 6517.67 M FCs/s sieved TF using 1048576 threads: 48.4094 ms (21.6606 M/s), 1386.28 M FCs/s TF'd (incl. sieving) CalcBitToClear 82688 primes: 273 us (302.886 M/s) sieve using 262144 threads: 10.4549 ms (25.0738 M/s), 6418.9 M FCs/s sieved TF using 1048576 threads: 48.4148 ms (21.6582 M/s), 1386.12 M FCs/s TF'd (incl. sieving) Using Factor=218687F2FF894FA83B3425A0F89061D5,77115127,70,71 |
6970
1 Attachment(s)
-st run, log attached, selftest passed.
I will start 500 tests now, let it run out, and then double check those 500 with the MfaktC on 2 titans afterwards, then send you the result. Great work on new version. it seems to have a huge throughput on the AMD cards.. |
They should be same. What Bdot wants I think, is -st2.
|
[QUOTE=Manpowre;340869]-st run, log attached, selftest passed.
I will start 500 tests now, let it run out, and then double check those 500 with the MfaktC on 2 titans afterwards, then send you the result. Great work on new version. it seems to have a huge throughput on the AMD cards..[/QUOTE] Thanks for your tests. I hope, you're not using the -pi- version for the 500 tests - this version has the [B]P[/B]erformance[B]I[/B]nfo DEBUG option enabled which considerably slows down overall processing, but allows for exact measurement of the kernel runtime. That is also the reason why the output looks so different to mfaktc. Best is to use this binary with CPU-sieving (SieveOnGPU=0) and a short test (-st). I use this output to see which of the kernels runs at which speed on this particular hardware. Try the binary without -pi- in its name, and the output should look familiar. And yes, throughput on high-end cards greatly benefits from GPU sieving. It should not be too long until I finished my stuff to release 0.13. Then it would be good if every user sent the output of one of the runs to James to allow more accurate updates of [URL="http://www.mersenne.ca/mfaktc.php"]this page[/URL]. Maybe I put this as a requirement into the license :smile: |
[QUOTE=Bdot;340900]Thanks for your tests. I hope, you're not using the -pi- version for the 500 tests - this version has the [B]P[/B]erformance[B]I[/B]nfo DEBUG option enabled which considerably slows down overall processing, but allows for exact measurement of the kernel runtime. That is also the reason why the output looks so different to mfaktc. Best is to use this binary with CPU-sieving (SieveOnGPU=0) and a short test (-st). I use this output to see which of the kernels runs at which speed on this particular hardware.
Try the binary without -pi- in its name, and the output should look familiar. And yes, throughput on high-end cards greatly benefits from GPU sieving. It should not be too long until I finished my stuff to release 0.13. Then it would be good if every user sent the output of one of the runs to James to allow more accurate updates of [URL="http://www.mersenne.ca/mfaktc.php"]this page[/URL]. Maybe I put this as a requirement into the license :smile:[/QUOTE] I did run the 05 pi yes.mfakto-0.13pre5-pi-win64. (not the 04 a in the report logfile, I copied the command from an earlier post hehe) -st - passed -st2 - passed - gave 201mb output into the logfile hehe.. but it passed all tests. and it took many hours.. around 10 hours. I started it a few hours later than the first -st test I reported here and it finished some time ago.. It would be great with a timestamp at beginning of test, and a timestamp at end of test both -st and -st2. and calculate total runtime for it as it seems this software is really using the card to its full. Great job bdot.. I hope I can improve cudalucas during summer to the same extent, however,, its going to take time :) Im really impressed here. |
[QUOTE=Bdot;340900]
Try the binary without -pi- in its name, and the output should look familiar. [/QUOTE] ahh, thats why the ati card doesnt clock up when I run the -pi- hehe.. ok.. Ill run the 500 tests without the -pi-.. and then double check them with the titans.. thanks.. |
HD7790 results
[url]https://www.box.com/s/pmxo4x26k5g2lcry46mk[/url]
st+st2+perftest :smile: |
[QUOTE=Cruelty;340982][url]https://www.box.com/s/pmxo4x26k5g2lcry46mk[/url]
st+st2+perftest :smile:[/QUOTE] Interesting. I was considering that instead of the 7770, how many GHZ/days do you get on it? |
[QUOTE=kracker;340983]Interesting. I was considering that instead of the 7770, how many GHZ/days do you get on it?[/QUOTE]
Actually I did it out of curiosity - let me know which tests would you like me to perform and I'll do it in the late evening (CET). |
[QUOTE=Cruelty;340982][URL]https://www.box.com/s/pmxo4x26k5g2lcry46mk[/URL]
st+st2+perftest :smile:[/QUOTE] Thanks a lot for these tests! Seems to be a pretty good card, this HD7790. I've added "Bonaire" to the list of known GPU's - not sure how I missed it when I added the latest models ... But this really has been the last change for the 0.13 release! [QUOTE=kracker;340983]Interesting. I was considering that instead of the 7770, how many GHZ/days do you get on it?[/QUOTE] At 1200MHz (reference default is 1000MHz), this card has about 148% of your 7770@1100MHz (def 1000), or 98% of my 7850@1050MHz (def 860). This is within 5% of the theoretical #-of-cores x clockspeed comparison. |
| All times are UTC. The time now is 23:09. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.