![]() |
|
|
#12 |
|
May 2011
Orange Park, FL
25·29 Posts |
5000. Two instances of mfaktc on GTX-580 with core-i7 970. No affinities.
|
|
|
|
|
|
#13 |
|
Dec 2011
1216 Posts |
~20,000 running 3 instances of mfaktc on a GTX 580. My CPU is a i5-3570k running at 4100MHz. GPU use is near 100%.
Last fiddled with by Jaxon on 2012-07-21 at 13:06 |
|
|
|
|
|
#14 |
|
Einyen
Dec 2003
Denmark
D7C16 Posts |
Geforce GTX 460 running on a Q9450 (Yorksfield):
1 instance of mfaktc 80M exponent 71-72 bits: SievePrimes auto adjusts to 5000. avg rate 125M/s. GPU load 73-75%. CPU load 25% (1 full core) 2 instances of mfaktc 80M exponent 71-72 bits: SievePrimes auto adjusts to ~ 17000-19000. avg rate 2 x 85M/s. GPU load 95-99%. CPU load 50% (2 cores) Last fiddled with by ATH on 2012-07-21 at 13:41 |
|
|
|
|
|
#15 |
|
Jun 2005
2018 Posts |
As with everyone else, depends on the number of instances of mfaktc.
I have a GTX560ti448 (basically a 570) with an OCd i5-750 1 or 2 instances don't max out the GPU even at sieve primes = 5000. Going from 1 to 2 instances doubles my effective throughput. With 3 instances I can max the GPU so I only get a ~27% speedup, sieve primes are in the 10-12,000 range. Going to 4 instances gives me a ~10% speedup, sieve primes go up to ~3x,xxx (from memory, might not be 100% accurate). For your entertainment, here's how number of candidates is influenced by sieve primes on a really slow GPU. Candidates is in millions, time is run time per class and rate is M candidates / sec. This is from a 49M exponent from 70 to 71. I only ran 1 class per sieve primes value so there's some noise in the timings, but the candidates should be correct. Looks like ~5-6% reduction in candidates every time you double sieve primes, at least in the 5-200K range that mfaktc uses. Code:
Sieve Primes candidates time rate 5,000 643.83 79.133 8.14 5,625 636.49 78.161 8.14 6,328 628.62 77.283 8.13 7,119 621.81 76.271 8.15 7,500 618.14 75.288 8.21 8,008 614.47 75.369 8.15 8,437 611.32 74.453 8.21 9,009 607.65 74.525 8.15 9,491 604.5 73.627 8.21 10,135 600.83 73.66 8.16 10,677 597.69 72.794 8.21 11,401 594.02 72.872 8.15 12,011 590.87 71.963 8.21 12,826 587.2 72.067 8.15 13,512 584.58 71.202 8.21 14,429 580.91 71.366 8.14 15,201 578.29 70.436 8.21 16,232 574.62 70.002 8.21 17,101 572 69.666 8.21 18,261 568.85 69.897 8.14 19,238 566.23 70.685 8.01 20,000 564.13 68.713 8.21 20,543 562.56 68.4 8.22 21,642 559.94 68.201 8.21 22,500 558.37 68.01 8.21 23,110 556.79 67.644 8.23 24,347 554.17 67.5 8.21 25,998 551.03 67.644 8.15 27,339 548.41 66.802 8.21 29,247 545.78 66.935 8.15 30,813 543.16 66.158 8.21 32,902 540.02 66.301 8.14 34,664 537.4 65.458 8.21 37,014 534.77 65.492 8.17 38,997 532.15 64.818 8.21 41,640 529.53 64.805 8.17 43,871 526.91 64.181 8.21 46,845 524.29 66.124 7.93 49,354 522.19 63.611 8.21 50,000 521.14 63.479 8.21 52,700 519.05 63.399 8.19 55,523 516.95 64.691 7.99 56,250 516.42 62.906 8.21 59,287 514.33 63.179 8.14 62,463 512.23 62.4 8.21 63,281 511.18 62.267 8.21 66,697 509.08 62.493 8.15 70,270 506.99 61.759 8.21 71,191 506.46 61.692 8.21 75,000 504.37 61.443 8.21 75,034 504.37 61.935 8.14 79,053 502.27 61.185 8.21 80,089 501.74 61.122 8.21 84,413 499.65 61.352 8.14 88,934 498.07 60.674 8.21 90,100 497.55 60.612 8.21 94,964 495.45 60.851 8.14 100,000 493.36 60.711 8.13 100,050 493.36 60.102 8.21 101,362 492.83 60.043 8.21 106,834 490.73 60.273 8.14 112,500 489.16 59.756 8.19 112,556 488.64 59.528 8.21 114,032 488.64 59.53 8.21 120,188 486.54 59.694 8.15 126,562 484.44 59.048 8.20 126,625 484.44 59.021 8.21 128,286 483.92 58.958 8.21 135,211 482.34 59.352 8.13 142,382 480.25 58.512 8.21 142,453 480.25 58.509 8.21 144,321 479.72 60.202 7.97 152,112 478.15 60.392 7.92 160,179 476.05 57.998 8.21 160,259 476.05 57.998 8.21 162,361 475.53 57.939 8.21 171,126 473.96 58.494 8.10 180,201 472.38 57.555 8.21 180,291 472.38 59.309 7.96 182,656 471.86 57.493 8.21 192,516 469.76 57.5 8.17 200,000 468.71 57.606 8.14 Last fiddled with by kjaget on 2012-07-23 at 15:13 |
|
|
|
|
|
#16 |
|
Nov 2010
Germany
3×199 Posts |
One HD5770 with 3 instances at SP 55,000 fix each consuming a little over 2 cores of an Phenom X4 955 (3.2GHz).
Another HD5770 with 3 instances at auto-adjust, SP ~100,000 with 3 full cores of a Xeon hex-core. As my GPUs deliver only about 150-160M/s I can trade the 3rd core's 2.6 CPU-GHz-days against 10-12 GPU-TF-GHz-days. With just 2 instances SP would settle at ~30,000. An NV Quadro2000, 2x ~95,000 auto-adjust, 2 cores of a Xeon-hex. BTW, I'm on rcv's code to port it to OpenCL. Last fiddled with by Bdot on 2012-07-23 at 20:03 |
|
|
|
|
|
#17 | |
|
Oct 2011
67910 Posts |
Quote:
|
|
|
|
|
|
|
#18 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17×487 Posts |
Thanks for all the data -- I was not expecting quite that much variety!
Just an update as to where I stand. I've built a test program that emulates the work required by a mfaktc GPU sieve. 95% of this code would be usable in a real mfaktc implementation. On my 460, the GPU sieve reaches the point of diminishing returns at roughly 71265 Primes (First Prime Is 13, Last Prime Is 899851). The speed of the GPU sieve is such that it achieves as much throughput as a CPU sieve sieving 14177 primes from (first prime is 13, last prime is 153877). I've got a few tweaks in mind that might improve things a tad. Of course, this is all theory. A real implementation might run into roadblocks that I had not considered. PM me if you'd like a copy of my test code. It is a blending of erato.cu from the "second dumbest cuda" thread and rcv's ideas as well as several ideas of my own. |
|
|
|
|
|
#19 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
37×59 Posts |
I don't run mfaktc since I don't have a Nvidia card... but I have mfakto
Damn, wish the attachment limit file size was just a bit bigger...
|
|
|
|
|
|
#20 | |
|
Aug 2002
2·32·13·37 Posts |
Quote:
|
|
|
|
|
|
|
#21 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
88716 Posts |
|
|
|
|
|
|
#22 |
|
Feb 2012
34·5 Posts |
Also try GIF; it beats JPEG when there are a lot of continuous tones in the picture.
|
|
|
|