mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-07-21, 12:53   #12
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

25·29 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I'm working on GPU sieving and I'd like to know what sieve_primes setting mfaktc most users are using.
5000. Two instances of mfaktc on GTX-580 with core-i7 970. No affinities.
Attached Thumbnails
Click image for larger version

Name:	mfaktc.jpg
Views:	174
Size:	238.1 KB
ID:	8216  
Chuck is offline   Reply With Quote
Old 2012-07-21, 13:06   #13
Jaxon
 
Dec 2011

1216 Posts
Default

~20,000 running 3 instances of mfaktc on a GTX 580. My CPU is a i5-3570k running at 4100MHz. GPU use is near 100%.
Attached Thumbnails
Click image for larger version

Name:	Untitled.jpg
Views:	168
Size:	165.6 KB
ID:	8217  

Last fiddled with by Jaxon on 2012-07-21 at 13:06
Jaxon is offline   Reply With Quote
Old 2012-07-21, 13:39   #14
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

D7C16 Posts
Default

Geforce GTX 460 running on a Q9450 (Yorksfield):

1 instance of mfaktc 80M exponent 71-72 bits:
SievePrimes auto adjusts to 5000. avg rate 125M/s. GPU load 73-75%. CPU load 25% (1 full core)

2 instances of mfaktc 80M exponent 71-72 bits:
SievePrimes auto adjusts to ~ 17000-19000. avg rate 2 x 85M/s. GPU load 95-99%. CPU load 50% (2 cores)

Last fiddled with by ATH on 2012-07-21 at 13:41
ATH is offline   Reply With Quote
Old 2012-07-23, 15:12   #15
kjaget
 
kjaget's Avatar
 
Jun 2005

2018 Posts
Default

As with everyone else, depends on the number of instances of mfaktc.

I have a GTX560ti448 (basically a 570) with an OCd i5-750

1 or 2 instances don't max out the GPU even at sieve primes = 5000. Going from 1 to 2 instances doubles my effective throughput.

With 3 instances I can max the GPU so I only get a ~27% speedup, sieve primes are in the 10-12,000 range.

Going to 4 instances gives me a ~10% speedup, sieve primes go up to ~3x,xxx (from memory, might not be 100% accurate).

For your entertainment, here's how number of candidates is influenced by sieve primes on a really slow GPU. Candidates is in millions, time is run time per class and rate is M candidates / sec. This is from a 49M exponent from 70 to 71. I only ran 1 class per sieve primes value so there's some noise in the timings, but the candidates should be correct. Looks like ~5-6% reduction in candidates every time you double sieve primes, at least in the 5-200K range that mfaktc uses.

Code:
Sieve Primes	candidates	time	rate		
5,000	643.83	79.133	8.14
5,625	636.49	78.161	8.14
6,328	628.62	77.283	8.13
7,119	621.81	76.271	8.15
7,500	618.14	75.288	8.21
8,008	614.47	75.369	8.15
8,437	611.32	74.453	8.21
9,009	607.65	74.525	8.15
9,491	604.5	73.627	8.21
10,135	600.83	73.66	8.16
10,677	597.69	72.794	8.21
11,401	594.02	72.872	8.15
12,011	590.87	71.963	8.21
12,826	587.2	72.067	8.15
13,512	584.58	71.202	8.21
14,429	580.91	71.366	8.14
15,201	578.29	70.436	8.21
16,232	574.62	70.002	8.21
17,101	572	69.666	8.21
18,261	568.85	69.897	8.14
19,238	566.23	70.685	8.01
20,000	564.13	68.713	8.21
20,543	562.56	68.4	8.22
21,642	559.94	68.201	8.21
22,500	558.37	68.01	8.21
23,110	556.79	67.644	8.23
24,347	554.17	67.5	8.21
25,998	551.03	67.644	8.15
27,339	548.41	66.802	8.21
29,247	545.78	66.935	8.15
30,813	543.16	66.158	8.21
32,902	540.02	66.301	8.14
34,664	537.4	65.458	8.21
37,014	534.77	65.492	8.17
38,997	532.15	64.818	8.21
41,640	529.53	64.805	8.17
43,871	526.91	64.181	8.21
46,845	524.29	66.124	7.93
49,354	522.19	63.611	8.21
50,000	521.14	63.479	8.21
52,700	519.05	63.399	8.19
55,523	516.95	64.691	7.99
56,250	516.42	62.906	8.21
59,287	514.33	63.179	8.14
62,463	512.23	62.4	8.21
63,281	511.18	62.267	8.21
66,697	509.08	62.493	8.15
70,270	506.99	61.759	8.21
71,191	506.46	61.692	8.21
75,000	504.37	61.443	8.21
75,034	504.37	61.935	8.14
79,053	502.27	61.185	8.21
80,089	501.74	61.122	8.21
84,413	499.65	61.352	8.14
88,934	498.07	60.674	8.21
90,100	497.55	60.612	8.21
94,964	495.45	60.851	8.14
100,000	493.36	60.711	8.13
100,050	493.36	60.102	8.21
101,362	492.83	60.043	8.21
106,834	490.73	60.273	8.14
112,500	489.16	59.756	8.19
112,556	488.64	59.528	8.21
114,032	488.64	59.53	8.21
120,188	486.54	59.694	8.15
126,562	484.44	59.048	8.20
126,625	484.44	59.021	8.21
128,286	483.92	58.958	8.21
135,211	482.34	59.352	8.13
142,382	480.25	58.512	8.21
142,453	480.25	58.509	8.21
144,321	479.72	60.202	7.97
152,112	478.15	60.392	7.92
160,179	476.05	57.998	8.21
160,259	476.05	57.998	8.21
162,361	475.53	57.939	8.21
171,126	473.96	58.494	8.10
180,201	472.38	57.555	8.21
180,291	472.38	59.309	7.96
182,656	471.86	57.493	8.21
192,516	469.76	57.5	8.17
200,000	468.71	57.606	8.14

Last fiddled with by kjaget on 2012-07-23 at 15:13
kjaget is offline   Reply With Quote
Old 2012-07-23, 20:00   #16
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

One HD5770 with 3 instances at SP 55,000 fix each consuming a little over 2 cores of an Phenom X4 955 (3.2GHz).

Another HD5770 with 3 instances at auto-adjust, SP ~100,000 with 3 full cores of a Xeon hex-core. As my GPUs deliver only about 150-160M/s I can trade the 3rd core's 2.6 CPU-GHz-days against 10-12 GPU-TF-GHz-days. With just 2 instances SP would settle at ~30,000.

An NV Quadro2000, 2x ~95,000 auto-adjust, 2 cores of a Xeon-hex.

BTW, I'm on rcv's code to port it to OpenCL.

Last fiddled with by Bdot on 2012-07-23 at 20:03
Bdot is offline   Reply With Quote
Old 2012-07-25, 01:24   #17
bcp19
 
bcp19's Avatar
 
Oct 2011

67910 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I'm working on GPU sieving and I'd like to know what sieve_primes setting mfaktc most users are using.

I think mfaktc can auto-adjust this value. If so, please report the sieve_primes value mfaktc thinks is optimal.

My goal is to create a GPU sieve that is as fast as the CPU sieve for most users. To do this, the GPU sieve must be fast enough so that it can sieve deeper than the CPU -- where the time saved by reduced TF candidates equals the time spent doing GPU sieving.

This may not be possible, but it is good to have goals!
I have 6 machines TFing using 7 cards and each is set up differently. 2500K/480, 2 instances, SP=5000, will up it to 3 after my 100M LL finishes and probably have SP=20,000-30,000. 2400/480, 3 instances, SP=11000(when it had the 560Ti SP = 33000). Q8200/560 Ti, 3 instances, SP=14000. Q6600/560, 3 instances, SP=9500. Q8200/550 Ti, 3 instances, SP=40000. Phenom II 1055T/460, 3 instances, SP=35000, 1055T/6770, 2 instances, SP=5000.
bcp19 is offline   Reply With Quote
Old 2012-07-25, 02:31   #18
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17×487 Posts
Default

Thanks for all the data -- I was not expecting quite that much variety!

Just an update as to where I stand. I've built a test program that emulates the work required by a mfaktc GPU sieve. 95% of this code would be usable in a real mfaktc implementation.

On my 460, the GPU sieve reaches the point of diminishing returns at roughly 71265 Primes (First Prime Is 13, Last Prime Is 899851). The speed of the GPU sieve is such that it achieves as much throughput as a CPU sieve sieving 14177 primes from (first prime is 13, last prime is 153877).

I've got a few tweaks in mind that might improve things a tad.

Of course, this is all theory. A real implementation might run into roadblocks that I had not considered.


Quote:
Originally Posted by Bdot View Post
BTW, I'm on rcv's code to port it to OpenCL.
PM me if you'd like a copy of my test code. It is a blending of erato.cu from the "second dumbest cuda" thread and rcv's ideas as well as several ideas of my own.
Prime95 is offline   Reply With Quote
Old 2012-07-25, 16:08   #19
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

37×59 Posts
Default

I don't run mfaktc since I don't have a Nvidia card... but I have mfakto
Damn, wish the attachment limit file size was just a bit bigger...
Attached Thumbnails
Click image for larger version

Name:	mfak.jpg
Views:	150
Size:	231.2 KB
ID:	8289  
kracker is offline   Reply With Quote
Old 2012-07-25, 18:54   #20
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

2·32·13·37 Posts
Default

Quote:
Damn, wish the attachment limit file size was just a bit bigger...
Just use a lower compression setting. The attached file at 123kb is very close to your original.
Attached Thumbnails
Click image for larger version

Name:	mfak.jpg
Views:	150
Size:	122.7 KB
ID:	8290  
Xyzzy is offline   Reply With Quote
Old 2012-07-25, 19:07   #21
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

88716 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Just use a lower compression setting. The attached file at 123kb is very close to your original.
Ah I see. Thanks!
kracker is offline   Reply With Quote
Old 2012-07-25, 19:28   #22
TObject
 
TObject's Avatar
 
Feb 2012

34·5 Posts
Default

Also try GIF; it beats JPEG when there are a lot of continuous tones in the picture.
TObject is offline   Reply With Quote
Reply



All times are UTC. The time now is 15:10.


Fri Jul 7 15:10:49 UTC 2023 up 323 days, 12:39, 0 users, load averages: 0.82, 1.04, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔