mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-27, 16:29   #804
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32×241 Posts
Default

Quote:
Originally Posted by Bdot View Post
Did you all send a benchmark run to James?


I wonder if there is someone with a 7970 here...
kracker is offline   Reply With Quote
Old 2013-05-28, 02:36   #805
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

1101010012 Posts
Default

Quote:
Originally Posted by Bdot View Post
Linux bits are there as well now.
Thanks, running the Linux v0.13 on a 5770 and getting about 115 ghz day/day.
lfm is offline   Reply With Quote
Old 2013-05-29, 17:00   #806
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

1000011110012 Posts
Default

I just noticed this, but there is a huge penalty going to gpu sieve on my APU...
Attached Thumbnails
Click image for larger version

Name:	sieve.jpg
Views:	166
Size:	155.7 KB
ID:	9799  
kracker is offline   Reply With Quote
Old 2013-05-29, 19:10   #807
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

234378 Posts
Default

Quote:
Originally Posted by kracker View Post
I just noticed this, but there is a huge penalty going to gpu sieve on my APU...
I believe you need to play with that SieveSize value for the sieve (for mfaktc at least, it has to be a lot increased for GPU sieve, otherwise the sieving is very fast and you end up with a lot of candidates and do a lot of unuseful exponentiations). But confirm this first with a guy who runs mfakto, my experience is limited to mfaktc. Even if the values are right, you still have the CPU free and can compensate with GHz days of P-1.
LaurV is offline   Reply With Quote
Old 2013-05-29, 20:31   #808
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

100100110112 Posts
Default

-st test on my 7950
Quote:
Selftest statistics
number of tests 3092
successful tests 3092

selftest PASSED!
And some stats for the statsjunks ;)

@800mhz (AMD reference clock)
296 GHzdays/day (70-71 bit)
283 GHzdays/day (73-74 bit)

@900mhz (factory clock)
335 GHzdays/day (70-71 bit)
318 GHzdays/day (73-74 bit)
VictordeHolland is offline   Reply With Quote
Old 2013-06-03, 19:08   #809
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by lfm View Post
Thanks, running the Linux v0.13 on a 5770 and getting about 115 ghz day/day.
Quote:
Originally Posted by kracker View Post
I just noticed this, but there is a huge penalty going to gpu sieve on my APU...
Quote:
Originally Posted by LaurV View Post
I believe you need to play with that SieveSize value for the sieve (for mfaktc at least, it has to be a lot increased for GPU sieve, otherwise the sieving is very fast and you end up with a lot of candidates and do a lot of unuseful exponentiations). But confirm this first with a guy who runs mfakto, my experience is limited to mfaktc. Even if the values are right, you still have the CPU free and can compensate with GHz days of P-1.
Quote:
Originally Posted by VictordeHolland View Post
-st test on my 7950
And some stats for the statsjunks ;)

@800mhz (AMD reference clock)
296 GHzdays/day (70-71 bit)
283 GHzdays/day (73-74 bit)

@900mhz (factory clock)
335 GHzdays/day (70-71 bit)
318 GHzdays/day (73-74 bit)
Thanks you all for your tests, and for providing the timings to James.

First of all, it's true, that all VLIW5 and VLIW4 GPUs pay a big penalty for the GPU sieving - much bigger than GCN. Therefore, the sweet spot is with reduced GPUSievePrimes (compared to the default), whereas GCN cards honor an increase. However, I wonder if you really need to go as low as 50k for the APU in order to get the best out of it. I found ~70k for VLIW5 and ~110k for GCN to be optimal. I have just two cards and it may look very different for you.

Second, I noticed quite some sensitivity for the GPUSieveProcessSize, where both my cards run best with 24.

Third, the AMD drivers. Sadly, the latest windows drivers (13.4) make mfakto consume almost one CPU core, even when GPU sieving. When other programs (like prime95) use a lot of CPU, then mfakto's high CPU load goes away, but at the cost of some 10-20% throughput.


The 5770's 115 GHz-days/day are pretty close to what should be possible, as are the 7950's results. Which GPUSieve* values did you use?

The APU (6550D, right?) speed, however, is too far from the expected 42.5 GHz-days/day. Again, which GPUSieve* values did you use?

Generally, most low to mid-end GPUs will have a much better throughput when CPU sieving, if you can spare enough CPU power to sustain SievePrimes of >20k (VLIW5) or >50k (GCN). I'm working on improving the sieve for the vector platforms, but v0.13 is not yet very optimized in this respect.
Bdot is offline   Reply With Quote
Old 2013-06-03, 23:00   #810
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

41718 Posts
Default

Quote:
Originally Posted by Bdot View Post
The APU (6550D, right?) speed, however, is too far from the expected 42.5 GHz-days/day. Again, which GPUSieve* values did you use?

Generally, most low to mid-end GPUs will have a much better throughput when CPU sieving, if you can spare enough CPU power to sustain SievePrimes of >20k (VLIW5) or >50k (GCN). I'm working on improving the sieve for the vector platforms, but v0.13 is not yet very optimized in this respect.
Tweaking them, I can only get up to around ~37 GHz in max.... I switched my display to integrated and I'm getting weird ghosting and artifacts on display... Hmm, maybe...
kracker is offline   Reply With Quote
Old 2013-06-04, 09:20   #811
Axelsson
 
Jul 2012
Sweden

2×3×7 Posts
Default

I found a small bugg....

Code:
got assignment: exp=33732341 bit_min=70 bit_max=71 (7.09 GHz-days)
Starting trial factoring M33732341 from 2^70 to 2^71 (7.09GHz-days)
Using GPU kernel "cl_barrett15_73_gs"
No checkpoint file "M33732341.ckp" found.
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jun 04 08:32 | 1263  27.6% |  7.446   1h26m |     85.68    21813    0.00%
M33732341 has a factor: 1516555032424995693727

found 1 factor for M33732341 from 2^70 to 2^71 (partially tested) [mfakto 0.13-Win cl_barrett15_73_gs_4]
tf(): total time spent: 15m 19.113s (666.39 GHz-days / day)
It feels like a leeetle bit high speed, I'm running at 185-200 GHz-days / day right now.

In other news I have had two reboots with crashed video driver. After reboot the machine starts up without any video output (still possible to log in via remote desktop). The next reboot brings back the screen. I'm writing it off as a thermal problem but curiously both crashes happened when switching assignments, there were no save files left.
It's been a couple of days since last crash and everything is running stable right now. I'll report back if I find out some more.

/Göran
Axelsson is offline   Reply With Quote
Old 2013-06-04, 11:44   #812
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by Axelsson View Post
I found a small bugg....
666.39 GHz-days / day
It feels like a leeetle bit high speed, I'm running at 185-200 GHz-days / day right now.
Yes, please keep trying, I'm sure there are more bugs in the code. This one, however, is intentional.

If primenet accepted the factor as "F" (meaning found by TF), then you probably got credit of more than 8 GHz-days/day, within 15 minutes. Sounds rather like 800 GHz-days/day, but I cannot easily guess how much credit you'll get for a factor, so I calculate it as if you completed the whole bit-level within 15 min (or whatever it took you to find the factor). In my experience that is about the lower limit of what you get as credit from primenet.

Quote:
Originally Posted by Axelsson View Post
In other news I have had two reboots with crashed video driver. After reboot the machine starts up without any video output (still possible to log in via remote desktop). The next reboot brings back the screen. I'm writing it off as a thermal problem but curiously both crashes happened when switching assignments, there were no save files left.
It's been a couple of days since last crash and everything is running stable right now. I'll report back if I find out some more.

/Göran
This sounds a lot like thermal or driver (which version are you on?) issues. mfakto does not do a lot regarding the GPU when switching assignments. No resources are (de-)allocated. Just a small initialization kernel is needed before each new exponent. And of course, the results and worktodo files are updated and ckp's deleted, so your disk has to spin up. During that millisecond your GPU may decide to power down the frequencies, and it has to ramp up again ... if the driver does not handle well that up&down ...

Then you could try to run two instances in parallel - the other one would keep the GPU busy while one saves the results, and otherwise they would evenly split the GPU power. (Evenly meaning the number of kernel invocations, not necessarily the performance.)

Edit: your GPUSievePrimes seems very low ... why?

Last fiddled with by Bdot on 2013-06-04 at 12:41
Bdot is offline   Reply With Quote
Old 2013-06-04, 11:54   #813
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by kracker View Post
Tweaking them, I can only get up to around ~37 GHz in max.... I switched my display to integrated and I'm getting weird ghosting and artifacts on display... Hmm, maybe...
I recently ran mfakto on a "Loveland" (the GPU of the small E-350 APUs). Though VLIW5, it seemed to have trouble assigning enough registers. I had to switch to VectorSize=2 for some kernels. I wanted to disregard this observation as something special to "Loveland" and thus not worth spending much time on it. But maybe it is a more general APU issue?

Please check, if VectorSize=2 or VectorSize=3 helps when GPU sieving. This would not be very fortunate, as it leaves ~25% of the vector units unused, but it may still be better than having to spill registers to global memory ...
Bdot is offline   Reply With Quote
Old 2013-06-15, 06:13   #814
Jayder
 
Jayder's Avatar
 
Dec 2012

4278 Posts
Default

Will the "best fit" GPUSievePrimes value remain constant? Or will it change as you change bit level, exponent, or possibly kernel? For example, if a GPUSievePrimes value of 52000 works best on a 332M exponent going from 2^69 to 2^70, will 52000 also be the best for a 65M exponent and 2^73 to 2^74? What about an 8M exponent and 2^60 to 2^61? Etc.

Are there any good strategies for finding the best value? Other than intelligent trial and error.

I searched for an answer in the mfaktc thread, but it is kind of a massive thread. I also considered experimenting and finding out for myself, but it would take a very long time to find out on my slow GPU. Hopefully I am not missing an answer that is staring me in the face.
Jayder is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2780 2022-08-09 14:36
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3541 2022-04-21 22:37
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 17:05.


Wed Aug 17 17:05:41 UTC 2022 up 41 days, 11:53, 1 user, load averages: 1.05, 1.22, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔