mersenneforum.org mfakto: an OpenCL program for Mersenne prefactoring
 Register FAQ Search Today's Posts Mark Forums Read

2013-05-27, 16:29   #804
kracker

"Mr. Meeseeks"
Jan 2012
California, USA

32×241 Posts

Quote:
 Originally Posted by Bdot Did you all send a benchmark run to James?

I wonder if there is someone with a 7970 here...

2013-05-28, 02:36   #805
lfm

Jul 2006
Calgary

1101010012 Posts

Quote:
 Originally Posted by Bdot Linux bits are there as well now.
Thanks, running the Linux v0.13 on a 5770 and getting about 115 ghz day/day.

 2013-05-29, 17:00 #806 kracker     "Mr. Meeseeks" Jan 2012 California, USA 1000011110012 Posts I just noticed this, but there is a huge penalty going to gpu sieve on my APU... Attached Thumbnails
2013-05-29, 19:10   #807
LaurV
Romulan Interpreter

"name field"
Jun 2011
Thailand

234378 Posts

Quote:
 Originally Posted by kracker I just noticed this, but there is a huge penalty going to gpu sieve on my APU...
I believe you need to play with that SieveSize value for the sieve (for mfaktc at least, it has to be a lot increased for GPU sieve, otherwise the sieving is very fast and you end up with a lot of candidates and do a lot of unuseful exponentiations). But confirm this first with a guy who runs mfakto, my experience is limited to mfaktc. Even if the values are right, you still have the CPU free and can compensate with GHz days of P-1.

2013-05-29, 20:31   #808
VictordeHolland

"Victor de Hollander"
Aug 2011
the Netherlands

100100110112 Posts

-st test on my 7950
Quote:
 Selftest statistics number of tests 3092 successful tests 3092 selftest PASSED!
And some stats for the statsjunks ;)

@800mhz (AMD reference clock)
296 GHzdays/day (70-71 bit)
283 GHzdays/day (73-74 bit)

@900mhz (factory clock)
335 GHzdays/day (70-71 bit)
318 GHzdays/day (73-74 bit)

2013-06-03, 19:08   #809
Bdot

Nov 2010
Germany

3×199 Posts

Quote:
 Originally Posted by lfm Thanks, running the Linux v0.13 on a 5770 and getting about 115 ghz day/day.
Quote:
 Originally Posted by kracker I just noticed this, but there is a huge penalty going to gpu sieve on my APU...
Quote:
 Originally Posted by LaurV I believe you need to play with that SieveSize value for the sieve (for mfaktc at least, it has to be a lot increased for GPU sieve, otherwise the sieving is very fast and you end up with a lot of candidates and do a lot of unuseful exponentiations). But confirm this first with a guy who runs mfakto, my experience is limited to mfaktc. Even if the values are right, you still have the CPU free and can compensate with GHz days of P-1.
Quote:
 Originally Posted by VictordeHolland -st test on my 7950 And some stats for the statsjunks ;) @800mhz (AMD reference clock) 296 GHzdays/day (70-71 bit) 283 GHzdays/day (73-74 bit) @900mhz (factory clock) 335 GHzdays/day (70-71 bit) 318 GHzdays/day (73-74 bit)
Thanks you all for your tests, and for providing the timings to James.

First of all, it's true, that all VLIW5 and VLIW4 GPUs pay a big penalty for the GPU sieving - much bigger than GCN. Therefore, the sweet spot is with reduced GPUSievePrimes (compared to the default), whereas GCN cards honor an increase. However, I wonder if you really need to go as low as 50k for the APU in order to get the best out of it. I found ~70k for VLIW5 and ~110k for GCN to be optimal. I have just two cards and it may look very different for you.

Second, I noticed quite some sensitivity for the GPUSieveProcessSize, where both my cards run best with 24.

Third, the AMD drivers. Sadly, the latest windows drivers (13.4) make mfakto consume almost one CPU core, even when GPU sieving. When other programs (like prime95) use a lot of CPU, then mfakto's high CPU load goes away, but at the cost of some 10-20% throughput.

The 5770's 115 GHz-days/day are pretty close to what should be possible, as are the 7950's results. Which GPUSieve* values did you use?

The APU (6550D, right?) speed, however, is too far from the expected 42.5 GHz-days/day. Again, which GPUSieve* values did you use?

Generally, most low to mid-end GPUs will have a much better throughput when CPU sieving, if you can spare enough CPU power to sustain SievePrimes of >20k (VLIW5) or >50k (GCN). I'm working on improving the sieve for the vector platforms, but v0.13 is not yet very optimized in this respect.

2013-06-03, 23:00   #810
kracker

"Mr. Meeseeks"
Jan 2012
California, USA

41718 Posts

Quote:
 Originally Posted by Bdot The APU (6550D, right?) speed, however, is too far from the expected 42.5 GHz-days/day. Again, which GPUSieve* values did you use? Generally, most low to mid-end GPUs will have a much better throughput when CPU sieving, if you can spare enough CPU power to sustain SievePrimes of >20k (VLIW5) or >50k (GCN). I'm working on improving the sieve for the vector platforms, but v0.13 is not yet very optimized in this respect.
Tweaking them, I can only get up to around ~37 GHz in max.... I switched my display to integrated and I'm getting weird ghosting and artifacts on display... Hmm, maybe...

 2013-06-04, 09:20 #811 Axelsson   Jul 2012 Sweden 2×3×7 Posts I found a small bugg.... Code: got assignment: exp=33732341 bit_min=70 bit_max=71 (7.09 GHz-days) Starting trial factoring M33732341 from 2^70 to 2^71 (7.09GHz-days) Using GPU kernel "cl_barrett15_73_gs" No checkpoint file "M33732341.ckp" found. Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jun 04 08:32 | 1263 27.6% | 7.446 1h26m | 85.68 21813 0.00% M33732341 has a factor: 1516555032424995693727 found 1 factor for M33732341 from 2^70 to 2^71 (partially tested) [mfakto 0.13-Win cl_barrett15_73_gs_4] tf(): total time spent: 15m 19.113s (666.39 GHz-days / day) It feels like a leeetle bit high speed, I'm running at 185-200 GHz-days / day right now. In other news I have had two reboots with crashed video driver. After reboot the machine starts up without any video output (still possible to log in via remote desktop). The next reboot brings back the screen. I'm writing it off as a thermal problem but curiously both crashes happened when switching assignments, there were no save files left. It's been a couple of days since last crash and everything is running stable right now. I'll report back if I find out some more. /Göran
2013-06-04, 11:44   #812
Bdot

Nov 2010
Germany

3·199 Posts

Quote:
 Originally Posted by Axelsson I found a small bugg.... 666.39 GHz-days / day It feels like a leeetle bit high speed, I'm running at 185-200 GHz-days / day right now.
Yes, please keep trying, I'm sure there are more bugs in the code. This one, however, is intentional.

If primenet accepted the factor as "F" (meaning found by TF), then you probably got credit of more than 8 GHz-days/day, within 15 minutes. Sounds rather like 800 GHz-days/day, but I cannot easily guess how much credit you'll get for a factor, so I calculate it as if you completed the whole bit-level within 15 min (or whatever it took you to find the factor). In my experience that is about the lower limit of what you get as credit from primenet.

Quote:
 Originally Posted by Axelsson In other news I have had two reboots with crashed video driver. After reboot the machine starts up without any video output (still possible to log in via remote desktop). The next reboot brings back the screen. I'm writing it off as a thermal problem but curiously both crashes happened when switching assignments, there were no save files left. It's been a couple of days since last crash and everything is running stable right now. I'll report back if I find out some more. /Göran
This sounds a lot like thermal or driver (which version are you on?) issues. mfakto does not do a lot regarding the GPU when switching assignments. No resources are (de-)allocated. Just a small initialization kernel is needed before each new exponent. And of course, the results and worktodo files are updated and ckp's deleted, so your disk has to spin up. During that millisecond your GPU may decide to power down the frequencies, and it has to ramp up again ... if the driver does not handle well that up&down ...

Then you could try to run two instances in parallel - the other one would keep the GPU busy while one saves the results, and otherwise they would evenly split the GPU power. (Evenly meaning the number of kernel invocations, not necessarily the performance.)

Edit: your GPUSievePrimes seems very low ... why?

Last fiddled with by Bdot on 2013-06-04 at 12:41

2013-06-04, 11:54   #813
Bdot

Nov 2010
Germany

3·199 Posts

Quote:
 Originally Posted by kracker Tweaking them, I can only get up to around ~37 GHz in max.... I switched my display to integrated and I'm getting weird ghosting and artifacts on display... Hmm, maybe...
I recently ran mfakto on a "Loveland" (the GPU of the small E-350 APUs). Though VLIW5, it seemed to have trouble assigning enough registers. I had to switch to VectorSize=2 for some kernels. I wanted to disregard this observation as something special to "Loveland" and thus not worth spending much time on it. But maybe it is a more general APU issue?

Please check, if VectorSize=2 or VectorSize=3 helps when GPU sieving. This would not be very fortunate, as it leaves ~25% of the vector units unused, but it may still be better than having to spill registers to global memory ...

 2013-06-15, 06:13 #814 Jayder     Dec 2012 4278 Posts Will the "best fit" GPUSievePrimes value remain constant? Or will it change as you change bit level, exponent, or possibly kernel? For example, if a GPUSievePrimes value of 52000 works best on a 332M exponent going from 2^69 to 2^70, will 52000 also be the best for a 65M exponent and 2^73 to 2^74? What about an 8M exponent and 2^60 to 2^61? Etc. Are there any good strategies for finding the best value? Other than intelligent trial and error. I searched for an answer in the mfaktc thread, but it is kind of a massive thread. I also considered experimenting and finding out for myself, but it would take a very long time to find out on my slow GPU. Hopefully I am not missing an answer that is staring me in the face.

 Similar Threads Thread Thread Starter Forum Replies Last Post preda GpuOwl 2780 2022-08-09 14:36 TheJudger GPU Computing 3541 2022-04-21 22:37 msft GPU Computing 433 2019-06-23 21:11 TObject GPU Computing 2 2013-10-12 21:09 Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 17:05.

Wed Aug 17 17:05:41 UTC 2022 up 41 days, 11:53, 1 user, load averages: 1.05, 1.22, 1.24