mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

kracker 2013-05-27 16:29

[QUOTE=Bdot;341660]
Did you all send a benchmark run to James?[/QUOTE]

:tu:

[SIZE="1"]I wonder if there is someone with a 7970 here...
[/SIZE]

lfm 2013-05-28 02:36

[QUOTE=Bdot;341086]Linux bits are there as well now.
[/QUOTE]

Thanks, running the Linux v0.13 on a 5770 and getting about 115 ghz day/day.

kracker 2013-05-29 17:00

1 Attachment(s)
I just noticed this, but there is a huge penalty going to gpu sieve on my APU...

LaurV 2013-05-29 19:10

[QUOTE=kracker;341892]I just noticed this, but there is a huge penalty going to gpu sieve on my APU...[/QUOTE]
I believe you need to play with that SieveSize value for the sieve (for mfaktc at least, it has to be a lot increased for GPU sieve, otherwise the sieving is very fast and you end up with a lot of candidates and do a lot of unuseful exponentiations). But confirm this first with a guy who runs mfakto, my experience is limited to mfaktc. Even if the values are right, you still have the CPU free and can compensate with GHz days of P-1.

VictordeHolland 2013-05-29 20:31

-st test on my 7950
[quote]
Selftest statistics
number of tests 3092
successful tests 3092

selftest PASSED!
[/quote]And some stats for the statsjunks ;)

@800mhz (AMD reference clock)
296 GHzdays/day (70-71 bit)
283 GHzdays/day (73-74 bit)

@900mhz (factory clock)
335 GHzdays/day (70-71 bit)
318 GHzdays/day (73-74 bit)

Bdot 2013-06-03 19:08

[QUOTE=lfm;341735]Thanks, running the Linux v0.13 on a 5770 and getting about 115 ghz day/day.[/QUOTE]

[QUOTE=kracker;341892]I just noticed this, but there is a huge penalty going to gpu sieve on my APU...[/QUOTE]

[QUOTE=LaurV;341903]I believe you need to play with that SieveSize value for the sieve (for mfaktc at least, it has to be a lot increased for GPU sieve, otherwise the sieving is very fast and you end up with a lot of candidates and do a lot of unuseful exponentiations). But confirm this first with a guy who runs mfakto, my experience is limited to mfaktc. Even if the values are right, you still have the CPU free and can compensate with GHz days of P-1.[/QUOTE]

[QUOTE=VictordeHolland;341911]-st test on my 7950
And some stats for the statsjunks ;)

@800mhz (AMD reference clock)
296 GHzdays/day (70-71 bit)
283 GHzdays/day (73-74 bit)

@900mhz (factory clock)
335 GHzdays/day (70-71 bit)
318 GHzdays/day (73-74 bit)[/QUOTE]

Thanks you all for your tests, and for providing the timings to James.

First of all, it's true, that all VLIW5 and VLIW4 GPUs pay a big penalty for the GPU sieving - much bigger than GCN. Therefore, the sweet spot is with reduced GPUSievePrimes (compared to the default), whereas GCN cards honor an increase. However, I wonder if you really need to go as low as 50k for the APU in order to get the best out of it. I found ~70k for VLIW5 and ~110k for GCN to be optimal. I have just two cards and it may look very different for you.

Second, I noticed quite some sensitivity for the GPUSieveProcessSize, where both my cards run best with 24.

Third, the AMD drivers. Sadly, the latest windows drivers (13.4) make mfakto consume almost one CPU core, even when GPU sieving. When other programs (like prime95) use a lot of CPU, then mfakto's high CPU load goes away, but at the cost of some 10-20% throughput.


The 5770's 115 GHz-days/day are pretty close to what should be possible, as are the 7950's results. Which GPUSieve* values did you use?

The APU (6550D, right?) speed, however, is too far from the expected 42.5 GHz-days/day. Again, which GPUSieve* values did you use?

Generally, most low to mid-end GPUs will have a much better throughput when CPU sieving, if you can spare enough CPU power to sustain SievePrimes of >20k (VLIW5) or >50k (GCN). I'm working on improving the sieve for the vector platforms, but v0.13 is not yet very optimized in this respect.

kracker 2013-06-03 23:00

[QUOTE=Bdot;342426]
The APU (6550D, right?) speed, however, is too far from the expected 42.5 GHz-days/day. Again, which GPUSieve* values did you use?

Generally, most low to mid-end GPUs will have a much better throughput when CPU sieving, if you can spare enough CPU power to sustain SievePrimes of >20k (VLIW5) or >50k (GCN). I'm working on improving the sieve for the vector platforms, but v0.13 is not yet very optimized in this respect.[/QUOTE]

Tweaking them, I can only get up to around ~37 GHz in max.... I switched my display to integrated and I'm getting weird ghosting and artifacts on display... Hmm, maybe...

Axelsson 2013-06-04 09:20

I found a small bugg.... :cool:

[CODE]
got assignment: exp=33732341 bit_min=70 bit_max=71 (7.09 GHz-days)
Starting trial factoring M33732341 from 2^70 to 2^71 (7.09GHz-days)
Using GPU kernel "cl_barrett15_73_gs"
No checkpoint file "M33732341.ckp" found.
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jun 04 08:32 | 1263 27.6% | 7.446 1h26m | 85.68 21813 0.00%
M33732341 has a factor: 1516555032424995693727

found 1 factor for M33732341 from 2^70 to 2^71 (partially tested) [mfakto 0.13-Win cl_barrett15_73_gs_4]
tf(): total time spent: 15m 19.113s ([COLOR=Red]666.39 GHz-days / day[/COLOR])
[/CODE]It feels like a leeetle bit high speed, I'm running at 185-200 GHz-days / day right now.

In other news I have had two reboots with crashed video driver. After reboot the machine starts up without any video output (still possible to log in via remote desktop). The next reboot brings back the screen. I'm writing it off as a thermal problem but curiously both crashes happened when switching assignments, there were no save files left.
It's been a couple of days since last crash and everything is running stable right now. I'll report back if I find out some more.

/Göran

Bdot 2013-06-04 11:44

[QUOTE=Axelsson;342456]I found a small bugg.... :cool:
[COLOR=Red]666.39 GHz-days / day[/COLOR]
It feels like a leeetle bit high speed, I'm running at 185-200 GHz-days / day right now.
[/QUOTE]
Yes, please keep trying, I'm sure there are more bugs in the code. This one, however, is intentional.

If primenet accepted the factor as "F" (meaning found by TF), then you probably got credit of more than 8 GHz-days/day, within 15 minutes. Sounds rather like 800 GHz-days/day, but I cannot easily guess how much credit you'll get for a factor, so I calculate it as if you completed the whole bit-level within 15 min (or whatever it took you to find the factor). In my experience that is about the lower limit of what you get as credit from primenet.

[QUOTE=Axelsson;342456]
In other news I have had two reboots with crashed video driver. After reboot the machine starts up without any video output (still possible to log in via remote desktop). The next reboot brings back the screen. I'm writing it off as a thermal problem but curiously both crashes happened when switching assignments, there were no save files left.
It's been a couple of days since last crash and everything is running stable right now. I'll report back if I find out some more.

/Göran[/QUOTE]

This sounds a lot like thermal or driver (which version are you on?) issues. mfakto does not do a lot regarding the GPU when switching assignments. No resources are (de-)allocated. Just a small initialization kernel is needed before each new exponent. And of course, the results and worktodo files are updated and ckp's deleted, so your disk has to spin up. During that millisecond your GPU may decide to power down the frequencies, and it has to ramp up again ... if the driver does not handle well that up&down ...

Then you could try to run two instances in parallel - the other one would keep the GPU busy while one saves the results, and otherwise they would evenly split the GPU power. (Evenly meaning the number of kernel invocations, not necessarily the performance.)

Edit: your GPUSievePrimes seems very low ... why?

Bdot 2013-06-04 11:54

[QUOTE=kracker;342439]Tweaking them, I can only get up to around ~37 GHz in max.... I switched my display to integrated and I'm getting weird ghosting and artifacts on display... Hmm, maybe...[/QUOTE]

I recently ran mfakto on a "Loveland" (the GPU of the small E-350 APUs). Though VLIW5, it seemed to have trouble assigning enough registers. I had to switch to VectorSize=2 for some kernels. I wanted to disregard this observation as something special to "Loveland" and thus not worth spending much time on it. But maybe it is a more general APU issue?

Please check, if VectorSize=2 or VectorSize=3 helps when GPU sieving. This would not be very fortunate, as it leaves ~25% of the vector units unused, but it may still be better than having to spill registers to global memory ...

Jayder 2013-06-15 06:13

Will the "best fit" GPUSievePrimes value remain constant? Or will it change as you change bit level, exponent, or possibly kernel? For example, if a GPUSievePrimes value of 52000 works best on a 332M exponent going from 2^69 to 2^70, will 52000 also be the best for a 65M exponent and 2^73 to 2^74? What about an 8M exponent and 2^60 to 2^61? Etc.

Are there any good strategies for finding the best value? Other than intelligent trial and error.

I searched for an answer in the mfaktc thread, but it is kind of a massive thread. I also considered experimenting and finding out for myself, but it would take a very long time to find out on my slow GPU. Hopefully I am not missing an answer that is staring me in the face.


All times are UTC. The time now is 23:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.