mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2018-07-13 12:13

[QUOTE=SELROC;491692]Currently testing latest.
It selected fft 5M on the current exponents 85M. The timing is 4-5 ms/it.
Waiting for completion.[/QUOTE]

Which GPU are you using?

In the recent changes, I replaced a set of precomputed trigonometric tables with some computed sin/cos. This moves some weight from memory to compute. I was surprised to see that on Vega64 the overall performance is about the same (i.e. not a huge penalty from the computed trig).

What was the timing on your GPU before, when using 8M FFT?

SELROC 2018-07-13 12:30

[QUOTE=preda;491694]Which GPU are you using?

In the recent changes, I replaced a set of precomputed trigonometric tables with some computed sin/cos. This moves some weight from memory to compute. I was surprised to see that on Vega64 the overall performance is about the same (i.e. not a huge penalty from the computed trig).

What was the timing on your GPU before, when using 8M FFT?[/QUOTE]


Always using the Asus Radeon RX580 8G (Ellesmere).

With 8M FFT the timing was 6-8 ms/it.

preda 2018-07-13 12:59

[QUOTE=SELROC;491695]Always using the Asus Radeon RX580 8G (Ellesmere).

With 8M FFT the timing was 6-8 ms/it.[/QUOTE]
OK, I'm glad it didn't get worse then :)

kriesel 2018-07-13 13:25

[QUOTE=preda;491693](roughly)
16M FFT: 7.8 ms/it.
20M FFT: 9.8 ms/it
So it's mostly linearly with the FFT size, which is about the best I could hope for. In fact under 10ms/it for 100M-digit PRP is not a bad baseline.[/QUOTE]

How do the V3.3 timings compare to the equivalent fft lengths in V1.9 and 2.0 on the same hardware? (default -block for V3.3 if convenient; your Vega 64?) Crude table follows, with what data I've been able to find from your previous posts.
[CODE]fft size V1.9/2 V3.3
4M 1.63 ?
5000K/5M 2.43 2.5
8M ? ?
10M na ?
16M ? 7.8
20M na 9.77[/CODE]Are the checkpoint files compatible between V1.9 or V2.0 and V3.3, or should a user finish exponents begun in V1.9 or 2 before switching to V3.3?

SELROC 2018-07-13 13:38

[QUOTE=kriesel;491704]How do the V3.3 timings compare to the equivalent fft lengths in V1.9 and 2.0 on the same hardware? (default -block for V3.3 if convenient; your Vega 64?) Crude table follows, with what data I've been able to find from your previous posts.
[CODE]fft size V1.9/2 V3.3
4M 1.63 ?
5000K/5M 2.43 2.5
8M ? ?
10M na ?
16M ? 7.8
20M na 9.77[/CODE][/QUOTE]

What is the gpu ?

[QUOTE]Are the checkpoint files compatible between V1.9 or V2.0 and V3.3, or should a user finish exponents begun in V1.9 or 2 before switching to V3.3?[/QUOTE]

I think v1.9 or 2.0 is not compatible with v3.3, but the worse is that it restart the computation from zero.

kriesel 2018-07-13 14:06

[QUOTE=SELROC;491706]What is the gpu ?
[/QUOTE]Preda's air cooled Vega 64, as indicated in his prior posts.
Asking about V1.9/2 compatibility to V3.3 was a question to Preda, to address SELROC's previously posted concern about compatibility. A clear statement on compatibility from Preda, who wrote and tested the code, would settle it, in my opinion.

Or are you saying, SELROC, that you've tested with v1.9 or 2.x checkpoint files and the exponent restarts from iteration 0 in V3.3? I've thought your statements about it up to now were questions or doubts, not test results.

SELROC 2018-07-13 14:13

[QUOTE=kriesel;491719]Preda's air cooled Vega 64, as indicated in his prior posts.
Asking about V1.9/2 compatibility to V3.3 was a question to Preda, to address SELROC's previously posted concern about compatibility. A clear statement on compatibility from Preda, who wrote and tested the code, would settle it, in my opinion.

Or are you saying, SELROC, that you've tested with v1.9 or 2.x checkpoint files and the exponent restarts from iteration 0 in V3.3? I've thought your statements about it up to now were questions or doubts, not test results.[/QUOTE]

no wait, I said that I tested all versions from v2.0 and at some point, I think in v2.1 there was a change in checkpoint file, so all versions after that are not compatible with v2.0, and if you use v3.3 with a v2.0 checkpoint file the computation will restart from 0.

SELROC 2018-07-13 14:16

[QUOTE=kriesel;491719]Preda's air cooled Vega 64, as indicated in his prior posts.
...[/QUOTE]

There are really two models:

- Radeon Pro Vega 64 [url]https://www.videocardbenchmark.net/gpu.php?gpu=Radeon+Pro+Vega+64&id=3879[/url][URL="https://www.videocardbenchmark.net/gpu.php?gpu=Radeon+Pro+Vega+64&id=3879"][/URL]

- Radeon RX Vega 64 [url]https://www.videocardbenchmark.net/gpu.php?gpu=Radeon+RX+Vega+64&id=3808[/url]

kriesel 2018-07-13 14:35

[QUOTE=SELROC;491723]There are really two models:

- Radeon Pro Vega 64 [URL]https://www.videocardbenchmark.net/gpu.php?gpu=Radeon+Pro+Vega+64&id=3879[/URL]

- Radeon RX Vega 64 [URL]https://www.videocardbenchmark.net/gpu.php?gpu=Radeon+RX+Vega+64&id=3808[/URL][/QUOTE]

My recollection from assembling the table of timings today is that Preda's is the second; RX.

kriesel 2018-07-13 14:45

[QUOTE=SELROC;491720]no wait, I said that I tested all versions from v2.0 and at some point, I think in v2.1 there was a change in checkpoint file, so all versions after that are not compatible with v2.0, and if you use v3.3 with a v2.0 checkpoint file the computation will restart from 0.[/QUOTE]
That's sufficiently different from what I read and recalled, that I went back through this thread looking for what I apparently missed. I did not find such test results stated as such, in pages 22-44 of this thread. Perhaps it's in another thread? Or something.

Anyway, thanks for the clear summary just now.
And in English, since my grasp of Italian is approximately zero.

So, now, with V3.3's additional fft lengths, in addition to a-d in [URL]http://www.mersenneforum.org/showpost.php?p=491433&postcount=465[/URL] (well, b&c seem still applicable)
there's e) benchmark and compute whether it's quicker overall to finish an existing exponent in V1.9/2.0 or start over in V3.3
and (depending on whether the various V3.x fft lengths are compatible and can be changed on the fly)
f) benchmark in the various V3.x lengths, and start over in, or switch midstream to, the fastest suitable V3.3 fft length.

SELROC, would you assemble and post a similar timings table vs. version and fft length for your RX580?

preda 2018-07-13 15:02

[QUOTE=kriesel;491704]How do the V3.3 timings compare to the equivalent fft lengths in V1.9 and 2.0 on the same hardware? (default -block for V3.3 if convenient; your Vega 64?) Crude table follows, with what data I've been able to find from your previous posts.
[CODE]fft size V1.9/2 V3.3
4M 1.63 ?
5000K/5M 2.43 2.5
8M ? ?
10M na ?
16M ? 7.8
20M na 9.77[/CODE][/QUOTE]

It's not straightforward to compare just milliseconds. For one, in the past I was using ROCm, but now I'm on amdgpu-pro 18.20. The compiler optimizations are critical (i.e. ROCm vs. amdgpu-pro), can easily have a 10%-20% impact on performance. I'm waiting for ROCm to support the OS version I'm using (Ubuntu 18.04) to try it again, and see how it compares.

Second, the new version is much more flexible in terms of FFT sizes. The code is simpler, cleaner and easier to evolve. The old version could not do 100M-digits at all.

So, in my personal opinion and without hard numbers, I would say that the new version is better architecturally, and not worse performance-wise.

[QUOTE]
Are the checkpoint files compatible between V1.9 or V2.0 and V3.3, or should a user finish exponents begun in V1.9 or 2 before switching to V3.3?[/QUOTE]I do not know, I would need to dig the sources back to check this.

Every version has backwards compatibility ("can read") with a few past versions of the savefiles. It is possible to build a "chain" of versions that would move a savefile forward all the way, but probably not in a single step.

I would recommend to the user: try, and if the new version does not read the old savefile, finish it with the old version and afterwards start a new exponent with the new version.


All times are UTC. The time now is 23:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.