mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2018-07-13, 12:13   #474
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by SELROC View Post
Currently testing latest.
It selected fft 5M on the current exponents 85M. The timing is 4-5 ms/it.
Waiting for completion.
Which GPU are you using?

In the recent changes, I replaced a set of precomputed trigonometric tables with some computed sin/cos. This moves some weight from memory to compute. I was surprised to see that on Vega64 the overall performance is about the same (i.e. not a huge penalty from the computed trig).

What was the timing on your GPU before, when using 8M FFT?
preda is offline   Reply With Quote
Old 2018-07-13, 12:30   #475
SELROC
 

100100110111012 Posts
Default

Quote:
Originally Posted by preda View Post
Which GPU are you using?

In the recent changes, I replaced a set of precomputed trigonometric tables with some computed sin/cos. This moves some weight from memory to compute. I was surprised to see that on Vega64 the overall performance is about the same (i.e. not a huge penalty from the computed trig).

What was the timing on your GPU before, when using 8M FFT?

Always using the Asus Radeon RX580 8G (Ellesmere).

With 8M FFT the timing was 6-8 ms/it.
  Reply With Quote
Old 2018-07-13, 12:59   #476
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by SELROC View Post
Always using the Asus Radeon RX580 8G (Ellesmere).

With 8M FFT the timing was 6-8 ms/it.
OK, I'm glad it didn't get worse then :)
preda is offline   Reply With Quote
Old 2018-07-13, 13:25   #477
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

543710 Posts
Default

Quote:
Originally Posted by preda View Post
(roughly)
16M FFT: 7.8 ms/it.
20M FFT: 9.8 ms/it
So it's mostly linearly with the FFT size, which is about the best I could hope for. In fact under 10ms/it for 100M-digit PRP is not a bad baseline.
How do the V3.3 timings compare to the equivalent fft lengths in V1.9 and 2.0 on the same hardware? (default -block for V3.3 if convenient; your Vega 64?) Crude table follows, with what data I've been able to find from your previous posts.
Code:
fft size  V1.9/2  V3.3
4M        1.63     ?
5000K/5M  2.43    2.5
8M        ?        ?
10M       na       ?
16M       ?       7.8
20M       na      9.77
Are the checkpoint files compatible between V1.9 or V2.0 and V3.3, or should a user finish exponents begun in V1.9 or 2 before switching to V3.3?
kriesel is online now   Reply With Quote
Old 2018-07-13, 13:38   #478
SELROC
 

797 Posts
Default

Quote:
Originally Posted by kriesel View Post
How do the V3.3 timings compare to the equivalent fft lengths in V1.9 and 2.0 on the same hardware? (default -block for V3.3 if convenient; your Vega 64?) Crude table follows, with what data I've been able to find from your previous posts.
Code:
fft size  V1.9/2  V3.3
4M        1.63     ?
5000K/5M  2.43    2.5
8M        ?        ?
10M       na       ?
16M       ?       7.8
20M       na      9.77
What is the gpu ?

Quote:
Are the checkpoint files compatible between V1.9 or V2.0 and V3.3, or should a user finish exponents begun in V1.9 or 2 before switching to V3.3?
I think v1.9 or 2.0 is not compatible with v3.3, but the worse is that it restart the computation from zero.
  Reply With Quote
Old 2018-07-13, 14:06   #479
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001111012 Posts
Default

Quote:
Originally Posted by SELROC View Post
What is the gpu ?
Preda's air cooled Vega 64, as indicated in his prior posts.
Asking about V1.9/2 compatibility to V3.3 was a question to Preda, to address SELROC's previously posted concern about compatibility. A clear statement on compatibility from Preda, who wrote and tested the code, would settle it, in my opinion.

Or are you saying, SELROC, that you've tested with v1.9 or 2.x checkpoint files and the exponent restarts from iteration 0 in V3.3? I've thought your statements about it up to now were questions or doubts, not test results.

Last fiddled with by kriesel on 2018-07-13 at 14:08
kriesel is online now   Reply With Quote
Old 2018-07-13, 14:13   #480
SELROC
 

22·5·7·67 Posts
Default

Quote:
Originally Posted by kriesel View Post
Preda's air cooled Vega 64, as indicated in his prior posts.
Asking about V1.9/2 compatibility to V3.3 was a question to Preda, to address SELROC's previously posted concern about compatibility. A clear statement on compatibility from Preda, who wrote and tested the code, would settle it, in my opinion.

Or are you saying, SELROC, that you've tested with v1.9 or 2.x checkpoint files and the exponent restarts from iteration 0 in V3.3? I've thought your statements about it up to now were questions or doubts, not test results.
no wait, I said that I tested all versions from v2.0 and at some point, I think in v2.1 there was a change in checkpoint file, so all versions after that are not compatible with v2.0, and if you use v3.3 with a v2.0 checkpoint file the computation will restart from 0.
  Reply With Quote
Old 2018-07-13, 14:16   #481
SELROC
 

61·97 Posts
Default

Quote:
Originally Posted by kriesel View Post
Preda's air cooled Vega 64, as indicated in his prior posts.
...
There are really two models:

- Radeon Pro Vega 64 https://www.videocardbenchmark.net/g...ega+64&id=3879

- Radeon RX Vega 64 https://www.videocardbenchmark.net/g...ega+64&id=3808
  Reply With Quote
Old 2018-07-13, 14:35   #482
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124758 Posts
Default

Quote:
Originally Posted by SELROC View Post
There are really two models:

- Radeon Pro Vega 64 https://www.videocardbenchmark.net/g...ega+64&id=3879

- Radeon RX Vega 64 https://www.videocardbenchmark.net/g...ega+64&id=3808
My recollection from assembling the table of timings today is that Preda's is the second; RX.
kriesel is online now   Reply With Quote
Old 2018-07-13, 14:45   #483
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by SELROC View Post
no wait, I said that I tested all versions from v2.0 and at some point, I think in v2.1 there was a change in checkpoint file, so all versions after that are not compatible with v2.0, and if you use v3.3 with a v2.0 checkpoint file the computation will restart from 0.
That's sufficiently different from what I read and recalled, that I went back through this thread looking for what I apparently missed. I did not find such test results stated as such, in pages 22-44 of this thread. Perhaps it's in another thread? Or something.

Anyway, thanks for the clear summary just now.
And in English, since my grasp of Italian is approximately zero.

So, now, with V3.3's additional fft lengths, in addition to a-d in http://www.mersenneforum.org/showpos...&postcount=465 (well, b&c seem still applicable)
there's e) benchmark and compute whether it's quicker overall to finish an existing exponent in V1.9/2.0 or start over in V3.3
and (depending on whether the various V3.x fft lengths are compatible and can be changed on the fly)
f) benchmark in the various V3.x lengths, and start over in, or switch midstream to, the fastest suitable V3.3 fft length.

SELROC, would you assemble and post a similar timings table vs. version and fft length for your RX580?

Last fiddled with by kriesel on 2018-07-13 at 14:56
kriesel is online now   Reply With Quote
Old 2018-07-13, 15:02   #484
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by kriesel View Post
How do the V3.3 timings compare to the equivalent fft lengths in V1.9 and 2.0 on the same hardware? (default -block for V3.3 if convenient; your Vega 64?) Crude table follows, with what data I've been able to find from your previous posts.
Code:
fft size  V1.9/2  V3.3
4M        1.63     ?
5000K/5M  2.43    2.5
8M        ?        ?
10M       na       ?
16M       ?       7.8
 20M       na      9.77
It's not straightforward to compare just milliseconds. For one, in the past I was using ROCm, but now I'm on amdgpu-pro 18.20. The compiler optimizations are critical (i.e. ROCm vs. amdgpu-pro), can easily have a 10%-20% impact on performance. I'm waiting for ROCm to support the OS version I'm using (Ubuntu 18.04) to try it again, and see how it compares.

Second, the new version is much more flexible in terms of FFT sizes. The code is simpler, cleaner and easier to evolve. The old version could not do 100M-digits at all.

So, in my personal opinion and without hard numbers, I would say that the new version is better architecturally, and not worse performance-wise.

Quote:
Are the checkpoint files compatible between V1.9 or V2.0 and V3.3, or should a user finish exponents begun in V1.9 or 2 before switching to V3.3?
I do not know, I would need to dig the sources back to check this.

Every version has backwards compatibility ("can read") with a few past versions of the savefiles. It is possible to build a "chain" of versions that would move a savefile forward all the way, but probably not in a single step.

I would recommend to the user: try, and if the new version does not read the old savefile, finish it with the old version and afterwards start a new exponent with the new version.

Last fiddled with by preda on 2018-07-13 at 15:18
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 14:20.


Fri Aug 6 14:20:30 UTC 2021 up 14 days, 8:49, 1 user, load averages: 3.48, 2.78, 2.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.