![]() |
|
|
#474 | |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Quote:
In the recent changes, I replaced a set of precomputed trigonometric tables with some computed sin/cos. This moves some weight from memory to compute. I was surprised to see that on Vega64 the overall performance is about the same (i.e. not a huge penalty from the computed trig). What was the timing on your GPU before, when using 8M FFT? |
|
|
|
|
|
|
#475 | |
|
100100110111012 Posts |
Quote:
Always using the Asus Radeon RX580 8G (Ellesmere). With 8M FFT the timing was 6-8 ms/it. |
|
|
|
|
#476 |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
|
|
|
|
|
|
#477 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
543710 Posts |
Quote:
Code:
fft size V1.9/2 V3.3 4M 1.63 ? 5000K/5M 2.43 2.5 8M ? ? 10M na ? 16M ? 7.8 20M na 9.77 |
|
|
|
|
|
|
#478 | ||
|
797 Posts |
Quote:
Quote:
|
||
|
|
|
#479 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10101001111012 Posts |
Preda's air cooled Vega 64, as indicated in his prior posts.
Asking about V1.9/2 compatibility to V3.3 was a question to Preda, to address SELROC's previously posted concern about compatibility. A clear statement on compatibility from Preda, who wrote and tested the code, would settle it, in my opinion. Or are you saying, SELROC, that you've tested with v1.9 or 2.x checkpoint files and the exponent restarts from iteration 0 in V3.3? I've thought your statements about it up to now were questions or doubts, not test results. Last fiddled with by kriesel on 2018-07-13 at 14:08 |
|
|
|
|
|
#480 | |
|
22·5·7·67 Posts |
Quote:
|
|
|
|
|
#481 | |
|
61·97 Posts |
Quote:
- Radeon Pro Vega 64 https://www.videocardbenchmark.net/g...ega+64&id=3879 - Radeon RX Vega 64 https://www.videocardbenchmark.net/g...ega+64&id=3808 |
|
|
|
|
#482 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
124758 Posts |
Quote:
|
|
|
|
|
|
|
#483 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Anyway, thanks for the clear summary just now. And in English, since my grasp of Italian is approximately zero. So, now, with V3.3's additional fft lengths, in addition to a-d in http://www.mersenneforum.org/showpos...&postcount=465 (well, b&c seem still applicable) there's e) benchmark and compute whether it's quicker overall to finish an existing exponent in V1.9/2.0 or start over in V3.3 and (depending on whether the various V3.x fft lengths are compatible and can be changed on the fly) f) benchmark in the various V3.x lengths, and start over in, or switch midstream to, the fastest suitable V3.3 fft length. SELROC, would you assemble and post a similar timings table vs. version and fft length for your RX580? Last fiddled with by kriesel on 2018-07-13 at 14:56 |
|
|
|
|
|
|
#484 | ||
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
Second, the new version is much more flexible in terms of FFT sizes. The code is simpler, cleaner and easier to evolve. The old version could not do 100M-digits at all. So, in my personal opinion and without hard numbers, I would say that the new version is better architecturally, and not worse performance-wise. Quote:
Every version has backwards compatibility ("can read") with a few past versions of the savefiles. It is possible to build a "chain" of versions that would move a savefile forward all the way, but probably not in a single step. I would recommend to the user: try, and if the new version does not read the old savefile, finish it with the old version and afterwards start a new exponent with the new version. Last fiddled with by preda on 2018-07-13 at 15:18 |
||
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |