mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2018-07-13, 15:14   #485
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by kriesel View Post
f) benchmark in the various V3.x lengths, and start over in, or switch midstream to, the fastest suitable V3.3 fft length.
Because there are very big jumps between the FFT sizes supported (4-5-8-10-16-20), and because these particular NPOTs are quite good performance-wise, it is not expected to encounter any "performance inversion" at all -- a situation where a larger FFT would be faster then a smaller one. So the fastest FFT is the smallest FFT that can handle the exponent. Of course anybody can experiment, e.g. using -fft +/-1 to verify this.

(the situation is different in CUDA, where the FFT sizes are "compact" (small jumps) and some sizes are particularly inefficient, so sometimes it makes sense to go to a bit larger FFT than strictly needed).

Last fiddled with by preda on 2018-07-13 at 15:14
preda is offline   Reply With Quote
Old 2018-07-13, 19:05   #486
SELROC
 

739 Posts
Default

Quote:
Originally Posted by kriesel View Post
Preda's air cooled Vega 64, as indicated in his prior posts.
Asking about V1.9/2 compatibility to V3.3 was a question to Preda, to address SELROC's previously posted concern about compatibility. A clear statement on compatibility from Preda, who wrote and tested the code, would settle it, in my opinion.

Or are you saying, SELROC, that you've tested with v1.9 or 2.x checkpoint files and the exponent restarts from iteration 0 in V3.3? I've thought your statements about it up to now were questions or doubts, not test results.
Quote:
Originally Posted by kriesel View Post
That's sufficiently different from what I read and recalled, that I went back through this thread looking for what I apparently missed. I did not find such test results stated as such, in pages 22-44 of this thread. Perhaps it's in another thread? Or something.

Anyway, thanks for the clear summary just now.
And in English, since my grasp of Italian is approximately zero.

So, now, with V3.3's additional fft lengths, in addition to a-d in http://www.mersenneforum.org/showpos...&postcount=465 (well, b&c seem still applicable)
there's e) benchmark and compute whether it's quicker overall to finish an existing exponent in V1.9/2.0 or start over in V3.3
and (depending on whether the various V3.x fft lengths are compatible and can be changed on the fly)
f) benchmark in the various V3.x lengths, and start over in, or switch midstream to, the fastest suitable V3.3 fft length.

SELROC, would you assemble and post a similar timings table vs. version and fft length for your RX580?

This is what I think: as long as you select the same FFT size, v3.3 is faster than previous versions.
The 4M size is too small for 85M exponent, I tested it and it failed, with bits/word > 20.
The 5M size is the faster for 85M-86M exponent.
  Reply With Quote
Old 2018-07-13, 20:20   #487
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

In V1.9, the -fft M61 -size 4M can do up to ~83.87M. (some limit <83871433)
On RX550, it is ~7.4-9.2% slower than V2.0 5000K DP, and on RX480, 4M M61 is faster than V2.0 5000K DP, by about 6.3%; Windows 7 x64, same gpu, driver, etc.

Code:
gpuOwL v2.0- GPU Mersenne primality checker
Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
Note: using long carry and fused tail kernels
OpenCL compilation in 1598 ms, with " -DEXP=83871259u  -I. -cl-fast-relaxed-math -cl-kernel-arg-info "
PRP-3: FFT 5000K (625 * 4096 * 2) of 83871259 (16.38 bits/word) [2018-07-11 09:20:32 Central Daylight Time]
OK 75500000 / 83871259 [90.02%], 5.25 ms/it [5.18, 5.81], check 4.76s; ETA 0d 12:12; 68406df4ababed55 [00:45:22]
OK 76000000 / 83871259 [90.61%], 5.25 ms/it [5.18, 5.72], check 4.80s; ETA 0d 11:28; 26f19477440580e1 [01:29:10]
OK 76500000 / 83871259 [91.21%], 5.25 ms/it [5.19, 5.82], check 4.79s; ETA 0d 10:45; e9bf0d6048a79a9b [02:13:00]
Code:
gpuOwL v1.9- GPU Mersenne primality checker
Radeon (TM) RX 480 Graphics 36 @28:0.0, Ellesmere 1266MHz

OpenCL compilation in 9860 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=83870041u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFGT_61=1 -DLOG_ROOT2=49u "
Warning: high word size of 20.00 bits may result in errors
Note: using long carry kernels
PRP-3: FFT 4M (1024 * 2048 * 2) of 83870041 (20.00 bits/word) [2018-07-13 13:41:07 Central Daylight Time]
OK  3710000 / 83870041 [ 4.42%], 4.92 ms/it [4.87, 5.34] CV 2.1%, check 4.09s; ETA 4d 13:33; 1683dd3b253ee628 [13:44:44]
OK  3720000 / 83870041 [ 4.44%], 4.93 ms/it [4.88, 5.35] CV 2.2%, check 3.89s; ETA 4d 13:41; d286de5a16debe4d [13:45:38]
OK  3740000 / 83870041 [ 4.46%], 4.92 ms/it [4.87, 5.36] CV 2.1%, check 4.08s; ETA 4d 13:35; b84105343fdda8a3 [13:47:20]
OK  3760000 / 83870041 [ 4.48%], 4.93 ms/it [4.87, 5.36] CV 2.3%, check 3.88s; ETA 4d 13:42; 9c0b1d4ddcf1bd2c [13:49:03]
OK  3780000 / 83870041 [ 4.51%], 4.93 ms/it [4.88, 5.29] CV 2.1%, check 4.03s; ETA 4d 13:46; 44b788f4b55511a2 [13:50:45]
OK  3800000 / 83870041 [ 4.53%], 4.93 ms/it [4.88, 5.28] CV 2.0%, check 3.90s; ETA 4d 13:37; 6b319d3fa95c5f7a [13:52:28]
OK  3850000 / 83870041 [ 4.59%], 4.95 ms/it [4.89, 5.81] CV 2.8%, check 3.91s; ETA 4d 13:56; 9fbf9129fb4516a8 [13:56:39]
OK  3900000 / 83870041 [ 4.65%], 4.99 ms/it [4.88, 6.78] CV 5.4%, check 3.96s; ETA 4d 14:55; 08250aa4bd6bf9c1 [14:00:53]
OK  3950000 / 83870041 [ 4.71%], 4.97 ms/it [4.88, 5.98] CV 3.3%, check 3.98s; ETA 4d 14:14; b6958fdc5d218140 [14:05:05]
OK  4000000 / 83870041 [ 4.77%], 4.93 ms/it [4.88, 5.39] CV 2.2%, check 3.88s; ETA 4d 13:27; 079471495f27e70b [14:09:15]
OK  4050000 / 83870041 [ 4.83%], 4.96 ms/it [4.87, 6.11] CV 3.8%, check 3.90s; ETA 4d 14:03; 8b6870aa4b408e7f [14:13:28]
OK  4100000 / 83870041 [ 4.89%], 4.98 ms/it [4.88, 5.53] CV 3.2%, check 3.90s; ETA 4d 14:16; 4cb590ea14c87340 [14:17:40]
OK  4150000 / 83870041 [ 4.95%], 4.93 ms/it [4.88, 5.38] CV 2.2%, check 4.09s; ETA 4d 13:12; ba6740911ef7be07 [14:21:51]
(see also http://www.mersenneforum.org/showpos...&postcount=370)
kriesel is online now   Reply With Quote
Old 2018-07-13, 21:42   #488
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by kriesel View Post
SELROC, what gpu model(s) are you wanting to display stats for?

gpuOwL runs on Intel and AMD but not currently NVIDIA.
Here are some things to try on for gpu monitoring related to gpuOwL on linux:
http://www.rkblog.rk.edu.pl/w/p/moni...e-under-linux/

https://support.amd.com/en-us/kb-art...emMonitor.aspx is Windows 7 specific

Never mind about the AMD system monitor utility. V1.0.0.9 from 2012 and reliably produced only an error message box on my AMD gpu system in Win 7 x64.
kriesel is online now   Reply With Quote
Old 2018-07-14, 06:59   #489
SELROC
 

11111011100102 Posts
Default

Quote:
Originally Posted by SELROC View Post
This is what I think: as long as you select the same FFT size, v3.3 is faster than previous versions.
The 4M size is too small for 85M exponent, I tested it and it failed, with bits/word > 20.
The 5M size is the faster for 85M-86M exponent.

Started a 150M exponent, gpuowl selected 8M FFT, timing is 6-7 ms/it, ETA 11d.
  Reply With Quote
Old 2018-07-14, 08:44   #490
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

25338 Posts
Default GPU memory usage

I made a little theoretical estimation of the amount of GPU memory used by openowl.

When working with an FFT of size N (i.e. N words), the total amount of GPU memory is [a bit more than] 8*8*N bytes.

For example when doing a 20M FFT, this comes to about 1.25GB,
and for a 5M FFT about 320MB.

Last fiddled with by preda on 2018-07-14 at 08:44
preda is offline   Reply With Quote
Old 2018-07-14, 09:29   #491
SELROC
 

2·11·421 Posts
Default

Quote:
Originally Posted by preda View Post
I made a little theoretical estimation of the amount of GPU memory used by openowl.

When working with an FFT of size N (i.e. N words), the total amount of GPU memory is [a bit more than] 8*8*N bytes.

For example when doing a 20M FFT, this comes to about 1.25GB,
and for a 5M FFT about 320MB.
Hi Mihai, I still have to find a GPU monitoring tool for Debian. Right now I am only monitoring temperatures of the GPU.
Lm-sensors is so slow that it is almost unusable to monitor adequately and any way it does not show GPU RAM.
  Reply With Quote
Old 2018-07-14, 11:44   #492
SELROC
 

23×3×19 Posts
Default

Hardware: Asus Radeon RX580 8G (Ellesmere)

Quote:
Originally Posted by SELROC View Post
Started a 150M exponent, gpuowl selected 8M FFT, timing is 6-7 ms/it, ETA 11d.
Yet started another exponent, at 300M this time, FFT 20M, timing 20-21 ms/it, ETA 69d.
  Reply With Quote
Old 2018-07-14, 11:49   #493
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by SELROC View Post
Hardware: Asus Radeon RX580 8G (Ellesmere)
Yet started another exponent, at 300M this time, FFT 20M, timing 20-21 ms/it, ETA 69d.
On 20M it may be worth doing a bit higher exponents, 332M, which reach into "100M digits" domain. You can get such exponents from the "manual assignments" page, "first time 100M digits PRP".
preda is offline   Reply With Quote
Old 2018-07-14, 11:52   #494
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by SELROC View Post
Hi Mihai, I still have to find a GPU monitoring tool for Debian. Right now I am only monitoring temperatures of the GPU.
Lm-sensors is so slow that it is almost unusable to monitor adequately and any way it does not show GPU RAM.
I don't have a good solution myself. If you use ROCm, it may be an idea to submit a feature request to rocm-smi. I think some information about allocated GPU RAM can be gleaned from clinfo.
That's why my memory info is "theoretical", not reported from the GPU.
preda is offline   Reply With Quote
Old 2018-07-14, 12:08   #495
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default V3.3 openowl build fail on Windows; help?

In msys2/mingw64 (there does not seem to be a make available there), see the attachment for warnings/errors.
Attached Thumbnails
Click image for larger version

Name:	openowl-builderror.png
Views:	65
Size:	59.3 KB
ID:	18747  
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 14:20.


Fri Aug 6 14:20:32 UTC 2021 up 14 days, 8:49, 1 user, load averages: 3.76, 2.85, 2.59

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.