mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2019-12-07, 22:07   #1508
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×5×17 Posts
Default What variables determine max P-1 exponent for two stages

Several scenarios, different observed limits on exponent in gpuowl P-1, several differences, some matches

A) Windows 7 Pro, dual old Xeon 4-core processors, 24GB system ram, NVIDIA GTX1080Ti with 11gb gpu ram, gpuowl v6.7-4-g278407a, -maxAlloc 10240
Observed maximum exponent for completing two stages with GPU72 bounds, at least 514M but less than 520M, testing 517M next
From the gpuowl.log,
2019-11-10 01:45:55 520000009 5580000 99.94%; 26842 us/sq; ETA 0d 00:01; 1baeff7353d9f091
2019-11-10 01:47:23 Not enough GPU memory, will skip stage2. Please wait for stage1 GCD

B) Colab, and so Ubuntu linux on VM, 2 cores Xeon, KVM hypervisor, ~12.7GB system ram, NVIDIA Tesla K80 with 12gb gpu ram, gpuowl: Fan Ming's build for Colab, -maxAlloc 10240 https://www.mersenneforum.org/showpo...&postcount=379
Observed maximum exponent for completing two stages with GPU72 bounds, at least 564M, upper bound TBD, testing 665M now

C) different Windows 7 Pro system, dual old Xeon 6-core processors, 12GB system ram, AMD RX480 with 8GB gpu ram, Gpuowl V6.6-5-667954b, -maxAlloc not an available option in that version
Observed maximum exponent for completing two stages with GPU72 bounds, at least 500M (gpu-z indicated 3.7GB used in stage 2), upper bound TBD, testing 530M now

D) AMD RX550 with 4GB gpu ram, Gpuowl v6.7-4-g278407a,
Observed maximum exponent for completing two stages with GPU72 bounds, at least 150M, upper bound 224M, testing 180M now

E) AMD RX550 with 2GB gpu ram, gpuowl-v6.10-9-g54cba1d -maxAlloc 1900
Observed maximum exponent for completing two stages with GPU72 bounds, at least 24M, upper bound TBD

F) NVIDIA GTX1060 3GB, gpuowl v6.9, -maxAlloc 3000
Observed maximum exponent for completing two stages with GPU72 bounds, <24M, not useful for GIMPS wavefront

G) NVIDIA GTX1080 with 8GB gpu ram,
Observed maximum exponent for completing two stages with GPU72 bounds, at least 499M, upper bound TBD

Any ideas why the difference in observed limit, especially for the same -maxAlloc? I'd really appreciate Preda's thoughts on what determines that. There's clearly more to it than only gpu ram. If maxAlloc controlled max exponent, the limits on scenarios A and B would be very similar or identical, and they're not.

Re unknown linux system ram amount, how do I ask the Colab VM's linux that?
After a web search, tried
!lshw -short
!sudo lshw -short
!sudo dmidecode -t memory
and got
bash: lshw: command not found
sudo: lshw: command not found
sudo: dmidecode: command not found

finally got it with !cat /proc/meminfo since top is not part of the gpuowl scripts
kriesel is offline   Reply With Quote
Old 2019-12-08, 13:43   #1509
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26·5·17 Posts
Default Dept of corrections

Quote:
Originally Posted by kriesel View Post
Several scenarios, different observed limits on exponent in gpuowl P-1, several differences, some matches

A) Windows 7 Pro, dual old Xeon 4-core processors, 24GB system ram, NVIDIA GTX1080Ti with 11gb gpu ram, gpuowl v6.7-4-g278407a, -maxAlloc 10240
Observed maximum exponent for completing two stages with GPU72 bounds, at least 510M but less than 514M, testing 511M next.
Fft length transition 28672 vs 36864K is sizable and 28672's max exponent is indicated in the help output as at 510.47M.

Last fiddled with by kriesel on 2019-12-08 at 13:45
kriesel is offline   Reply With Quote
Old 2019-12-08, 19:51   #1510
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

754310 Posts
Default New gpuOwl version

A new version of gpuOwl is ready with some fairly decent speed and power improvements. The key change was merging the transpose and middle steps into one kernel. This reduces memory bandwidth requirements by 33%.

On my test Radeon VII system running a 5M FFT, one instance timing was 950us, two instance timings was 1720us. Timings with the new code are 839us along with a reduction of 17 watts.

For Linux, download the source from preda's github gpuowl page. Here is a link to the Windows executable: https://www.dropbox.com/s/w9qnxd02ql...l-win.exe?dl=0

To activate the new code, add "-use MERGED_MIDDLE" to the command line. Do not run 2 instances, it will reduce throughput.

The new code is still undergoing some minor tuning and cleanup especially with regards to architectures other than Radeon VII. It needs QA on P-1 tasks.

Bonus: I added the -cleanup command line argument to delete PRP save files at the end of the run for composite results.
Prime95 is offline   Reply With Quote
Old 2019-12-08, 20:23   #1511
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

22×941 Posts
Default

Quote:
Originally Posted by Prime95 View Post
A new version of gpuOwl is ready with some fairly decent speed and power improvements. The key change was merging the transpose and middle steps into one kernel. This reduces memory bandwidth requirements by 33%.

On my test Radeon VII system running a 5M FFT, one instance timing was 950us, two instance timings was 1720us. Timings with the new code are 839us along with a reduction of 17 watts.

For Linux, download the source from preda's github gpuowl page. Here is a link to the Windows executable: https://www.dropbox.com/s/w9qnxd02ql...l-win.exe?dl=0

To activate the new code, add "-use MERGED_MIDDLE" to the command line. Do not run 2 instances, it will reduce throughput.

The new code is still undergoing some minor tuning and cleanup especially with regards to architectures other than Radeon VII. It needs QA on P-1 tasks.

Bonus: I added the -cleanup command line argument to delete PRP save files at the end of the run for composite results.
A very nice speed up from 990us to 860us

Last fiddled with by paulunderwood on 2019-12-08 at 20:34
paulunderwood is offline   Reply With Quote
Old 2019-12-08, 22:32   #1512
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×5×17 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
A very nice speed up from 990us to 860us
Or 1720/2=860, to 839, 2.5% higher throughput. There's also around 6% single-instance speedup for p=89796247, fft 5M, gtx1080, Win7 Pro, gpuowl v6.11-71-g7e02b07, and perhaps a little more to come. (That commit took different -use input)

Last fiddled with by kriesel on 2019-12-08 at 23:20
kriesel is offline   Reply With Quote
Old 2019-12-08, 22:37   #1513
xx005fs
 
"Eric"
Jan 2018
USA

22·53 Posts
Default

Quote:
Originally Posted by Prime95 View Post
A new version of gpuOwl is ready with some fairly decent speed and power improvements. The key change was merging the transpose and middle steps into one kernel. This reduces memory bandwidth requirements by 33%.

On my test Radeon VII system running a 5M FFT, one instance timing was 950us, two instance timings was 1720us. Timings with the new code are 839us along with a reduction of 17 watts.

For Linux, download the source from preda's github gpuowl page. Here is a link to the Windows executable: https://www.dropbox.com/s/w9qnxd02ql...l-win.exe?dl=0

To activate the new code, add "-use MERGED_MIDDLE" to the command line. Do not run 2 instances, it will reduce throughput.

The new code is still undergoing some minor tuning and cleanup especially with regards to architectures other than Radeon VII. It needs QA on P-1 tasks.

Bonus: I added the -cleanup command line argument to delete PRP save files at the end of the run for composite results.
Awesome! I will test on my Titan V which is severely bounded by memory bandwidth and I'll also experiment with the P100s on Colab with this version when I have time.

UPDATE 1: Significant speed up with my Vega 64, going from 2100us/it to 1870us/it. Not bad at all.

UPDATE 2: Getting the following error on my Nvidia GPUs. The following error is happening on my Windows 10 1909, Nvidia Titan V running driver version 440.97 using George's executable:
Code:
OpenCL compilation error -11 (args -DEXP=94205039u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x8.2daa303b0ff18p-3 -DIWEIGHT_STEP=0xf.a6a9923a8d87p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DMERGED_MIDDLE=1 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-08 14:43:55 <kernel>:1117:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  atom_add(&localSum, sum);
          ^
<kernel>:1120:39: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  if (get_local_id(0) == 0) { atom_add(&out[0], localSum); }
                                      ^

2019-12-08 14:43:55 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:226 build
2019-12-08 14:43:55 Bye

Last fiddled with by xx005fs on 2019-12-08 at 22:48
xx005fs is offline   Reply With Quote
Old 2019-12-08, 23:17   #1514
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

544010 Posts
Default

Quote:
Originally Posted by Prime95 View Post
A new version of gpuOwl is ready with some fairly decent speed and power improvements. ...

For Linux, download the source from preda's github gpuowl page. Here is a link to the Windows executable: https://www.dropbox.com/s/w9qnxd02ql...l-win.exe?dl=0
Code:
C:\Users\ken\Documents\gwtest>gpuowl-win -time -iters 10000 -use NO_ASM
2019-12-08 17:07:54 gpuowl v6.11-64-g01a9778-dirty
2019-12-08 17:07:54 Note: no config.txt file found
2019-12-08 17:07:54 config: -time -iters 10000 -use NO_ASM
2019-12-08 17:07:54 89796247 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word
2019-12-08 17:07:56 OpenCL args "-DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGHT_STEP=0x8.bce25ec56bc
2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-08 17:07:56 OpenCL compilation error -11 (args -DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGH
T_STEP=0x8.bce25ec56bc2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-08 17:07:56 <kernel>:1117:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  atom_add(&localSum, sum);
          ^
<kernel>:1120:39: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  if (get_local_id(0) == 0) { atom_add(&out[0], localSum); }
                                      ^

2019-12-08 17:07:56 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:226 build
2019-12-08 17:07:56 Bye

C:\Users\ken\Documents\gwtest>gpuowl-win -time -iters 10000 -use NO_ASM,MERGED_MIDDLE
2019-12-08 17:07:56 gpuowl v6.11-64-g01a9778-dirty
2019-12-08 17:07:57 Note: no config.txt file found
2019-12-08 17:07:57 config: -time -iters 10000 -use NO_ASM,MERGED_MIDDLE
2019-12-08 17:07:57 89796247 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word
2019-12-08 17:07:58 OpenCL args "-DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGHT_STEP=0x8.bce25ec56bc
2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -DMERGED_MIDDLE=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-08 17:07:58 OpenCL compilation error -11 (args -DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGH
T_STEP=0x8.bce25ec56bc2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -DMERGED_MIDDLE=1  -I. -cl-fast-relaxed-math
-cl-std=CL2.0)
2019-12-08 17:07:58 <kernel>:1117:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  atom_add(&localSum, sum);
          ^
<kernel>:1120:39: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  if (get_local_id(0) == 0) { atom_add(&out[0], localSum); }
                                      ^

2019-12-08 17:07:59 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:226 build
2019-12-08 17:07:59 Bye
Preda had made a pragma fix for his latest commit, 6.11-71-g7e02b07, which apparently didn't make it into prime95's Windows build yet.
kriesel is offline   Reply With Quote
Old 2019-12-08, 23:28   #1515
xx005fs
 
"Eric"
Jan 2018
USA

22×53 Posts
Default

Quote:
Originally Posted by kriesel View Post
Code:
C:\Users\ken\Documents\gwtest>gpuowl-win -time -iters 10000 -use NO_ASM
2019-12-08 17:07:54 gpuowl v6.11-64-g01a9778-dirty
2019-12-08 17:07:54 Note: no config.txt file found
2019-12-08 17:07:54 config: -time -iters 10000 -use NO_ASM
2019-12-08 17:07:54 89796247 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word
2019-12-08 17:07:56 OpenCL args "-DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGHT_STEP=0x8.bce25ec56bc
2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-08 17:07:56 OpenCL compilation error -11 (args -DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGH
T_STEP=0x8.bce25ec56bc2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-08 17:07:56 <kernel>:1117:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  atom_add(&localSum, sum);
          ^
<kernel>:1120:39: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  if (get_local_id(0) == 0) { atom_add(&out[0], localSum); }
                                      ^

2019-12-08 17:07:56 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:226 build
2019-12-08 17:07:56 Bye

C:\Users\ken\Documents\gwtest>gpuowl-win -time -iters 10000 -use NO_ASM,MERGED_MIDDLE
2019-12-08 17:07:56 gpuowl v6.11-64-g01a9778-dirty
2019-12-08 17:07:57 Note: no config.txt file found
2019-12-08 17:07:57 config: -time -iters 10000 -use NO_ASM,MERGED_MIDDLE
2019-12-08 17:07:57 89796247 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word
2019-12-08 17:07:58 OpenCL args "-DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGHT_STEP=0x8.bce25ec56bc
2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -DMERGED_MIDDLE=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-08 17:07:58 OpenCL compilation error -11 (args -DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGH
T_STEP=0x8.bce25ec56bc2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -DMERGED_MIDDLE=1  -I. -cl-fast-relaxed-math
-cl-std=CL2.0)
2019-12-08 17:07:58 <kernel>:1117:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  atom_add(&localSum, sum);
          ^
<kernel>:1120:39: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
  if (get_local_id(0) == 0) { atom_add(&out[0], localSum); }
                                      ^

2019-12-08 17:07:59 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:226 build
2019-12-08 17:07:59 Bye
Preda had made a pragma fix for his latest commit, 6.11-71-g7e02b07, which apparently didn't make it into prime95's Windows build yet.
I see. Any build instruction for windows then? It would be great if I can figure out how to build on windows myself :D
xx005fs is offline   Reply With Quote
Old 2019-12-08, 23:42   #1516
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

754310 Posts
Default

Quote:
Originally Posted by kriesel View Post
Preda had made a pragma fix for his latest commit, 6.11-71-g7e02b07, which apparently didn't make it into prime95's Windows build yet.
I'm waiting for one last merge before making another Windows executable.
Prime95 is offline   Reply With Quote
Old 2019-12-09, 00:19   #1517
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26×5×17 Posts
Default

Quote:
Originally Posted by xx005fs View Post
I see. Any build instruction for windows then? It would be great if I can figure out how to build on windows myself :D
Probably best to let Preda and Prime95 get back into sync first.

But in general, for relatively recent gpuowl versions, on Windows,
do steps 1 through 4 of kracker's instructions at https://www.mersenneforum.org/showpo...&postcount=356
(The AMD APP SDK 3.0 link has gone dead. See for example https://github.com/fireice-uk/xmr-stak/issues/1511 or https://en.wikipedia.org/wiki/AMD_APP_SDK)

Install git on msys2
This may not be the whole story for setting up for compiles.

In an msys2 cmd prompt box from here on:
# to refresh a git working folder:
git pull https://github.com/preda/gpuowl

#or to new folder that has not been a git folder before:
git clone https://github.com/preda/gpuowl

cd gpuowl
make gpuowl-win.exe

To use the executable, switch to an NT command prompt box. It won't run in the msys2 context.
Msys2 is a linux like environment. The executable is a Windows executable. It's a sort of cross-compile.

I usually run gpuowl-win.exe -h immediately, both to save it, and to verify the newly compiled program is working well enough to identify gpus on the build box. Since it's OpenCL based, it's the same build whether used on AMD or NVIDIA gpus.

Last fiddled with by kriesel on 2019-12-09 at 00:29
kriesel is offline   Reply With Quote
Old 2019-12-09, 00:26   #1518
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

87816 Posts
Default

Quote:
Originally Posted by kriesel View Post
To use the executable, switch to an NT command prompt box. It won't run in the msys2 context.
It'll run, you just have to tell it to look for it in the "current" folder like "./gpuowl-win.exe" for example.
kracker is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 21:20.


Fri Aug 6 21:20:26 UTC 2021 up 14 days, 15:49, 1 user, load averages: 2.70, 2.70, 2.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.