![]() |
|
|
#1508 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Several scenarios, different observed limits on exponent in gpuowl P-1, several differences, some matches
A) Windows 7 Pro, dual old Xeon 4-core processors, 24GB system ram, NVIDIA GTX1080Ti with 11gb gpu ram, gpuowl v6.7-4-g278407a, -maxAlloc 10240 Observed maximum exponent for completing two stages with GPU72 bounds, at least 514M but less than 520M, testing 517M next From the gpuowl.log, 2019-11-10 01:45:55 520000009 5580000 99.94%; 26842 us/sq; ETA 0d 00:01; 1baeff7353d9f091 2019-11-10 01:47:23 Not enough GPU memory, will skip stage2. Please wait for stage1 GCD B) Colab, and so Ubuntu linux on VM, 2 cores Xeon, KVM hypervisor, ~12.7GB system ram, NVIDIA Tesla K80 with 12gb gpu ram, gpuowl: Fan Ming's build for Colab, -maxAlloc 10240 https://www.mersenneforum.org/showpo...&postcount=379 Observed maximum exponent for completing two stages with GPU72 bounds, at least 564M, upper bound TBD, testing 665M now C) different Windows 7 Pro system, dual old Xeon 6-core processors, 12GB system ram, AMD RX480 with 8GB gpu ram, Gpuowl V6.6-5-667954b, -maxAlloc not an available option in that version Observed maximum exponent for completing two stages with GPU72 bounds, at least 500M (gpu-z indicated 3.7GB used in stage 2), upper bound TBD, testing 530M now D) AMD RX550 with 4GB gpu ram, Gpuowl v6.7-4-g278407a, Observed maximum exponent for completing two stages with GPU72 bounds, at least 150M, upper bound 224M, testing 180M now E) AMD RX550 with 2GB gpu ram, gpuowl-v6.10-9-g54cba1d -maxAlloc 1900 Observed maximum exponent for completing two stages with GPU72 bounds, at least 24M, upper bound TBD F) NVIDIA GTX1060 3GB, gpuowl v6.9, -maxAlloc 3000 Observed maximum exponent for completing two stages with GPU72 bounds, <24M, not useful for GIMPS wavefront G) NVIDIA GTX1080 with 8GB gpu ram, Observed maximum exponent for completing two stages with GPU72 bounds, at least 499M, upper bound TBD Any ideas why the difference in observed limit, especially for the same -maxAlloc? I'd really appreciate Preda's thoughts on what determines that. There's clearly more to it than only gpu ram. If maxAlloc controlled max exponent, the limits on scenarios A and B would be very similar or identical, and they're not. Re unknown linux system ram amount, how do I ask the Colab VM's linux that? After a web search, tried !lshw -short !sudo lshw -short !sudo dmidecode -t memory and got bash: lshw: command not found sudo: lshw: command not found sudo: dmidecode: command not found finally got it with !cat /proc/meminfo since top is not part of the gpuowl scripts |
|
|
|
|
|
#1509 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Last fiddled with by kriesel on 2019-12-08 at 13:45 |
|
|
|
|
|
|
#1510 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
754310 Posts |
A new version of gpuOwl is ready with some fairly decent speed and power improvements. The key change was merging the transpose and middle steps into one kernel. This reduces memory bandwidth requirements by 33%.
On my test Radeon VII system running a 5M FFT, one instance timing was 950us, two instance timings was 1720us. Timings with the new code are 839us along with a reduction of 17 watts. For Linux, download the source from preda's github gpuowl page. Here is a link to the Windows executable: https://www.dropbox.com/s/w9qnxd02ql...l-win.exe?dl=0 To activate the new code, add "-use MERGED_MIDDLE" to the command line. Do not run 2 instances, it will reduce throughput. The new code is still undergoing some minor tuning and cleanup especially with regards to architectures other than Radeon VII. It needs QA on P-1 tasks. Bonus: I added the -cleanup command line argument to delete PRP save files at the end of the run for composite results. |
|
|
|
|
|
#1511 | |
|
Sep 2002
Database er0rr
2×32×11×19 Posts |
Quote:
Last fiddled with by paulunderwood on 2019-12-08 at 20:34 |
|
|
|
|
|
|
#1512 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
543710 Posts |
Or 1720/2=860, to 839, 2.5% higher throughput. There's also around 6% single-instance speedup for p=89796247, fft 5M, gtx1080, Win7 Pro, gpuowl v6.11-71-g7e02b07, and perhaps a little more to come. (That commit took different -use input)
Last fiddled with by kriesel on 2019-12-08 at 23:20 |
|
|
|
|
|
#1513 | |
|
"Eric"
Jan 2018
USA
22×53 Posts |
Quote:
UPDATE 1: Significant speed up with my Vega 64, going from 2100us/it to 1870us/it. Not bad at all. UPDATE 2: Getting the following error on my Nvidia GPUs. The following error is happening on my Windows 10 1909, Nvidia Titan V running driver version 440.97 using George's executable: Code:
OpenCL compilation error -11 (args -DEXP=94205039u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x8.2daa303b0ff18p-3 -DIWEIGHT_STEP=0xf.a6a9923a8d87p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DMERGED_MIDDLE=1 -DORIG_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-08 14:43:55 <kernel>:1117:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
atom_add(&localSum, sum);
^
<kernel>:1120:39: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
if (get_local_id(0) == 0) { atom_add(&out[0], localSum); }
^
2019-12-08 14:43:55 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:226 build
2019-12-08 14:43:55 Bye
Last fiddled with by xx005fs on 2019-12-08 at 22:48 |
|
|
|
|
|
|
#1514 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Code:
C:\Users\ken\Documents\gwtest>gpuowl-win -time -iters 10000 -use NO_ASM
2019-12-08 17:07:54 gpuowl v6.11-64-g01a9778-dirty
2019-12-08 17:07:54 Note: no config.txt file found
2019-12-08 17:07:54 config: -time -iters 10000 -use NO_ASM
2019-12-08 17:07:54 89796247 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word
2019-12-08 17:07:56 OpenCL args "-DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGHT_STEP=0x8.bce25ec56bc
2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-08 17:07:56 OpenCL compilation error -11 (args -DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGH
T_STEP=0x8.bce25ec56bc2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-08 17:07:56 <kernel>:1117:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
atom_add(&localSum, sum);
^
<kernel>:1120:39: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
if (get_local_id(0) == 0) { atom_add(&out[0], localSum); }
^
2019-12-08 17:07:56 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:226 build
2019-12-08 17:07:56 Bye
C:\Users\ken\Documents\gwtest>gpuowl-win -time -iters 10000 -use NO_ASM,MERGED_MIDDLE
2019-12-08 17:07:56 gpuowl v6.11-64-g01a9778-dirty
2019-12-08 17:07:57 Note: no config.txt file found
2019-12-08 17:07:57 config: -time -iters 10000 -use NO_ASM,MERGED_MIDDLE
2019-12-08 17:07:57 89796247 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word
2019-12-08 17:07:58 OpenCL args "-DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGHT_STEP=0x8.bce25ec56bc
2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -DMERGED_MIDDLE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-08 17:07:58 OpenCL compilation error -11 (args -DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGH
T_STEP=0x8.bce25ec56bc2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -DMERGED_MIDDLE=1 -I. -cl-fast-relaxed-math
-cl-std=CL2.0)
2019-12-08 17:07:58 <kernel>:1117:11: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
atom_add(&localSum, sum);
^
<kernel>:1120:39: error: must specify '#pragma OPENCL EXTENSION cl_khr_int64_base_atomics: enable' before using this atomic operation
if (get_local_id(0) == 0) { atom_add(&out[0], localSum); }
^
2019-12-08 17:07:59 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:226 build
2019-12-08 17:07:59 Bye
|
|
|
|
|
|
|
#1515 | |
|
"Eric"
Jan 2018
USA
22·53 Posts |
Quote:
|
|
|
|
|
|
|
#1516 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
|
|
|
|
|
|
#1517 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
But in general, for relatively recent gpuowl versions, on Windows, do steps 1 through 4 of kracker's instructions at https://www.mersenneforum.org/showpo...&postcount=356 (The AMD APP SDK 3.0 link has gone dead. See for example https://github.com/fireice-uk/xmr-stak/issues/1511 or https://en.wikipedia.org/wiki/AMD_APP_SDK) Install git on msys2 This may not be the whole story for setting up for compiles. In an msys2 cmd prompt box from here on: # to refresh a git working folder: git pull https://github.com/preda/gpuowl #or to new folder that has not been a git folder before: git clone https://github.com/preda/gpuowl cd gpuowl make gpuowl-win.exe To use the executable, switch to an NT command prompt box. It won't run in the msys2 context. Msys2 is a linux like environment. The executable is a Windows executable. It's a sort of cross-compile. I usually run gpuowl-win.exe -h immediately, both to save it, and to verify the newly compiled program is working well enough to identify gpus on the build box. Since it's OpenCL based, it's the same build whether used on AMD or NVIDIA gpus. Last fiddled with by kriesel on 2019-12-09 at 00:29 |
|
|
|
|
|
|
#1518 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |