mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-05-29, 00:50   #1222
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default Windows 7 x64 build of gpuowl-v6.5-61-g5c0db85

I don't know what they all are, or which if any the end users shouldn't mess with, but I found these in the gpuowl.cl file:
Code:
OLD_ISBIG
ORIG_SQ
ORIG_X2 INLINE_X2 FMA_X2
NEWEST_FFT8 NEW_FFT8
NEWEST_FFT5 NEW_FFT5 OLD_FFT5
NEWEST_FFT10 NEW_FFT10 OLD_FFT10
ALT_RESTRICT
ORIG_PAIRSQ
ORIG_PAIRMUL
TEST_KERNEL
MIDDLE_MUL_LOOP
WIDTH
SMALL_HEIGHT
MIDDLE
NH
Attached Files
File Type: 7z gpuowl-win.7z (393.6 KB, 78 views)
kriesel is offline   Reply With Quote
Old 2019-05-30, 20:22   #1223
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001010112 Posts
Default gpuowl attempts on Intel IGP and CPU

Code:
gpuowl v6.5-61-g5c0db85

Command line options:

-dir <folder>      : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value>     : PRP GEC block size. Default 1000. Smaller block is slower but detects errors sooner.
-log <step>        : log every <step> iterations, default 20000. Multiple of 10000.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-B1                : P-1 B1 bound, default 500000
-B2                : P-1 B2 bound, default B1 * 30
-rB2               : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-prp <exponent>    : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent>    : run a single P-1 test and exit, ignoring worktodo.txt
-results <file>    : name of results file, default 'results.txt'
-iters <N>         : run next PRP test for <N> iterations and exit. Multiple of 10000.
-use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning).
-device <N>        : select a specific device:
 0 : Intel(R) UHD Graphics 630-24x1100-
 1 : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz-12x2200-
 2 : GeForce GTX 1050 Ti-6x1620-
Tried running gpuowl-win V65-c48d46f on the OpenCl device 0, uhd630, on a laptop. It seemed to work (slowly, any igp is slow), but produced EE after the first 2000 iterations, repeatedly.
Code:
2019-05-30 13:17:51 config: -device 0 
2019-05-30 13:17:51 85469147 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.11 bits/word
2019-05-30 13:17:51 using short carry kernels
2019-05-30 13:18:42 OpenCL compilation in 50608 ms, with "-DEXP=85469147u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 13:18:44 85469147.owl not found, starting from the beginning.
2019-05-30 13:25:50 85469147 EE     2000  0.00%; 95.53 ms/sq; ETA 94d 11:54; 91e7259a0ae0534b (check 96.17s)
2019-05-30 13:25:50 85469147.owl not found, starting from the beginning.
2019-05-30 13:32:39 85469147 EE     2000  0.00%; 156.09 ms/sq; ETA 154d 09:38; 91e7259a0ae0534b (check 96.44s)
Tried running gpuowl-win v65-c48d46f on the OpenCl device 1, i7-8750H cpu, on a laptop. It did not get far.
Code:
>gpuowl-win -device 1 -fft +1 -carry short
2019-05-30 15:03:16 gpuowl v6.5-c48d46f
2019-05-30 15:03:16 Note: no config.txt file found
2019-05-30 15:03:16 config: -device 1 -fft +1 -carry short
2019-05-30 15:03:16 85469147 FFT 4608K: Width 64x4, Height 256x4, Middle 9; 18.11 bits/word
2019-05-30 15:03:16 using short carry kernels
2019-05-30 15:03:18 OpenCL compilation error -11 (args -DEXP=85469147u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-05-30 15:03:18 Compilation started
Compilation done
Linking started
Linking done
Device build started
Failed to build device program
Error: unimplemented function(s) used:
_Z18work_group_barrierj12memory_scope is undefined
CompilerException Failed to parse IR

2019-05-30 15:03:18 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:220 build
2019-05-30 15:03:18 Bye
Tried running gpuowl-win v65-61-g5c0db85 on i7-8750H laptop cpu. Same issue as for the earlier version.
Code:
>gpuowl-win -device 1 -fft +1 -carry short -use ORIG_X2
2019-05-30 15:15:53 gpuowl v6.5-61-g5c0db85
2019-05-30 15:15:53 Note: no config.txt file found
2019-05-30 15:15:53 config: -device 1 -fft +1 -carry short -use ORIG_X2
2019-05-30 15:15:53 85469147 FFT 4608K: Width 64x4, Height 256x4, Middle 9; 18.11 bits/word
2019-05-30 15:15:53 using short carry kernels
2019-05-30 15:15:53 OpenCL args "-DEXP=85469147u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DFRAC=2089525580236878279ul -DWEIGHT_STEP=0xe.cab3fdd2379b8p-3 -DIWEIGHT_STEP=0x8.a747b4917f72p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DINVWEIGHT_LIMIT=0xe.38e38e38e38ep-29 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 15:15:54 OpenCL compilation error -11 (args -DEXP=85469147u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DFRAC=2089525580236878279ul -DWEIGHT_STEP=0xe.cab3fdd2379b8p-3 -DIWEIGHT_STEP=0x8.a747b4917f72p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DINVWEIGHT_LIMIT=0xe.38e38e38e38ep-29 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-05-30 15:15:54 Compilation started
Compilation done
Linking started
Linking done
Device build started
Failed to build device program
Error: unimplemented function(s) used:
_Z18work_group_barrierj12memory_scope is undefined
CompilerException Failed to parse IR

2019-05-30 15:15:54 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build
2019-05-30 15:15:54 Bye
kriesel is offline   Reply With Quote
Old 2019-05-31, 03:57   #1224
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default more on the uhd630 gpuowl attempts

gpuowl attempt on i7-8750H's uhd630 IGP OpenCL device 0 unsuccessful in various ways:
Code:
>gpuowl-win-c48d46f -device 0 -fft +0 -carry short
2019-05-30 13:17:51 Note: no config.txt file found
2019-05-30 13:17:51 config: -device 0 
2019-05-30 13:17:51 85469147 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.11 bits/word
2019-05-30 13:17:51 using short carry kernels
2019-05-30 13:18:42 OpenCL compilation in 50608 ms, with "-DEXP=85469147u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 13:18:44 85469147.owl not found, starting from the beginning.
2019-05-30 13:25:50 85469147 EE     2000  0.00%; 95.53 ms/sq; ETA 94d 11:54; 91e7259a0ae0534b (check 96.17s)
2019-05-30 13:25:50 85469147.owl not found, starting from the beginning.
2019-05-30 13:32:39 85469147 EE     2000  0.00%; 156.09 ms/sq; ETA 154d 09:38; 91e7259a0ae0534b (check 96.44s)
(then some successful iterations on gtx1050Ti device 2, then return to the igp device 0)
Code:
>gpuowl-win-c48d46f -device 0 -fft +1 -carry short
2019-05-30 17:47:08 gpuowl v6.5-c48d46f
2019-05-30 17:47:08 Note: no config.txt file found
2019-05-30 17:47:08 config: -device 0 -fft +1 -carry short
2019-05-30 17:47:08 85469147 FFT 4608K: Width 64x4, Height 256x4, Middle 9; 18.11 bits/word
2019-05-30 17:47:08 using short carry kernels
2019-05-30 17:48:01 OpenCL compilation in 53016 ms, with "-DEXP=85469147u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 17:48:03 85469147.owl loaded: k 223000, block 1000, res64 6dc0ba3dd68cf05d
2019-05-30 17:50:30 85469147 EE loaded: 223000, blockSize 1000, ee2866e4a4297374 (expected 6dc0ba3dd68cf05d)
2019-05-30 17:50:30 Exiting because "error on load"
2019-05-30 17:50:30 Bye

>gpuowl-win-c48d46f -device 0 -fft +3 -carry short
2019-05-30 17:52:14 gpuowl v6.5-c48d46f
2019-05-30 17:52:14 Note: no config.txt file found
2019-05-30 17:52:14 config: -device 0 -fft +3 -carry short
2019-05-30 17:52:14 85469147 FFT 4608K: Width 512x8, Height 8x8, Middle 9; 18.11 bits/word
2019-05-30 17:52:14 using short carry kernels
2019-05-30 17:52:55 OpenCL compilation in 40489 ms, with "-DEXP=85469147u -DWIDTH=4096u -DSMALL_HEIGHT=64u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 17:52:57 85469147.owl loaded: k 223000, block 1000, res64 6dc0ba3dd68cf05d
Abort was called at 74 line in file:
D:\qb\workspace\19992\src\vpg-compute-neo\runtime/command_stream/linear_stream.h

>gpuowl-win-c48d46f -device 0 -fft +2 -carry short
2019-05-30 17:54:32 gpuowl v6.5-c48d46f
2019-05-30 17:54:32 Note: no config.txt file found
2019-05-30 17:54:32 config: -device 0 -fft +2 -carry short
2019-05-30 17:54:32 85469147 FFT 4608K: Width 64x8, Height 64x8, Middle 9; 18.11 bits/word
2019-05-30 17:54:32 using short carry kernels
2019-05-30 17:56:02 OpenCL compilation in 88926 ms, with "-DEXP=85469147u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 17:56:03 85469147.owl loaded: k 223000, block 1000, res64 6dc0ba3dd68cf05d
(no progress indicated for 4 hours, no response to CTRL-C, igp is busy; terminated process in Task Manager)
>time
The current time is: 22:16:37.56

>gpuowl-win-c48d46f -device 0 -fft +0 -carry long
2019-05-30 22:26:15 gpuowl v6.5-c48d46f
2019-05-30 22:26:15 Note: no config.txt file found
2019-05-30 22:26:15 config: -device 0 -fft +0 -carry long
2019-05-30 22:26:15 85469147 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.11 bits/word
2019-05-30 22:26:15 using long carry kernels
2019-05-30 22:27:06 OpenCL compilation in 50507 ms, with "-DEXP=85469147u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 22:27:08 85469147.owl loaded: k 223000, block 1000, res64 6dc0ba3dd68cf05d
2019-05-30 22:29:46 85469147 EE loaded: 223000, blockSize 1000, 5ba05a0a832d8141 (expected 6dc0ba3dd68cf05d)
2019-05-30 22:29:46 Exiting because "error on load"
2019-05-30 22:29:46 Bye
kriesel is offline   Reply With Quote
Old 2019-05-31, 05:06   #1225
SELROC
 

2×32×223 Posts
Default

It has been said earlier that gpuowl needs a discrete gpu. Your device 0 is an integrated gpu with shared memory.


https://www.notebookcheck.net/Intel-....257928.0.html
  Reply With Quote
Old 2019-05-31, 12:20   #1226
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by SELROC View Post
It has been said earlier that gpuowl needs a discrete gpu. Your device 0 is an integrated gpu with shared memory.

https://www.notebookcheck.net/Intel-....257928.0.html
Using IGPs takes some memory bandwidth and TDP budget away from prime95/mprime on the cpu, whether it's TF or something else on the IGP. Sometimes it's a net gain though.
Some earlier IGPs lacked DP, so could run mfakto but not gpuowl. The UHD630's OpenCl indicates DP capability. (as does the HD620) From Gpu-Z's Advanced tab for OpenCl:
Code:
General
Platform Name    Intel(R) OpenCL
Platform Vendor    Intel(R) Corporation
Platform Profile    FULL_PROFILE
Platform Version    OpenCL 2.1 
Vendor    Intel(R) Corporation
Device Name    Intel(R) UHD Graphics 630
Version    OpenCL 2.1 NEO 
Driver Version    23.20.16.4973
C Version    OpenCL C 2.1 
IL Version    SPIR-V_1.0 
Profile    FULL_PROFILE
Global Memory Size    6497 MB
Clock Frequency    1100 MHz
Compute Units    24
Device Available    Yes
Compiler Available    Yes
Linker Available    Yes
Preferred Synchronization    User
CMD Queue Properties    Out of Order, Profiling
SVM Capabilities    Coarse, Fine, Atomics
DP Capability    Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
SP Capability    Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
Half FP Capability    Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
Address Bits    64
Preferred On-Device Queue    128 KB
Global Memory Cache    512 KB (RW Cache)
Global Memory Cacheline    0 KB
Preferred Global Atomic Alignment    0
Preferred Local Atomic Alignment    0
Preferred Platform Atomic Alignment    0
Local Memory    Local (64 KB)
Memory Alignment    1024 bits
Pitch Alignment    4 pixels
Built-in Kernels    block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Little Endian    Yes
Error Correction    No
Execution Capability    Kernel
Unified Memory    Yes
Image Support    Yes

Limits
Max Device Events    1024
Max Device Queues    1
Max On-Device Queue    65536 KB
Preferred Max Variable Size    3406522368 Bytes
Max Memory Allocation    3248 MB
Max Constant Buffer    3326682 KB
Max Constant Args    8
Max Pipe Args    16
Max Pipe Reservations    1
Max Pipe Packet Size    1024 Bytes
Max Read Image Args    128
Max Write Image Args    128
Max Read-Write Image Args    0
Max Samplers    16
Max Work Item Dims    3
Max Write Image Args    128

Native Vectors
Native Vector Width (CHAR)    16
Native Vector Width (SHORT)    8
Native Vector Width (INT)    4
Native Vector Width (LONG)    1
Native Vector Width (FLOAT)    1
Native Vector Width (DOUBLE)    1
Native Vector Width (HALF)    8
Preferred Vector Width (CHAR)    16
Preferred Vector Width (SHORT)    8
Preferred Vector Width (INT)    4
Preferred Vector Width (LONG)    1
Preferred Vector Width (FLOAT)    1
Preferred Vector Width (DOUBLE)    1
Preferred Vector Width (HALF)    8

Extensions
cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_simultaneous_sharing
It was a long time since I'd last tested gpuowl on an IGP. As I recall, it ran there back in the LL days. The UHD630's performance is small in TF (typ. under 22 GhD/day), as is true of all IGPs I've tried or heard benchmarks of. So not a priority.
kriesel is offline   Reply With Quote
Old 2019-06-01, 20:13   #1227
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default M61 transform & NVIDIA?

Back in gpuowl V1.9, there were four transform types, SP, DP, M31, and M61. M61 could go a bit higher on exponent than DP of the same length but was not nearly as fast, on AMD with its 1:16 DP:SP ratio.

Now in V6.5, gpuowl is running in OpenCl1.2 or above on NVIDIA. Most NVIDIA gpus have a slower ratio DP:SP than AMD does. Specifically, GTX10xx is 1:32.
If the M61 transform was available in gpuowl v6.x, it may be faster on NVIDIA than DP is.
See first attachment of https://www.mersenneforum.org/showpo...35&postcount=2, and https://www.mersenneforum.org/showpo...31&postcount=8
kriesel is offline   Reply With Quote
Old 2019-06-03, 19:55   #1228
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

152B16 Posts
Default

Latest makefile seems to get the strip right on Windows, requires specifying the target as gpuowl-win.exe.
Code:
$ make gpuowl-win.exe
cat head.txt gpuowl.cl tail.txt > gpuowl-wrap.cpp
echo \"`git describe --long --dirty --always`\" > version.new
diff -q -N version.new version.inc >/dev/null || mv version.new version.inc
echo Version: `cat version.inc`
Version: "v6.5-75-g4902439-dirty"
g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17   -c -o Pm1Plan.o Pm1Plan.cpp
g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17   -c -o GmpUtil.o GmpUtil.cpp
g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17   -c -o Worktodo.o Worktodo.cpp
g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17   -c -o common.o common.cpp
g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17   -c -o main.o main.cpp
g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17   -c -o Gpu.o Gpu.cpp
g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17   -c -o clwrap.o clwrap.cpp
g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17   -c -o Task.o Task.cpp
g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17   -c -o checkpoint.o checkpoint.cpp
g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17   -c -o timeutil.o timeutil.cpp
g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17   -c -o Args.o Args.cpp
g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17   -c -o state.o state.cpp
g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17   -c -o Signal.o Signal.cpp
g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17   -c -o FFTConfig.o FFTConfig.cpp
g++ -MT clpp.o -MMD -MP -MF .d/clpp.Td -Wall -O2 -std=c++17   -c -o clpp.o clpp.cpp
g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17   -c -o gpuowl-wrap.o gpuowl-wrap.cpp
g++ -o gpuowl-win.exe Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o clpp.o gpuowl-wrap.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
strip gpuowl-win.exe
What does it mean that it's labeled dirty? Perhaps that the conversion to u32 is not complete?

Code:
>gpuowl-win -prp 3321928097
2019-06-03 14:22:00 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:22:00 Exception St12out_of_range: stol
2019-06-03 14:22:00 Bye

>gpuowl-win -prp 2147483659
2019-06-03 14:28:16 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:28:17 Exception St12out_of_range: stol
2019-06-03 14:28:17 Bye

>gpuowl-win -prp 2147483647 -use FMA_X2
2019-06-03 14:29:52 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:29:52 Note: no config.txt file found
2019-06-03 14:29:52 config: -prp 2147483647 -use FMA_X2
2019-06-03 14:29:52 2147483647 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-06-03 14:29:52 using long carry kernels
2019-06-03 14:30:00 OpenCL args "-DEXP=2147483647u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -DWEIGHT_STEP=0xd.b745787f2c4cp-3 -DIWEIGHT_STEP=0x9.550d2c9e8
37e8p-4 -DWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-3 -DIWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-4 -DFMA_X2=1 -DFMA_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-06-03 14:30:04 OpenCL compilation in 4704 ms
2019-06-03 14:30:28 2147483647.owl not found, starting from the beginning.
2019-06-03 14:42:03 2147483647 OK     2000  0.00%; 162.835 ms/sq; ETA 4047d 06:30; fb12c8169932aa03 (check 172.72s)
(Above was on an RX480.
2147483647 < 231 < 2147483659;
log10(23321928097-1) > 109)

Last fiddled with by kriesel on 2019-06-03 at 20:29
kriesel is offline   Reply With Quote
Old 2019-06-03, 21:39   #1229
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by kriesel View Post
What does it mean that it's labeled dirty?
Dirty means that there are uncommited local changes (edits) to some files. If the build is done from exactly the version that is checked-out, then it's not dirty.

I tried to fix the stol(), please re-try with a >2G exponent.
preda is offline   Reply With Quote
Old 2019-06-03, 23:44   #1230
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by preda View Post
I tried to fix the stol(), please re-try with a >2G exponent.
Looks good. Timings don't though. Eleven to 23.5 years for these on RX480.
Code:
>gpuowl-win -prp 2147483659 -use FMA_X2
2019-06-03 17:40:09 gpuowl v6.5-76-g1ca08e2-dirty
2019-06-03 17:40:09 Note: no config.txt file found
2019-06-03 17:40:09 config: -prp 2147483659 -use FMA_X2
2019-06-03 17:40:09 2147483659 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-06-03 17:40:09 using long carry kernels
2019-06-03 17:40:16 OpenCL args "-DEXP=2147483659u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -DWEIGHT_STEP=0xd.b7456bd211bf8p-3 -DIWEIGHT_STEP=0x9.550d353e
7752p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DFMA_X2=1 -DFMA_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-06-03 17:40:21 OpenCL compilation in 4679 ms
2019-06-03 17:40:47 2147483659.owl not found, starting from the beginning.
2019-06-03 17:52:25 2147483659 OK     2000  0.00%; 161.868 ms/sq; ETA 4023d 06:15; 25ac32a404e8574e (check 171.25s)
^CTerminate batch job (Y/N)? n

>gpuowl-win -prp 3321928097 -use ORIG_X2
2019-06-03 17:53:53 gpuowl v6.5-76-g1ca08e2-dirty
2019-06-03 17:53:53 Note: no config.txt file found
2019-06-03 17:53:53 config: -prp 3321928097 -use ORIG_X2
2019-06-03 17:53:53 3321928097 FFT 196608K: Width 512x8, Height 256x8, Middle 12; 16.50 bits/word
2019-06-03 17:53:53 using long carry kernels
2019-06-03 17:53:59 OpenCL args "-DEXP=3321928097u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=12u -DWEIGHT_STEP=0xb.4feacf46035b8p-3 -DIWEIGHT_STEP=0xb.50b39ab
42445p-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-4 -DORIG_X2=1 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-06-03 17:54:03 OpenCL compilation in 4318 ms
2019-06-03 17:54:38 3321928097.owl not found, starting from the beginning.
2019-06-03 18:20:55 3321928097 OK     2000  0.00%; 222.996 ms/sq; ETA 8573d 19:14; 5388b104718177b6 (check 237.96s)
2019-06-03 18:24:37 Stopping, please wait..
2019-06-03 18:28:33 3321928097 OK     3000  0.00%; 221.702 ms/sq; ETA 8524d 01:06; faa54e1e75915eab (check 235.93s)
2019-06-03 18:28:37 Exiting because "stop requested"
2019-06-03 18:28:37 Bye
In the first case, 2000 iterations x 162ms/sq + 171 = 495 sec, but elapsed time >690 sec.
In the second case, 2000 iterations x 223 ms/sq + 238 = 684 sec, but elapsed time = 18:20:55-17:54:38 = 1577 sec.
GPU ram usage was ~6GB in the second case.
kriesel is offline   Reply With Quote
Old 2019-06-04, 11:42   #1231
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

In a recent commit, the timing display is changed from ms/sq to us/sq ("micros") :)
Code:
2019-06-04 21:46:15 r7u 85504057 OK 78643000 91.97%;  794 us/sq; ETA 0d 01:31; 3dad4b579a2cd95c (check 0.97s)
2019-06-04 21:47:01 r7u 85504057    78700000 92.04%;  811 us/sq; ETA 0d 01:32; 13b0dc053fd74724

Last fiddled with by preda on 2019-06-04 at 11:47
preda is offline   Reply With Quote
Old 2019-06-04, 15:21   #1232
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default Kudos to the contributors

I read through the commit listings back to mid January, and saw Preda had acknowledged there numerous contributions made by several individuals. A crude summary follows
Code:
valeriob01  -w argument; readme.md work; description of cmd line arguments 
            & updates, display of parameters; primenet.py date & time;
            makefile fix
 
k3ack3r     fix some msys2 warnings; update makefile

chengsun    fix alignment violation causing OUT_OF_RESOURCES error on NVIDIA
            GPUs
 
sillygitter add -iters argument

gwoltman    allow making small test kernels; new X2 definition; fft8 cleanup +
            documentation; new sq macro; overhaul/comment fft5/fft10 macros;
            improved pairSq and pairMul; faster 6m fft using new fft12 middle;
            new 5.5m fft using new fft11 middle; increased precision of fft11 
            constants; inline X2; fft7 middle step; shorter multiply chains in
            middle
Thanks to you all!
kriesel is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 20:31.


Sun Aug 1 20:31:58 UTC 2021 up 9 days, 15 hrs, 0 users, load averages: 2.22, 2.25, 1.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.