mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2019-05-29 00:50

Windows 7 x64 build of gpuowl-v6.5-61-g5c0db85
 
1 Attachment(s)
I don't know what they all are, or which if any the end users shouldn't mess with, but I found these in the gpuowl.cl file:
[CODE]OLD_ISBIG
ORIG_SQ
ORIG_X2 INLINE_X2 FMA_X2
NEWEST_FFT8 NEW_FFT8
NEWEST_FFT5 NEW_FFT5 OLD_FFT5
NEWEST_FFT10 NEW_FFT10 OLD_FFT10
ALT_RESTRICT
ORIG_PAIRSQ
ORIG_PAIRMUL
TEST_KERNEL
MIDDLE_MUL_LOOP
WIDTH
SMALL_HEIGHT
MIDDLE
NH

[/CODE]

kriesel 2019-05-30 20:22

gpuowl attempts on Intel IGP and CPU
 
[CODE]gpuowl v6.5-61-g5c0db85

Command line options:

-dir <folder> : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value> : PRP GEC block size. Default 1000. Smaller block is slower but detects errors sooner.
-log <step> : log every <step> iterations, default 20000. Multiple of 10000.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-B1 : P-1 B1 bound, default 500000
-B2 : P-1 B2 bound, default B1 * 30
-rB2 : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-prp <exponent> : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent> : run a single P-1 test and exit, ignoring worktodo.txt
-results <file> : name of results file, default 'results.txt'
-iters <N> : run next PRP test for <N> iterations and exit. Multiple of 10000.
-use NEW_FFT8,OLD_FFT5,NEW_FFT10: comma separated list of defines, see the #if tests in gpuowl.cl (used for perf tuning).
-device <N> : select a specific device:
0 : Intel(R) UHD Graphics 630-24x1100-
1 : Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz-12x2200-
2 : GeForce GTX 1050 Ti-6x1620-[/CODE]Tried running gpuowl-win V65-c48d46f on the OpenCl device 0, uhd630, on a laptop. It seemed to work (slowly, any igp is slow), but produced EE after the first 2000 iterations, repeatedly.
[CODE]2019-05-30 13:17:51 config: -device 0
2019-05-30 13:17:51 85469147 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.11 bits/word
2019-05-30 13:17:51 using short carry kernels
2019-05-30 13:18:42 OpenCL compilation in 50608 ms, with "-DEXP=85469147u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 13:18:44 85469147.owl not found, starting from the beginning.
2019-05-30 13:25:50 85469147 EE 2000 0.00%; 95.53 ms/sq; ETA 94d 11:54; 91e7259a0ae0534b (check 96.17s)
2019-05-30 13:25:50 85469147.owl not found, starting from the beginning.
2019-05-30 13:32:39 85469147 EE 2000 0.00%; 156.09 ms/sq; ETA 154d 09:38; 91e7259a0ae0534b (check 96.44s)
[/CODE]Tried running gpuowl-win v65-c48d46f on the OpenCl device 1, i7-8750H cpu, on a laptop. It did not get far.[CODE]>gpuowl-win -device 1 -fft +1 -carry short
2019-05-30 15:03:16 gpuowl v6.5-c48d46f
2019-05-30 15:03:16 Note: no config.txt file found
2019-05-30 15:03:16 config: -device 1 -fft +1 -carry short
2019-05-30 15:03:16 85469147 FFT 4608K: Width 64x4, Height 256x4, Middle 9; 18.11 bits/word
2019-05-30 15:03:16 using short carry kernels
2019-05-30 15:03:18 OpenCL compilation error -11 (args -DEXP=85469147u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-05-30 15:03:18 Compilation started
Compilation done
Linking started
Linking done
Device build started
Failed to build device program
Error: unimplemented function(s) used:
_Z18work_group_barrierj12memory_scope is undefined
CompilerException Failed to parse IR

2019-05-30 15:03:18 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:220 build
2019-05-30 15:03:18 Bye[/CODE]Tried running gpuowl-win v65-61-g5c0db85 on i7-8750H laptop cpu. Same issue as for the earlier version.
[CODE]>gpuowl-win -device 1 -fft +1 -carry short -use ORIG_X2
2019-05-30 15:15:53 gpuowl v6.5-61-g5c0db85
2019-05-30 15:15:53 Note: no config.txt file found
2019-05-30 15:15:53 config: -device 1 -fft +1 -carry short -use ORIG_X2
2019-05-30 15:15:53 85469147 FFT 4608K: Width 64x4, Height 256x4, Middle 9; 18.11 bits/word
2019-05-30 15:15:53 using short carry kernels
2019-05-30 15:15:53 OpenCL args "-DEXP=85469147u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DFRAC=2089525580236878279ul -DWEIGHT_STEP=0xe.cab3fdd2379b8p-3 -DIWEIGHT_STEP=0x8.a747b4917f72p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DINVWEIGHT_LIMIT=0xe.38e38e38e38ep-29 -DORIG_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 15:15:54 OpenCL compilation error -11 (args -DEXP=85469147u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DFRAC=2089525580236878279ul -DWEIGHT_STEP=0xe.cab3fdd2379b8p-3 -DIWEIGHT_STEP=0x8.a747b4917f72p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DINVWEIGHT_LIMIT=0xe.38e38e38e38ep-29 -DORIG_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-05-30 15:15:54 Compilation started
Compilation done
Linking started
Linking done
Device build started
Failed to build device program
Error: unimplemented function(s) used:
_Z18work_group_barrierj12memory_scope is undefined
CompilerException Failed to parse IR

2019-05-30 15:15:54 Exception 9gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:215 build
2019-05-30 15:15:54 Bye[/CODE]

kriesel 2019-05-31 03:57

more on the uhd630 gpuowl attempts
 
gpuowl attempt on i7-8750H's uhd630 IGP OpenCL device 0 unsuccessful in various ways:
[CODE]>gpuowl-win-c48d46f -device 0 -fft +0 -carry short
2019-05-30 13:17:51 Note: no config.txt file found
2019-05-30 13:17:51 config: -device 0
2019-05-30 13:17:51 85469147 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.11 bits/word
2019-05-30 13:17:51 using short carry kernels
2019-05-30 13:18:42 OpenCL compilation in 50608 ms, with "-DEXP=85469147u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 13:18:44 85469147.owl not found, starting from the beginning.
2019-05-30 13:25:50 85469147 EE 2000 0.00%; 95.53 ms/sq; ETA 94d 11:54; 91e7259a0ae0534b (check 96.17s)
2019-05-30 13:25:50 85469147.owl not found, starting from the beginning.
2019-05-30 13:32:39 85469147 EE 2000 0.00%; 156.09 ms/sq; ETA 154d 09:38; 91e7259a0ae0534b (check 96.44s)
[/CODE](then some successful iterations on gtx1050Ti device 2, then return to the igp device 0)
[CODE]
>gpuowl-win-c48d46f -device 0 -fft +1 -carry short
2019-05-30 17:47:08 gpuowl v6.5-c48d46f
2019-05-30 17:47:08 Note: no config.txt file found
2019-05-30 17:47:08 config: -device 0 -fft +1 -carry short
2019-05-30 17:47:08 85469147 FFT 4608K: Width 64x4, Height 256x4, Middle 9; 18.11 bits/word
2019-05-30 17:47:08 using short carry kernels
2019-05-30 17:48:01 OpenCL compilation in 53016 ms, with "-DEXP=85469147u -DWIDTH=256u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 17:48:03 85469147.owl loaded: k 223000, block 1000, res64 6dc0ba3dd68cf05d
2019-05-30 17:50:30 85469147 EE loaded: 223000, blockSize 1000, ee2866e4a4297374 (expected 6dc0ba3dd68cf05d)
2019-05-30 17:50:30 Exiting because "error on load"
2019-05-30 17:50:30 Bye

>gpuowl-win-c48d46f -device 0 -fft +3 -carry short
2019-05-30 17:52:14 gpuowl v6.5-c48d46f
2019-05-30 17:52:14 Note: no config.txt file found
2019-05-30 17:52:14 config: -device 0 -fft +3 -carry short
2019-05-30 17:52:14 85469147 FFT 4608K: Width 512x8, Height 8x8, Middle 9; 18.11 bits/word
2019-05-30 17:52:14 using short carry kernels
2019-05-30 17:52:55 OpenCL compilation in 40489 ms, with "-DEXP=85469147u -DWIDTH=4096u -DSMALL_HEIGHT=64u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 17:52:57 85469147.owl loaded: k 223000, block 1000, res64 6dc0ba3dd68cf05d
Abort was called at 74 line in file:
D:\qb\workspace\19992\src\vpg-compute-neo\runtime/command_stream/linear_stream.h

>gpuowl-win-c48d46f -device 0 -fft +2 -carry short
2019-05-30 17:54:32 gpuowl v6.5-c48d46f
2019-05-30 17:54:32 Note: no config.txt file found
2019-05-30 17:54:32 config: -device 0 -fft +2 -carry short
2019-05-30 17:54:32 85469147 FFT 4608K: Width 64x8, Height 64x8, Middle 9; 18.11 bits/word
2019-05-30 17:54:32 using short carry kernels
2019-05-30 17:56:02 OpenCL compilation in 88926 ms, with "-DEXP=85469147u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 17:56:03 85469147.owl loaded: k 223000, block 1000, res64 6dc0ba3dd68cf05d
(no progress indicated for 4 hours, no response to CTRL-C, igp is busy; terminated process in Task Manager)
>time
The current time is: 22:16:37.56

>gpuowl-win-c48d46f -device 0 -fft +0 -carry long
2019-05-30 22:26:15 gpuowl v6.5-c48d46f
2019-05-30 22:26:15 Note: no config.txt file found
2019-05-30 22:26:15 config: -device 0 -fft +0 -carry long
2019-05-30 22:26:15 85469147 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 18.11 bits/word
2019-05-30 22:26:15 using long carry kernels
2019-05-30 22:27:06 OpenCL compilation in 50507 ms, with "-DEXP=85469147u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-30 22:27:08 85469147.owl loaded: k 223000, block 1000, res64 6dc0ba3dd68cf05d
2019-05-30 22:29:46 85469147 EE loaded: 223000, blockSize 1000, 5ba05a0a832d8141 (expected 6dc0ba3dd68cf05d)
2019-05-30 22:29:46 Exiting because "error on load"
2019-05-30 22:29:46 Bye[/CODE]

SELROC 2019-05-31 05:06

It has been said earlier that gpuowl needs a discrete gpu. Your device 0 is an integrated gpu with shared memory.


[url]https://www.notebookcheck.net/Intel-UHD-Graphics-630-GPU-Benchmarks-and-Specs.257928.0.html[/url]

kriesel 2019-05-31 12:20

[QUOTE=SELROC;518181]It has been said earlier that gpuowl needs a discrete gpu. Your device 0 is an integrated gpu with shared memory.

[URL]https://www.notebookcheck.net/Intel-UHD-Graphics-630-GPU-Benchmarks-and-Specs.257928.0.html[/URL][/QUOTE]Using IGPs takes some memory bandwidth and TDP budget away from prime95/mprime on the cpu, whether it's TF or something else on the IGP. Sometimes it's a net gain though.
Some earlier IGPs lacked DP, so could run mfakto but not gpuowl. The UHD630's OpenCl indicates DP capability. (as does the HD620) From Gpu-Z's Advanced tab for OpenCl:[CODE]General
Platform Name Intel(R) OpenCL
Platform Vendor Intel(R) Corporation
Platform Profile FULL_PROFILE
Platform Version OpenCL 2.1
Vendor Intel(R) Corporation
Device Name Intel(R) UHD Graphics 630
Version OpenCL 2.1 NEO
Driver Version 23.20.16.4973
C Version OpenCL C 2.1
IL Version SPIR-V_1.0
Profile FULL_PROFILE
Global Memory Size 6497 MB
Clock Frequency 1100 MHz
Compute Units 24
Device Available Yes
Compiler Available Yes
Linker Available Yes
Preferred Synchronization User
CMD Queue Properties Out of Order, Profiling
SVM Capabilities Coarse, Fine, Atomics
[B]DP Capability Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA[/B]
SP Capability Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
Half FP Capability Denorm, INF NAN, Round Nearest, Round Zero, Round INF, FMA
Address Bits 64
Preferred On-Device Queue 128 KB
Global Memory Cache 512 KB (RW Cache)
Global Memory Cacheline 0 KB
Preferred Global Atomic Alignment 0
Preferred Local Atomic Alignment 0
Preferred Platform Atomic Alignment 0
Local Memory Local (64 KB)
Memory Alignment 1024 bits
Pitch Alignment 4 pixels
Built-in Kernels block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
Little Endian Yes
Error Correction No
Execution Capability Kernel
Unified Memory Yes
Image Support Yes

Limits
Max Device Events 1024
Max Device Queues 1
Max On-Device Queue 65536 KB
Preferred Max Variable Size 3406522368 Bytes
Max Memory Allocation 3248 MB
Max Constant Buffer 3326682 KB
Max Constant Args 8
Max Pipe Args 16
Max Pipe Reservations 1
Max Pipe Packet Size 1024 Bytes
Max Read Image Args 128
Max Write Image Args 128
Max Read-Write Image Args 0
Max Samplers 16
Max Work Item Dims 3
Max Write Image Args 128

Native Vectors
Native Vector Width (CHAR) 16
Native Vector Width (SHORT) 8
Native Vector Width (INT) 4
Native Vector Width (LONG) 1
Native Vector Width (FLOAT) 1
[B]Native Vector Width (DOUBLE) 1[/B]
Native Vector Width (HALF) 8
Preferred Vector Width (CHAR) 16
Preferred Vector Width (SHORT) 8
Preferred Vector Width (INT) 4
Preferred Vector Width (LONG) 1
Preferred Vector Width (FLOAT) 1
[B]Preferred Vector Width (DOUBLE) 1[/B]
Preferred Vector Width (HALF) 8

Extensions
cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_simultaneous_sharing
[/CODE]It was a long time since I'd last tested gpuowl on an IGP. As I recall, it ran there back in the LL days. The UHD630's performance is small in TF (typ. under 22 GhD/day), as is true of all IGPs I've tried or heard benchmarks of. So not a priority.

kriesel 2019-06-01 20:13

M61 transform & NVIDIA?
 
Back in gpuowl V1.9, there were four transform types, SP, DP, M31, and M61. M61 could go a bit higher on exponent than DP of the same length but was not nearly as fast, on AMD with its 1:16 DP:SP ratio.

Now in V6.5, gpuowl is running in OpenCl1.2 or above on NVIDIA. Most NVIDIA gpus have a slower ratio DP:SP than AMD does. Specifically, GTX10xx is 1:32.
If the M61 transform was available in gpuowl v6.x, it may be faster on NVIDIA than DP is.
See first attachment of [url]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/url], and [url]https://www.mersenneforum.org/showpost.php?p=498231&postcount=8[/url]

kriesel 2019-06-03 19:55

Latest makefile seems to get the strip right on Windows, requires specifying the target as gpuowl-win.exe.[CODE]
$ make gpuowl-win.exe
cat head.txt gpuowl.cl tail.txt > gpuowl-wrap.cpp
echo \"`git describe --long --dirty --always`\" > version.new
diff -q -N version.new version.inc >/dev/null || mv version.new version.inc
echo Version: `cat version.inc`
Version: "v6.5-75-g4902439-dirty"
g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp
g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp
g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17 -c -o Worktodo.o Worktodo.cpp
g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17 -c -o common.o common.cpp
g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17 -c -o main.o main.cpp
g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17 -c -o Gpu.o Gpu.cpp
g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17 -c -o clwrap.o clwrap.cpp
g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17 -c -o Task.o Task.cpp
g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17 -c -o checkpoint.o checkpoint.cpp
g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17 -c -o timeutil.o timeutil.cpp
g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17 -c -o Args.o Args.cpp
g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17 -c -o state.o state.cpp
g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17 -c -o Signal.o Signal.cpp
g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17 -c -o FFTConfig.o FFTConfig.cpp
g++ -MT clpp.o -MMD -MP -MF .d/clpp.Td -Wall -O2 -std=c++17 -c -o clpp.o clpp.cpp
g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17 -c -o gpuowl-wrap.o gpuowl-wrap.cpp
g++ -o gpuowl-win.exe Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o clpp.o gpuowl-wrap.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
strip gpuowl-win.exe[/CODE]What does it mean that it's labeled dirty? Perhaps that the conversion to u32 is not complete?

[CODE]>gpuowl-win -prp 3321928097
2019-06-03 14:22:00 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:22:00 Exception St12out_of_range: stol
2019-06-03 14:22:00 Bye

>gpuowl-win -prp 2147483659
2019-06-03 14:28:16 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:28:17 Exception St12out_of_range: stol
2019-06-03 14:28:17 Bye

>gpuowl-win -prp 2147483647 -use FMA_X2
2019-06-03 14:29:52 gpuowl v6.5-75-g4902439-dirty
2019-06-03 14:29:52 Note: no config.txt file found
2019-06-03 14:29:52 config: -prp 2147483647 -use FMA_X2
2019-06-03 14:29:52 2147483647 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-06-03 14:29:52 using long carry kernels
2019-06-03 14:30:00 OpenCL args "-DEXP=2147483647u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -DWEIGHT_STEP=0xd.b745787f2c4cp-3 -DIWEIGHT_STEP=0x9.550d2c9e8
37e8p-4 -DWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-3 -DIWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-4 -DFMA_X2=1 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-06-03 14:30:04 OpenCL compilation in 4704 ms
2019-06-03 14:30:28 2147483647.owl not found, starting from the beginning.
2019-06-03 14:42:03 2147483647 OK 2000 0.00%; 162.835 ms/sq; ETA 4047d 06:30; fb12c8169932aa03 (check 172.72s)[/CODE](Above was on an RX480.
2147483647 < 2[SUP]31[/SUP] < 2147483659;
log10(2[SUP]3321928097[/SUP]-1) > 10[SUP]9[/SUP])

preda 2019-06-03 21:39

[QUOTE=kriesel;518470]What does it mean that it's labeled dirty?[/QUOTE]
Dirty means that there are uncommited local changes (edits) to some files. If the build is done from exactly the version that is checked-out, then it's not dirty.

I tried to fix the stol(), please re-try with a >2G exponent.

kriesel 2019-06-03 23:44

[QUOTE=preda;518473]I tried to fix the stol(), please re-try with a >2G exponent.[/QUOTE]Looks good. Timings don't though. Eleven to 23.5 years for these on RX480. [CODE]>gpuowl-win -prp 2147483659 -use FMA_X2
2019-06-03 17:40:09 gpuowl v6.5-76-g1ca08e2-dirty
2019-06-03 17:40:09 Note: no config.txt file found
2019-06-03 17:40:09 config: -prp 2147483659 -use FMA_X2
2019-06-03 17:40:09 2147483659 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word
2019-06-03 17:40:09 using long carry kernels
2019-06-03 17:40:16 OpenCL args "-DEXP=2147483659u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -DWEIGHT_STEP=0xd.b7456bd211bf8p-3 -DIWEIGHT_STEP=0x9.550d353e
7752p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DFMA_X2=1 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-06-03 17:40:21 OpenCL compilation in 4679 ms
2019-06-03 17:40:47 2147483659.owl not found, starting from the beginning.
2019-06-03 17:52:25 2147483659 OK 2000 0.00%; 161.868 ms/sq; ETA 4023d 06:15; 25ac32a404e8574e (check 171.25s)
^CTerminate batch job (Y/N)? n

>gpuowl-win -prp 3321928097 -use ORIG_X2
2019-06-03 17:53:53 gpuowl v6.5-76-g1ca08e2-dirty
2019-06-03 17:53:53 Note: no config.txt file found
2019-06-03 17:53:53 config: -prp 3321928097 -use ORIG_X2
2019-06-03 17:53:53 3321928097 FFT 196608K: Width 512x8, Height 256x8, Middle 12; 16.50 bits/word
2019-06-03 17:53:53 using long carry kernels
2019-06-03 17:53:59 OpenCL args "-DEXP=3321928097u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=12u -DWEIGHT_STEP=0xb.4feacf46035b8p-3 -DIWEIGHT_STEP=0xb.50b39ab
42445p-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-4 -DORIG_X2=1 -DORIG_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-06-03 17:54:03 OpenCL compilation in 4318 ms
2019-06-03 17:54:38 3321928097.owl not found, starting from the beginning.
2019-06-03 18:20:55 3321928097 OK 2000 0.00%; 222.996 ms/sq; ETA 8573d 19:14; 5388b104718177b6 (check 237.96s)
2019-06-03 18:24:37 Stopping, please wait..
2019-06-03 18:28:33 3321928097 OK 3000 0.00%; 221.702 ms/sq; ETA 8524d 01:06; faa54e1e75915eab (check 235.93s)
2019-06-03 18:28:37 Exiting because "stop requested"
2019-06-03 18:28:37 Bye[/CODE] In the first case, 2000 iterations x 162ms/sq + 171 = 495 sec, but elapsed time >690 sec.
In the second case, 2000 iterations x 223 ms/sq + 238 = 684 sec, but elapsed time = 18:20:55-17:54:38 = 1577 sec.
GPU ram usage was ~6GB in the second case.

preda 2019-06-04 11:42

In a recent commit, the timing display is changed from ms/sq to us/sq ("micros") :)
[CODE]
2019-06-04 21:46:15 r7u 85504057 OK 78643000 91.97%; 794 us/sq; ETA 0d 01:31; 3dad4b579a2cd95c (check 0.97s)
2019-06-04 21:47:01 r7u 85504057 78700000 92.04%; 811 us/sq; ETA 0d 01:32; 13b0dc053fd74724
[/CODE]

kriesel 2019-06-04 15:21

Kudos to the contributors
 
I read through the commit listings back to mid January, and saw Preda had acknowledged there numerous contributions made by several individuals. A crude summary follows[CODE]valeriob01 -w argument; readme.md work; description of cmd line arguments
& updates, display of parameters; primenet.py date & time;
makefile fix

k3ack3r fix some msys2 warnings; update makefile

chengsun fix alignment violation causing OUT_OF_RESOURCES error on NVIDIA
GPUs

sillygitter add -iters argument

gwoltman allow making small test kernels; new X2 definition; fft8 cleanup +
documentation; new sq macro; overhaul/comment fft5/fft10 macros;
improved pairSq and pairMul; faster 6m fft using new fft12 middle;
new 5.5m fft using new fft11 middle; increased precision of fft11
constants; inline X2; fft7 middle step; shorter multiply chains in
middle[/CODE]Thanks to you all!


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.