![]() |
|
|
#1541 |
|
"Robert Gerbicz"
Oct 2005
Hungary
2×743 Posts |
Ofcourse if res64=0 then you need to check the full residue to see if it is really true that res=0. For much larger p>2^64 you could see (multiple) interim res64=0 during a prp test.
|
|
|
|
|
|
#1542 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
Notes on the new MERGED_MIDDLE code. There are many implementations buried in the code. The fastest implementation depends on the memory bus width and bandwidth and GPU architecture and maybe the cache architecture.
The benefits of MERGED_MIDDLE really kick in for FFTs with a WIDTH >= 256 and SMALL_HEIGHT >= 256. To find the best implementation for your GPU. Benchmark using each of these options: WORKINGIN,WORKINGIN1,WORKINGIN1A,WORKINGIN2,WORKINGIN3,WORKINGIN4,WORKINGIN5. Then benchmark again using each of these options: WORKINGOUT,WORKINGOUT0,WORKINGOUT1,WORKINGOUT1A,WORKINGOUT2,WORKINGOUT3,WORKINGOUT4,WORKINGOUT5 Once you've determined the best implementations you can add the best WORKINGIN and WORKINGOUT options to your production config.txt file. The default is WORKINGIN3 and WORKINGOUT3. If we can obtain some consistent data, we can select different default values for non-AMD GPUs. So let us know your GPU and your timings. Thanks. |
|
|
|
|
|
#1543 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Yes there's a very slight chance that a res64 zero is correct for a nonzero full residue. One place it shows up is in penultimate residues. It's also true that eventually we will reach a point where early residues will correctly have values that are currently treated as errors. This occurs within the 232 capability of Mlucas. (See the attachment at https://www.mersenneforum.org/showpo...72&postcount=9) Before the probability of zero or 2 res64 becomes high, the project is likely to switch to a longer residue for such checks, say res128. With all due respect, Dr. Gerbicz, none of us will need to worry about residues for p>264, or likely 248. TF is feasible with the right software up to a point, but P-1 or primality testing exponents of order 264 is quite out of reach and will be for more than my lifetime and many others'. In GIMPS we're dealing with p<232 and generally <230 (mersenne.org exponent limit for PRP, LL, or P-1 results acceptance is109), with most current activity other than my limits testing or the 100Mdigit attempts occurring at the wavefront <226.6. A single 230 exponent PRP takes several months on the fastest available gpus. P-1 factoring to feasible limits imposed by memory and software takes weeks on most hardware if not all. The scaling for primality testing and P-1 is roughly p2.1, p~232 takes years, p~233 decades (longer than hardware lifetime), and would require fft lengths longer than available in gpuowl or CUDALucas. Last fiddled with by kriesel on 2019-12-09 at 21:42 |
|
|
|
|
|
|
#1544 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Compiling Gpuowl https://www.mersenneforum.org/showpo...4&postcount=21
added to reference content. Probably has errors or omissions. I'll fix them as they are identified. |
|
|
|
|
|
#1545 |
|
"Eric"
Jan 2018
USA
3248 Posts |
Getting this error when trying to use -nospin as argument:
Code:
2019-12-09 19:19:27 gpuowl v6.11-71-g7e02b07 2019-12-09 19:19:27 Argument '-nospin' '' not understood 2019-12-09 19:19:27 Exiting because "args" 2019-12-09 19:19:27 Bye |
|
|
|
|
|
#1546 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Code:
$ make gpuowl-win.exe
cat head.txt gpuowl.cl tail.txt > gpuowl-wrap.cpp
echo \"`git describe --long --dirty --always`\" > version.new
diff -q -N version.new version.inc >/dev/null || mv version.new version.inc
echo Version: `cat version.inc`
Version: "v6.11-79-g0c139c4"
g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp
g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp
g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17 -c -o Worktodo.o Worktodo.cpp
In file included from Worktodo.cpp:6:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17 -c -o common.o common.cpp
In file included from common.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17 -c -o main.o main.cpp
In file included from main.cpp:8:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17 -c -o Gpu.o Gpu.cpp
In file included from ProofSet.h:6,
from Gpu.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17 -c -o clwrap.o clwrap.cpp
In file included from clwrap.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17 -c -o Task.o Task.cpp
In file included from Task.cpp:7:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17 -c -o checkpoint.o checkpoint.cpp
In file included from checkpoint.h:5,
from checkpoint.cpp:3:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17 -c -o timeutil.o timeutil.cpp
g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17 -c -o Args.o Args.cpp
In file included from Args.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:11: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17 -c -o state.o state.cpp
g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17 -c -o Signal.o Signal.cpp
g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17 -c -o FFTConfig.o FFTConfig.cpp
g++ -MT AllocTrac.o -MMD -MP -MF .d/AllocTrac.Td -Wall -O2 -std=c++17 -c -o AllocTrac.o AllocTrac.cpp
g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17 -c -o gpuowl-wrap.o gpuowl-wrap.cpp
g++ -o gpuowl-win.exe Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o AllocTrac.o gpuowl-wrap.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
strip gpuowl-win.exe
Following is a test of the OpenCL version check on a Quadro 2000, which indicates 1.1/1.2 in gpu-z. Code:
c:\Users\Ken\Documents\gpuowl\v6.11-79-g0c139c4>gpuowl-win -time -iters 10000 -use NO_ASM
2019-12-09 22:33:39 gpuowl v6.11-79-g0c139c4
2019-12-09 22:33:39 config.txt: -device 1 -user kriesel -cpu condorette/q2000
2019-12-09 22:33:39 condorette/q2000 config: -time -iters 10000 -use NO_ASM
2019-12-09 22:33:39 condorette/q2000 89796247 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word
2019-12-09 22:33:40 condorette/q2000 OpenCL args "-DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-3 -DIWEIGHT_STEP=0x8.b
ce25ec56bc2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-09 22:33:40 condorette/q2000 OpenCL compilation error -11 (args -DEXP=89796247u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xe.a6216bdf4fcdp-
3 -DIWEIGHT_STEP=0x8.bce25ec56bc2p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-09 22:33:40 condorette/q2000 <kernel>:13:9: warning: GpuOwl requires OpenCL 200, found 110
#pragma message "GpuOwl requires OpenCL 200, found " STR(__OPENCL_VERSION__)
^
<kernel>:14:2: error: OpenCL >= 2.0 required
#error OpenCL >= 2.0 required
^
<kernel>:2777:66: error: use of undeclared identifier 'memory_scope_device'
work_group_barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE, memory_scope_device);
^
<kernel>:2786:66: error: use of undeclared identifier 'memory_scope_device'
work_group_barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE, memory_scope_device);
^
<kernel>:2845:12: warning: implicit declaration of function 'atomic_load' is invalid in C99
while(!atomic_load((atomic_uint *) &ready[gr - 1]));
^
<kernel>:2845:25: error: use of undeclared identifier 'atomic_uint'
while(!atomic_load((atomic_uint *) &ready[gr - 1]));
^
<kernel>:2845:38: error: expected expression
while(!atomic_load((atomic_uint *) &ready[gr - 1]));
^
<kernel>:2846:5: warning: implicit declaration of function 'atomic_store' is invalid in C99
atomic_store((atomic_uint *) &ready[gr - 1], 0);
^
<kernel>:2846:19: error: use of undeclared identifier 'atomic_uint'
atomic_store((atomic_uint *) &ready[gr - 1], 0);
^
<kernel>:2846:32: error: expected expression
atomic_store((atomic_uint *) &ready[gr - 1], 0);
^
<kernel>:2919:25: error: use of undeclared identifier 'atomic_uint'
while(!atomic_load((atomic_uint *) &ready[gr - 1]));
^
<kernel>:2919:38: error: expected expression
while(!atomic_load((atomic_uint *) &ready[gr - 1]));
^
<kernel>:2920:19: error: use of undeclared identifier 'atomic_uint'
atomic_store((atomic_uint *) &ready[gr - 1], 0);
^
<kernel>:2920:32: error: expected expression
2019-12-09 22:33:40 condorette/q2000 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:234 build
2019-12-09 22:33:40 condorette/q2000 Bye
|
|
|
|
|
|
|
#1547 | |
|
"Sam Laur"
Dec 2018
Turku, Finland
4758 Posts |
Quote:
RTX 2080, clock pinned to 1920 MHz, Linux. Command line options -yield -log 10000 -prp 89796247 -fft +2 -iters 50000 -use NO_ASM,MERGED_MIDDLE except for those two baseline timings (3807 and 3808 µs) that were run without MERGED_MIDDLE. And then one IN and one OUT setting chosen. For whatever reason, the differences were really small on this card. 0.35% between the highest and lowest value, and if that one outlier (IN1A and OUT1A chosen) is taken out, the rest are within 0.19%. None of the WORKINGOUT0 tests would run, an error occurred: 2019-12-10 04:14:53 Exception gpu_error: OUT_OF_RESOURCES carryA at clwrap.cpp:304 run The smallest value was 3680 µs, which was reached with several different combinations. I have attached the full array of timings to this message. |
|
|
|
|
|
|
#1548 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Lots of differences of course; gpu, OS, pinning clock, exponent. Try some repeatability runs. Also, I think it's a rectangular array with one more row and column than you allowed for. George gave a list of ins and a list of outs, but there's also the null entry for in (baseline in) and for out (baseline out). And it appears from my recent test that minimum in, and minimum out, don't necessarily mean even better in combination. Last fiddled with by kriesel on 2019-12-10 at 06:50 |
|
|
|
|
|
|
#1549 | |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Already did, but of course not enough (five each of "without merge", IN1A+OUT1A, and IN3+OUT5). At least that time, the results varied max. 2µs from run to run. The advantage of benchmarking on Linux is that the results are more predictable, it's less likely that the OS starts indexing or going through updates or scanning for viruses in the background.
Quote:
|
|
|
|
|
|
|
#1550 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
I guess George's post means that if there's MERGE_MIDDLE, the default is 3's; else baseline NO_ASM only, middle is not merged so prior code, no in or out, so no 3's. (Who leaves indexing and autoupdates turned on?) Last fiddled with by kriesel on 2019-12-10 at 10:16 |
|
|
|
|
|
|
#1551 | |
|
"Sam Laur"
Dec 2018
Turku, Finland
31710 Posts |
Quote:
Don't get me started on quantization noise... ![]() I'm used to getting reliable and repeatable results when timing other programs, mostly mfaktc, but I have to admit these are exceptionally steady, about one digit more than I'm used to getting. Maybe I should start doubting the method, and use some sort of external timer as well, instead of blindly trusting the internal timer within the program. But that's way too much effort to sink into a quick test like this. Not by my own choice of course, but the win10 box I have at work has autoupdates forced on by group policy (corporate IT). Not sure about search though. And yeah, likewise the antivirus software (F-Secure) is forced always on. I still manage to run prime95 on it, but there the iteration timings are anything but stable. |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |