mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

LaurV 2020-04-16 16:54

3 Attachment(s)
[QUOTE=preda;542607]1. does clinfo run, detecting the GPUs?
2. does gpuowl -h run, printing the list of GPUs?
3. what fails?[/QUOTE]
Managed to make it run. I only had to look into the mirror to see where the dumb was. For some odd reason, I was using very old drivers (397), and when I decided to use the new owl freshly compiled by the people here, I had to upgrade the drivers to the new 445.7x (?) available last week. Either something went wrong at that time, or the driver didn't work properly. I suspect, the last, because they just released 445.87 yesterday, and with this it works. It seemed a fast fix on their side, so maybe indeed something was wrong with the drivers.

Now, the comments... Be ready... It is coming... [U]Now[/U]...

The owl is about 7% faster than cudaLucas running with old drivers. And trying cudaLucas with newer drivers, I suddenly remembered why I kept the old :picard:... all drivers from the generation 4xx are about 6% slower when running cudaLucas for 1080Ti and 2080Ti. I already forgot this, because in the last time I only played mfaktc/gpu72.

So, with the new drivers, in the race between the owl and cudaLucas, the owl comes out about 13% faster. This is good. Also, the cards run much cooler, about 9-10°C cooler that cudaLucas runs them. Also good.

[ATTACH]22045[/ATTACH]
(ignore the iteration at 300k, that is because we were doing something else more important in that time, for few seconds).

But then, the CPU activity seems strange in that graphic, the temperature jumps up and down like somebody is stealing ticks from P95. And voila, the owl gets full core for itself. (in the photo, 5% and 45% is due to HT, this CPU has 10 phys cores, so it is 10% and 90% occupancy).

[ATTACH]22046[/ATTACH]

That's not good. As a proof, P95 performance goes down 10% compared with the case when cudaLucas runs. That is 20%, if two copies (two cards) run. And if your CPU has only 4 cores, than the performance of P95 gets down 50%, because gpuOwl (2 copies) can monopolize 2 cores from 4. That's bad.

[ATTACH]22047[/ATTACH]
(in this picture, P95 runs parallel with 1 gpuOwl thread then with 2 cudaLucas threads, then back, then some other tests)


So, if you have Nvidia cards and want to use gpuOwl, AND use P95 in the same time, you have to see how do you get more output, running P95 al lower performance, and use the faster owl with the GPU(s) or running P95 full speed and using the (slower) cudaLucas (as cudaLucas does not take [U]any[/U] CPU resources).

Temperature is also a thing, but there is no dilemma here, if you have common water circuit, then the CPU running cooler reflects also in the GPUs running cooler. To make sure, separate water circuit will be needed (which I will do, probably during the weekend).

And checkpoint file names suck... where is the "exponent.iteration.residue.ll" naming convention? :razz:
And where is the shift? :rant:

Prime95 2020-04-16 17:18

[QUOTE=LaurV;542874]Now, the comments... Be ready... It is coming... [U]Now[/U]..[/QUOTE]

Everybody's a critic :)

Try the -yield option to tell gpuowl to yield the CPU to other tasks. I don't recall all the details but I believe this is really an nVidia openCL bug.

LaurV 2020-04-16 17:36

1 Attachment(s)
[QUOTE=Prime95;542879]
Try the -yield option [/QUOTE]
No difference. (this was in fact written in the help, being dumb again, or too tired, 0:35AM here, going to bed after this...)

But wait! What? WHAT? WTF? :shock:

[ATTACH]22048[/ATTACH]
(cL running on the other card, regardless of the fact that gpuOwl is running or not, now it takes a CPU core).

It seems a NV problem indeed. I didn't have this in the past. Maybe I am dumb, but I will have to dig up the old drivers to verify.

kriesel 2020-04-17 16:54

All on the same RX550 gpu and host system, that has reliably run without GEC errors for multiple 5M PRP first-tests on v6.11-134:
90710093 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word
92858651 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.71 bits/word
93461911 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.83 bits/word
93873049 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.90 bits/word
94418047 began FFT 5120K: Width 256x4, Height 64x4, Middle 10; 18.01 bits/word, ran ok. finished at 6M because of problems encountered at 7M on 131.5M, requiring -fft +6

All runs -use NO_ASM
V6.11-134 on rx550 from start on 94741139; no GEC to 2.8M iterations; this was at 6M fft, 18370 us/iter due to problems seen with 7M fft (leftover -fft +6 config.txt content)
V6.11-257 continuation on same rx550, 5M fft 1K:5:512; 14310 us/iter to 6.0544M iterations
V6.11-257 continuation on same RX550, no fft specification; 1K:10:256 chosen by program; ~13712 us/iter, 9 GEC errors by 22.1M iterations.
V6.11-259 continuation with -use STATS on same RX550, 1K:10:256, ~16542 us/iter, no additional GEC through 25.64M iterations.
V6.11-259 continuation without -use STATS underway now, 13750 usec/iter

kriesel 2020-04-17 19:57

gpuowl-win v6.11-264-264-g5c977d4 build
 
2 Attachment(s)
Nice, gracefully deals with the ASM issue. That may have been there a while, I haven't tried it recently until now.

[CODE]2020-04-17 14:17:34 roa/radeonvii ASM compilation failed, retrying compilation using NO_ASM[/CODE]Please fix the github source for this, previously reported minor issue.[CODE]Gpu.cpp: In member function 'void Gpu::printRoundoff(u32)':
Gpu.cpp:851:35: error: 'M_PI' was not declared in this scope; did you mean 'M_PIl'?
851 | double beta = sdev * (sqrt(6) / M_PI);
| ^~~~
| M_PIl
make: *** [Makefile:30: Gpu.o] Error 1[/CODE]Please fix the readme.md which says P-1 iterations for LL; it's P-2. (See [URL]https://www.mersenne.org/various/math.php#lucas-lehmer[/URL])

-use doc from top of gpuowl.cl source code; note that there are additional that are derived from these optional inputs:[CODE]//

gpuOwl, an OpenCL Mersenne primality test.
// Copyright Mihai Preda and George Woltman.

/* List of user-serviceable -use flags and their effects

DEBUG : enable asserts. Slow, but allows to verify that all asserts hold.

NO_ASM : request to not use any inline __asm()
NO_OMOD: do not use GCN output modifiers in __asm()

OUT_WG,OUT_SIZEX,OUT_SPACING <AMD default is 256,32,4> <nVidia default is 256,4,1 but needs testing>
IN_WG,IN_SIZEX,IN_SPACING <AMD default is 256,32,1> <nVidia default is 256,4,1 but needs testing>

UNROLL_WIDTH <nVidia default>
NO_UNROLL_WIDTH <AMD default>

OLD_FFT8 <default>
NEWEST_FFT8
NEW_FFT8

OLD_FFT5
NEW_FFT5 <default>
NEWEST_FFT5

NEW_FFT10 <default>
OLD_FFT10

CARRY32 <AMD default for PRP when appropriate>
CARRY64 <nVidia default>, <AMD default for PM1 when appropriate>

CARRYM32
CARRYM64 <default>

ORIG_SLOWTRIG
NEW_SLOWTRIG <default> // Our own sin/cos implementation
ROCM_SLOWTRIG // Use ROCm's private reduced-argument sin/cos

ROCM31 <AMD default> // Enable workaround for ROCm 3.1 bug affecting kcos()
NO_ROCM31 <nVidia default>

---- P-1 below ----

NO_P2_FUSED_TAIL // Do not use the big kernel tailFusedMulDelta

*/
[/CODE]Somehow describing the available values and restrictions would be a useful addition. Granted, we don't want to absorb too much time of the coders in documentation of transient states. However, more readily available info would aid testing support by others and reduce lost time by users.

-use STATS and ROUNDOFF are not listed above. Have they been removed?


There do seem to be some oddities, like getting the message both CARRY32 and CARRY64 have been specified when CARRY32 is specified, or out of resources message and terminate when -use STATS attempted.
[CODE]2020-04-17 14:37:29 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:37:29 config: -device 1 -user kriesel -cpu roa/radeonvii-w2
2020-04-17 14:37:29 device 1, unique id ''
2020-04-17 14:37:29 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:37:29 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:37:41 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:37:41 roa/radeonvii-w2 ASM compilation failed, retrying compilation using NO_ASM
2020-04-17 14:37:57 roa/radeonvii-w2 OpenCL compilation in 16.49 s
2020-04-17 14:38:07 roa/radeonvii-w2 852348659 OK 165317600 loaded: blockSize 400, ad90fdb1696eabf0
2020-04-17 14:38:25 roa/radeonvii-w2 852348659 OK 165318400 19.40%; 12853 us/it; ETA 102d 04:56; 8dcd685e25e59f08 (check 7.34s) 4 errors
2020-04-17 14:38:53 roa/radeonvii-w2 852348659 OK 165320000 19.40%; 12831 us/it; ETA 102d 00:41; d1ff263c2a76e8c1 (check 7.27s) 4 errors
2020-04-17 14:40:31 roa/radeonvii-w2 Stopping, please wait..
2020-04-17 14:40:42 roa/radeonvii-w2 852348659 OK 165328000 19.40%; 12834 us/it; ETA 102d 01:12; f095cd3f6ba99a30 (check 6.65s) 4 errors
2020-04-17 14:40:42 roa/radeonvii-w2 Exiting because "stop requested"
2020-04-17 14:40:42 roa/radeonvii-w2 Bye

2020-04-17 14:40:47 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:40:47 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,CARRY32,STATS,DEBUG
2020-04-17 14:40:47 device 1, unique id ''
2020-04-17 14:40:47 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:40:47 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:40:59 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DCARRY32=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:41:00 roa/radeonvii-w2 ASM compilation failed, retrying compilation using NO_ASM
2020-04-17 14:41:00 roa/radeonvii-w2 OpenCL compilation error -11 (args -DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DCARRY32=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 -DNO_ASM=1)
2020-04-17 14:41:00 roa/radeonvii-w2 C:\Users\ken\AppData\Local\Temp\\OCL2832T1.cl:82:2: error: Conflict: both CARRY32 and CARRY64 requested
#error Conflict: both CARRY32 and CARRY64 requested
^
1 error generated.

error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR

2020-04-17 14:41:00 roa/radeonvii-w2 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build
2020-04-17 14:41:00 roa/radeonvii-w2 Bye

2020-04-17 14:41:33 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:41:33 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS,DEBUG
2020-04-17 14:41:33 device 1, unique id ''
2020-04-17 14:41:33 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:41:33 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:41:45 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:42:09 roa/radeonvii-w2 OpenCL compilation in 24.22 s
2020-04-17 14:42:13 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run
2020-04-17 14:42:13 roa/radeonvii-w2 Bye

2020-04-17 14:42:31 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:42:31 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS,DEBUG,ROUNDOFF
2020-04-17 14:42:31 device 1, unique id ''
2020-04-17 14:42:31 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:42:31 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:42:43 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -DROUNDOFF=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:43:06 roa/radeonvii-w2 OpenCL compilation in 23.37 s
2020-04-17 14:43:11 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run
2020-04-17 14:43:11 roa/radeonvii-w2 Bye

2020-04-17 14:43:40 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:43:40 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS
2020-04-17 14:43:40 device 1, unique id ''
2020-04-17 14:43:40 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:43:40 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:43:52 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:44:10 roa/radeonvii-w2 OpenCL compilation in 18.00 s
2020-04-17 14:44:14 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run
2020-04-17 14:44:14 roa/radeonvii-w2 Bye

2020-04-17 14:44:54 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:44:54 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,DEBUG
2020-04-17 14:44:54 device 1, unique id ''
2020-04-17 14:44:54 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:44:54 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:45:06 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:45:31 roa/radeonvii-w2 OpenCL compilation in 24.75 s
2020-04-17 14:45:43 roa/radeonvii-w2 852348659 OK 165328000 loaded: blockSize 400, f095cd3f6ba99a30
2020-04-17 14:46:07 roa/radeonvii-w2 852348659 OK 165328800 19.40%; 17545 us/it; ETA 139d 12:21; ebb1fef7de6d72cf (check 9.52s) 4 errors
/[/CODE]
Debug overhead seems to be around 35.%; roundoff 0.03% based on very brief trials.

preda 2020-04-17 22:30

[QUOTE=kriesel;542984]

-use STATS and ROUNDOFF are not listed above. Have they been removed?

There do seem to be some oddities, like getting the message both CARRY32 and CARRY64 have been specified when CARRY32 is specified, or out of resources message and terminate when -use STATS attempted.[/QUOTE]

(the small problems fixed in a recent commit: README p-1, M_PI)

- ROUNDOFF does not exist anymore, it's called STATS now.
- with that exponent, CARRY64 is required so the software inserts it for you. At the same time, you manually request CARRY32. This situation is reported as a conflict. This hand-holding has the advantage to protect against users specifying invalid CARRY32 which would be severe for a naked LL
- do you have more details about the "out of resources" with STATS?

kriesel 2020-04-17 23:34

[QUOTE=preda;543008](the small problems fixed in a recent commit: README p-1, M_PI)

- ROUNDOFF does not exist anymore, it's called STATS now.
- with that exponent, CARRY64 is required so the software inserts it for you. At the same time, you manually request CARRY32. This situation is reported as a conflict. This hand-holding has the advantage to protect against users specifying invalid CARRY32 which would be severe for a naked LL[/QUOTE]Thanks for the above.
[QUOTE]- do you have more details about the "out of resources" with STATS?[/QUOTE]I don't have any more than what the console showed in the preceding post's long CODE section. gpuowl.log says the same; see following.[CODE]2020-04-17 14:43:40 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS
2020-04-17 14:43:40 config: ;,DEBUG,ROUNDOFF
2020-04-17 14:43:40 device 1, unique id ''
2020-04-17 14:43:40 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:43:40 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:43:52 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:44:10 roa/radeonvii-w2 OpenCL compilation in 18.00 s
2020-04-17 14:44:14 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run
2020-04-17 14:44:14 roa/radeonvii-w2 Bye
[/CODE]I'll try -use STATS on some other exponents / fft lengths.

kriesel 2020-04-18 17:40

-use STATS through the occurrence of a GEC error
 
Finally caught a GEC error with -use STATS in effect, after ~ 30 hours, on gpuowl-win v6.11-259, rx550 that was reliable previously. Maybe the fft length's exponent limit is just slightly too high and this exponent is pushing it. See also [URL]https://mersenneforum.org/showpost.php?p=542963&postcount=2094[/URL] and [URL]https://mersenneforum.org/showthread.php?t=25452[/URL]

The program's help output says the 5M fft is usable up to 95.71M.[CODE]
2020-04-18 00:29:06 condorella/rx550 94741139 OK 28880000 30.48%; 16548 us/it; ETA 12d 14:44; dee8912001b65af1 (check 6.82s) 9 errors
2020-04-18 00:40:08 condorella/rx550 Roundoff: N=40500, max 0.300651, avg 0.205114, sdev 0.012225 (0.059601, 0.061244), max-round 0.400715
2020-04-18 00:40:08 condorella/rx550 Carry: N=40499, max 3ddbbf85, avg 2b51cb2f; CarryM: N=1, max 81bd1be4, avg 81bd1be4
2020-04-18 00:40:15 condorella/rx550 94741139 OK 28920000 30.53%; 16549 us/it; ETA 12d 14:35; 75ec23b835c4110e (check 6.91s) 9 errors
2020-04-18 00:51:17 condorella/rx550 Roundoff: N=40500, max 0.287802, avg 0.205098, sdev 0.012112 (0.059053, 0.060665), max-round 0.398885
2020-04-18 00:51:17 condorella/rx550 Carry: N=40499, max 3ab5e0b9, avg 2b54d6af; CarryM: N=1, max 8a7e172e, avg 8a7e172e
2020-04-18 00:51:23 condorella/rx550 94741139 OK 28960000 30.57%; 16543 us/it; ETA 12d 14:17; bd54d7293f55a663 (check 6.80s) 9 errors
2020-04-18 01:02:25 condorella/rx550 Roundoff: N=40500, max 0.303200, avg 0.205065, sdev 0.012201 (0.059499, 0.061136), max-round 0.400285
2020-04-18 01:02:25 condorella/rx550 Carry: N=40499, max 3c80220d, avg 2b50c51e; CarryM: N=1, max 904d2cef, avg 904d2cef
2020-04-18 01:02:32 condorella/rx550 94741139 OK 29000000 30.61%; 16550 us/it; ETA 12d 14:14; 4abacbf0300b8d02 (check 6.78s) 9 errors
2020-04-18 01:13:34 condorella/rx550 Roundoff: N=40500, [B]max 0.507677[/B], avg 0.205098, sdev 0.012272 (0.059834, 0.061490), max-round 0.401449
2020-04-18 01:13:34 condorella/rx550 Carry: N=40499, max 3ca2ec25, avg 2b5134d2; CarryM: N=1, max 8ee0956e, avg 8ee0956e
2020-04-18 01:13:41 condorella/rx550 94741139 [COLOR=Red][B]EE[/B][/COLOR] 29040000 30.65%; 16547 us/it; ETA 12d 14:00; 94dffebb91e5bac9 (check 6.79s) 9 errors
2020-04-18 01:13:48 condorella/rx550 94741139 OK 29000000 loaded: blockSize 400, 4abacbf0300b8d02
2020-04-18 01:24:50 condorella/rx550 Roundoff: N=40928, max 0.308405, avg 0.205040, sdev 0.012282 (0.059900, 0.061560), max-round 0.401552
2020-04-18 01:24:50 condorella/rx550 Carry: N=40926, max 3ca2ec25, avg 2b4e7a6d; CarryM: N=2, max 879ffbad, avg 71a4f7ac
2020-04-18 01:24:57 condorella/rx550 94741139 OK 29040000 30.65%; 16552 us/it; ETA 12d 14:05; 94dffebb91e5bac9 (check 6.81s) 10 errors
2020-04-18 01:35:59 condorella/rx550 Roundoff: N=40500, max 0.289172, avg 0.205136, sdev 0.012255 (0.059742, 0.061392), max-round 0.401219
2020-04-18 01:35:59 condorella/rx550 Carry: N=40499, max 3cd22042, avg 2b55bcd1; CarryM: N=1, max 8e3bb9df, avg 8e3bb9df
2020-04-18 01:36:06 condorella/rx550 94741139 OK 29080000 30.69%; 16549 us/it; ETA 12d 13:50; 1af282f82b04e1e9 (check 6.79s) 10 errors
2020-04-18 01:47:08 condorella/rx550 Roundoff: N=40500, max 0.286431, avg 0.205259, sdev 0.012291 (0.059879, 0.061538), max-round 0.401912
2020-04-18 01:47:08 condorella/rx550 Carry: N=40499, max 3c82a79c, avg 2b52a12b; CarryM: N=1, max 7cc1bbcb, avg 7cc1bbcb
2020-04-18 01:47:14 condorella/rx550 94741139 OK 29120000 30.74%; 16545 us/it; ETA 12d 13:35; ca53d4b9ba5f4404 (check 6.80s) 10 errors
2020-04-18 01:58:16 condorella/rx550 Roundoff: N=40500, max 0.311799, avg 0.205057, sdev 0.012195 (0.059469, 0.061105), max-round 0.400171
2020-04-18 01:58:16 condorella/rx550 Carry: N=40499, max 390cef07, avg 2b50e612; CarryM: N=1, max 81a290db, avg 81a290db
2020-04-18 01:58:23 condorella/rx550 94741139 OK 29160000 30.78%; 16549 us/it; ETA 12d 13:29; c18cada5e14a957b (check 6.79s) 10 errors
2020-04-18 02:09:25 condorella/rx550 Roundoff: N=40500, max 0.306561, avg 0.205154, sdev 0.012206 (0.059498, 0.061135), max-round 0.400454
2020-04-18 02:09:25 condorella/rx550 Carry: N=40499, max 3b2efd5d, avg 2b527db0; CarryM: N=1, max 82169439, avg 82169439
2020-04-18 02:09:32 condorella/rx550 94741139 OK 29200000 30.82%; 16544 us/it; ETA 12d 13:12; e8410c7d73b8f089 (check 6.81s) 10 errors
2020-04-18 02:20:34 condorella/rx550 Roundoff: N=40500, max 0.299344, avg 0.205238, sdev 0.012232 (0.059601, 0.061244), max-round 0.400957
2020-04-18 02:20:34 condorella/rx550 Carry: N=40499, max 39891e5b, avg 2b510501; CarryM: N=1, max 7aa1e3fc, avg 7aa1e3fc[/CODE]

ewmayer 2020-04-18 19:37

Quibble - reported roundoff error should never be > 0.5, as the fractional part is computed as abs(x - rnd(x)).

preda 2020-04-18 21:57

I see in this case that the residue was correct when the error was reported (because the line with "OK" on the same iteration has the same residue). Is this a pattern -- do you see the same for the previous errors?

The most likely explanation is still GPU error (either memory-related or processor related). Do you have another similar GPU to try on, for comparison?

The roundoff being large is most likely a red herring here.

[QUOTE=kriesel;543080]Finally caught a GEC error with -use STATS in effect, after ~ 30 hours, on gpuowl-win v6.11-259, rx550 that was reliable previously. Maybe the fft length's exponent limit is just slightly too high and this exponent is pushing it. See also [URL]https://mersenneforum.org/showpost.php?p=542963&postcount=2094[/URL] and [URL]https://mersenneforum.org/showthread.php?t=25452[/URL]

The program's help output says the 5M fft is usable up to 95.71M.[CODE]
2020-04-18 01:13:34 condorella/rx550 Roundoff: N=40500, [B]max 0.507677[/B], avg 0.205098, sdev 0.012272 (0.059834, 0.061490), max-round 0.401449
2020-04-18 01:13:34 condorella/rx550 Carry: N=40499, max 3ca2ec25, avg 2b5134d2; CarryM: N=1, max 8ee0956e, avg 8ee0956e
2020-04-18 01:13:41 condorella/rx550 94741139 [COLOR=Red][B]EE[/B][/COLOR] 29040000 30.65%; 16547 us/it; ETA 12d 14:00; 94dffebb91e5bac9 (check 6.79s) 9 errors
2020-04-18 01:13:48 condorella/rx550 94741139 OK 29000000 loaded: blockSize 400, 4abacbf0300b8d02
2020-04-18 01:24:50 condorella/rx550 Roundoff: N=40928, max 0.308405, avg 0.205040, sdev 0.012282 (0.059900, 0.061560), max-round 0.401552
2020-04-18 01:24:50 condorella/rx550 Carry: N=40926, max 3ca2ec25, avg 2b4e7a6d; CarryM: N=2, max 879ffbad, avg 71a4f7ac
2020-04-18 01:24:57 condorella/rx550 94741139 OK 29040000 30.65%; 16552 us/it; ETA 12d 14:05; 94dffebb91e5bac9 (check 6.81s) 10 errors
[/CODE][/QUOTE]

preda 2020-04-18 22:06

[QUOTE=ewmayer;543087]Quibble - reported roundoff error should never be > 0.5, as the fractional part is computed as abs(x - rnd(x)).[/QUOTE]

More precisely, given reverse-weight "w" and FFT-output word "x", the error is computed as:

abs(FMA(x, w, -rint(x * w)));

which, arguably, can be larger that 0.5.


All times are UTC. The time now is 23:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.