![]() |
[QUOTE=moebius;556560]If it is a PRP test, there should be a 107826457-old.owl file in the directory of the exponent, For LL I don't know.[/QUOTE]
Yes, the file is there. Should I throw away the non-old file, and rename the old file to 107826457.owl ? |
[QUOTE=Viliam Furik;556562]Yes, the file is there. Should I throw away the non-old file, and rename the old file to 107826457.owl ?[/QUOTE]
yes, but make a copy of both files before, than first try the original 107826457.owl at restart. |
[QUOTE=moebius;556564]yes, but make a copy of both files before, than first try the original 107826457.owl at restart.[/QUOTE]
It worked! Thank you very much! |
We are experiencing a laggy display using gpuowl with Linux on an Nvidia card. We didn't notice any lag with our AMD cards. Is there any way to fix this or diagnose what we are doing wrong?
:mike: |
[QUOTE=Xyzzy;556573]We are experiencing a laggy display using gpuowl with Linux on an Nvidia card. We didn't notice any lag with our AMD cards. Is there any way to fix this or diagnose what we are doing wrong?
:mike:[/QUOTE] You might try a lower -block value, something like 200 or 100 or 50. This should lower the number of kernels that are queued at once to the GPU, but it has the side effect of reducing a bit the efficiency. Anyway, if a lower block-size fixed your problem, I can look into making that a special behavior for Nvidia without needing to change the block-size. |
[QUOTE=preda;556619][B]but it has the side effect of reducing a bit the efficiency.[/B] Anyway, if a lower block-size fixed your problem, [B]I can look into making that a special behavior for Nvidia without needing to change the block-size.[/B][/QUOTE]
Please don't do this ! I think it depends on the specific Nvidia card he uses.Not every gpuowl- user is so familiar with the settings to even notice for themselves that the default block-size has to be changed in order to use the full compute power. f.e. The current Win64 gpuowl-version 6.11-380 is around 100 us/it slower on Vega64 than the version 6.11-364. for FTT 5.5 M, and I don't know what's the reason for this behaviour. |
Had no issues with this version running LL/P-1... what am I doing wrong here?
(same result without -use arguments) [code] 2020-09-13 09:08:43 gpuowl v6.11-380-g79ea0cc 2020-09-13 09:08:43 config: -use OUT_SIZEX=8,OUT_SPACING=2 -user kracker -cpu core -cleanup 2020-09-13 09:08:43 config: -prp 104914741 2020-09-13 09:08:43 device 0, unique id '' 2020-09-13 09:08:43 core 104914741 FFT: 5.50M 1K:11:256 (18.19 bpw) 2020-09-13 09:08:43 core Expected maximum carry32: 506F0000 2020-09-13 09:08:44 core OpenCL args "-DEXP=104914741u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DPM1=0 -DAMDGPU=1 -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xc.04912aa6417p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.b9d67ad213798p-5 -DOUT_SIZEX=8 -DOUT_SPACING=2 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-09-13 09:08:44 core ASM compilation failed, retrying compilation using NO_ASM 2020-09-13 09:08:48 core OpenCL compilation in 3.81 s 2020-09-13 09:08:50 core 104914741 OK 0 loaded: blockSize 400, 0000000000000003 2020-09-13 09:08:50 core validating proof residues for power 8 2020-09-13 09:08:50 core Proof using power 8 2020-09-13 09:08:54 core 104914741 EE 800 0.00%; 3902 us/it; ETA 4d 17:43; 281087c3716953d2 (check 1.69s) 2020-09-13 09:08:56 core 104914741 OK 0 loaded: blockSize 400, 0000000000000003 2020-09-13 09:09:01 core 104914741 EE 800 0.00%; 3903 us/it; ETA 4d 17:44; 281087c3716953d2 (check 1.69s) 1 errors 2020-09-13 09:09:03 core 104914741 OK 0 loaded: blockSize 400, 0000000000000003 2020-09-13 09:09:03 core Stopping, please wait.. 2020-09-13 09:09:06 core 104914741 EE 400 0.00%; 3906 us/it; ETA 4d 17:50; 036344df470abd74 (check 1.67s) 2 errors 2020-09-13 09:09:06 core 3 sequential errors, will stop. 2020-09-13 09:09:06 core Exiting because "too many errors" 2020-09-13 09:09:06 core Bye [/code] |
[QUOTE=kracker;556882]2020-09-13 09:09:06 core 3 sequential errors, will stop.
2020-09-13 09:09:06 core Exiting because "too many errors" 2020-09-13 09:09:06 core Bye[/QUOTE] I only had once the same problem with 3 errors with gpuowl / PRP when starting over with an exponent with a Tesla-K-80 and colab (Ubuntu). But I think the people at google immediately threw the defect card into the trash can. |
I don't know. I verified that the residue you see at #800 (281087c3716953d2) is correct (I get the same), so the error is affecting the check or something around it, not the core computation. Can you run the exponent on a different GPU?
[QUOTE=kracker;556882]Had no issues with this version running LL/P-1... what am I doing wrong here? (same result without -use arguments) [code] 2020-09-13 09:08:43 gpuowl v6.11-380-g79ea0cc 2020-09-13 09:08:43 config: -use OUT_SIZEX=8,OUT_SPACING=2 -user kracker -cpu core -cleanup 2020-09-13 09:08:43 config: -prp 104914741 2020-09-13 09:08:43 device 0, unique id '' 2020-09-13 09:08:43 core 104914741 FFT: 5.50M 1K:11:256 (18.19 bpw) 2020-09-13 09:08:43 core Expected maximum carry32: 506F0000 2020-09-13 09:08:44 core OpenCL args "-DEXP=104914741u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DPM1=0 -DAMDGPU=1 -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xc.04912aa6417p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.b9d67ad213798p-5 -DOUT_SIZEX=8 -DOUT_SPACING=2 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-09-13 09:08:44 core ASM compilation failed, retrying compilation using NO_ASM 2020-09-13 09:08:48 core OpenCL compilation in 3.81 s 2020-09-13 09:08:50 core 104914741 OK 0 loaded: blockSize 400, 0000000000000003 2020-09-13 09:08:50 core validating proof residues for power 8 2020-09-13 09:08:50 core Proof using power 8 2020-09-13 09:08:54 core 104914741 EE 800 0.00%; 3902 us/it; ETA 4d 17:43; 281087c3716953d2 (check 1.69s) 2020-09-13 09:08:56 core 104914741 OK 0 loaded: blockSize 400, 0000000000000003 2020-09-13 09:09:01 core 104914741 EE 800 0.00%; 3903 us/it; ETA 4d 17:44; 281087c3716953d2 (check 1.69s) 1 errors 2020-09-13 09:09:03 core 104914741 OK 0 loaded: blockSize 400, 0000000000000003 2020-09-13 09:09:03 core Stopping, please wait.. 2020-09-13 09:09:06 core 104914741 EE 400 0.00%; 3906 us/it; ETA 4d 17:50; 036344df470abd74 (check 1.67s) 2 errors 2020-09-13 09:09:06 core 3 sequential errors, will stop. 2020-09-13 09:09:06 core Exiting because "too many errors" 2020-09-13 09:09:06 core Bye [/code][/QUOTE] |
[QUOTE=kracker;556882]Had no issues with this version running LL/P-1... what am I doing wrong here?[/QUOTE]Check cpu-side component temperatures especially system ram.
I had a system with similar problems, traced eventually to hot ram sticks because of a bad fan. (Hotter than rated operating temperature range.) PRP/GEC will detect errors the most completely/frequently, that LL only sometimes spots, and P-1 does not spot. PRP computations occur on the gpu and its ram, get brought over to the cpu side for the GEC, as I recall. The gpu data can be fine, and an error introduced at the cpu ram. |
[QUOTE=preda;556910]I don't know. I verified that the residue you see at #800 (281087c3716953d2) is correct (I get the same), so the error is affecting the check or something around it, not the core computation. Can you run the exponent on a different GPU?[/QUOTE]
Sadly - no.. [QUOTE=kriesel;556911] Check cpu-side component temperatures especially system ram. I had a system with similar problems, traced eventually to hot ram sticks because of a bad fan. (Hotter than rated operating temperature range.) [/QUOTE] 99% sure I don't have any issues there. |
| All times are UTC. The time now is 22:48. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.