mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-09-09, 16:23   #2432
Viliam Furik
 
Jul 2018
Martin, Slovakia

23·33 Posts
Default

Quote:
Originally Posted by moebius View Post
If it is a PRP test, there should be a 107826457-old.owl file in the directory of the exponent, For LL I don't know.
Yes, the file is there. Should I throw away the non-old file, and rename the old file to 107826457.owl ?
Viliam Furik is online now   Reply With Quote
Old 2020-09-09, 16:29   #2433
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2×33×7 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
Yes, the file is there. Should I throw away the non-old file, and rename the old file to 107826457.owl ?
yes, but make a copy of both files before, than first try the original 107826457.owl at restart.

Last fiddled with by moebius on 2020-09-09 at 16:30
moebius is online now   Reply With Quote
Old 2020-09-09, 17:09   #2434
Viliam Furik
 
Jul 2018
Martin, Slovakia

D816 Posts
Default

Quote:
Originally Posted by moebius View Post
yes, but make a copy of both files before, than first try the original 107826457.owl at restart.
It worked! Thank you very much!
Viliam Furik is online now   Reply With Quote
Old 2020-09-09, 17:12   #2435
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

170318 Posts
Default

We are experiencing a laggy display using gpuowl with Linux on an Nvidia card. We didn't notice any lag with our AMD cards. Is there any way to fix this or diagnose what we are doing wrong?

Xyzzy is offline   Reply With Quote
Old 2020-09-10, 08:46   #2436
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

129110 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We are experiencing a laggy display using gpuowl with Linux on an Nvidia card. We didn't notice any lag with our AMD cards. Is there any way to fix this or diagnose what we are doing wrong?

You might try a lower -block value, something like 200 or 100 or 50. This should lower the number of kernels that are queued at once to the GPU, but it has the side effect of reducing a bit the efficiency. Anyway, if a lower block-size fixed your problem, I can look into making that a special behavior for Nvidia without needing to change the block-size.
preda is offline   Reply With Quote
Old 2020-09-10, 14:16   #2437
moebius
 
moebius's Avatar
 
Jul 2009
Germany

5728 Posts
Default

Quote:
Originally Posted by preda View Post
but it has the side effect of reducing a bit the efficiency. Anyway, if a lower block-size fixed your problem, I can look into making that a special behavior for Nvidia without needing to change the block-size.
Please don't do this ! I think it depends on the specific Nvidia card he uses.Not every gpuowl- user is so familiar with the settings to even notice for themselves that the default block-size has to be changed in order to use the full compute power.
f.e. The current Win64 gpuowl-version 6.11-380 is around 100 us/it slower on Vega64 than the version 6.11-364. for FTT 5.5 M, and I don't know what's the reason for this behaviour.

Last fiddled with by moebius on 2020-09-10 at 14:27
moebius is online now   Reply With Quote
Old 2020-09-13, 16:41   #2438
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

41708 Posts
Default

Had no issues with this version running LL/P-1... what am I doing wrong here?
(same result without -use arguments)
Code:
2020-09-13 09:08:43 gpuowl v6.11-380-g79ea0cc
2020-09-13 09:08:43 config: -use OUT_SIZEX=8,OUT_SPACING=2 -user kracker -cpu core -cleanup
2020-09-13 09:08:43 config: -prp 104914741
2020-09-13 09:08:43 device 0, unique id ''
2020-09-13 09:08:43 core 104914741 FFT: 5.50M 1K:11:256 (18.19 bpw)
2020-09-13 09:08:43 core Expected maximum carry32: 506F0000
2020-09-13 09:08:44 core OpenCL args "-DEXP=104914741u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DPM1=0 -DAMDGPU=1 -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xc.04912aa6417p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.b9d67ad213798p-5 -DOUT_SIZEX=8 -DOUT_SPACING=2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-09-13 09:08:44 core ASM compilation failed, retrying compilation using NO_ASM
2020-09-13 09:08:48 core OpenCL compilation in 3.81 s
2020-09-13 09:08:50 core 104914741 OK        0 loaded: blockSize 400, 0000000000000003
2020-09-13 09:08:50 core validating proof residues for power 8
2020-09-13 09:08:50 core Proof using power 8
2020-09-13 09:08:54 core 104914741 EE      800   0.00%; 3902 us/it; ETA 4d 17:43; 281087c3716953d2 (check 1.69s)
2020-09-13 09:08:56 core 104914741 OK        0 loaded: blockSize 400, 0000000000000003
2020-09-13 09:09:01 core 104914741 EE      800   0.00%; 3903 us/it; ETA 4d 17:44; 281087c3716953d2 (check 1.69s) 1 errors
2020-09-13 09:09:03 core 104914741 OK        0 loaded: blockSize 400, 0000000000000003
2020-09-13 09:09:03 core Stopping, please wait..
2020-09-13 09:09:06 core 104914741 EE      400   0.00%; 3906 us/it; ETA 4d 17:50; 036344df470abd74 (check 1.67s) 2 errors
2020-09-13 09:09:06 core 3 sequential errors, will stop.
2020-09-13 09:09:06 core Exiting because "too many errors"
2020-09-13 09:09:06 core Bye
kracker is offline   Reply With Quote
Old 2020-09-13, 18:37   #2439
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2×33×7 Posts
Angry

Quote:
Originally Posted by kracker View Post
2020-09-13 09:09:06 core 3 sequential errors, will stop.
2020-09-13 09:09:06 core Exiting because "too many errors"
2020-09-13 09:09:06 core Bye
I only had once the same problem with 3 errors with gpuowl / PRP when starting over with an exponent with a Tesla-K-80 and colab (Ubuntu). But I think the people at google immediately threw the defect card into the trash can.

Last fiddled with by moebius on 2020-09-13 at 18:39
moebius is online now   Reply With Quote
Old 2020-09-13, 23:07   #2440
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

1,291 Posts
Default

I don't know. I verified that the residue you see at #800 (281087c3716953d2) is correct (I get the same), so the error is affecting the check or something around it, not the core computation. Can you run the exponent on a different GPU?

Quote:
Originally Posted by kracker View Post
Had no issues with this version running LL/P-1... what am I doing wrong here?
(same result without -use arguments)
Code:
2020-09-13 09:08:43 gpuowl v6.11-380-g79ea0cc
2020-09-13 09:08:43 config: -use OUT_SIZEX=8,OUT_SPACING=2 -user kracker -cpu core -cleanup
2020-09-13 09:08:43 config: -prp 104914741
2020-09-13 09:08:43 device 0, unique id ''
2020-09-13 09:08:43 core 104914741 FFT: 5.50M 1K:11:256 (18.19 bpw)
2020-09-13 09:08:43 core Expected maximum carry32: 506F0000
2020-09-13 09:08:44 core OpenCL args "-DEXP=104914741u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DPM1=0 -DAMDGPU=1 -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xc.04912aa6417p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.b9d67ad213798p-5 -DOUT_SIZEX=8 -DOUT_SPACING=2  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-09-13 09:08:44 core ASM compilation failed, retrying compilation using NO_ASM
2020-09-13 09:08:48 core OpenCL compilation in 3.81 s
2020-09-13 09:08:50 core 104914741 OK        0 loaded: blockSize 400, 0000000000000003
2020-09-13 09:08:50 core validating proof residues for power 8
2020-09-13 09:08:50 core Proof using power 8
2020-09-13 09:08:54 core 104914741 EE      800   0.00%; 3902 us/it; ETA 4d 17:43; 281087c3716953d2 (check 1.69s)
2020-09-13 09:08:56 core 104914741 OK        0 loaded: blockSize 400, 0000000000000003
2020-09-13 09:09:01 core 104914741 EE      800   0.00%; 3903 us/it; ETA 4d 17:44; 281087c3716953d2 (check 1.69s) 1 errors
2020-09-13 09:09:03 core 104914741 OK        0 loaded: blockSize 400, 0000000000000003
2020-09-13 09:09:03 core Stopping, please wait..
2020-09-13 09:09:06 core 104914741 EE      400   0.00%; 3906 us/it; ETA 4d 17:50; 036344df470abd74 (check 1.67s) 2 errors
2020-09-13 09:09:06 core 3 sequential errors, will stop.
2020-09-13 09:09:06 core Exiting because "too many errors"
2020-09-13 09:09:06 core Bye
preda is offline   Reply With Quote
Old 2020-09-13, 23:31   #2441
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×1,151 Posts
Default

Quote:
Originally Posted by kracker View Post
Had no issues with this version running LL/P-1... what am I doing wrong here?
Check cpu-side component temperatures especially system ram.
I had a system with similar problems, traced eventually to hot ram sticks because of a bad fan. (Hotter than rated operating temperature range.)

PRP/GEC will detect errors the most completely/frequently, that LL only sometimes spots, and P-1 does not spot. PRP computations occur on the gpu and its ram, get brought over to the cpu side for the GEC, as I recall. The gpu data can be fine, and an error introduced at the cpu ram.
kriesel is online now   Reply With Quote
Old 2020-09-14, 23:46   #2442
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by preda View Post
I don't know. I verified that the residue you see at #800 (281087c3716953d2) is correct (I get the same), so the error is affecting the check or something around it, not the core computation. Can you run the exponent on a different GPU?
Sadly - no..

Quote:
Originally Posted by kriesel View Post
Check cpu-side component temperatures especially system ram.
I had a system with similar problems, traced eventually to hot ram sticks because of a bad fan. (Hotter than rated operating temperature range.)
99% sure I don't have any issues there.
kracker is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1656 2020-10-13 14:21
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 16:27.

Sun Oct 25 16:27:48 UTC 2020 up 45 days, 13:38, 1 user, load averages: 1.42, 1.87, 1.85

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.