mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-01-09, 12:39   #1728
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

19×53 Posts
Default

Try using -yield which is specially designed to address CUDA CPU usage.

Quote:
Originally Posted by wfgarnett3 View Post
I downloaded this gpuowl-v6.11-112-gf1b00d1.7z windows exe file that kriesel posted.

However it seems to have CPU usage (and I checked Task Manager to verify).

Whenever I have it running on the GPU and Prime95 on the CPU the iteration time on Prime95 slows down. Once I stop gpuowl Prime95 goes back to its normal iteration time, but as soon as I restart gpuowl Prime95 slows back down.

This is my first real usage of gpuowl (I am "double-checking" a PRP number just to test the software out).

This never happened when I double-checked LL for CudaLucas - CudaLucas (or mfaktc for TF) running on the GPU never affected Prime95 running simultanosly on the CPU.

Can someone tell me why this happens?

EVGA GeForce GTX 1050 SC GAMING (2GB GDDR5)
Part number: 02G-P4-6152-KR

Dell Desktop Tower with Windows 10
Intel i3-4150 @ 3.5GHz
Memory: 8.00 GB
preda is online now   Reply With Quote
Old 2020-01-09, 12:46   #1729
wfgarnett3
 
wfgarnett3's Avatar
 
"William Garnett III"
Oct 2002
Bensalem, PA

22×3×7 Posts
Default

Quote:
Originally Posted by preda View Post
Try using -yield which is specially designed to address CUDA CPU usage.
-yield does not seem to help - Prime95 still gets slowed.

(Note I am actually going to sleep right now so will be offline but feel free for you or anyone else to comment on what may be happening)
wfgarnett3 is offline   Reply With Quote
Old 2020-01-09, 15:40   #1730
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

381510 Posts
Default

Quote:
Originally Posted by wfgarnett3 View Post
-yield does not seem to help - Prime95 still gets slowed.

(Note I am actually going to sleep right now so will be offline but feel free for you or anyone else to comment on what may be happening)
An unfortunate feature of prime95 is that without hyperthreading enabled on the host, occupying one cpu core with something else can cost an entire prime95 worker's output, however many cores that is. A little cpu usage by gpuowl even with -yield is normal. Some cpu cycles are used for save checkpoints to disk, screen output, doing the GEC, etc. But if -yield is in the config.txt or the command line, it should be reduced from the full cpu core or hyperthread that occurs without that option. How much were you seeing without -yield, and how much with?

Last fiddled with by kriesel on 2020-01-09 at 15:42
kriesel is offline   Reply With Quote
Old 2020-01-09, 21:12   #1731
mrh
 
"mrh"
Oct 2018
Temecula, ca

1101112 Posts
Default

On my ubuntu system, rocm got upgraded (by apt upgrade) to what looks to be 3.0.6, and now in some instances gpuowl crashes immediately with this message:

Code:
Memory access fault by GPU node-1 (Agent handle: 0x56153da65fb0) on address 0x7f6250a00000. Reason: Page not present or supervisor privilege.
This from the kernel:
Code:
[10312.567135] amdgpu 0000:04:00.0: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process gpuowl pid 10653 thread gpuowl pid 10653)
[10312.567142] amdgpu 0000:04:00.0:   in page starting at address 0x00007f6250a00000 from client 27
[10312.567146] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00801030
[10312.567150] amdgpu 0000:04:00.0:      MORE_FAULTS: 0x0
[10312.567154] amdgpu 0000:04:00.0:      WALKER_ERROR: 0x0
[10312.567157] amdgpu 0000:04:00.0:      PERMISSION_FAULTS: 0x3
[10312.567160] amdgpu 0000:04:00.0:      MAPPING_ERROR: 0x0
[10312.567164] amdgpu 0000:04:00.0:      RW: 0x0

-pm1 works fine, -prp 133331333 works fine, but -prp on most other numbers crashes.

This was with a clean build from github. As far as I can tell nothing happened to the card, mfacto also seems to work just fine.

Any ideas about what to do? Is 3.0.6 a bad version to be using?
mrh is offline   Reply With Quote
Old 2020-01-09, 21:18   #1732
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

19×53 Posts
Default automatic initial P-1 for PRP

For a task of the form:
PRP=XXXXXXXX,1,2,91408469,-1,77,1
i.e. note the final integer, let's call it "wantsPm1", being "1" instead of the usual "0" -- this indicates that P-1 testing is desired;

gpuowl will automatically expand the task into a P-1 and a PRP with the "wantsPm1" set to 0.

It works like this:
- gpuowl reads the first good line from worktodo.txt
- if that line is a PRP with wantsPm1 non-zero, two new tasks are *appended* to the worktodot.txt (i.e. at the end)
- after which the PRP task that was having wantPm1 is deleted from worktodo.txt
- loop to find the first task in workdoto.txt

I.e. this would result in a re-ordering of the tasks in worktodo.txt because the "expanded" tasks are always added to the end.

It is likely there are some bugs, please let me know if you see any.
preda is online now   Reply With Quote
Old 2020-01-09, 21:22   #1733
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

19·53 Posts
Default

It is because of the upgrade to ROCm 3.0. I don't know where exactly is the bug (in gpuowl or in ROCm) because personally I couldn't yet upgrade to ROCm 3.0 to test. When I can install 3.0 I'll have a look. In the meantime a solution is to move back to ROCm 2.10 .

Quote:
Originally Posted by mrh View Post
On my ubuntu system, rocm got upgraded (by apt upgrade) to what looks to be 3.0.6, and now in some instances gpuowl crashes immediately with this message:

Code:
Memory access fault by GPU node-1 (Agent handle: 0x56153da65fb0) on address 0x7f6250a00000. Reason: Page not present or supervisor privilege.
This from the kernel:
Code:
[10312.567135] amdgpu 0000:04:00.0: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process gpuowl pid 10653 thread gpuowl pid 10653)
[10312.567142] amdgpu 0000:04:00.0:   in page starting at address 0x00007f6250a00000 from client 27
[10312.567146] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00801030
[10312.567150] amdgpu 0000:04:00.0:      MORE_FAULTS: 0x0
[10312.567154] amdgpu 0000:04:00.0:      WALKER_ERROR: 0x0
[10312.567157] amdgpu 0000:04:00.0:      PERMISSION_FAULTS: 0x3
[10312.567160] amdgpu 0000:04:00.0:      MAPPING_ERROR: 0x0
[10312.567164] amdgpu 0000:04:00.0:      RW: 0x0

-pm1 works fine, -prp 133331333 works fine, but -prp on most other numbers crashes.

This was with a clean build from github. As far as I can tell nothing happened to the card, mfacto also seems to work just fine.

Any ideas about what to do? Is 3.0.6 a bad version to be using?
preda is online now   Reply With Quote
Old 2020-01-09, 21:31   #1734
mrh
 
"mrh"
Oct 2018
Temecula, ca

678 Posts
Default

Ah, thanks! I rolled back to rocm 2.10 and I'm all better.
mrh is offline   Reply With Quote
Old 2020-01-09, 21:40   #1735
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

23×3×19 Posts
Default

Quote:
Originally Posted by preda View Post
It is because of the upgrade to ROCm 3.0. I don't know where exactly is the bug (in gpuowl or in ROCm) because personally I couldn't yet upgrade to ROCm 3.0 to test. When I can install 3.0 I'll have a look. In the meantime a solution is to move back to ROCm 2.10 .
Wow, I must have gotten in just under the wire. My installation is recent (Dec. 30), and I loaded whatever ROCm version was available then. However, It does not have video drivers loaded, only OpenCL, and no monitor is attached to my Radeon VII. I've not had a problem.

How can I check my current ROCm version? It doesn't show up in dmesg or lsmod.
PhilF is offline   Reply With Quote
Old 2020-01-09, 21:56   #1736
mrh
 
"mrh"
Oct 2018
Temecula, ca

5×11 Posts
Default

Quote:
Originally Posted by PhilF View Post
Wow, I must have gotten in just under the wire. My installation is recent (Dec. 30), and I loaded whatever ROCm version was available then. However, It does not have video drivers loaded, only OpenCL, and no monitor is attached to my Radeon VII. I've not had a problem.

How can I check my current ROCm version? It doesn't show up in dmesg or lsmod.
I think the simplest is "dpkg -l |grep rocm"
mrh is offline   Reply With Quote
Old 2020-01-09, 22:07   #1737
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

1110010002 Posts
Default

Quote:
Originally Posted by mrh View Post
I think the simplest is "dpkg -l |grep rocm"
My dpkg output doesn't have any lines that contain rocm. Some do contain "amdgpu", but the only version number I can find is 19.30-934563, which doesn't shed any light on the rocm version as far as I can tell.

Code:
ii  amdgpu-core                    19.30-934563                all          Core meta package for unified amdgpu driver.
ii  amdgpu-dkms                    19.30-934563                all          amdgpu driver in DKMS format.
ii  amdgpu-pro-core                19.30-934563                all          Core meta package for Pro components of the unified amdgpu driver.
ii  amdgpu-pro-pin                 19.30-934563                all          Meta package to pin a specific amdgpu driver version.
ii  clinfo-amdgpu-pro              19.30-934563                amd64        AMD OpenCL info utility
ii  libdrm-amdgpu-amdgpu1:amd64    1:2.4.98-934563             amd64        Userspace interface to amdgpu-specific kernel DRM services -- runtime
ii  libdrm-amdgpu-common           1.0.0-934563                all          List of AMD/ATI cards' device IDs, revision IDs and marketing names
ii  libdrm2-amdgpu:amd64           1:2.4.98-934563             amd64        Userspace interface to kernel DRM services -- runtime
ii  libopencl1-amdgpu-pro:amd64    19.30-934563                amd64        AMD OpenCL ICD Loader library
ii  opencl-amdgpu-pro-comgr        19.30-934563                amd64        non-free AMD OpenCL ICD Loaders
ii  opencl-amdgpu-pro-icd          19.30-934563                amd64        non-free AMD OpenCL ICD Loaders

Last fiddled with by PhilF on 2020-01-09 at 22:08
PhilF is offline   Reply With Quote
Old 2020-01-09, 22:50   #1738
mrh
 
"mrh"
Oct 2018
Temecula, ca

5×11 Posts
Default

Quote:
Originally Posted by PhilF View Post
My dpkg output doesn't have any lines that contain rocm. Some do contain "amdgpu", but the only version number I can find is 19.30-934563, which doesn't shed any light on the rocm version as far as I can tell.
I'm not an amd expert, but I think that indicates you are using the "amd pro" drivers (I think that is what they are called) vs. rocm.

-mike

Last fiddled with by mrh on 2020-01-09 at 22:50
mrh is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1615 2020-05-16 23:55
GPUOWL AMD Windows OpenCL issues xx005fs GPU Computing 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 22:42.

Thu May 28 22:42:32 UTC 2020 up 64 days, 20:15, 1 user, load averages: 1.27, 1.36, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.