![]() |
![]() |
#1728 | |
"Mihai Preda"
Apr 2015
22·192 Posts |
![]()
Try using -yield which is specially designed to address CUDA CPU usage.
Quote:
|
|
![]() |
![]() |
![]() |
#1729 | |
"William Garnett III"
Oct 2002
Langhorne, PA
5616 Posts |
![]() Quote:
(Note I am actually going to sleep right now so will be offline but feel free for you or anyone else to comment on what may be happening) |
|
![]() |
![]() |
![]() |
#1730 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·29·127 Posts |
![]()
An unfortunate feature of prime95 is that without hyperthreading enabled on the host, occupying one cpu core with something else can cost an entire prime95 worker's output, however many cores that is. A little cpu usage by gpuowl even with -yield is normal. Some cpu cycles are used for save checkpoints to disk, screen output, doing the GEC, etc. But if -yield is in the config.txt or the command line, it should be reduced from the full cpu core or hyperthread that occurs without that option. How much were you seeing without -yield, and how much with?
Last fiddled with by kriesel on 2020-01-09 at 15:42 |
![]() |
![]() |
![]() |
#1731 |
"mrh"
Oct 2018
Temecula, ca
2·32·5 Posts |
![]()
On my ubuntu system, rocm got upgraded (by apt upgrade) to what looks to be 3.0.6, and now in some instances gpuowl crashes immediately with this message:
Code:
Memory access fault by GPU node-1 (Agent handle: 0x56153da65fb0) on address 0x7f6250a00000. Reason: Page not present or supervisor privilege. Code:
[10312.567135] amdgpu 0000:04:00.0: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process gpuowl pid 10653 thread gpuowl pid 10653) [10312.567142] amdgpu 0000:04:00.0: in page starting at address 0x00007f6250a00000 from client 27 [10312.567146] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00801030 [10312.567150] amdgpu 0000:04:00.0: MORE_FAULTS: 0x0 [10312.567154] amdgpu 0000:04:00.0: WALKER_ERROR: 0x0 [10312.567157] amdgpu 0000:04:00.0: PERMISSION_FAULTS: 0x3 [10312.567160] amdgpu 0000:04:00.0: MAPPING_ERROR: 0x0 [10312.567164] amdgpu 0000:04:00.0: RW: 0x0 -pm1 works fine, -prp 133331333 works fine, but -prp on most other numbers crashes. This was with a clean build from github. As far as I can tell nothing happened to the card, mfacto also seems to work just fine. Any ideas about what to do? Is 3.0.6 a bad version to be using? |
![]() |
![]() |
![]() |
#1732 |
"Mihai Preda"
Apr 2015
5A416 Posts |
![]()
For a task of the form:
PRP=XXXXXXXX,1,2,91408469,-1,77,1 i.e. note the final integer, let's call it "wantsPm1", being "1" instead of the usual "0" -- this indicates that P-1 testing is desired; gpuowl will automatically expand the task into a P-1 and a PRP with the "wantsPm1" set to 0. It works like this: - gpuowl reads the first good line from worktodo.txt - if that line is a PRP with wantsPm1 non-zero, two new tasks are *appended* to the worktodot.txt (i.e. at the end) - after which the PRP task that was having wantPm1 is deleted from worktodo.txt - loop to find the first task in workdoto.txt I.e. this would result in a re-ordering of the tasks in worktodo.txt because the "expanded" tasks are always added to the end. It is likely there are some bugs, please let me know if you see any. |
![]() |
![]() |
![]() |
#1733 | |
"Mihai Preda"
Apr 2015
26448 Posts |
![]()
It is because of the upgrade to ROCm 3.0. I don't know where exactly is the bug (in gpuowl or in ROCm) because personally I couldn't yet upgrade to ROCm 3.0 to test. When I can install 3.0 I'll have a look. In the meantime a solution is to move back to ROCm 2.10 .
Quote:
|
|
![]() |
![]() |
![]() |
#1734 |
"mrh"
Oct 2018
Temecula, ca
2·32·5 Posts |
![]()
Ah, thanks! I rolled back to rocm 2.10 and I'm all better.
|
![]() |
![]() |
![]() |
#1735 | |
"6800 descendent"
Feb 2005
Colorado
10111000102 Posts |
![]() Quote:
How can I check my current ROCm version? It doesn't show up in dmesg or lsmod. |
|
![]() |
![]() |
![]() |
#1736 | |
"mrh"
Oct 2018
Temecula, ca
9010 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#1737 |
"6800 descendent"
Feb 2005
Colorado
2×32×41 Posts |
![]()
My dpkg output doesn't have any lines that contain rocm. Some do contain "amdgpu", but the only version number I can find is 19.30-934563, which doesn't shed any light on the rocm version as far as I can tell.
Code:
ii amdgpu-core 19.30-934563 all Core meta package for unified amdgpu driver. ii amdgpu-dkms 19.30-934563 all amdgpu driver in DKMS format. ii amdgpu-pro-core 19.30-934563 all Core meta package for Pro components of the unified amdgpu driver. ii amdgpu-pro-pin 19.30-934563 all Meta package to pin a specific amdgpu driver version. ii clinfo-amdgpu-pro 19.30-934563 amd64 AMD OpenCL info utility ii libdrm-amdgpu-amdgpu1:amd64 1:2.4.98-934563 amd64 Userspace interface to amdgpu-specific kernel DRM services -- runtime ii libdrm-amdgpu-common 1.0.0-934563 all List of AMD/ATI cards' device IDs, revision IDs and marketing names ii libdrm2-amdgpu:amd64 1:2.4.98-934563 amd64 Userspace interface to kernel DRM services -- runtime ii libopencl1-amdgpu-pro:amd64 19.30-934563 amd64 AMD OpenCL ICD Loader library ii opencl-amdgpu-pro-comgr 19.30-934563 amd64 non-free AMD OpenCL ICD Loaders ii opencl-amdgpu-pro-icd 19.30-934563 amd64 non-free AMD OpenCL ICD Loaders Last fiddled with by PhilF on 2020-01-09 at 22:08 |
![]() |
![]() |
![]() |
#1738 | |
"mrh"
Oct 2018
Temecula, ca
2×32×5 Posts |
![]() Quote:
-mike Last fiddled with by mrh on 2020-01-09 at 22:50 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1719 | 2023-01-16 15:51 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |