mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2020-01-09 12:39

Try using -yield which is specially designed to address CUDA CPU usage.

[QUOTE=wfgarnett3;534666]I downloaded this gpuowl-v6.11-112-gf1b00d1.7z windows exe file that kriesel posted.

However it seems to have CPU usage (and I checked Task Manager to verify).

Whenever I have it running on the GPU and Prime95 on the CPU the iteration time on Prime95 slows down. Once I stop gpuowl Prime95 goes back to its normal iteration time, but as soon as I restart gpuowl Prime95 slows back down.

This is my first real usage of gpuowl (I am "double-checking" a PRP number just to test the software out).

This never happened when I double-checked LL for CudaLucas - CudaLucas (or mfaktc for TF) running on the GPU never affected Prime95 running simultanosly on the CPU.

Can someone tell me why this happens?

EVGA GeForce GTX 1050 SC GAMING (2GB GDDR5)
Part number: 02G-P4-6152-KR

Dell Desktop Tower with Windows 10
Intel i3-4150 @ 3.5GHz
Memory: 8.00 GB[/QUOTE]

wfgarnett3 2020-01-09 12:46

[QUOTE=preda;534667]Try using -yield which is specially designed to address CUDA CPU usage.[/QUOTE]

-yield does not seem to help - Prime95 still gets slowed.

(Note I am actually going to sleep right now so will be offline but feel free for you or anyone else to comment on what may be happening)

kriesel 2020-01-09 15:40

[QUOTE=wfgarnett3;534668]-yield does not seem to help - Prime95 still gets slowed.

(Note I am actually going to sleep right now so will be offline but feel free for you or anyone else to comment on what may be happening)[/QUOTE]An unfortunate feature of prime95 is that without hyperthreading enabled on the host, occupying one cpu core with something else can cost an entire prime95 worker's output, however many cores that is. A little cpu usage by gpuowl even with -yield is normal. Some cpu cycles are used for save checkpoints to disk, screen output, doing the GEC, etc. But if -yield is in the config.txt or the command line, it should be reduced from the full cpu core or hyperthread that occurs without that option. How much were you seeing without -yield, and how much with?

mrh 2020-01-09 21:12

On my ubuntu system, rocm got upgraded (by apt upgrade) to what looks to be 3.0.6, and now in some instances gpuowl crashes immediately with this message:

[CODE]Memory access fault by GPU node-1 (Agent handle: 0x56153da65fb0) on address 0x7f6250a00000. Reason: Page not present or supervisor privilege.[/CODE]

This from the kernel:
[CODE][10312.567135] amdgpu 0000:04:00.0: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process gpuowl pid 10653 thread gpuowl pid 10653)
[10312.567142] amdgpu 0000:04:00.0: in page starting at address 0x00007f6250a00000 from client 27
[10312.567146] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00801030
[10312.567150] amdgpu 0000:04:00.0: MORE_FAULTS: 0x0
[10312.567154] amdgpu 0000:04:00.0: WALKER_ERROR: 0x0
[10312.567157] amdgpu 0000:04:00.0: PERMISSION_FAULTS: 0x3
[10312.567160] amdgpu 0000:04:00.0: MAPPING_ERROR: 0x0
[10312.567164] amdgpu 0000:04:00.0: RW: 0x0[/CODE]


-pm1 works fine, -prp 133331333 works fine, but -prp on most other numbers crashes.

This was with a clean build from github. As far as I can tell nothing happened to the card, mfacto also seems to work just fine.

Any ideas about what to do? Is 3.0.6 a bad version to be using?

preda 2020-01-09 21:18

automatic initial P-1 for PRP
 
For a task of the form:
PRP=XXXXXXXX,1,2,91408469,-1,77,1
i.e. note the final integer, let's call it "wantsPm1", being "1" instead of the usual "0" -- this indicates that P-1 testing is desired;

gpuowl will automatically expand the task into a P-1 and a PRP with the "wantsPm1" set to 0.

It works like this:
- gpuowl reads the first good line from worktodo.txt
- if that line is a PRP with wantsPm1 non-zero, two new tasks are *appended* to the worktodot.txt (i.e. at the end)
- after which the PRP task that was having wantPm1 is deleted from worktodo.txt
- loop to find the first task in workdoto.txt

I.e. this would result in a re-ordering of the tasks in worktodo.txt because the "expanded" tasks are always added to the end.

It is likely there are some bugs, please let me know if you see any.

preda 2020-01-09 21:22

It is because of the upgrade to ROCm 3.0. I don't know where exactly is the bug (in gpuowl or in ROCm) because personally I couldn't yet upgrade to ROCm 3.0 to test. When I can install 3.0 I'll have a look. In the meantime a solution is to move back to ROCm 2.10 .

[QUOTE=mrh;534703]On my ubuntu system, rocm got upgraded (by apt upgrade) to what looks to be 3.0.6, and now in some instances gpuowl crashes immediately with this message:

[CODE]Memory access fault by GPU node-1 (Agent handle: 0x56153da65fb0) on address 0x7f6250a00000. Reason: Page not present or supervisor privilege.[/CODE]

This from the kernel:
[CODE][10312.567135] amdgpu 0000:04:00.0: [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process gpuowl pid 10653 thread gpuowl pid 10653)
[10312.567142] amdgpu 0000:04:00.0: in page starting at address 0x00007f6250a00000 from client 27
[10312.567146] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00801030
[10312.567150] amdgpu 0000:04:00.0: MORE_FAULTS: 0x0
[10312.567154] amdgpu 0000:04:00.0: WALKER_ERROR: 0x0
[10312.567157] amdgpu 0000:04:00.0: PERMISSION_FAULTS: 0x3
[10312.567160] amdgpu 0000:04:00.0: MAPPING_ERROR: 0x0
[10312.567164] amdgpu 0000:04:00.0: RW: 0x0[/CODE]


-pm1 works fine, -prp 133331333 works fine, but -prp on most other numbers crashes.

This was with a clean build from github. As far as I can tell nothing happened to the card, mfacto also seems to work just fine.

Any ideas about what to do? Is 3.0.6 a bad version to be using?[/QUOTE]

mrh 2020-01-09 21:31

Ah, thanks! I rolled back to rocm 2.10 and I'm all better.

PhilF 2020-01-09 21:40

[QUOTE=preda;534707]It is because of the upgrade to ROCm 3.0. I don't know where exactly is the bug (in gpuowl or in ROCm) because personally I couldn't yet upgrade to ROCm 3.0 to test. When I can install 3.0 I'll have a look. In the meantime a solution is to move back to ROCm 2.10 .[/QUOTE]

Wow, I must have gotten in just under the wire. My installation is recent (Dec. 30), and I loaded whatever ROCm version was available then. However, It does not have video drivers loaded, only OpenCL, and no monitor is attached to my Radeon VII. I've not had a problem.

How can I check my current ROCm version? It doesn't show up in dmesg or lsmod.

mrh 2020-01-09 21:56

[QUOTE=PhilF;534709]Wow, I must have gotten in just under the wire. My installation is recent (Dec. 30), and I loaded whatever ROCm version was available then. However, It does not have video drivers loaded, only OpenCL, and no monitor is attached to my Radeon VII. I've not had a problem.

How can I check my current ROCm version? It doesn't show up in dmesg or lsmod.[/QUOTE]

I think the simplest is "dpkg -l |grep rocm"

PhilF 2020-01-09 22:07

[QUOTE=mrh;534711]I think the simplest is "dpkg -l |grep rocm"[/QUOTE]

My dpkg output doesn't have any lines that contain rocm. Some do contain "amdgpu", but the only version number I can find is 19.30-934563, which doesn't shed any light on the rocm version as far as I can tell.

[code]ii amdgpu-core 19.30-934563 all Core meta package for unified amdgpu driver.
ii amdgpu-dkms 19.30-934563 all amdgpu driver in DKMS format.
ii amdgpu-pro-core 19.30-934563 all Core meta package for Pro components of the unified amdgpu driver.
ii amdgpu-pro-pin 19.30-934563 all Meta package to pin a specific amdgpu driver version.
ii clinfo-amdgpu-pro 19.30-934563 amd64 AMD OpenCL info utility
ii libdrm-amdgpu-amdgpu1:amd64 1:2.4.98-934563 amd64 Userspace interface to amdgpu-specific kernel DRM services -- runtime
ii libdrm-amdgpu-common 1.0.0-934563 all List of AMD/ATI cards' device IDs, revision IDs and marketing names
ii libdrm2-amdgpu:amd64 1:2.4.98-934563 amd64 Userspace interface to kernel DRM services -- runtime
ii libopencl1-amdgpu-pro:amd64 19.30-934563 amd64 AMD OpenCL ICD Loader library
ii opencl-amdgpu-pro-comgr 19.30-934563 amd64 non-free AMD OpenCL ICD Loaders
ii opencl-amdgpu-pro-icd 19.30-934563 amd64 non-free AMD OpenCL ICD Loaders
[/code]

mrh 2020-01-09 22:50

[QUOTE=PhilF;534713]My dpkg output doesn't have any lines that contain rocm. Some do contain "amdgpu", but the only version number I can find is 19.30-934563, which doesn't shed any light on the rocm version as far as I can tell.
[/QUOTE]

I'm not an amd expert, but I think that indicates you are using the "amd pro" drivers (I think that is what they are called) vs. rocm.

-mike


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.