![]() |
[QUOTE=ewmayer;547734]Might be useful to find out more precisely where your card's max-mem-to-use lies - you could move the next PFactor line in your worktodo to make it the topmost line, save, kill run and restart with some high value of maxAlloc, wait 'til it hits stage 2 to see if it's slow, if so kill and restart with a slightly lower -maxAlloc value until you see the speed come back to normal non-swap level.[/QUOTE]We need to find a gpu-top program!
We'll probably just do the P-1 work on the CPU. Is there any chance George can add a work-type to Primenet to P-1 test first-run PRP tests before they are issued for PRP testing? :mike: |
[QUOTE=Xyzzy;547747]We'll probably just do the P-1 work on the CPU.
Is there any chance George can add a work-type to Primenet to P-1 test first-run PRP tests before they are issued for PRP testing?[/QUOTE] I thought you said you wanted this new setup to run in set-it-and-forget-it mode ... lacking the Primenet worktype option you mention (which would indeed be useful), running p-1 on the CPU means manually fiddling all your gpuowl worktodo-file entries to have a trailing '0' before the program gets around to auto-splitting them into PFactor=... and PRP=...,0 pairs, or moving all the PFactor entries resulting from such auto-splitting into your mprime's worktodo file. Even with 'just' 4GB card memory allocated for your runs, the resulting p-1 should be more or less as efficient as it can be. Once the number of stage 2 buffers gets above the 50-100 range there is really negligible performance gain to be had from using more buffers. But I would be interested in a comparison of p-1 runtime on your CPU vs GPU, both using similar expos (= same FFT lengths) and stage bounds. Do you have any recent p-1 work captured in your mprime logfile on that system? |
[QUOTE=Xyzzy;547747]We need to find a gpu-top program!
:mike:[/QUOTE] [URL="https://awesomedetect.com/how-to-monitor-amd-ati-or-radeon-gpu-usage-in-linux/"]sudo radeontop[/URL] works for me. |
Somewhere between the Linux kernel, ROCm 3.5.0 and gpuOwl there is no longer the creeping kworker CPU-hogger problem; There is no need to stop/start gpuOwl to get rid of it.
|
[QUOTE=paulunderwood;547847]Somewhere between the Linux kernel, ROCm 3.5.0 and gpuOwl there is no longer the creeping kworker CPU-hogger problem; There is no need to stop/start gpuOwl to get rid of it.[/QUOTE]
It's the amdgpu (that is part of the Linux kernel) that has the fix. Most likely ROCm 3.5 has nothing to do with it, as it was fixed for me on ROCm 3.3 by updating the kernel. How do you see the performance of ROCm 3.5 compared to 3.3? |
[QUOTE=preda;547855]It's the amdgpu (that is part of the Linux kernel) that has the fix. Most likely ROCm 3.5 has nothing to do with it, as it was fixed for me on ROCm 3.3 by updating the kernel.
How do you see the performance of ROCm 3.5 compared to 3.3?[/QUOTE] It is [I]slower[/I] with ROCm 3.5 :down: But I have compensated by overclocking the RAM. And that is another thing -- I get fewer GEC errors with ROCm 3.5 with the memory at 1200 -- in fact one error in the last 6 tests. I am also daily adjusting sclk (3 or 4) and fans depending the ambient temperature. I never let the junction temperature go over 95C. 2 instances at 5.5M FFT: sclk 3: 1423 µs/it sclk 4: 1317 µs/it |
[QUOTE=Xyzzy;547747]Is there any chance George can add a work-type to Primenet to P-1 test first-run PRP tests before they are issued for PRP testing?[/QUOTE]Isn't the work type in mPrime95 enough? If not Chris has a solution.
|
strange error
[CODE]2020-06-13 21:47:51 f582388172fd5d41 104975743 OK 104400000 99.45%; 1321 us/it; ETA 0d 00:13; 70d06a8a5f7db9ce (check 0.67s)
2020-06-13 21:51:17 f582388172fd5d41 104975743 OK 104600000 99.64%; 1025 us/it; ETA 0d 00:06; 21631a5aea1b4537 (check 0.41s) 2020-06-13 21:53:34 f582388172fd5d41 104975743 OK 104800000 99.83%; 684 us/it; ETA 0d 00:02; 35fc341e287f3108 (check 0.42s) 2020-06-13 21:55:34 f582388172fd5d41 CC 104975743 / 104975743, fd81bec8b0e1a661 2020-06-13 21:55:35 f582388172fd5d41 104975743 EE 104976000 100.00%; 684 us/it; ETA 0d 00:00; 5c9491730bed16cb (check 0.37s) 2020-06-13 21:55:35 f582388172fd5d41 104975743 OK 104800000 loaded: blockSize 400, 35fc341e287f3108 2020-06-13 21:56:44 f582388172fd5d41 104975743 OK 104900000 99.93%; 684 us/it; ETA 0d 00:01; 12a1da4dffe7c5d0 (check 0.45s) 1 errors 2020-06-13 21:57:36 f582388172fd5d41 CC 104975743 / 104975743, fd81bec8b0e1a661 [/CODE] I had just stopped instance 2 to let instance 1 catch so I could have them synchronized. There appears to be no error because the res64s match. What gives? |
[QUOTE=paulunderwood;547905][CODE]2020-06-13 21:47:51 f582388172fd5d41 104975743 OK 104400000 99.45%; 1321 us/it; ETA 0d 00:13; 70d06a8a5f7db9ce (check 0.67s)
2020-06-13 21:51:17 f582388172fd5d41 104975743 OK 104600000 99.64%; 1025 us/it; ETA 0d 00:06; 21631a5aea1b4537 (check 0.41s) 2020-06-13 21:53:34 f582388172fd5d41 104975743 OK 104800000 99.83%; 684 us/it; ETA 0d 00:02; 35fc341e287f3108 (check 0.42s) 2020-06-13 21:55:34 f582388172fd5d41 CC 104975743 / 104975743, fd81bec8b0e1a661 2020-06-13 21:55:35 f582388172fd5d41 104975743 EE 104976000 100.00%; 684 us/it; ETA 0d 00:00; 5c9491730bed16cb (check 0.37s) 2020-06-13 21:55:35 f582388172fd5d41 104975743 OK 104800000 loaded: blockSize 400, 35fc341e287f3108 2020-06-13 21:56:44 f582388172fd5d41 104975743 OK 104900000 99.93%; 684 us/it; ETA 0d 00:01; 12a1da4dffe7c5d0 (check 0.45s) 1 errors 2020-06-13 21:57:36 f582388172fd5d41 CC 104975743 / 104975743, fd81bec8b0e1a661 [/CODE] I had just stopped instance 2 to let instance 1 catch so I could have them synchronized. There appears to be no error because the res64s match. What gives?[/QUOTE] I'm not sure how you could not notice that, but similar things happen to me too sometimes. After detecting the error, gpuOwl goes back to last correct residue and starts again, with doubled error-check frequency (instead of every 200,000 iterations it's 100,000, and if there is another error, 50,000 and so on... You get it.) Because it went back, it's valid now, and that's the good or rather best thing about GEC. (I think Jacobi does a similar thing, but I don't remember seeing it on my computer, perhaps due to low detection rate.) |
We set up a work pool directory today for our two cards. It works great!
Do we have to stop gpuowl to add work to the worktodo.txt file? We tried using a worktodo.add file but nothing has happened yet. (For now we are manually filling up the pool. Maybe later we will get the python thingie running.) :mike: |
[QUOTE=Xyzzy;547909]We set up a work pool directory today for our two cards. It works great!
Do we have to stop gpuowl to add work to the worktodo.txt file? We tried using a worktodo.add file but nothing has happened yet.[/QUOTE] Are you running 1 job per card or 2? As I noted in an edit to my how-to-under-linux thread, even if your particular card gives no better total throughput running 2 jobs, it makes sense to do so as "crash protection insurance" - one of the 2 jobs I run on my Haswell-system's R7 coredumped the other night, if that had been the only instance, I would've lost ~10 hours crunching. You can fiddle worktodo during the run - as long as your current assignment isn't just about to finish (or to complete a p-1 try) and you don't change line 1 of the file, that is safe. |
| All times are UTC. The time now is 23:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.