mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2019-09-15 15:56

[QUOTE=Prime95;525871]I can program the server to email a user whenever a non-zero error count is reported (this feature exists now for prime95 LL tests).
[/QUOTE]This is great. Something I'd like to see added someday is a breakdown within the manual results. Right now all cpus and gpus reported manually go into one manual total.
If there was a way for the user to look at manual results with or without error, DC mismatch etc., divided by the various computer/gpuinstance identifiers that would be great.

preda 2019-09-16 11:52

[QUOTE=kriesel;525873]Approximately reproduced in 6.10-1. Renaming to a filename that already exists, in this case in subfolder <exponent> is a problem.
[/QUOTE]

OK, this is a broken implementation of filesystem::rename() on mingw64 which throws if the destination exists; added an workaround.

The worktodo.bak left behind is sort of intended -- for the situation that GpuOwl makes a mess out of worktodo.txt for some reason, the user can recover.

preda 2019-09-16 12:52

[QUOTE=Prime95;525871]The count is useful for two reasons that I can see:

1) It lets the user monitor hardware health. This is especially nice for headless operations. Rather than ssh into each GPU machine and grepping the log files, I can program the server to email a user whenever a non-zero error count is reported (this feature exists now for prime95 LL tests).

2) It lets us spot double check these PRP results someday. The first prime95 implementation had some windows of vulnerability. If there any vulnerabilities remaining, these machines would be the most likely to find them.[/QUOTE]

OK, added the nErrors back. Increased savefile version number (to 10), and added errors to results json.

mrh 2019-09-16 22:29

Does anyone know if P-1 works using nvidia? PRP seems to work fine for me, but when trying -pm1 it looks like stage1 makes it to about 100% and then the host (not gpu) tries to allocate 70GB or so of memory and gets kill by oom_reaper. I can dig deeper, but maybe someone already knows what is wrong?

kriesel 2019-09-17 00:50

[QUOTE=mrh;525949]Does anyone know if P-1 works using nvidia? PRP seems to work fine for me, but when trying -pm1 it looks like stage1 makes it to about 100% and then the host (not gpu) tries to allocate 70GB or so of memory and gets kill by oom_reaper. I can dig deeper, but maybe someone already knows what is wrong?[/QUOTE]
On linux? What gpu, exponent, bounds? I suggest starting with low known-factor test cases, then small exponent real useful work and working your way up. Maybe -maxAlloc would help. Or a bigger swap file. It's the gcd that gets done on a host cpu core (in gpuowl, or cudapm1). I've not seen 70GB requirements here, on Win7, and have run gpuowl P-1 up to gputo72 bounds on 500M (but 600M finished stage 1 and gcd but would not do stage 2 on a 11GB GTX1080Ti). I've run 800M P-1 both stages in prime95 with an 8GB memory allowance on a 16GB ram laptop (and 852M is in stage 2 now).

See the previous few pages of this thread and particularly
[URL]https://www.mersenneforum.org/showpost.php?p=525692&postcount=1360[/URL]
[URL]https://www.mersenneforum.org/showpost.php?p=525580&postcount=1358[/URL]

Perhaps also some of [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL], including the new Gpuowl P-1 run time scaling on AMD and NVIDIA [URL]https://www.mersenneforum.org/showpost.php?p=525955&postcount=17[/URL]

TF has dainty memory requirements, LL or PRP significant, and P-1 the biggest.

mrh 2019-09-17 03:08

[QUOTE=kriesel;525957]On linux? What gpu, exponent, bounds? I suggest starting with low known-factor test cases, then small exponent real useful work and working your way up. Maybe -maxAlloc would help. Or a bigger swap file. It's the gcd that gets done on a host cpu core (in gpuowl, or cudapm1). I've not seen 70GB requirements here, on Win7, and have run gpuowl P-1 up to gputo72 bounds on 500M (but 600M finished stage 1 and gcd but would not do stage 2 on a 11GB GTX1080Ti). I've run 800M P-1 both stages in prime95 with an 8GB memory allowance on a 16GB ram laptop (and 852M is in stage 2 now).

See the previous few pages of this thread and particularly
[URL]https://www.mersenneforum.org/showpost.php?p=525692&postcount=1360[/URL]
[URL]https://www.mersenneforum.org/showpost.php?p=525580&postcount=1358[/URL]

Perhaps also some of [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL], including the new Gpuowl P-1 run time scaling on AMD and NVIDIA [URL]https://www.mersenneforum.org/showpost.php?p=525955&postcount=17[/URL]

TF has dainty memory requirements, LL or PRP significant, and P-1 the biggest.[/QUOTE]

Thanks! Good info, I think that is enough for me to figure it out.

mrh 2019-09-17 04:16

[QUOTE=kriesel;525957]On linux? What gpu, exponent, bounds? I suggest starting with low known-factor test cases, then small exponent real useful work and working your way up. Maybe -maxAlloc would help. Or a bigger swap file. It's the gcd that gets done on a host cpu core (in gpuowl, or cudapm1). I've not seen 70GB requirements here, on Win7, and have run gpuowl P-1 up to gputo72 bounds on 500M (but 600M finished stage 1 and gcd but would not do stage 2 on a 11GB GTX1080Ti). I've run 800M P-1 both stages in prime95 with an 8GB memory allowance on a 16GB ram laptop (and 852M is in stage 2 now).

See the previous few pages of this thread and particularly
[URL]https://www.mersenneforum.org/showpost.php?p=525692&postcount=1360[/URL]
[URL]https://www.mersenneforum.org/showpost.php?p=525580&postcount=1358[/URL]

Perhaps also some of [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL], including the new Gpuowl P-1 run time scaling on AMD and NVIDIA [URL]https://www.mersenneforum.org/showpost.php?p=525955&postcount=17[/URL]

TF has dainty memory requirements, LL or PRP significant, and P-1 the biggest.[/QUOTE]

-maxAlloc was all that was needed. thanks!

preda 2019-09-18 13:38

In the most recent commit I added savefile support for P-1 *first-stage only*. There will be files created, with the extension ".p1.owl". A save is done every 5min and on manual exit (ctrl-C). There should be messages in the log indicating the save being done and the iteration.

The B1 is saved in the savefile. If the actual B1 does not match the saved B1 the savefile will not be loaded, but also should not be overwritten -- instead the program will indicate the mismatch and exit.

preda 2019-09-19 15:07

savefiles for P-1
 
Finally added Ken's pet feature, savefiles for P-1, both stages. Here's a bit how they're implemented:
For exponent N, there are:
stage1 savefile(s): <workdir>/N/N.p1.owl
stage2 savefile(s): <workdir>/N/N.p2.owl

stage1 is saved periodically (every 5minutes), on Ctrl-C, and at the one-to-last iteration of stage1. stage2 is only saved when a "round" is completed (log lines like "86316851 P2 774/2880" indicate rounds being completed). Notably, stage2 is not saved on Ctrl-C.

Feedback welcome; there may be bugs. Please test a few known-factors exponents and check that the factors are found; especially across reloads in stage2.

kriesel 2019-09-19 18:14

1 Attachment(s)
[QUOTE=preda;526134]Finally added [STRIKE]Ken's pet[/STRIKE][I][COLOR=Purple] a valuable[/COLOR][/I] feature, savefiles for P-1, both stages. Here's a bit how they're implemented:
For exponent N, there are:
stage1 savefile(s): <workdir>/N/N.p1.owl
stage2 savefile(s): <workdir>/N/N.p2.owl

stage1 is saved periodically (every 5minutes), on Ctrl-C, and at the one-to-last iteration of stage1. stage2 is only saved when a "round" is completed (log lines like "86316851 P2 774/2880" indicate rounds being completed). Notably, stage2 is not saved on Ctrl-C.

Feedback welcome; there may be bugs. Please test a few known-factors exponents and check that the factors are found; especially across reloads in stage2.[/QUOTE]gpuowl V6.10-9-g54cba1d looks good so far.
Built no problem in msys2/mingw64, ran ok on Win7-x64 Pro RX480.
M24000577 P-1 stage 1 save at 5 minute intervals confirmed;
stage 2 save at rounds confirmed;
stage 2 gcd factor found confirmed;
rerun with existing save files begins at very late stage 1, redoes stage 2.
Presumably this could be used to fairly efficiently run a second stage 2 with larger bound (might require deleting the earlier stage 2 files, or code modification to run b2-old to b2-new-larger without duplicating B1 to B2-old.)

M1257787 PRP test went correctly, no file or folder test or rename etc issues.

M95m P-1 run in progress, stage one save on Ctrl-C confirmed

P-1 save and resume is useful as larger exponents increase total P-1 run time to days or weeks each. See [url]https://www.mersenneforum.org/showpost.php?p=525955&postcount=17[/url]

xx005fs 2019-09-21 05:47

CPU load using Nvidia GPUs
 
It looks like when using an Nvidia GPU to do any work with GPUOWL maxes out 1 of my CPU cores, which does have a decent impact on my CPU crunching performance and waste unnecessary heat and power. I have found some links that supposedly fixes the issue (only for CUDA), but I don't know if it is going to be able to work with OpenCL apps like GPUOWL. If one of the CPU core can be freed then that would waste less compute cycles overall.
Here's one of the link anyways.
[url]https://devtalk.nvidia.com/default/topic/755859/cpu-core-is-busy-while-gpu-runs-its-kernel/[/url]


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.