mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2019-04-13 07:10

[QUOTE=SELROC;513582]I have attempted to reproduce the segfault without success. I have fixed the Makefile by adding -lstdc++fs , the pull request is waiting.[/QUOTE]

Thanks Valerio, in fact adding -lstdc++fs fixes the segfault problem. Your pull request is merged now.

SELROC 2019-04-13 07:19

[QUOTE=preda;513584]Thanks Valerio, in fact adding -lstdc++fs fixes the segfault problem. Your pull request is merged now.[/QUOTE]


One thing I don't understand is that compiling without stdc++fs throws a compile-time error, so I don't know how he got there.

preda 2019-04-13 07:20

[QUOTE=kriesel;513543]Same method as used for V6.4, produced[/QUOTE]
Ken, I'm looking into these, give me one moment.

M344587487 2019-04-13 09:24

That works nicely. The only issue I can see is the new way the version is done means that it's not in the binary so not in any submitted results.

preda 2019-04-13 09:49

[QUOTE=M344587487;513589]That works nicely. The only issue I can see is the new way the version is done means that it's not in the binary so not in any submitted results.[/QUOTE]

Why do you think the version is not in the binary?

There are two ways to build now, using make with Makefile, or using scons with SConstruct.
Both ways generate during the build a version string that is written to the file version.inc, which is included in the build thus the version is in the binary. The version should be reported in the results as well.

The version string is produced by:
git describe --long
and looks like:
v6.5-1-g168a15c

Where 'v6.5' is a tag in git, '1' is the number of commits after the tag, and '168a15c' is a commit hash.

Also, I'm changing the executable name back to 'gpuowl' from 'openowl', I hope that's not a problem.

SELROC 2019-04-13 10:21

[QUOTE=preda;513590]Why do you think the version is not in the binary?

There are two ways to build now, using make with Makefile, or using scons with SConstruct.
Both ways generate during the build a version string that is written to the file version.inc, which is included in the build thus the version is in the binary. The version should be reported in the results as well.

The version string is produced by:
git describe --long
and looks like:
v6.5-1-g168a15c

Where 'v6.5' is a tag in git, '1' is the number of commits after the tag, and '168a15c' is a commit hash.

Also, I'm changing the executable name back to 'gpuowl' from 'openowl', I hope that's not a problem.[/QUOTE]




not a problem if I don't care :-)


I have to edit all my scripts.

preda 2019-04-13 10:47

[QUOTE=kriesel;513543]Same method as used for V6.4, produced [errors][/QUOTE]
Please try again now.

Note, the build now uses a file version.inc which contains the version between quotes, e.g.:
"v6.5-5-ga574d99-dirty"

This file is normally generated by:
echo "`git describe --long --dirty`" > version.inc

I'm not sure how this works on Windows, but if you get
git describe --long --dirty
to work, just put that string between quotes in version.inc

(you can see this in Makefile)

If you get a binary built, and publish it, please use a proper version with the right git hash (which would allow in the future to track down a result to the exact source code that generated it)

kriesel 2019-04-13 13:35

[QUOTE=preda;513594]Please try again now.

Note, the build now uses a file version.inc which contains the version between quotes, e.g.:
"v6.5-5-ga574d99-dirty"

This file is normally generated by:
echo "`git describe --long --dirty`" > version.inc

I'm not sure how this works on Windows, but if you get
git describe --long --dirty
to work, just put that string between quotes in version.inc

(you can see this in Makefile)

If you get a binary built, and publish it, please use a proper version with the right git hash (which would allow in the future to track down a result to the exact source code that generated it)[/QUOTE]
I've never used git; I always download a zipfile. Clumsy perhaps but works and lets me dodge a learning curve for now.
I edit the makefile to add an openowl-win section, as shown, with hardcoded hash, and the necessary -static option. It would be great if you would add the openowl-win section, even if it is still written for using git, as it would mean less makefile editing for me every time. Usually the resulting makefile looks something like the following, and after running the make, I also typically run strip openowl.exe, which then gives an executable ~540k in size from the original ~1.4M, as for v6.4

[CODE]HEADERS = Background.h Pm1Plan.h GmpUtil.h Args.h checkpoint.h clwrap.h common.h kernel.h state.h timeutil.h tinycl.h Worktodo.h Gpu.h Primes.h Signal.h FFTConfig.h
SRCS = Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp Primes.cpp state.cpp Signal.cpp FFTConfig.cpp

# Edit the path in -L below if needed, to the folder containing OpenCL.dll on Windows or libOpenCL.so on UNIX.
# The included lib paths are for ROCm, AMDGPU-pro/Linux or MSYS-2/Windows.
LIBPATH = -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.

#-fsanitize=leak

openowl: ${HEADERS} ${SRCS}
g++ -Wall -O2 -std=c++17 -DREV=\"`git rev-parse --short HEAD``git diff-files --quiet || echo -mod`\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH}

openowl-win: ${HEADERS} ${SRCS}
g++ -Wall -O2 -std=c++17 -DREV=\"aa9f555f\" -Wall ${SRCS} -o openowl -lOpenCL -lgmp -pthread ${LIBPATH} -static
[/CODE]For the latest, I used the following. Not sure the -5 is right here. This time the executable is 4.2M size before the strip, 1.4M after.

[CODE]HEADERS = Background.h Pm1Plan.h GmpUtil.h Args.h checkpoint.h clwrap.h common.h kernel.h state.h timeutil.h tinycl.h Worktodo.h Gpu.h Signal.h FFTConfig.h
SRCS = Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp

# Edit the path in -L below if needed, to the folder containing OpenCL.dll on Windows or libOpenCL.so on UNIX.
# The included lib paths are for ROCm, AMDGPU-pro/Linux or MSYS-2/Windows.
LIBPATH = -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.

#-fsanitize=leak

openowl: ${HEADERS} ${SRCS}
echo \"`git describe --long --dirty`\" > version.inc
echo Version: `cat version.inc`
g++ -Wall -O2 -std=c++17 -Wall ${SRCS} -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread ${LIBPATH}

openowl-win: ${HEADERS} ${SRCS}
echo \"v6.5-5-1f401d5\" >version.inc
echo Version: `cat version.inc`
g++ -Wall -O2 -std=c++17 -Wall ${SRCS} -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread ${LIBPATH} -static
[/CODE] which then yields
[CODE]$ make openowl-win
echo \"v6.5-5-1f401d5\" >version.inc
echo Version: `cat version.inc`
Version: "v6.5-5-1f401d5"
g++ -Wall -O2 -std=c++17 -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
[/CODE]and a clean build. At least, no error messages, and -h ran.
[CODE]>gpuowl -h
2019-04-13 08:09:34 gpuowl v6.5-5-1f401d5
2019-04-13 08:09:34 config: -h

Command line options:

-dir <folder> : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-user <name> : specify the user name.
-cpu <name> : specify the hardware name.
-time : display kernel profiling information.
-fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner.
-carry long|short : force carry type. Short carry may be faster, but requires high bits/word.
-B1 : P-1 B1, default 500000
-rB2 : ratio of B2 to B1, default 30
-prp <exponent> : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent> : run a single P-1 test and exit, ignoring worktodo.txt
-device <N> : select a specific device:
0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

FFT Configurations:
FFT 8K [ 0.01M - 0.18M] 64-64
FFT 32K [ 0.05M - 0.68M] 64-256 256-64
FFT 48K [ 0.07M - 1.01M] 64-64-6
FFT 64K [ 0.10M - 1.34M] 64-512 512-64
FFT 72K [ 0.11M - 1.50M] 64-64-9
FFT 80K [ 0.12M - 1.66M] 64-64-10
FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256
FFT 192K [ 0.29M - 3.91M] 64-256-6 256-64-6
FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64
FFT 288K [ 0.44M - 5.81M] 64-256-9 256-64-9
FFT 320K [ 0.49M - 6.44M] 64-256-10 256-64-10
FFT 384K [ 0.59M - 7.69M] 64-512-6 512-64-6
FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64
FFT 576K [ 0.88M - 11.42M] 64-512-9 512-64-9
FFT 640K [ 0.98M - 12.66M] 64-512-10 512-64-10
FFT 768K [ 1.18M - 15.12M] 1K-64-6 64-1K-6 256-256-6
FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256
FFT 1152K [ 1.77M - 22.45M] 1K-64-9 64-1K-9 256-256-9
FFT 1280K [ 1.97M - 24.88M] 1K-64-10 64-1K-10 256-256-10
FFT 1536K [ 2.36M - 29.72M] 64-2K-6 256-512-6 512-256-6 2K-64-6
FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256
FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 2K-64-9
FFT 2560K [ 3.93M - 48.90M] 64-2K-10 256-512-10 512-256-10 2K-64-10
FFT 3M [ 4.72M - 58.41M] 1K-256-6 256-1K-6 512-512-6 4K-64-6
FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512
FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 4K-64-9
FFT 5M [ 7.86M - 96.07M] 1K-256-10 256-1K-10 512-512-10 4K-64-10
FFT 6M [ 9.44M - 114.74M] 1K-512-6 256-2K-6 512-1K-6 2K-256-6
FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K
FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT 10M [ 15.73M - 188.68M] 1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT 12M [ 18.87M - 225.32M] 1K-1K-6 512-2K-6 2K-512-6 4K-256-6
FFT 16M [ 25.17M - 298.13M] 4K-2K
FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT 20M [ 31.46M - 370.44M] 1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT 24M [ 37.75M - 442.34M] 1K-2K-6 2K-1K-6 4K-512-6
FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9
FFT 40M [ 62.91M - 727.03M] 1K-2K-10 2K-1K-10 4K-512-10
FFT 48M [ 75.50M - 868.07M] 2K-2K-6 4K-1K-6
FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9
FFT 80M [125.83M - 1426.38M] 2K-2K-10 4K-1K-10
FFT 96M [150.99M - 1702.92M] 4K-2K-6
FFT 144M [226.49M - 2525.23M] 4K-2K-9
FFT 160M [251.66M - 2797.39M] 4K-2K-10
2019-04-13 08:09:39 Exiting because "help"
2019-04-13 08:09:39 Bye[/CODE]If you'll tell me whether the -5 is correct for that hash, I'll alter the makefile & rerun it if needed, and post the executable in a zipfile. I think the right zip contents would now be
gpuowl.exe
makefile (not really necessary, but shows how it was built)
primenet.py
read.md

M344587487 2019-04-13 14:27

[QUOTE=preda;513590]Why do you think the version is not in the binary?

There are two ways to build now, using make with Makefile, or using scons with SConstruct.
Both ways generate during the build a version string that is written to the file version.inc, which is included in the build thus the version is in the binary. The version should be reported in the results as well.

The version string is produced by:
git describe --long
and looks like:
v6.5-1-g168a15c

Where 'v6.5' is a tag in git, '1' is the number of commits after the tag, and '168a15c' is a commit hash.

Also, I'm changing the executable name back to 'gpuowl' from 'openowl', I hope that's not a problem.[/QUOTE]
I did a full git clone, tarred it up and transferred to the offline PC. Compiled with make verbatim, nothing shows up in the header when you run gpuowl or in the results.txt JSON data, just version:"". It's not a problem for now as I'm batching it manually, I'll just grep v6.5-1-g168a15c into the results. I'll investigate a bit more later.

[QUOTE=kriesel;513601]I've never used git; I always download a zipfile. Clumsy perhaps but works and lets me dodge a learning curve for now.
...[/QUOTE]

It can be as simple as:
[code]git clone https://github.com/preda/gpuowl[/code]You could get adventurous and download and compile in one go:
[code]git clone https://github.com/preda/gpuowl && cd gpuowl && make[/code]There's a bit more to git if you want that's only a search away, but if all you want to do is get the latest version and compile you can avoid the hassle by doing the above in a fresh directory every time. You could be a bit more efficient by only getting the latest commit:
[code]git clone --depth 1 https://github.com/preda/gpuowl && cd gpuowl && make[/code]

kriesel 2019-04-13 14:51

[QUOTE=M344587487;513605]
It can be as simple as:
[code]git clone https://github.com/preda/gpuowl[/code]You could get adventurous and download and compile in one go:
[code]git clone https://github.com/preda/gpuowl && cd gpuowl && make[/code]There's a bit more to git if you want that's only a search away, but if all you want to do is get the latest version and compile you can avoid the hassle by doing the above in a fresh directory every time. You could be a bit more efficient by only getting the latest commit:
[code]git clone --depth 1 https://github.com/preda/gpuowl && cd gpuowl && make[/code][/QUOTE]Would need to first install git, yes? in the msys/mingw environment that's atop Windows 7, where the compiling gets done, I think. It looks like it's already there in msys2/mingw.

This gives 5 ways on windows, some oriented to Win 10.
[URL]https://www.jamessturtevant.com/posts/5-Ways-to-install-git-on-Windows/[/URL]

SELROC 2019-04-13 17:47

[QUOTE=preda;513594]Please try again now.

Note, the build now uses a file version.inc which contains the version between quotes, e.g.:
"v6.5-5-ga574d99-dirty"

This file is normally generated by:
echo "`git describe --long --dirty`" > version.inc

I'm not sure how this works on Windows, but if you get
git describe --long --dirty
to work, just put that string between quotes in version.inc

(you can see this in Makefile)

If you get a binary built, and publish it, please use a proper version with the right git hash (which would allow in the future to track down a result to the exact source code that generated it)[/QUOTE]


I don't know what it means, it seems that a string is missing,


echo "git describe --long --dirty" > version.inc
fatal: No names found, cannot describe anything.
echo Version: cat version.inc
Version: ""

preda 2019-04-13 23:14

[QUOTE=kriesel;513601]I've never used git; I always download a zipfile [...] [/QUOTE]
Ken, I updated the Makefile with the target gpuowl-win (changed the executable name accordingly, you'll get gpuowl-win, you need to rename it if the name is not good), added the strip too for the windows target.

If you don't have git, you need to put something manually in version.inc, changing this line:
echo "`git describe --long --dirty --always`" > version.inc
to something like:
echo "v6.5-8-g005297a" > version.inc

If you're building the version string manually, don't worry about the middle number, just replace it with 'x' such: v6.5-x-g005297a

The convention now is, if there are uncommited changes (local changes to the source), to suffix with -dirty: v6.5-x-g005297a-dirty , do indicate it's not exactly equal to that commit. (But for a public build it would be nice to be not-dirty, of course.)

You may try to install git, it may make the workflow a bit easier. After the initial "git clone" you'll only need "git pull" to get the delta.

preda 2019-04-13 23:19

[QUOTE=SELROC;513623]I don't know what it means, it seems that a string is missing,


echo "git describe --long --dirty" > version.inc
fatal: No names found, cannot describe anything.
echo Version: cat version.inc
Version: ""[/QUOTE]

Thanks! I didn't realize that the tag is not pushed by default to remote on "git push", fixed now. If you retry, the version should be properly set now. Also added a check which rejects an empty VERSION on compilation, to prevent non-intentional build without version.

So, "git describe" searches for a git tag where it takes the version string from (e.g. "v6.5"), and it didn't like that there was no tag at all, because I forgot to push the tag to remote.

SELROC 2019-04-14 01:27

[QUOTE=preda;513642]Thanks! I didn't realize that the tag is not pushed by default to remote on "git push", fixed now. If you retry, the version should be properly set now. Also added a check which rejects an empty VERSION on compilation, to prevent non-intentional build without version.

So, "git describe" searches for a git tag where it takes the version string from (e.g. "v6.5"), and it didn't like that there was no tag at all, because I forgot to push the tag to remote.[/QUOTE]


It has a git tag now, however it is not a version like x.y

M344587487 2019-04-14 07:31

[QUOTE=SELROC;513648]It has a git tag now, however it is not a version like x.y[/QUOTE]
It's just the commit hash which is all that's in version.inc. I'm prepending "6.5-x-" for now.


[QUOTE=kriesel;513606]Would need to first install git, yes? in the msys/mingw environment that's atop Windows 7, where the compiling gets done, I think. It looks like it's already there in msys2/mingw.

This gives 5 ways on windows, some oriented to Win 10.
[URL]https://www.jamessturtevant.com/posts/5-Ways-to-install-git-on-Windows/[/URL][/QUOTE]
If you already have it in mingw try that, otherwise I'd try the commandline bit of git for windows from your link.

SELROC 2019-04-14 07:38

[QUOTE=M344587487;513658]It's just the commit hash which is all that's in version.inc. I'm prepending "6.5-x-" for now.
[/QUOTE]


The commit hash is a precise way to specify a version, probably we need no more information for the purpose.

preda 2019-04-14 10:46

[QUOTE=SELROC;513648]It has a git tag now, however it is not a version like x.y[/QUOTE]
What do you mean, what is it like?

The version is supposed to look something like this:
v6.5-8-g005297a or v6.5-8-g005297a-dirty

Which gives at the same time a logical idea of the version (6.5), and access to the commit.

preda 2019-04-14 10:49

[QUOTE=M344587487;513658]It's just the commit hash which is all that's in version.inc. I'm prepending "6.5-x-" for now.
[/QUOTE]
It might have just the hash because you didn't pull the git tag ("v6.5") yet. Probably after the next git pull this will be solved.

SELROC 2019-04-14 11:03

[QUOTE=preda;513668]What do you mean, what is it like?

The version is supposed to look something like this:
v6.5-8-g005297a or v6.5-8-g005297a-dirty

Which gives at the same time a logical idea of the version (6.5), and access to the commit.[/QUOTE]




[url]https://github.com/preda/gpuowl/issues/43[/url]

M344587487 2019-04-14 11:52

[QUOTE=SELROC;513659]The commit hash is a precise way to specify a version, probably we need no more information for the purpose.[/QUOTE]
It is but not human-readable. The format preda is using is x.y-z-hash, might as well follow convention.

[QUOTE=preda;513669]It might have just the hash because you didn't pull the git tag ("v6.5") yet. Probably after the next git pull this will be solved.[/QUOTE]That did it. I was on the latest commit and either pulled before you added the tag or "git pull origin master" doesn't get the tag.

SELROC 2019-04-14 12:48

[QUOTE=M344587487;513674]It is but not human-readable. The format preda is using is x.y-z-hash, might as well follow convention.

That did it. I was on the latest commit and either pulled before you added the tag or "git pull origin master" doesn't get the tag.[/QUOTE]


exact, only "git pull" gets the tag.

preda 2019-04-14 14:30

Warning for ROCm users: refrain from upgrading to recently-released ROCm 2.3, there is a 5% perf degradation. [url]https://github.com/RadeonOpenCompute/ROCm/issues/766[/url]

SELROC 2019-04-14 14:36

[QUOTE=preda;513683]Warning for ROCm users: refrain from upgrading to recently-released ROCm 2.3, there is a 5% perf degradation. [URL]https://github.com/RadeonOpenCompute/ROCm/issues/766[/URL][/QUOTE]


I regret it is too late :-)


how did you notice the regression ?

preda 2019-04-14 20:51

[QUOTE=SELROC;513685]I regret it is too late :-)


how did you notice the regression ?[/QUOTE]

From the timing information displayed, which jumped up right after upgrade. I downgraded, and the timing recovered.

SELROC 2019-04-15 16:41

[QUOTE=preda;513711]From the timing information displayed, which jumped up right after upgrade. I downgraded, and the timing recovered.[/QUOTE]


which kernel are you using ?


I have downgraded ROCm but the timing is the same, I am using kernel 5.0.0

preda 2019-04-15 20:53

[QUOTE=SELROC;513773]which kernel are you using ?


I have downgraded ROCm but the timing is the same, I am using kernel 5.0.0[/QUOTE]

I'm using kernel 5.0.7, but it shouldn't have much impact on performance.

I could downgrade to ROCm 2.2 by using this in /etc/apt/sources.list.d:
deb [arch=amd64] [url]http://repo.radeon.com/rocm/apt/2.2/[/url] xenial main

SELROC 2019-04-16 05:23

[QUOTE=preda;513792]I'm using kernel 5.0.7, but it shouldn't have much impact on performance.

I could downgrade to ROCm 2.2 by using this in /etc/apt/sources.list.d:
deb [arch=amd64] [URL]http://repo.radeon.com/rocm/apt/2.2/[/URL] xenial main[/QUOTE]


where did you find the 5.0.7 ?

M344587487 2019-04-16 09:39

[QUOTE=SELROC;513823]where did you find the 5.0.7 ?[/QUOTE]
On Ubuntu the easiest way is to use ukuu or install from .deb files here: [URL]https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.7/[/URL]


Going the .deb route you need to download all the amd64 .deb files that don't say low latency, then do dpkg -i *.deb and restart the PC. You'd need to apply any custom kernel parameters again if you use any, the new kernel should be the default but assuming you're using grub you can select an older kernel on boot if desired as they're still installed unless you uninstall them.


edit: I recommend ukuu but you do need to add a ppa to install it which can be found with a search.

SELROC 2019-04-16 09:54

[QUOTE=M344587487;513833]On Ubuntu the easiest way is to use ukuu or install from .deb files here: [URL]https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.7/[/URL]


Going the .deb route you need to download all the amd64 .deb files that don't say low latency, then do dpkg -i *.deb and restart the PC. You'd need to apply any custom kernel parameters again if you use any, the new kernel should be the default but assuming you're using grub you can select an older kernel on boot if desired as they're still installed unless you uninstall them.


edit: I recommend ukuu but you do need to add a ppa to install it which can be found with a search.[/QUOTE]




That wasn't necessary, I know how to install .deb packages.


Ukuu has become a For-Pay product, and it is a graphical tool, not useful for me.


[url]https://github.com/teejee2008/ukuu[/url]

M344587487 2019-04-16 10:11

[QUOTE=SELROC;513834]That wasn't necessary, I know how to install .deb packages.[/QUOTE]I try to assume nothing to be as clear as possible. Taken a step further should I have assumed you knew not to use the low latency debs?

[QUOTE=SELROC;513834]Ukuu has become a For-Pay product, and it is a graphical tool, not useful for me.


[URL]https://github.com/teejee2008/ukuu[/URL]
[/QUOTE]There is an element that is a paid product but the core functionality is free. It has a commandline element, never seen the GUI.

SELROC 2019-04-16 10:15

[QUOTE=M344587487;513837]I try to assume nothing to be as clear as possible. Taken a step further should I have assumed you knew not to use the low latency debs?

There is an element that is a paid product but the core functionality is free. It has a commandline element, never seen the GUI.[/QUOTE]


Nice to know, thanks for the info.

SELROC 2019-04-18 05:40

[QUOTE=SELROC;512961]I must note that the error (zero residue) has appeared for the first time to me with the radeon VII.[/QUOTE]


Another 6 days passed and only got an EE on the first day (not the same error as the residue was not 0000...), after that no error until now.

SELROC 2019-04-18 12:16

[QUOTE=SELROC;514022]Another 6 days passed and only got an EE on the first day (not the same error as the residue was not 0000...), after that no error until now.[/QUOTE]


Well, I meant no error yet. Life is fantastic with a Radeon VII.

xx005fs 2019-04-22 02:47

Weird issue running GPUOWL using Adrenaline 19 drivers
 
I am trying to run my Vega 56 BIOS flashed to 64 on the most recent driver, and with that I purged my old 18.9.3 driver and decided to try the Adrenaline 19.4.1 and reinstalled AMD APP SDK 3.0 with it. When I start PRP testing on an exponent, the application would first compile kernel and load the save file, and then it would start loading my GPU and it's not displaying anything regarding to what ms/it value it's currently at, or telling me whether it's passed the GEC for that 10000 Iterations. Then when I press ctrl+c to force quit it, it would just load up one of my CPU core and refuse to quit, while keeping my GPU loaded. When I turn on task manager, task manager would also freeze when I try to close it. Finally, when I click restart, the system would be stuck at "restarting" and keep there forever until I do a hard reset. What is going on and will this issue be addressed?

I am running on Windows 10 with the newest update. I am also running gpuowl 6.2.
This issue has occured before, and I have reinstalled drivers several times for my Vega card, yet it still persists.



Here's the log for that specific session, and this is all it has

[CODE]2019-04-21 18:45:30 gpuowl 6.2-e2ffe65
2019-04-21 18:45:30 RX Vega 56 -user ****** -cpu RX Vega 56 -device 0
2019-04-21 18:45:30 RX Vega 56 88686799 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 16.92 bits/word
2019-04-21 18:45:30 RX Vega 56 using short carry kernels
2019-04-21 18:45:32 RX Vega 56 OpenCL compilation in 1871 ms, with "-DEXP=88686799u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-04-21 18:45:32 RX Vega 56 88686799.owl loaded: k 33405600, block 400, res64 ecb2a7d36cbc599f[/CODE]

preda 2019-04-22 03:34

It looks like a problem below GpuOwl, maybe a driver issue. Did it ever work? with a different driver version?

[QUOTE=xx005fs;514359]I am trying to run my Vega 56 BIOS flashed to 64 on the most recent driver, and with that I purged my old 18.9.3 driver and decided to try the Adrenaline 19.4.1 and reinstalled AMD APP SDK 3.0 with it. When I start PRP testing on an exponent, the application would first compile kernel and load the save file, and then it would start loading my GPU and it's not displaying anything regarding to what ms/it value it's currently at, or telling me whether it's passed the GEC for that 10000 Iterations. Then when I press ctrl+c to force quit it, it would just load up one of my CPU core and refuse to quit, while keeping my GPU loaded. When I turn on task manager, task manager would also freeze when I try to close it. Finally, when I click restart, the system would be stuck at "restarting" and keep there forever until I do a hard reset. What is going on and will this issue be addressed?

I am running on Windows 10 with the newest update. I am also running gpuowl 6.2.
This issue has occured before, and I have reinstalled drivers several times for my Vega card, yet it still persists.



Here's the log for that specific session, and this is all it has

[CODE]2019-04-21 18:45:30 gpuowl 6.2-e2ffe65
2019-04-21 18:45:30 RX Vega 56 -user ****** -cpu RX Vega 56 -device 0
2019-04-21 18:45:30 RX Vega 56 88686799 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 16.92 bits/word
2019-04-21 18:45:30 RX Vega 56 using short carry kernels
2019-04-21 18:45:32 RX Vega 56 OpenCL compilation in 1871 ms, with "-DEXP=88686799u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-04-21 18:45:32 RX Vega 56 88686799.owl loaded: k 33405600, block 400, res64 ecb2a7d36cbc599f[/CODE][/QUOTE]

SELROC 2019-04-22 03:52

[QUOTE=xx005fs;514359]I am trying to run my Vega 56 BIOS flashed to 64 on the most recent driver, and with that I purged my old 18.9.3 driver and decided to try the Adrenaline 19.4.1 and reinstalled AMD APP SDK 3.0 with it. When I start PRP testing on an exponent, the application would first compile kernel and load the save file, and then it would start loading my GPU and it's not displaying anything regarding to what ms/it value it's currently at, or telling me whether it's passed the GEC for that 10000 Iterations. Then when I press ctrl+c to force quit it, it would just load up one of my CPU core and refuse to quit, while keeping my GPU loaded. When I turn on task manager, task manager would also freeze when I try to close it. Finally, when I click restart, the system would be stuck at "restarting" and keep there forever until I do a hard reset. What is going on and will this issue be addressed?

I am running on Windows 10 with the newest update. I am also running gpuowl 6.2.
This issue has occured before, and I have reinstalled drivers several times for my Vega card, yet it still persists.



Here's the log for that specific session, and this is all it has

[CODE]2019-04-21 18:45:30 gpuowl 6.2-e2ffe65
2019-04-21 18:45:30 RX Vega 56 -user ****** -cpu RX Vega 56 -device 0
2019-04-21 18:45:30 RX Vega 56 88686799 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 16.92 bits/word
2019-04-21 18:45:30 RX Vega 56 using short carry kernels
2019-04-21 18:45:32 RX Vega 56 OpenCL compilation in 1871 ms, with "-DEXP=88686799u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-04-21 18:45:32 RX Vega 56 88686799.owl loaded: k 33405600, block 400, res64 ecb2a7d36cbc599f[/CODE][/QUOTE]

[QUOTE=preda;514362]It looks like a problem below GpuOwl, maybe a driver issue. Did it ever work? with a different driver version?[/QUOTE]




It seems more like the gpu driver enters an error state. I fought with driver errors for a long time with amdgpu-pro, but definitely it was a timeout error. After removing the pci risers and connecting the gpu on a pci 16x slot, the error is gone.

xx005fs 2019-04-22 19:25

[QUOTE=preda;514362]It looks like a problem below GpuOwl, maybe a driver issue. Did it ever work? with a different driver version?[/QUOTE]
GPUOWL worked with all the drivers before AMD's adrenaline 2019 updates (iirc 18.12.3), and currently when I am running 18.9.3 it works perfectly.



[QUOTE] It seems more like the gpu driver enters an error state. I fought with driver errors for a long time with amdgpu-pro, but definitely it was a timeout error. After removing the pci risers and connecting the gpu on a pci 16x slot, the error is gone. [/QUOTE]
I have two GPUs in the system on a mainstream platform so I am probably not going to have another full x16 slot to use it. Neither am I using a PCIE riser for this.

SELROC 2019-04-23 06:56

[QUOTE=xx005fs;514410]GPUOWL worked with all the drivers before AMD's adrenaline 2019 updates (iirc 18.12.3), and currently when I am running 18.9.3 it works perfectly.




I have two GPUs in the system on a mainstream platform so I am probably not going to have another full x16 slot to use it. Neither am I using a PCIE riser for this.[/QUOTE]


The PCIE 8x slots may work, I have not tested them, but the GEC performance should be lower.


Sorry, I know nothing about Windows gpu drivers.

kriesel 2019-04-23 13:24

[QUOTE=SELROC;514457]The PCIE 8x slots may work, I have not tested them, but the GEC performance should be lower.

Sorry, I know nothing about Windows gpu drivers.[/QUOTE]
In my Windows experimentation with RX550 gpus on PCIE slots directly or via 1x/16x powered extenders, throughput was hardly affected at all in gpuowl early PRP/GC versions by pcie width (V1.9, V3.8). I've settled on being an "only when necessary" adopter of gpu driver updates, after finding individual updates cost 0.5% or 5% of throughput. Occasionally, such as for gpuowl V2.0, the application requires a gpu driver update.

I've seen though, for both NVIDIA and AMD gpus, a decline over time in how many gpus a given HP Z600 workstation chassis will reliably support. A system I ran 4 gpus in for a while now occasionally has hangs on the last RX550 in it, while the RX480 is still solid. I suspect the power supplies age and decline in usable output when running near full capacity output 24/7 for months or years. The ventilation is limited, component temperatures are high. I suggest a digital wattmeter and ensuring the system runs at some margin less than maximum wattage, perhaps 60-75% of max.

SELROC 2019-04-23 13:35

ROCm does not support pcie risers (powered extenders), it needs something called "pci atomics".


I oversize the psu, and mount additional cooling fans.

kriesel 2019-04-23 16:00

[QUOTE=SELROC;514465]ROCm does not support pcie risers (powered extenders), it needs something called "pci atomics".

I oversize the psu, and mount additional cooling fans.[/QUOTE]Right, I remember, and for linux, ROCm's requirements are a definite consideration. Occasional reminder is probably a good thing. Re the psu and fans, unfortunately in my HP Z600s, with their oddly shaped PSU and cramped case, upsizing the PSU or adding more fans are not feasible. I would if I could.

M344587487 2019-04-23 16:08

[QUOTE=SELROC;514465]ROCm does not support pcie risers (powered extenders), it needs something called "pci atomics".


I oversize the psu, and mount additional cooling fans.[/QUOTE]
My Radeon VII card seems to work fine with ROCm on a powered riser. rocm-smi says unknown instead of the pcie speed but gpuowl ran happily for half an hour before I dismantled the setup.

SELROC 2019-04-23 16:40

[QUOTE=M344587487;514473]My Radeon VII card seems to work fine with ROCm on a powered riser. rocm-smi says unknown instead of the pcie speed but gpuowl ran happily for half an hour before I dismantled the setup.[/QUOTE]


The Radeon VII consumes 247 watts, putting one on riser means you need 2 power connectors on the psu, and it means the data transfer rate is lower, thus GEC performance is lower.

M344587487 2019-04-23 18:19

Some top end platinum psus have enough connectors for 6 cards with fully populated eight pins and powered risers, there comes a point where you might as well go bold.



GEC performance as data transfer is limited sounds like an interesting thing to test, how much of an impact does it have? Would reducing the GEC frequency to mitigate this with the -blocks flag be detrimental to error checking in ways other than just taking longer before an error is detected?

SELROC 2019-04-23 18:28

[QUOTE=M344587487;514489]Some top end platinum psus have enough connectors for 6 cards with fully populated eight pins and powered risers, there comes a point where you might as well go bold.



GEC performance as data transfer is limited sounds like an interesting thing to test, how much of an impact does it have? Would reducing the GEC frequency to mitigate this with the -blocks flag be detrimental to error checking in ways other than just taking longer before an error is detected?[/QUOTE]


I have tested gpuowl with RX580 on riser and on 16x slot, the difference in GEC timing was about 2 seconds, with the old software. I have not redone the test with the new software.


That is normal as GEC moves data forth and back so the transfer rate counts.

kriesel 2019-04-23 18:39

[QUOTE=M344587487;514473]My Radeon VII card seems to work fine with ROCm on a powered riser. rocm-smi says unknown instead of the pcie speed but gpuowl ran happily for half an hour before I dismantled the setup.[/QUOTE]
How about a full primality-test duration? In my experience (Windows, RX550, other differences) AMD gpus and gpuowl can take hours to weeks to show issues.

M344587487 2019-04-23 19:00

Right now I can't install the riser but will test it when possible.

preda 2019-04-24 12:24

[QUOTE=M344587487;514489]
GEC performance as data transfer is limited sounds like an interesting thing to test, how much of an impact does it have? Would reducing the GEC frequency to mitigate this with the -blocks flag be detrimental to error checking in ways other than just taking longer before an error is detected?[/QUOTE]

I'm considering changing the default block size to 1000 (from the current 400), which would mean a check done every 1M iterations. This because RadeonVII is so fast and rather reliable that a default of 400 seems un-necesarilly low. Of course the user is able to specify lower values such as 400, 200, 100 if he's suspecting something or simply wants frequent feedback.

In general GpuOwl does little transfer over the PCIe, so putting the card is a less-than-16x slot should have tiny impact. Indeed the check becomes a bit slower, but it's tiny anyway.

SELROC 2019-04-24 12:58

[QUOTE=preda;514538]I'm considering changing the default block size to 1000 (from the current 400), which would mean a check done every 1M iterations. This because RadeonVII is so fast and rather reliable that a default of 400 seems un-necesarilly low. Of course the user is able to specify lower values such as 400, 200, 100 if he's suspecting something or simply wants frequent feedback.

In general GpuOwl does little transfer over the PCIe, so putting the card is a less-than-16x slot should have tiny impact. Indeed the check becomes a bit slower, but it's tiny anyway.[/QUOTE]


In effect I have more *occasional* errors with Radeon VII, the RX580 never showed an error.
The last error occurred on Radeon VII was just EE with a normal residue, hard to decipher what is going on as I can't be all the time watching. But gpuowl has recovered happily, so I only lost the last 400K iterations.
Those errors are so occasional that doubling the block size or setting to 1000 has little impact overall. So it is good if you do it.

SELROC 2019-04-25 14:56

[QUOTE=preda;513683]Warning for ROCm users: refrain from upgrading to recently-released ROCm 2.3, there is a 5% perf degradation. [URL]https://github.com/RadeonOpenCompute/ROCm/issues/766[/URL][/QUOTE]


Good news !


[url]https://github.com/RadeonOpenCompute/ROCm/issues/766#issuecomment-486592049[/url]

kracker 2019-04-25 17:01

Tried to use the makefile for the first time under MSYS2/Windows... getting this:
[code]
i5-4670k@DESKTOP-H3R152O MINGW64 ~/gpuowl-master-t/gpuowl
$ make
echo \"`git describe --long --dirty --always`\" > version.inc
echo Version: `cat version.inc`
Version: "v6.5-24-g984cfc4"
g++ -Wall -O2 -std=c++17 -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp main.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.
d000046.o:(.idata$5+0x0): multiple definition of `__imp___C_specific_handler'
d000043.o:(.idata$5+0x0): first defined here
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: In function `pre_c_init':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:146: undefined reference to `__p__fmode'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: In function `__tmainCRTStartup':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:290: undefined reference to `_set_invalid_parameter_handler'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:299: undefined reference to `__p__acmdln'
C:\msys64\tmp\ccyfHvwr.o:common.cpp:(.text+0x53c): undefined reference to `__imp___acrt_iob_func'
C:\msys64\tmp\ccrwc8MT.o:Args.cpp:(.text+0x29): undefined reference to `__imp___acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-merr.o): In function `_matherr':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/merr.c:46: undefined reference to `__acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-pseudo-reloc.o): In function `__report_error':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/pseudo-reloc.c:149: undefined reference to `__acrt_iob_func'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/pseudo-reloc.c:150: undefined reference to `__acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingwex.a(lib64_libmingwex_a-mingw_vfprintf.o): In function `__mingw_vfprintf':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/stdio/mingw_vfprintf.c:53: undefined reference to `_lock_file'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/stdio/mingw_vfprintf.c:55: undefined reference to `_unlock_file'
collect2.exe: error: ld returned 1 exit status
make: *** [Makefile:14: gpuowl] Error 1
[/code]

However I can compile it with no problems "manually".

preda 2019-04-25 20:30

[QUOTE=kracker;514682]Tried to use the makefile for the first time under MSYS2/Windows... getting this: [..]
However I can compile it with no problems "manually".[/QUOTE]

What is the difference when you compile manually? Do you add or remove some flags?

kracker 2019-04-25 23:29

[QUOTE=preda;514711]What is the difference when you compile manually? Do you add or remove some flags?[/QUOTE]


This works with no errors/warnings (after generating version.inc)

[code]
cd gpuowl
g++ -Wall -std=c++17 -c Pm1Plan.cpp
g++ -Wall -std=c++17 -c GmpUtil.cpp
g++ -Wall -std=c++17 -c Worktodo.cpp
g++ -Wall -std=c++17 -c common.cpp
g++ -Wall -std=c++17 -c main.cpp
g++ -Wall -std=c++17 -c Gpu.cpp
g++ -Wall -std=c++17 -c clwrap.cpp
g++ -Wall -std=c++17 -c Task.cpp
g++ -Wall -std=c++17 -c checkpoint.cpp
g++ -Wall -std=c++17 -c timeutil.cpp
g++ -Wall -std=c++17 -c Args.cpp
g++ -Wall -std=c++17 -c state.cpp
g++ -Wall -std=c++17 -c Signal.cpp
g++ -Wall -std=c++17 -c FFTConfig.cpp
g++ -o gpuowl.exe -static -std=c++17 Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o -lgmp -lstdc++fs /c/Windows/System32/OpenCL.dll
[/code]

kriesel 2019-05-02 04:12

[QUOTE=preda;490474]I added an initial CUDA backend to gpuOwl. I expect this to be rough, buggy and not-optimized yet, but it's a start.
...
- cudaOwl has a rich choice of FFT sizes (unlike openOwl). FFT selection is controlled with the "-fft" argument, allowing to specify hard sizes such as 4096K or 4M, or delta steps from the "default" size for the exponent, such as +1 or -1.

A few nice things:
- it's possible to switch the savefile between CUDA/OpenCL in midflight.
- it's possible to change the FFT size in midflight.

Not so nice:
the performance on GTX 1080 is disappointing. 5.9ms/it at the PRP wavefront, 4480K FFT. (thus I don't think it's such a good idea to do PRP or LL on Nvidia yet. Probably TF is a better fit for the 32bit-oriented hardware).[/QUOTE]
Not sure why 5.9, but "not-optimized yet" probably covers it. CUDALucas 2.06 does LL at 4.37ms/it on my GTX1080 at a slightly higher fft length, but probably didn't reach that performance level all at once (and lacks even the Jacobi check). CUDAowl reaching 74% of that from the start with GEC is not a bad effort at all.

[CODE]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Dec 13 04:22:13 | M82599421 2000000 0x191f30c8ee1b9fe0 | 4608K 0.10938 4.3673 436.73s | 4:01:26:41 2.42% |
[/CODE]

SELROC 2019-05-05 15:48

[QUOTE=SELROC;514542]In effect I have more *occasional* errors with Radeon VII, the RX580 never showed an error.
The last error occurred on Radeon VII was just EE with a normal residue, hard to decipher what is going on as I can't be all the time watching. But gpuowl has recovered happily, so I only lost the last 400K iterations.
Those errors are so occasional that doubling the block size or setting to 1000 has little impact overall. So it is good if you do it.[/QUOTE]




Experimenting with -block sizes for a 332M exponent:


1. the GEC time with block 400K is 2.11~ sec.
2. the GEC time with block 1000K is 4.25~ sec.


The GEC time varies with block size.

preda 2019-05-05 20:58

[QUOTE=SELROC;515840]Experimenting with -block sizes for a 332M exponent:


1. the GEC time with block 400K is 2.11~ sec.
2. the GEC time with block 1000K is 4.25~ sec.


The GEC time varies with block size.[/QUOTE]

Yes because a check involves doing "block-size" additional iterations. E.g. with block=400, 400 additional iterations are done every 400^2==160K iterations, while with block=1000, 1000 additional iterations are done every 1000^2=1M iterations.

SELROC 2019-05-06 06:15

[QUOTE=preda;515866]Yes because a check involves doing "block-size" additional iterations. E.g. with block=400, 400 additional iterations are done every 400^2==160K iterations, while with block=1000, 1000 additional iterations are done every 1000^2=1M iterations.[/QUOTE]


That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here.
Do I want long unfrequent checks or short frequent checks ?


When using a block of 1000 and a log of 20000, sometimes gpuowl misses to display the OK... (check ...) output, probably because it is in between.

preda 2019-05-06 10:18

[QUOTE=SELROC;515914]That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here.
Do I want long unfrequent checks or short frequent checks ?


When using a block of 1000 and a log of 20000, sometimes gpuowl misses to display the OK... (check ...) output, probably because it is in between.[/QUOTE]

The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M.

But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan)

SELROC 2019-05-06 10:37

[QUOTE=preda;515923]The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M.

But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan)[/QUOTE]


Ok so if you confirm that the check is displayed, I may have missed it.
Experimenting further ...

SELROC 2019-05-06 13:01

[QUOTE=preda;515923]The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M.

But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan)[/QUOTE]


PS: The Mersenne number [url]https://www.mersenne.org/report_exponent/?exp_lo=332252533&full=1[/url] is composite.
Computation took 15 days 10 hours ~ on Radeon VII.

Prime95 2019-05-06 14:53

[QUOTE=SELROC;515931]PS: The Mersenne number [url]https://www.mersenne.org/report_exponent/?exp_lo=332252533&full=1[/url] is composite.
Computation took 15 days 10 hours ~ on Radeon VII.[/QUOTE]

Impressive speed.

IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing?

SELROC 2019-05-06 14:58

[QUOTE=Prime95;515935]Impressive speed.

IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing?[/QUOTE]




I am currently using ROCm 2.3 which has a performance regression. I bet that with ROCm 2.4 (if they fix the issue) the ETA for 332M will be around 13-14 days.

SELROC 2019-05-06 15:34

[QUOTE=Prime95;515935]Impressive speed.

IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing?[/QUOTE]


At this time GpuOwl supports P-1, so do I better do p-1 before PRP ?

R. Gerbicz 2019-05-06 15:51

[QUOTE=SELROC;515914]That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here.
Do I want long unfrequent checks or short frequent checks ?
[/QUOTE]

Trade-off is a good word, beacuse larger blocksize=L means more work at rollbacks, because you need to redo L^2 iterations.
It isn't that easy for a not that faulty card/cpu to choose the best L, but for example
if you have 0.2 rollbacks per p (so basically one for 5 tests) at p~1e8 then your optimal L value is 1000.
Just interestingly the exact formula:

L=(2*p/#rollback)^(1/3),
where #rollback is the average number of rollbacks for p (so this could be even higher than 1, for a faulty card).

ps. and don't choose L>sqrt(p), because then you'd not make any check, but this also depends on your implementation.

Prime95 2019-05-06 18:38

[QUOTE=SELROC;515937]At this time GpuOwl supports P-1, so do I better do p-1 before PRP ?[/QUOTE]

Yes, P-1 is highly recommended.

It's a question of "relative" performance. That is, if on your machine GpuOwl is 5x faster than prime95 at PRP but only 2x faster at P-1, then GpuOwl should be doing PRP 100% of the time. Use prime95 on your CPU to do all your P-1 work prior to having the the GPU do the PRP.

Your P-1 bounds are "suspicious" in that when prime95 is used to double-check your result years down the road, I think prime95 will want to redo the P-1 to higher bounds.

IIUC, Preda has a somewhat different view on optimal P-1 bounds in that he believes Gerbicz error checking in the initial PRP test means double-checking is not necessary.

For reference, prime95 would use these bounds (assuming 1.8GB of memory).
Saving 1 LL/PRP test: B1=1,115,000 B2=14,495,000
Saving 2 LL/PRP tests: B1=2,395,000 B2=35,326,000

SELROC 2019-05-06 19:05

[QUOTE=Prime95;515949]Yes, P-1 is highly recommended.

It's a question of "relative" performance. That is, if on your machine GpuOwl is 5x faster than prime95 at PRP but only 2x faster at P-1, then GpuOwl should be doing PRP 100% of the time. Use prime95 on your CPU to do all your P-1 work prior to having the the GPU do the PRP.

Your P-1 bounds are "suspicious" in that when prime95 is used to double-check your result years down the road, I think prime95 will want to redo the P-1 to higher bounds.

IIUC, Preda has a somewhat different view on optimal P-1 bounds in that he believes Gerbicz error checking in the initial PRP test means double-checking is not necessary.

For reference, prime95 would use these bounds (assuming 1.8GB of memory).
Saving 1 LL/PRP test: B1=1,115,000 B2=14,495,000
Saving 2 LL/PRP tests: B1=2,395,000 B2=35,326,000[/QUOTE]


The ETA for 332M exponent is 2 months on Radeon RX580 and 15 days on Radeon VII.


I am using primenet .py to get assignments and return results, gpuowl runs in parallel, and the worktodo.txt file is checked every 2 hours. The assignment type is 153.



To do P-1, do I have to get a different assignment type ?

Prime95 2019-05-06 19:13

[QUOTE=SELROC;515950]
To do P-1, do I have to get a different assignment type ?[/QUOTE]

No, but prime95's worktodo.txt would be:

Pfactor=exponent,how_far_factored

I would not change your workflow until preda weighs in

kriesel 2019-05-06 19:14

[QUOTE=SELROC;515950]
To do P-1, do I have to get a different assignment type ?[/QUOTE]
It works on primenet to perform P-1 without an assignment, while you hold a primality test assignment. Manually submitting such a P-1 result will generate a message that there was an assignment type mismatch for the exponent, but the result is accepted, computing credit isgiven, and the existing primality test assignment for the same exponent is unaffected.

Similarly, if assigned P-1, some additional manual TF for the same exponent can be performed and reported before the assigned P-1 is. I've had success doing so and altering the bits-factored-to field of the P-1 worktodo entry to match, which CUDAPm1 uses to determine optimal bounds.

It's bad form to do a different primality test type than assigned.

SELROC 2019-05-06 19:20

[QUOTE=kriesel;515952]It works on primenet to perform P-1 without an assignment, while you hold a primality test assignment. Manually submitting such a P-1 result will generate a message that there was an assignment type mismatch for the exponent, but the result is accepted, computing credit isgiven, and the existing primality test assignment for the same exponent is unaffected.

Similarly, if assigned P-1, some additional manual TF for the same exponent can be performed and reported before the assigned P-1 is. I've had success doing so and altering the bits-factored-to field of the P-1 worktodo entry to match, which CUDAPm1 uses to determine optimal bounds.

It's bad form to do a different primality test type than assigned.[/QUOTE]


Thanks.
However, may I ask (@Preda) why P-1 on GpuOwl involved the CPU (which becomes hot) at some point near the end?

kriesel 2019-05-06 19:42

[QUOTE=SELROC;515953]Thanks.
However, may I ask (@Preda) why P-1 on GpuOwl involved the CPU (which becomes hot) at some point near the end?[/QUOTE]
If I recall correctly, neither CUDAPm1 nor gpuowl do the required gcd on the gpu, using instead the cpu for that portion via gmp.

That Preda is so efficient, he answered your question months before you asked.
[URL]https://www.mersenneforum.org/showpost.php?p=506749&postcount=946[/URL]
[QUOTE]It needs GMP (for the GCD done on the CPU, as was before with PRP-1)[/QUOTE]

preda 2019-05-06 21:33

[QUOTE=Prime95;515949]Yes, P-1 is highly recommended.

It's a question of "relative" performance. That is, if on your machine GpuOwl is 5x faster than prime95 at PRP but only 2x faster at P-1, then GpuOwl should be doing PRP 100% of the time. Use prime95 on your CPU to do all your P-1 work prior to having the the GPU do the PRP.

Your P-1 bounds are "suspicious" in that when prime95 is used to double-check your result years down the road, I think prime95 will want to redo the P-1 to higher bounds.

IIUC, Preda has a somewhat different view on optimal P-1 bounds in that he believes Gerbicz error checking in the initial PRP test means double-checking is not necessary.

For reference, prime95 would use these bounds (assuming 1.8GB of memory).
Saving 1 LL/PRP test: B1=1,115,000 B2=14,495,000
Saving 2 LL/PRP tests: B1=2,395,000 B2=35,326,000[/QUOTE]

GpuOwl can do P-1 now, and it's rather efficient at it -- especially stage2 on high-RAM GPUs such as R7 with 16GB. Two GCD per test are offloaded to the CPU (one after each stage).

OTOH GpuOwl doesn't do "preliminary P-1" ahead of a PRP test by its own initiative. One reason being that the already-done P-1 information is not included in the assignment line for a P-1 test (unlike the already-done TF information, which is included), thus GpuOwl could not know whether P-1 is needed or not ahead of PRP.

The solution, for a 100M exponent, is for the user to manually do "preliminary P-1" if desired, before the PRP. Either by getting a manual P-1 assignment for the exponent, or by creating the worktodo line simply without AID (without assignment from the server) and still submit the result at the end.

Prime95 2019-05-06 22:16

[QUOTE=preda;515970]GpuOwl can do P-1 now, and it's rather efficient at it -- especially stage2 on high-RAM GPUs .[/QUOTE]

How were the 100M P-1 bounds calculated? They were vastly smaller than prime95's optimal P-1 bounds.

Prime95 2019-05-06 22:20

[QUOTE=preda;515970]
OTOH GpuOwl doesn't do "preliminary P-1" ahead of a PRP test by its own initiative. One reason being that the already-done P-1 information is not included in the assignment line for a P-1 test (unlike the already-done TF information, which is included), thus GpuOwl could not know whether P-1 is needed or not ahead of PRP.[/QUOTE]

A PRP worktodo line looks like this:
PRP=k,b,n,c[,how_far_factored,tests_saved[,base,residue_type]][,known_factors]

If tests_saved is zero then P-1 has been done. If one, then this is a PRP double-check with no P-1 done yet. If two then this is a first-time PRP with no P-1 done yet.

I have not validated that the server is sending this data properly to prime95 (or formatting them properly in the manual reservations web page).

kriesel 2019-05-06 23:08

[QUOTE=Prime95;515974]A PRP worktodo line looks like this:
PRP=k,b,n,c[,how_far_factored,tests_saved[,base,residue_type]][,known_factors]

If tests_saved is zero then P-1 has been done. If one, then this is a PRP double-check with no P-1 done yet. If two then this is a first-time PRP with no P-1 done yet.

I have not validated that the server is sending this data properly to prime95 (or formatting them properly in the manual reservations web page).[/QUOTE]
prime95 via primenet: PRP=(aid redacted),1,2,79335979,-1,75,0,3,1
fresh from manual assign now: PRP=(aid redacted),1,2,85128553,-1,76,0

SELROC 2019-05-07 03:57

[QUOTE=kriesel;515952]It works on primenet to perform P-1 without an assignment, while you hold a primality test assignment. Manually submitting such a P-1 result will generate a message that there was an assignment type mismatch for the exponent, but the result is accepted, computing credit isgiven, and the existing primality test assignment for the same exponent is unaffected.

Similarly, if assigned P-1, some additional manual TF for the same exponent can be performed and reported before the assigned P-1 is. I've had success doing so and altering the bits-factored-to field of the P-1 worktodo entry to match, which CUDAPm1 uses to determine optimal bounds.

It's bad form to do a different primality test type than assigned.[/QUOTE]


This all requires manual intervention, which I am moving away from. I have other projects to follow and I need GIMPS to work on his own.


It could be easier if the server can assign PRP with P-1, even better, if GpuOwl could do P-1 before PRP.

preda 2019-05-07 08:30

[QUOTE=Prime95;515973]How were the 100M P-1 bounds calculated? They were vastly smaller than prime95's optimal P-1 bounds.[/QUOTE]

Which ones? (for which exponent, user)

GpuOwl always takes P-1 bounds from the user. They can be configured per-exponent or globally (per run). The user must configure (specify) at least B1. By default B2==30*B1, but this is also configurable either as a ratio (relative to B1) or absolute B2.

For exponents around 95M (the P-1 wavefront), I would consider normal bounds (assuming no previous P-1 done) anything with B1 between 500K and 2M, and rB2 between 10 and 25.

For exponents at 100M-digits, which might have lower TF, I would probably use B1 around 1M-3M and rB2 20 to 30.

Right now I'm not doing P-1 because the exponents at 85M where I PRP are fully P-1'ed. If I'd do 100M-digits exponents, probably I would allocate.. up to about 4% of the the test time (about half-a-day on R7?) for additional P-1 and TF before starting the PRP.

preda 2019-05-07 10:18

[QUOTE=Prime95;515974]A PRP worktodo line looks like this:
PRP=k,b,n,c[,how_far_factored,tests_saved[,base,residue_type]][,known_factors]

If tests_saved is zero then P-1 has been done. If one, then this is a PRP double-check with no P-1 done yet. If two then this is a first-time PRP with no P-1 done yet.

I have not validated that the server is sending this data properly to prime95 (or formatting them properly in the manual reservations web page).[/QUOTE]

Thank you, so the boolean information on whether any P-1 has been done is there in the assignment in the form of "tests_saved".

Note though that in this particular case (
[url]https://www.mersenne.org/report_exponent/?exp_lo=332252533&full=1[/url] ), P-1 has been done before the PRP although to insufficient bounds (not by GpuOwl).

I can change GpuOwl to auto-trigger P-1 for a PRP without any P-1; but I still can't handle the case of P-1 done with insufficient bounds.

SELROC 2019-05-07 10:57

[QUOTE=preda;516001]Thank you, so the boolean information on whether any P-1 has been done is there in the assignment in the form of "tests_saved".

Note though that in this particular case (
[URL]https://www.mersenne.org/report_exponent/?exp_lo=332252533&full=1[/URL] ), P-1 has been done before the PRP although to insufficient bounds (not by GpuOwl).

I can change GpuOwl to auto-trigger P-1 for a PRP without any P-1; but I still can't handle the case of P-1 done with insufficient bounds.[/QUOTE]


It seems the P-1 was done by MadPoo, so may be ask him how it was done.

kriesel 2019-05-07 11:56

[QUOTE=preda;516001]Thank you, so the boolean information on whether any P-1 has been done is there in the assignment in the form of "tests_saved".

Note though that in this particular case (
[URL]https://www.mersenne.org/report_exponent/?exp_lo=332252533&full=1[/URL] ), P-1 has been done before the PRP although to insufficient bounds (not by GpuOwl).

I can change GpuOwl to auto-trigger P-1 for a PRP without any P-1; but I still can't handle the case of P-1 done with insufficient bounds.[/QUOTE]CUDAPm1 and prime95 include code for determining bounds to maximize the probable computing time savings. The source is out there for you to repurpose in gpuowl. I think primenet will do the right thing and indicate P-1 needed if the bounds of what was done previously are inadequate.

SELROC 2019-05-09 10:59

[QUOTE=SELROC;515936]I am currently using ROCm 2.3 which has a performance regression. I bet that with ROCm 2.4 (if they fix the issue) the ETA for 332M will be around 13-14 days.[/QUOTE]


Disappointingly ROCm 2.4 has even worse speed regression. I have scolded them on github.


[url]https://github.com/RadeonOpenCompute/ROCm/issues/766#issuecomment-490484839[/url]

M344587487 2019-05-09 13:56

[QUOTE=SELROC;516214]Disappointingly ROCm 2.4 has even worse speed regression. I have scolded them on github.


[URL]https://github.com/RadeonOpenCompute/ROCm/issues/766#issuecomment-490484839[/URL][/QUOTE]
Were they a child or pet I'm sure scolding would go a long way. In my experience everyone else tends to tell you to sling your hook. Now go to your room and think about what you've done ;)

SELROC 2019-05-09 14:11

[QUOTE=M344587487;516232]Were they a child or pet I'm sure scolding would go a long way. In my experience everyone else tends to tell you to sling your hook. Now go to your room and think about what you've done ;)[/QUOTE]


I hope that with this they get the sense of our unsatisfied feeling :-)

kriesel 2019-05-12 17:55

A little more effort toward NVIDIA support?
 
Happened to check the gpuowl github repository today, and saw this as the latest commit, relating to an attempt for the NVIDIA RTX2070.

Does it work on NVIDIA OpenCl yet? [URL]https://github.com/preda/gpuowl/commit/c48d46fdbcba6c490c439aa9b07eb4c40bcacae0[/URL]

kriesel 2019-05-12 18:27

[QUOTE=SELROC;516005]It seems the P-1 was done by MadPoo, so may be ask him how it was done.[/QUOTE]MadPoo did a large batch of P-1 factoring attempts with very small P-1 bounds for quick ~0.8% chances of finding factors (on 100Mdigit exponents with primality tests completed or assigned and no prior P-1 factoring). See
[url]https://www.mersenneforum.org/showpost.php?p=513719&postcount=3[/url]

[url]https://www.mersenneforum.org/showpost.php?p=513803&postcount=8[/url]

chengsun 2019-05-12 21:31

[QUOTE=kriesel;516553]Happened to check the gpuowl github repository today, and saw this as the latest commit, relating to an attempt for the NVIDIA RTX2070.

Does it work on NVIDIA OpenCl yet? [URL]https://github.com/preda/gpuowl/commit/c48d46fdbcba6c490c439aa9b07eb4c40bcacae0[/URL][/QUOTE]


It now works on my Nvidia RTX 2070, as of that commit.


EDIT: should clarify that I have only tested PRP, not P-1.




(Let me know if people would be interested in benchmarks.)

kriesel 2019-05-13 00:04

[QUOTE=chengsun;516563]It now works on my Nvidia RTX 2070, as of that commit.

EDIT: should clarify that I have only tested PRP, not P-1.

(Let me know if people would be interested in benchmarks.)[/QUOTE]Always! I'm sure iteration times for specific exponents will be of interest to RTX20xx owners, or those considering buying one, for comparison to CUDALucas on the same model. And congratulations on getting it to run.

kriesel 2019-05-13 01:28

[QUOTE=preda;516001]Thank you, so the boolean information on whether any P-1 has been done is there in the assignment in the form of "tests_saved".
[/QUOTE]Not boolean; integer. Tests-saved is 0, 1 or 2, as issued by primenet. Two for saving both the first primality test and double check if a P-1 factor is found, so larger bounds are justified to increase the odds of finding a factor, or one for saving only the double check if a factor is found, so lesser bounds are justified, or 0 for don't bother with any P-1, it's already been (adequately I think) done. Although CUDAPm1 (and I think prime95/mprime) can be influenced to use higher yet bounds by going to higher values 3 to 9.

xx005fs 2019-05-13 05:26

Windows Binary
 
Any prebuilt windows binary for the newest commit? I am very keen to try out PRP on nvidia hardwares.

preda 2019-05-13 07:52

[QUOTE=kriesel;516569]Not boolean; integer. Tests-saved is 0, 1 or 2, as issued by primenet. Two for saving both the first primality test and double check if a P-1 factor is found, so larger bounds are justified to increase the odds of finding a factor, or one for saving only the double check if a factor is found, so lesser bounds are justified, or 0 for don't bother with any P-1, it's already been (adequately I think) done. Although CUDAPm1 (and I think prime95/mprime) can be influenced to use higher yet bounds by going to higher values 3 to 9.[/QUOTE]

So for a first-time PRP with no previous P-1, tests-saved is 2, OK.

I would like to know what happens if the same first-time PRP had P-1 done with B1=B2=100'000. Is tests-saved 2 or 0 in this case?

i.e. is an "insufficient bounds" P-1 treated the same as "no P-1" when handing out the PRP assignment?

SELROC 2019-05-13 07:59

[QUOTE=preda;516575]So for a first-time PRP with no previous P-1, tests-saved is 2, OK.

I would like to know what happens if the same first-time PRP had P-1 done with B1=B2=100'000. Is tests-saved 2 or 0 in this case?

i.e. is an "insufficient bounds" P-1 treated the same as "no P-1" when handing out the PRP assignment?[/QUOTE]


Hi Mihai,
I am doing TF up to 77 bit. TF from 76 to 77 takes 3 hour 40 minutes on the RX580.
How that performance compares to P-1 ?

preda 2019-05-13 11:22

[QUOTE=SELROC;516577]Hi Mihai,
I am doing TF up to 77 bit. TF from 76 to 77 takes 3 hour 40 minutes on the RX580.
How that performance compares to P-1 ?[/QUOTE]

What exponent size?

P-1 duration, and prob of factor, depends on the bounds. The time is often split roughly half-half between the first and second stage (B1, B2). The first stage is mostly linear with B1. Thus, running with B1=10000 and B2=2M would be one-tenth of B1=1M and B2=20M.

To get an idea of the duration of first-stage, take the duration of N iterations of PRP where
N=B1*1.44*1.2 (aprox)
and double that to include the second-stage.

One more thing, second stage goes a bit faster when there's plenty of memory. I personally prefer 16GB for second-stage on GPU; if the GPU only has only 8GB, I would probably use a lower rate B2/B1, e.g. 10.

SELROC 2019-05-13 11:40

[QUOTE=preda;516586]What exponent size?

P-1 duration, and prob of factor, depends on the bounds. The time is often split roughly half-half between the first and second stage (B1, B2). The first stage is mostly linear with B1. Thus, running with B1=10000 and B2=2M would be one-tenth of B1=1M and B2=20M.

To get an idea of the duration of first-stage, take the duration of N iterations of PRP where
N=B1*1.44*1.2 (aprox)
and double that to include the second-stage.

One more thing, second stage goes a bit faster when there's plenty of memory. I personally prefer 16GB for second-stage on GPU; if the GPU only has only 8GB, I would probably use a lower rate B2/B1, e.g. 10.[/QUOTE]


The current TF exponents are between 170M and 200M.

preda 2019-05-13 12:21

[QUOTE=SELROC;516587]The current TF exponents are between 170M and 200M.[/QUOTE]

[QUOTE]
I am doing TF up to 77 bit. TF from 76 to 77 takes 3 hour 40 minutes on the RX580.
How that performance compares to P-1 ?
[/QUOTE]

So, the question might be what is better to do for those exponents now, TF or P-1? My intuition would say P-1, but somebody should probably work out the numbers. Maybe there is not a huge difference between the order of (TF, P-1) for a few more bits of TF.

For P-1, the second question is what bounds? I also don't know. The P-1 calculator can be used as a starting point.

I think you do get more credit for TF then P-1 per unit of time, though.

Prime95 2019-05-13 13:42

[QUOTE=preda;516575]
I would like to know what happens if the same first-time PRP had P-1 done with B1=B2=100'000. Is tests-saved 2 or 0 in this case?

i.e. is an "insufficient bounds" P-1 treated the same as "no P-1" when handing out the PRP assignment?[/QUOTE]

Yes, insufficient is treated as "no P-1". I may need to tweak the server's rules for what constitutes sufficient.

kriesel 2019-05-13 15:57

[QUOTE=preda;516591]So, the question might be what is better to do for those exponents now, TF or P-1? My intuition would say P-1, but somebody should probably work out the numbers. Maybe there is not a huge difference between the order of (TF, P-1) for a few more bits of TF.

For P-1, the second question is what bounds? I also don't know. The P-1 calculator can be used as a starting point.

I think you do get more credit for TF than P-1 per unit of time, though.[/QUOTE]It's best to do some of both, and using mersenne.ca's representation of gpu72 bounds for each is not bad.
In mfaktc and mfakto, and probably elsewhere, TF effort is proportional to 2^bit-limit, inversely proportional to exponent, plus slight effects that mostly offset, of longer coding sequences for kernels for the higher bit levels, but fewer primes per linear interval at higher magnitudes. Details at [URL="https://www.mersenneforum.org/showpost.php?p=508523&postcount=6;"]https://www.mersenneforum.org/showpost.php?p=508523&postcount=6[/URL]; relative runtime scaling data at [URL="https://www.mersenneforum.org/showpost.php?p=488519&postcount=2"]https://www.mersenneforum.org/showpost.php?p=488519&postcount=2
[/URL] which shows variation of (time/bitlevel * exponent/ 2[SUP]bitlevel[/SUP]) is only about 5% over 7 bits of TF for a given exponent, or under 12% for 7 bits and 4:1 exponent variation.

On gpus, one gets far more computing credit per day doing TF, than anything else (P-1, LL, PRP), having to do with the SP/DP or integer/DPfloat performance ratios.
The credit ratio is in most cases, ~8:1 TF/other on AMD, 11-16 on NVIDIA, and up to 40. on RTX20xx, while on cpus I've seen it range 0.7 to 1.3 or so. Not sure about Radeon 7. One can generally get a rough sense of the ratio from Heinrich's TF and LL benchmark data for the same gpu if both are listed. Unfortunately they don't seem consistent with timings I've seen posted for the Radeon 7. Owners of newer cards, please submit benchmarks. [URL]https://www.mersenne.ca/cudalucas.php[/URL] [URL]https://www.mersenne.ca/mfaktc.php[/URL]

kriesel 2019-05-13 16:40

[QUOTE=SELROC;516587]The current TF exponents are between 170M and 200M.[/QUOTE]By using [url]https://www.mersenne.org/manual[/url][B]_gpu[/B]_assignment/

instead of [url]https://www.mersenne.org/manual_assignment/[/url]
we can get TF assignments down to about 93M in preparation for first-time tests, and help keep ahead of the P-1 and primality test wavefronts.

SELROC 2019-05-13 18:16

[QUOTE=kriesel;516623]By using [URL]https://www.mersenne.org/manual[/URL][B]_gpu[/B]_assignment/

instead of [URL]https://www.mersenne.org/manual_assignment/[/URL]
we can get TF assignments down to about 93M in preparation for first-time tests, and help keep ahead of the P-1 and primality test wavefronts.[/QUOTE]


I don't know how to instruct mfloop.py to request such exponents.


All times are UTC. The time now is 07:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.