![]() |
[QUOTE=preda;490821]Do you have a computer with both AMD and Nvidia GPUs?
That would require an executable that is linked with both OpenCL and CUDA. That can be done, I just wasn't sure if there's a need for such a build.[/QUOTE] Such systems are likely to be rare. The drivers don't get along well. There are videos online about the strange sequence of driver installation and card removal and replacement necessary to get both to operate. Results vary. [URL]https://www.reddit.com/r/nvidia/comments/62wr2a/nvidia_and_amd_gpus_in_the_same_system/[/URL] In a mixed system, why wouldn't one run an OpenCl app or several for the AMD cards and a separate CUDA app or several for the NVIDIA cards? We run one instance (or more) per gpu as it is, on all-AMD or all-NVIDIA systems. If contemplating a future program that managed multiple tasks to occupy multiple gpus, keeping track of different gpu models and makes and their mappings to OS-specific or driver-specific device numbering sequences could get complicated. |
[QUOTE=preda;490821]Do you have a computer with both AMD and Nvidia GPUs?
That would require an executable that is linked with both OpenCL and CUDA. That can be done, I just wasn't sure if there's a need for such a build.[/QUOTE] My system only with AMD. There are mixed AMD/NVIDIA systems, but they are unstable and are generally used only for mining. Why not two executables openowl/cudaowl? in this case you can choose -per device- which one to run. |
[QUOTE=SELROC;490832]
Why not two executables openowl/cudaowl? in this case you can choose -per device- which one to run.[/QUOTE] Yes, that's my thinking too. The drawback is that cudaowl would only see the CUDA devices, and this would affect the list in -h, and similarly for openowl seeing only OpenCL devices. Not a big issue though. |
[QUOTE=preda;490836]Yes, that's my thinking too. The drawback is that cudaowl would only see the CUDA devices, and this would affect the list in -h, and similarly for openowl seeing only OpenCL devices. Not a big issue though.[/QUOTE]
That sounds pretty reasonable for one executable to see only the devices it can run on. |
I ran Nvidia GPUs with AMD onboard graphics; but the AMD stuff was only [STRIKE]drawing[/STRIKE] generating the display. No compute.
|
[QUOTE=preda;490739]Yes, I'm aware of that, it's because of the recent backend split.[/QUOTE]
Today I am testing gpuOwl OpenCL variant, version 3.1-mod. Fom the first tests it looks sligthly faster ... |
[QUOTE=SELROC;491273]Today I am testing gpuOwl OpenCL variant, version 3.1-mod.
Fom the first tests it looks sligthly faster ...[/QUOTE] I have attempted to select the FFT size with the -fft argument, I tried 5000K and 5M, but it keeps using 8192K. |
[QUOTE=SELROC;491331]I have attempted to select the FFT size with the -fft argument, I tried 5000K and 5M, but it keeps using 8192K.[/QUOTE]
In OpenCL right now there's limited support for selecting FFT size. Only 4M, 8M and 16M are implemented. The -fft command line switch was ignored before, but now (recent commit) can be used to select between these 3 sizes. I'm working now towards adding new sizes. |
[QUOTE=preda;491333]In OpenCL right now there's limited support for selecting FFT size. Only 4M, 8M and 16M are implemented. The -fft command line switch was ignored before, but now (recent commit) can be used to select between these 3 sizes.
I'm working now towards adding new sizes.[/QUOTE] Ok, so I tried with version 3.2 to specify an FFT of 4M and 5M, which worked at the beginning, then it showed an error "EE loaded ...". I guess that I cannot reduce the fft size when starting from a checkpoint (?) |
[QUOTE=SELROC;491348]Ok, so I tried with version 3.2 to specify an FFT of 4M and 5M, which worked at the beginning, then it showed an error "EE loaded ...".
I guess that I cannot reduce the fft size when starting from a checkpoint (?)[/QUOTE] You can reload with different FFT size; but if you reduce it too much for the exponent the computation does not work anymore. Look for the value "bits/word", if that's higher than 19 then the FFT is too small. 5M is not implemented yet. 4M is good up to about 77M - 78M exponents. You may try an -fft 16M, and that should work although slowly. |
[QUOTE=preda;491398]You can reload with different FFT size; but if you reduce it too much for the exponent the computation does not work anymore. Look for the value "bits/word", if that's higher than 19 then the FFT is too small.
5M is not implemented yet. 4M is good up to about 77M - 78M exponents. You may try an -fft 16M, and that should work although slowly.[/QUOTE] I am computing 85M-86M so 4M would fail (?), while 16M should slow down. Looking forward for 5M (before automatic work fetch). |
| All times are UTC. The time now is 23:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.