![]() |
For what it's worth, [i]preda[/i] is doing interesting things with gpuowl, including some magical combination of PRP+P-1, which appears nearly ready for production.
|
[QUOTE=James Heinrich;498664]For what it's worth, [i]preda[/i] is doing interesting things with gpuowl, including some magical combination of PRP+P-1, which appears nearly ready for production.[/QUOTE]
The only problem is that such magics don't happen on CUDA :smile: |
[QUOTE=James Heinrich;498664]For what it's worth, [I]preda[/I] is doing interesting things with gpuowl, including some magical combination of PRP+P-1, which appears nearly ready for production.[/QUOTE]Yes, that work is very interesting, and described in his recent posts in [URL]https://www.mersenneforum.org/showthread.php?t=22204&page=70[/URL]. Unfortunately Preda has abandoned efforts toward CUDA or OpenCl on NVIDIA and sold his NVIDIA test GPU.
There appears to be no appreciable mersenne searching software development activity for NVIDIA, for either PRP or P-1, with either CUDA or OpenCl. To my knowledge there's no usable PRP NVIDIA code available at all. |
[QUOTE=aaronhaviland;498613]
The code I have on github should be an improvement based on the code from Sourceforge, but it's been so long, I don't remember what I actually improved. [/QUOTE] I found reading the commit notes in your fork interesting. "fencepost error" may account for some of the anomalies I've seen in the Sourceforge-version-derived Windows executables. Please review those notes and carry it forward! I have a collection of reference material, specific to CUDAPm1 (Sourceforge versions), at [URL]https://www.mersenneforum.org/showthread.php?p=498673#post498673[/URL] Post 7 is a summary/overview of testing I've done on CUDAPm1 v0.20, mostly September 2013 cuda 5.5 version, some November 2013 cuda 5.0, on Windows. Posts 8 and 9 are new and contain attachments showing detail, separately per gpu model, 8 models, ranging from 1 to 8 gb gpu ram. Total test effort was I think >1 gpu-year to date. For most of that I have been able to submit at least stage 1 results to primenet, and for many stage 2, although some runs failed before printing a stage 1 gcd result, factor or no factor found, and some runs failed at other points. Some were completed by moving to a different gpu. Others can't be completed that way either. If anyone has a way of converting or moving a pre-gcd stage 1 run from CUDAPm1 to some other software that can perform the gcd check for a factor, please share, either here or by PM. (Or a CUDAPm1 Windows executable or source code that doesn't have that issue...) These tests indicate that currently, 0 of 8 gpu models evaluated can complete stage 1 and 2 above exponent value ~433,000,000 (maybe as low as ~431M max for the GTX1060 3gb). Prime95 can go higher, but is also capped well below the mersenne.org limit of 10[SUP]9[/SUP] (~595M, except FMA3-capable hardware ~920M). |
[QUOTE=kriesel;498677]
If anyone has a way of converting or moving a pre-gcd stage 1 run from CUDAPm1 to some other software that can perform the gcd check for a factor, please share, either here or by PM. (Or a CUDAPm1 Windows executable or source code that doesn't have that issue...) These tests indicate that currently, 0 of 8 gpu models evaluated can complete stage 1 and 2 above exponent value ~433,000,000 (maybe as low as ~431M max for the GTX1060 3gb). Prime95 can go higher, but is also capped well below the mersenne.org limit of 10[SUP]9[/SUP] (~595M, except FMA3-capable hardware ~920M).[/QUOTE] GpuOwl does the GCD on the CPU, using GMP, and it's pretty small and simple code, see e.g.: [url]https://github.com/preda/gpuowl/blob/master/GCD.cpp[/url] More work is probably in transforming the "balanced bits" from the GPU representation into "compact words" for the CPU. (i.e. importing the data GPU-to-CPU). After that, doing the GCD with GMP is easy. |
[QUOTE=preda;498684]GpuOwl does the GCD on the CPU, using GMP, and it's pretty small and simple code, see e.g.:
[URL]https://github.com/preda/gpuowl/blob/master/GCD.cpp[/URL] More work is probably in transforming the "balanced bits" from the GPU representation into "compact words" for the CPU. (i.e. importing the data GPU-to-CPU). After that, doing the GCD with GMP is easy.[/QUOTE] Thanks. Having no idea what gpuOwL's PRP-1 limits are, not having run it at all yet, and myself regarding it as a somewhat different animal than standalone P-1 run capability, I omitted it from the limit description. I've looked at the various CUDA app code (CUDALucas, mfaktc, CUDAPm1) but have not succeeded in getting a successful build completed for a CUDA app (unmodified code) yet on Windows, or spent much time trying. CUDAPm1 looked to me to be using GMP also. But the available Windows CUDAPm1 executables are linked to an old GMP version (2013 or earlier), and after looking through GMP's revision history of the past few years, I think there might be some issues due to that, not present in gpuOwL, or future builds of CUDAPm1 with a current GMP version for that matter. I think a fair description of the CUDAPm1 testing I've done is partial factorial black box. I run almost entirely unique exponents on different systems containing different gpu models, with some system cpu models matching but mostly not. Slightly different exponents will sometimes fail on the same gpu model but a different box can run them to completion. (Quadro 2000, ~84.8M exponent for example.) These differences could easily be the effect of some bug triggering on certain operands rather than sensitivity to memory sizes, cpu type, gpu type, OS, etc. |
[QUOTE=preda;498684]GpuOwl does the GCD on the CPU, using GMP, and it's pretty small and simple code, see e.g.:
[URL]https://github.com/preda/gpuowl/blob/master/GCD.cpp[/URL] More work is probably in transforming the "balanced bits" from the GPU representation into "compact words" for the CPU. (i.e. importing the data GPU-to-CPU). After that, doing the GCD with GMP is easy.[/QUOTE] libgmp-dev is a separate package that has become a dependency for gpuOwl on Debian and I think also on other flavors of Linux. If compilation fails, install libgmp-dev. |
@kriesel, I feel the pain for your testing, such a situation would have driven me mad.
(I understand this thread is about CUDA P-1, but diverting to GpuOwl, it does a normal P-1 first-stage to any limit B1, and within the general exponent limits (PRP) of GpuOwl; the "different beast" starts with the second stage P-1. But there, GpuOwl can do any B2<=Exponent in B2 iterations during the PRP. So to clarify about GpuOwl: - first stage is up to any B1, and just as efficient as any P-1 first-stage. - second-stage is "fancy", and done in parallel with the PRP, but any B2<=Exponent can be covered in the first B2 iterations of the PRP ) |
[QUOTE=SELROC;498690]libgmp-dev is a separate package that has become a dependency for gpuOwl on Debian and I think also on other flavors of Linux.
If compilation fails, install libgmp-dev.[/QUOTE] ...and presumably a whole linux install on the Windows boxes? ;) This, for Windows, is rather dated (2006); [url]https://cs.nyu.edu/~exact/core/gmp/index.html[/url] |
[QUOTE=preda;498691]@kriesel, I feel the pain for your testing, such a situation would have driven me mad.
(I understand this thread is about CUDA P-1, but diverting to GpuOwl, it does a normal P-1 first-stage to any limit B1, and within the general exponent limits (PRP) of GpuOwl; the "different beast" starts with the second stage P-1. But there, GpuOwl can do any B2<=Exponent in B2 iterations during the PRP. So to clarify about GpuOwl: - first stage is up to any B1, and just as efficient as any P-1 first-stage. - second-stage is "fancy", and done in parallel with the PRP, but any B2<=Exponent can be covered in the first B2 iterations of the PRP[/QUOTE] Re testing and sanity, it can try one's patience, yes, but fortunately I have a lot of that, and it's renewable within limits. Re gpuOwL B2, do I understand you correctly that its B2 is limited to no more than the exponent? (Seems reasonable.) If so I'll add that to the available software summary I maintain. |
[QUOTE=SELROC;498690]libgmp-dev is a separate package that has become a dependency for gpuOwl on Debian and I think also on other flavors of Linux.
If compilation fails, install libgmp-dev.[/QUOTE] All the online references to libgmp-dev I can find are referring to Debian, Ubuntu, etc. I'm attempting gpuowl builds in Mingw64/msys2 atop Windows 7 X64. (Freshly updated tonight to current, and g++ is v8.203) $ g++ --version g++.exe (Rev3, Built by MSYS2 project) 8.2.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. libgmp-10.dll is present in C:\msys64\mingw64\bin and is much older (added to the system in August 2018, file date is Jan 12 2017) |
| All times are UTC. The time now is 23:19. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.