![]() |
[QUOTE=Prime95;497517]If I understand the 2080 architecture correctly, LL test speed could be improved (perhaps greatly) by going to 128-bit fixed point reals represented as four 32-bit integers. I investigated this somewhat 4 years ago when 32-bit adds had huge throughput advantage but 32-bit multiplies had no advantage compared to DP throughput. IIUC, in the 2080 both 32-bit adds and 32-bit multiplies have a huge throughput advantage compared to DP throughput.
The basic idea is that adding two 128-bit fixed point reals requires four 32-bit adds (with carries) plus some overhead for handling signs. Multiplying two 128-bit fixed point reals requires sixteen 32-bit multiplies, plus some adds, and some overhead for handling signs. Each FFT butterfly adds and subtracts FFT data values which increases the maximum FFT data value by one bit. Thus, the fixed point reals must be shifted one bit prior to a butterfly (i.e. move the implied decimal point). This adds some additional overhead in implementing a fixed-point real FFT. My research indicated we could store as many as 51 bits of input data in each 128-bit fixed point real. This (51/128) is much more memory efficient than current DP FFTs which store about 17-bits of data in each 64-bit double. Is there any flaw in my understanding of the 2080 architecture? Does anyone have time to explore the feasibility of this approach?[/QUOTE] Presumably if this pans out it also would be applicable to CUDAPm1. And to PRP if anyone were to code that for CUDA. I remember reading something about the int and real circuits being independent enough, and about hybrid FFTs being possible using both at the same time, a while back, by Preda and others, but can't find it now. |
CUDALucas v2.06 verification
I feel we can stop regarding the May 5 2017 version of CUDALucas 2.06 as beta software. It includes bad-residue checks that were not included in 2.05.1, and so is more reliable.
All GIMPS-discovered exponents verified before the release of CUDALucas v2.06 May 5 2017 have been verified again with CUDALucas v2.06 May 5 2017 on an NVIDIA GTX1080. See the attachment at [URL]https://www.mersenneforum.org/showpost.php?p=506183&postcount=8[/URL] |
CUDALucas repositories maintenance needed.
Please see [URL]https://www.mersenneforum.org/showpost.php?p=509157&postcount=14[/URL]
Could one of those authorized, update the readme on sourceforge, add builds for recently released CUDA levels (Windows and linux), update the mersenne.ca mirror, etc? Perhaps the executables for earlier versions lacking the full complement of known-bad-interim-residues checks should be removed, or prominent warnings about those executables' limitations added. Known-bad-interim-residues are 0x0000000000000000, 0x0000000000000002, 0xfffffffffffffffc. |
d not c
Make that 0xfffffffffffffffd.
|
Was the binary for the 2.06 beta using Cude v 10.1 or 9.2 posted anywhere?
I can't seem to find it anywhere. TIA |
[QUOTE=tServo;509684]Was the binary for the 2.06 beta using Cude v 10.1 or 9.2 posted anywhere?
I can't seem to find it anywhere. TIA[/QUOTE]As far as I know, CUDA 9.1 is the last posted. Maybe Jerry (flashjh) would be willing to roll some new builds for 9.2, 10.x, and 8.0 into a zip file for Windows. |
CUDALucas CUDA 9 and 10 Windows builds needed; linux 10.x
1 Attachment(s)
Last CUDALucas builds were for up to CUDA 8 in 2017 for Windows; 9.1 for linux. Someone please build and post for more recent CUDA levels and gpu models.
|
1 Attachment(s)
Oh, so CUDA 10.1 is out? Great... I guess?
I wonder if they have any compatibility between minor versions this time, as in, did I just waste time compiling on 10.0 and then the executable won't work with machines on 10.1 ... So here's a Windows x64 / CUDA 10.0 / CUDALucas 2.06beta (2017-05-05, "r102" from Sourceforge) precompiled binary package. Visual Studio 2012 was used. Compiled for compute capability 5.0, 5.2, 5.3, 6.0, 6.1, 6.2, 7.0, 7.5. I did some short self tests to see if anything is horribly broken, but nothing beyond that. Your mileage may vary. Also included in the zip file are cudart64_100.dll and nvml.dll. The latter is included in the NVidia drivers but the program seemed to like to have it in the program directory as well. (Fetch it from C:\Program Files\NVIDIA Corporation\NVSMI\nvml.dll to match your driver version, if you feel like it). Also included are the source files, README etc. and the modified Makefile.win with which it was compiled. |
[QUOTE=nomead;509739]Oh, so CUDA 10.1 is out? Great... I guess?
I wonder if they have any compatibility between minor versions this time, as in, did I just waste time compiling on 10.0 and then the executable won't work with machines on 10.1 ... So here's a Windows x64 / CUDA 10.0 / CUDALucas 2.06beta (2017-05-05, "r102" from Sourceforge) precompiled binary package. Visual Studio 2012 was used. Compiled for compute capability 5.0, 5.2, 5.3, 6.0, 6.1, 6.2, 7.0, 7.5. I did some short self tests to see if anything is horribly broken, but nothing beyond that. Your mileage may vary. Also included in the zip file are cudart64_100.dll and nvml.dll. The latter is included in the NVidia drivers but the program seemed to like to have it in the program directory as well. (Fetch it from C:\Program Files\NVIDIA Corporation\NVSMI\nvml.dll to match your driver version, if you feel like it). Also included are the source files, README etc. and the modified Makefile.win with which it was compiled.[/QUOTE] If there are bugs to be fixed in the NVIDIA libraries, (and when have there not been) yay for 10.1. Thanks for the build. People using this are likely to also need cufff64_100.dll. At least by analogy with levels 8 and below. |
[QUOTE=kriesel;509742] Thanks for the build. People using this are likely to also need cufff64_100.dll. At least by analogy with levels 8 and below.[/QUOTE]
Hmm, okay... but it is not (the earlier version) included in the CUDA 8.0 compiled package on Sourceforge. And I found a reason for that. It's 97.3 MB... It's included with the GPU Computing Toolkit, but not the driver package. Even when zipped, it's still 74.4 MB, well over the attachment size limit here. |
[QUOTE=nomead;509753]Hmm, okay... but it is not (the earlier version) included in the CUDA 8.0 compiled package on Sourceforge. And I found a reason for that. It's 97.3 MB... It's included with the GPU Computing Toolkit, but not the driver package. Even when zipped, it's still 74.4 MB, well over the attachment size limit here.[/QUOTE]
An excellent reason to not include it. It's available at the mirror site, [url]https://download.mersenne.ca/CUDA-DLLs[/url] |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.