mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-10-07, 15:24   #2707
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·173 Posts
Default

Quote:
Originally Posted by Prime95 View Post
If I understand the 2080 architecture correctly, LL test speed could be improved (perhaps greatly) by going to 128-bit fixed point reals represented as four 32-bit integers. I investigated this somewhat 4 years ago when 32-bit adds had huge throughput advantage but 32-bit multiplies had no advantage compared to DP throughput. IIUC, in the 2080 both 32-bit adds and 32-bit multiplies have a huge throughput advantage compared to DP throughput.

The basic idea is that adding two 128-bit fixed point reals requires four 32-bit adds (with carries) plus some overhead for handling signs. Multiplying two 128-bit fixed point reals requires sixteen 32-bit multiplies, plus some adds, and some overhead for handling signs.

Each FFT butterfly adds and subtracts FFT data values which increases the maximum FFT data value by one bit. Thus, the fixed point reals must be shifted one bit prior to a butterfly (i.e. move the implied decimal point). This adds some additional overhead in implementing a fixed-point real FFT.

My research indicated we could store as many as 51 bits of input data in each 128-bit fixed point real. This (51/128) is much more memory efficient than current DP FFTs which store about 17-bits of data in each 64-bit double.

Is there any flaw in my understanding of the 2080 architecture? Does anyone have time to explore the feasibility of this approach?
Presumably if this pans out it also would be applicable to CUDAPm1. And to PRP if anyone were to code that for CUDA. I remember reading something about the int and real circuits being independent enough, and about hybrid FFTs being possible using both at the same time, a while back, by Preda and others, but can't find it now.
kriesel is offline   Reply With Quote
Old 2019-01-27, 14:34   #2708
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10100111100112 Posts
Default CUDALucas v2.06 verification

I feel we can stop regarding the May 5 2017 version of CUDALucas 2.06 as beta software. It includes bad-residue checks that were not included in 2.05.1, and so is more reliable.

All GIMPS-discovered exponents verified before the release of CUDALucas v2.06 May 5 2017 have been verified again with CUDALucas v2.06 May 5 2017 on an NVIDIA GTX1080.

See the attachment at https://www.mersenneforum.org/showpo...83&postcount=8
kriesel is offline   Reply With Quote
Old 2019-02-22, 18:07   #2709
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31×173 Posts
Default CUDALucas repositories maintenance needed.

Please see https://www.mersenneforum.org/showpo...7&postcount=14
Could one of those authorized, update the readme on sourceforge, add builds for recently released CUDA levels (Windows and linux), update the mersenne.ca mirror, etc?

Perhaps the executables for earlier versions lacking the full complement of known-bad-interim-residues checks should be removed, or prominent warnings about those executables' limitations added. Known-bad-interim-residues are 0x0000000000000000, 0x0000000000000002, 0xfffffffffffffffc.

Last fiddled with by kriesel on 2019-02-22 at 18:18
kriesel is offline   Reply With Quote
Old 2019-02-22, 19:37   #2710
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

123638 Posts
Default d not c

Make that 0xfffffffffffffffd.
kriesel is offline   Reply With Quote
Old 2019-02-28, 16:40   #2711
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2·3·109 Posts
Default

Was the binary for the 2.06 beta using Cude v 10.1 or 9.2 posted anywhere?
I can't seem to find it anywhere.
TIA
tServo is offline   Reply With Quote
Old 2019-02-28, 17:11   #2712
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·173 Posts
Default

Quote:
Originally Posted by tServo View Post
Was the binary for the 2.06 beta using Cude v 10.1 or 9.2 posted anywhere?
I can't seem to find it anywhere.
TIA
As far as I know, CUDA 9.1 is the last posted. Maybe Jerry (flashjh) would be willing to roll some new builds for 9.2, 10.x, and 8.0 into a zip file for Windows.
kriesel is offline   Reply With Quote
Old 2019-02-28, 18:37   #2713
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·173 Posts
Default CUDALucas CUDA 9 and 10 Windows builds needed; linux 10.x

Last CUDALucas builds were for up to CUDA 8 in 2017 for Windows; 9.1 for linux. Someone please build and post for more recent CUDA levels and gpu models.
Attached Files
File Type: pdf CUDA levels.pdf (11.5 KB, 55 views)
kriesel is offline   Reply With Quote
Old 2019-03-01, 02:33   #2714
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

1001111102 Posts
Default

Oh, so CUDA 10.1 is out? Great... I guess?

I wonder if they have any compatibility between minor versions this time, as in, did I just waste time compiling on 10.0 and then the executable won't work with machines on 10.1 ...

So here's a Windows x64 / CUDA 10.0 / CUDALucas 2.06beta (2017-05-05, "r102" from Sourceforge) precompiled binary package. Visual Studio 2012 was used. Compiled for compute capability 5.0, 5.2, 5.3, 6.0, 6.1, 6.2, 7.0, 7.5. I did some short self tests to see if anything is horribly broken, but nothing beyond that. Your mileage may vary.

Also included in the zip file are cudart64_100.dll and nvml.dll. The latter is included in the NVidia drivers but the program seemed to like to have it in the program directory as well. (Fetch it from C:\Program Files\NVIDIA Corporation\NVSMI\nvml.dll to match your driver version, if you feel like it).

Also included are the source files, README etc. and the modified Makefile.win with which it was compiled.
Attached Files
File Type: zip CUDALucas2.06beta-CUDA10.0-Windows-x64.zip (735.4 KB, 55 views)
nomead is offline   Reply With Quote
Old 2019-03-01, 03:58   #2715
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·173 Posts
Default

Quote:
Originally Posted by nomead View Post
Oh, so CUDA 10.1 is out? Great... I guess?

I wonder if they have any compatibility between minor versions this time, as in, did I just waste time compiling on 10.0 and then the executable won't work with machines on 10.1 ...

So here's a Windows x64 / CUDA 10.0 / CUDALucas 2.06beta (2017-05-05, "r102" from Sourceforge) precompiled binary package. Visual Studio 2012 was used. Compiled for compute capability 5.0, 5.2, 5.3, 6.0, 6.1, 6.2, 7.0, 7.5. I did some short self tests to see if anything is horribly broken, but nothing beyond that. Your mileage may vary.

Also included in the zip file are cudart64_100.dll and nvml.dll. The latter is included in the NVidia drivers but the program seemed to like to have it in the program directory as well. (Fetch it from C:\Program Files\NVIDIA Corporation\NVSMI\nvml.dll to match your driver version, if you feel like it).

Also included are the source files, README etc. and the modified Makefile.win with which it was compiled.
If there are bugs to be fixed in the NVIDIA libraries, (and when have there not been) yay for 10.1.

Thanks for the build. People using this are likely to also need cufff64_100.dll. At least by analogy with levels 8 and below.
kriesel is offline   Reply With Quote
Old 2019-03-01, 06:21   #2716
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2·3·53 Posts
Default

Quote:
Originally Posted by kriesel View Post
Thanks for the build. People using this are likely to also need cufff64_100.dll. At least by analogy with levels 8 and below.
Hmm, okay... but it is not (the earlier version) included in the CUDA 8.0 compiled package on Sourceforge. And I found a reason for that. It's 97.3 MB... It's included with the GPU Computing Toolkit, but not the driver package. Even when zipped, it's still 74.4 MB, well over the attachment size limit here.
nomead is offline   Reply With Quote
Old 2019-03-01, 06:51   #2717
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31×173 Posts
Default

Quote:
Originally Posted by nomead View Post
Hmm, okay... but it is not (the earlier version) included in the CUDA 8.0 compiled package on Sourceforge. And I found a reason for that. It's 97.3 MB... It's included with the GPU Computing Toolkit, but not the driver package. Even when zipped, it's still 74.4 MB, well over the attachment size limit here.
An excellent reason to not include it. It's available at the mirror site,

https://download.mersenne.ca/CUDA-DLLs
kriesel is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 06:10.


Sat Jul 17 06:10:58 UTC 2021 up 50 days, 3:58, 1 user, load averages: 1.03, 1.10, 1.28

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.