mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-12-09, 18:04   #3235
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

376110 Posts
Default

Quote:
Originally Posted by storm5510 View Post
"Make sure "non-free" in ticked in the aptitude..." I suppose this is something I will see along the line? If so, then this looks very simple. I didn't realize a compiler was incorporated into Linux, regardless of source. Mine is Ubuntu.
It'll be in the software center -- possibly called synaptic not aptitude -- under repositories. It is crucial it is checked and you afterwards start the process with sudo apt-get update

Last fiddled with by paulunderwood on 2019-12-09 at 18:31
paulunderwood is offline   Reply With Quote
Old 2019-12-10, 00:51   #3236
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

22×3×163 Posts
Default

Aptitude was not installed on my system so I ran the Nvidia update from a terminal windows. There did not seem to be any problems.

I ran "make" and the last line said, "unsupported hardware" and ended there. So, I went back to my original mfaktc folder, tried the one there, and it runs. The Nvidia update gave it what it needed. I ran a self-test and everything was successful.

For James Heinrich's project, there does not seem to be a "less classes" version for Linux.
storm5510 is offline   Reply With Quote
Old 2019-12-10, 01:27   #3237
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3,761 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Aptitude was not installed on my system so I ran the Nvidia update from a terminal windows. There did not seem to be any problems.

I ran "make" and the last line said, "unsupported hardware" and ended there. So, I went back to my original mfaktc folder, tried the one there, and it runs. The Nvidia update gave it what it needed. I ran a self-test and everything was successful.

For James Heinrich's project, there does not seem to be a "less classes" version for Linux.
You can edit the Makefile in the "src" directory What worked for me was this:

Code:
# Compiler settings for .cu files (CPU/GPU)
NVCC = nvcc -ccbin clang-3.8
NVCCFLAGS = $(CUDA_INCLUDE) --ptxas-options=-v
and

Code:
# generate code for various compute capabilities
#NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc)
#NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all!
NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code 
#NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc
#NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code
then I ran make and it works on my... lspci | grep VGA

Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1)
The self test works!

Last fiddled with by paulunderwood on 2019-12-10 at 01:28
paulunderwood is offline   Reply With Quote
Old 2019-12-29, 19:07   #3238
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

5·479 Posts
Default

I noticed that compiling mfaktc for a system with a Tesla V100 and CUDA 10.x will result in an compilation errors because the V100 only supports up to compute capability 7.0 unlike newer Volta cards. Running make build=cuda100 would result in an "Unsupported gpu architecture" error. Therefore, I updated my custom makefile to support CUDA 9.0 builds.
  • make build=cuda65 for CUDA 6.5 (supports compute capability 1.1 to 5.0)
  • make build=cuda80 for CUDA 8.0 (supports compute capability 2.0 to 6.1)
  • make build=cuda90 for CUDA 9.0 (supports compute capability 3.0 to 7.0)
  • make build=cuda100 for CUDA 10.0 (supports compute capability 3.5 to 7.5)

I've also updated my script for launching multiple mfaktc instances. It is now more compact and uses less variables.
Attached Files
File Type: zip Makefile.zip (1.5 KB, 240 views)
File Type: zip start-mfaktc.zip (474 Bytes, 240 views)

Last fiddled with by ixfd64 on 2019-12-29 at 19:10
ixfd64 is offline   Reply With Quote
Old 2019-12-30, 16:02   #3239
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2×7×47 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
I noticed that compiling mfaktc for a system with a Tesla V100 and CUDA 10.x will result in an compilation errors because the V100 only supports up to compute capability 7.0 unlike newer Volta cards. .
FYI: There were only 3 Volta cards made: Titan V, Tesla V100, and Quadro gv100.
Since they all use the same gpu, they are all cuda 7.0.
tServo is offline   Reply With Quote
Old 2020-01-27, 19:22   #3240
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·7·43 Posts
Default GTX480 to 1023Mib GPUSieveSize

Quote:
Originally Posted by nomead View Post
the attached binaries are for 64-bit CUDA 6.5, compute versions 20 30 35 37 50 and 52.
THANKS Sam! I finally got around to testing before deploying it today. Early results look good.
Code:
GTX 480, GPU clock 701Mhz, memory clock 924 Mhz

tune mfaktc.ini gpu sieve parameters in this order:
GPUSieveProcessSize
GPUSieveSize
GPUSievePrimes

Single application instance tests except where indicated dual, starting from
GPUSievePrimes=82486
GPUSieveSize=64
GPUSieveProcessSize=16
~330 GhzD/day, tune Mfaktc v0.21 Feb 5 2015 mfaktc-win-64.exe (cuda 6.5)

GPUSieveProcessSize=32 214.5
GPUSieveProcessSize=24 317.94
GPUSieveProcessSize=16 328.10 *
GPUSieveProcessSize=8 318.18

GPUSieveSize=128 329.85 *
GPUSieveSize=64 318.55
GPUSieveSize=96 327.13
GPUSieveSize=32 299.28
GPUSieveSize=16 267.98 (82% gpu load)

GPUSieveProcessSize=16, GPUSieveSize=128
GPUSievePrimes=90000 340.8
GPUSievePrimes=100000 339.80
GPUSievePrimes=80000 340.97
GPUSievePrimes=110000 338.37
GPUSievePrimes=85000 340.77
GPUSievePrimes=82000 341.08 * ~97% gpu load

2 instances 178+177.34 = 355.34 (GPU load 99%)

1 instance, GPUSievePrimes=82000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64
GPUSieveSize=128 346.27
GPUSieveSize=256 352.25
GPUSieveSize=512 356.48
GPUSieveSize=1023 357.98 99-100% gpu load on gpu-z
GPUSieveSize=1024 failed after 1 class, with error message:
ERROR: cudaGetLastError() returned 9: invalid configuration argument
GPUSieveSize=2047 not attempted

2 instances GPUSieveSize=1023, GPUSievePrimes=82000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64
100034929,75,76 first instance (also used in preceding single instance tests)  179.12
100108363,75,76 second instance  (also used in preceding double instance tests)  182.77
combined throughput  361.89 GhzD/day, consistent 100% gpu load in gpu-z
second instance raised throughput over single instance, 361.89/357.98 = 1.0109

Ratio between GPUSieveSize-optimized versions, 2 instances, "2047"/128 max, 361.89/355.34 = 1.0184
If it also adds 1.5-2% on my other old gpu models, it adds up to a nice little boost.
Perhaps the error relates to the CUDA 1D Linear Texture Size 134217728?
(134217728 bytes = 1024 Mib) The rating for the GTX480 at mersenne.ca is ~368. GhzD/day, but is n practice a function of exponent and bit depth.

Last fiddled with by kriesel on 2020-01-27 at 19:26
kriesel is online now   Reply With Quote
Old 2020-01-28, 15:24   #3241
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·7·43 Posts
Default Quadro 5000 large GPUSieveSize tune

Quote:
Originally Posted by kriesel View Post
If it also adds 1.5-2% on my other old gpu models, it adds up to a nice little boost.
Perhaps the error relates to the CUDA 1D Linear Texture Size 134217728?
(134217728 bytes = 1024 Mib) The rating for the GTX480 at mersenne.ca is ~368. GhzD/day, but is n practice a function of exponent and bit depth.
Yes other old models benefit. No it's not the 1D Linear Texture Size, since other models have different limits but the same value for that.
Code:
Quadro 5000, GPU clock 513 Mhz, memory clock 750 Mhz

tune mfaktc.ini gpu sieve parameters in this order:
GPUSieveProcessSize
GPUSieveSize
GPUSievePrimes

starting from
GPUSievePrimes=82486
GPUSieveSize=64
GPUSieveProcessSize=16
184.23 GhzD/day, tune Mfaktc v0.21 Feb 5 2015 mfaktc-win-64.exe (cuda 6.5), 96-97% gpu load

Factor=97482377,74,75
GPUSieveProcessSize=32 119.69
GPUSieveProcessSize=24 184.07
GPUSieveProcessSize=16 184.75 *
GPUSieveProcessSize=8 178.41

Factor=100103767,75,76; GPUSieveProcessSize=16, GPUSievePrimes=82486
GPUSieveSize=128 188.19 (98% gpu load) *
GPUSieveSize=96 186.70 (98% gpu load)
GPUSieveSize=64 184.11 (97% gpu load)
GPUSieveSize=32 176.72 (93% gpu load)
GPUSieveSize=16 163.33 (88% gpu load)

GPUSieveProcessSize=16, GPUSieveSize=128
GPUSievePrimes=90000 188.32
GPUSievePrimes=100000 187.72
GPUSievePrimes=80000 188.73
GPUSievePrimes=85000 188.56
GPUSievePrimes=82000 188.38 ~98% gpu load
GPUSievePrimes=75000 188.64
GPUSievePrimes=78000 188.72
GPUSievePrimes=79000 188.80 *

2 instances 96.33 + 95.32 = 191.65 (GPU load 99-100%)

1 instance, GPUSievePrimes=79000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64
GPUSieveSize=128 189.25
GPUSieveSize=256 191.23 (99% gpu load)
GPUSieveSize=512 192.00
GPUSieveSize=1023 192.75 100% gpu load on gpu-z *
GPUSieveSize=1024 error
GPUSieveSize=2047 not tried

2 instances GPUSieveSize=1023, GPUSievePrimes=79000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64
97482377,74,75 first instance (also used in preceding tests)  99.12
100124191,73,74 second instance    (also used in preceding tests) 95.01
combined throughput 194.13 GhzD/day, consistent 100% gpu load in gpu-z, 194.13/192.75 = 1.0072 x single instance throughput

Ratio between GPUSieveSize-optimized versions, 2 instances, "2047"/128 max,  = 194.13/191.65 = 1.0129

Overall gain, 194.13/184.23 = 1.0537.
kriesel is online now   Reply With Quote
Old 2020-01-28, 15:27   #3242
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

541810 Posts
Default Quadro 2000 large GPUSieveSize tune

Code:
tune mfaktc.ini gpu sieve parameters in this order:
GPUSieveProcessSize
GPUSieveSize
GPUSievePrimes

Q2000 gpu clock 625.9 Mhz, memory 652 Mhz

(GPUSieveSize=64, GPUSievePrimes=82486)
GPUSieveProcessSize=16 86.794
GPUSieveProcessSize=24 86.77
GPUSieveProcessSize=32 46.3
GPUSieveProcessSize=8 87.19 *

GPUSieveSize=128 87.56 *
GPUSieveSize=96  87.42
GPUSieveSize=32  86.34
GPUSieveSize=16  84.95

GPUSievePrimes=100000 87.75 *
GPUSievePrimes=120000 87.70
GPUSievePrimes=110000 87.72
GPUSievePrimes=95000  87.698
GPUSievePrimes=102000 87.75

above with cuda6.5win32 2015 executable: 88.09 *


GPUSieveProcessSize=8
GPUSieveSize=128
GPUSievePrimes=100000

mfaktc-more-cuda65-64 from nomead October 2019 post:
GPUSieveSize=128 88.44
GPUSieveSize=256 88.67
GPUSieveSize=511 88.79 *
GPUSieveSize=512 fail
higher values not tried


advantage to increased gpusievesize 88.79/88.09 =~ 1.0079
kriesel is online now   Reply With Quote
Old 2020-01-28, 17:43   #3243
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·7·43 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I've only tried it with P-1 on the CPU.
I don't have the tools (maybe the mental tools) to recompile mfaktc.

Larger gpuseiveprimes or gpusievprocesssize seem to have negligible impact for me.

Thanks
Spellcheck. It will probably ignoar misplled keywrds, and use the defaults instead.

Last fiddled with by kriesel on 2020-01-28 at 17:44
kriesel is online now   Reply With Quote
Old 2020-01-28, 17:45   #3244
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·7·43 Posts
Default mfaktc retune with 2047-capable mfaktc on GTX1060

Code:
gtx1060 mfaktc tune with nomead's CUDA8 2047-capable executable
on Win7 X64 Pro

GPUSievePrimes=82486
GPUSieveSize=128
GPUSieveProcessSize=32 503 GhzD/day (32=max; increments of 8) *
GPUSieveProcessSize=24 492.7
GPUSieveProcessSize=16 492.7
GPUSieveProcessSize=8 489

GPUSieveProcessSize=32 503; GPUSievePrimes=82486
GPUSieveSize=128 503.5
GPUSieveSize=64 493.1
GPUSieveSize=256 509.46
GPUSieveSize=512 511.71
GPUSieveSize=1024 513.2
GPUSieveSize=2047 513.64 *

GPUSieveProcessSize=32, GPUSieveSize=2047
GPUSievePrimes=82486 513.64
GPUSievePrimes=90000 514.64
GPUSievePrimes=100000 515.2
GPUSievePrimes=110000 515.32
GPUSievePrimes=120000 514.71
GPUSievePrimes=115000 515.01
GPUSievePrimes=106000 515.63 *

2 instances: 258.04+258.03 = 516.07
516.07/515.63-1 = .085% gain from two instances over one
There is an additional gain if shutting one down for some sort of software maintenance; 
the other uses the full gpu while one is stopped, so no productive time lost.

Last fiddled with by kriesel on 2020-01-28 at 17:46
kriesel is online now   Reply With Quote
Old 2020-01-29, 02:00   #3245
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·7·43 Posts
Default Quadro 4000 with 2047-Mib GPUSieveSize mfaktc

Code:
Q4000 mfaktc tuning

1) 2015 mfaktc v0.21
GPUSievePrimes=82486 (0-1075000)
GPUSieveSize=64 (4-128)
varying GPUSieveProcessSize (value fequired to be multiples of 8)
GPUSieveProcessSize=16 127.56 GhzD/day 
GPUSieveProcessSize=24 127.8 *
GPUSieveProcessSize=32 82.24
GPUSieveProcessSize=8  127.69

GPUSieveProcessSize=24, GPUSievePrimes=82486, vary GPUSieveSize
GPUSieveSize=64 127.8
GPUSieveSize=32 125.89
GPUSieveSize=96 123.72
GPUSieveSize=128 128.9 *


GPUSieveProcessSize=24, GPUSievePrimes varied, GPUSieveSize=128

GPUSievePrimes=82486 128.9 *
GPUSievePrimes=90000 128.87
GPUSievePrimes=70000 128.36
GPUSievePrimes=100000 126.97
GPUSievePrimes=86000 127.86

2) mfaktc-more-cuda65-64.exe from nomead allowing 2047Mib GPUSieveSize:
GPUSievePrimes=82486, GPUSieveProcessSize=24
Factor=95389123,75,76
GPUSieveSize=96 124.16
GPUSieveSize=192 124.83
GPUSieveSize=384 125.17
GPUSieveSize=768 125.33
GPUSieveSize=1024 failed

GPUSievePrimes=82486, GPUSieveProcessSize=16
GPUSieveSize=16 122.82
GPUSieveSize=32 126.09
GPUSieveSize=64 128.12
GPUSieveSize=128 129.16
GPUSieveSize=256 129.71
GPUSieveSize=512 129.99
GPUSieveSize=1008 130.13 (221MiB used) *

advantage due to increased GPUSieveSize 130.13/128.9 = 1.0095

2 instances
Factor=95389123,75,76  64.35
Factor=100110187,75,76 65.78
total 130.13
Two-instance gain = none.
GPU ram occupancy 389 MiB
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 18:47.


Sun Aug 1 18:47:13 UTC 2021 up 9 days, 13:16, 0 users, load averages: 2.04, 1.72, 1.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.