mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

paulunderwood 2019-12-09 18:04

[QUOTE=storm5510;532459]"Make sure "non-free" in ticked in the aptitude..." I suppose this is something I will see along the line? If so, then this looks very simple. I didn't realize a compiler was incorporated into Linux, regardless of source. Mine is [I]Ubuntu[/I].[/QUOTE]

It'll be in the software center -- possibly called synaptic not aptitude -- under repositories. It is crucial it is checked and you afterwards start the process with [c]sudo apt-get update[/c]

storm5510 2019-12-10 00:51

Aptitude was not installed on my system so I ran the Nvidia update from a terminal windows. There did not seem to be any problems.

I ran "make" and the last line said, "unsupported hardware" and ended there. So, I went back to my original [I]mfaktc[/I] folder, tried the one there, and it runs. The Nvidia update gave it what it needed. I ran a self-test and everything was successful.

For James Heinrich's project, there does not seem to be a "less classes" version for Linux.

paulunderwood 2019-12-10 01:27

[QUOTE=storm5510;532511]Aptitude was not installed on my system so I ran the Nvidia update from a terminal windows. There did not seem to be any problems.

I ran "make" and the last line said, "unsupported hardware" and ended there. So, I went back to my original [I]mfaktc[/I] folder, tried the one there, and it runs. The Nvidia update gave it what it needed. I ran a self-test and everything was successful.

For James Heinrich's project, there does not seem to be a "less classes" version for Linux.[/QUOTE]

You can edit the [c]Makefile[/c] in the "src" directory What worked for me was this:

[CODE]# Compiler settings for .cu files (CPU/GPU)
NVCC = nvcc [COLOR="Red"]-ccbin clang-3.8[/COLOR]
NVCCFLAGS = $(CUDA_INCLUDE) --ptxas-options=-v
[/CODE]

and

[CODE]# generate code for various compute capabilities
[COLOR="red"]#[/COLOR]NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc)
[COLOR="red"]#[/COLOR]NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all!
NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code
[COLOR="red"]#[/COLOR]NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc
[COLOR="red"]#[/COLOR]NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code
[/CODE]

then I ran [c]make[/c] and it works on my... [C]lspci | grep VGA[/C]

[CODE]01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1)
[/CODE]

The self test works!

ixfd64 2019-12-29 19:07

2 Attachment(s)
I noticed that compiling mfaktc for a system with a Tesla V100 and CUDA 10.x will result in an compilation errors because the V100 only supports up to compute capability 7.0 unlike newer Volta cards. Running [c]make build=cuda100[/c] would result in an "Unsupported gpu architecture" error. Therefore, I updated my custom makefile to support CUDA 9.0 builds.
[LIST][*][c]make build=cuda65[/c] for CUDA 6.5 (supports compute capability 1.1 to 5.0)[*][c]make build=cuda80[/c] for CUDA 8.0 (supports compute capability 2.0 to 6.1)[*][c]make build=cuda90[/c] for CUDA 9.0 (supports compute capability 3.0 to 7.0)[*][c]make build=cuda100[/c] for CUDA 10.0 (supports compute capability 3.5 to 7.5)[/LIST]
I've also updated my script for launching multiple mfaktc instances. It is now more compact and uses less variables.

tServo 2019-12-30 16:02

[QUOTE=ixfd64;533734]I noticed that compiling mfaktc for a system with a Tesla V100 and CUDA 10.x will result in an compilation errors because the V100 only supports up to compute capability 7.0 unlike newer Volta cards. .[/QUOTE]

FYI: There were only 3 Volta cards made: Titan V, Tesla V100, and Quadro gv100.
Since they all use the same gpu, they are all cuda 7.0.

kriesel 2020-01-27 19:22

GTX480 to 1023Mib GPUSieveSize
 
[QUOTE=nomead;527423]the attached binaries are for 64-bit CUDA 6.5, compute versions 20 30 35 37 50 and 52.[/QUOTE]THANKS Sam! I finally got around to testing before deploying it today. Early results look good.[CODE]GTX 480, GPU clock 701Mhz, memory clock 924 Mhz

tune mfaktc.ini gpu sieve parameters in this order:
GPUSieveProcessSize
GPUSieveSize
GPUSievePrimes

Single application instance tests except where indicated dual, starting from
GPUSievePrimes=82486
GPUSieveSize=64
GPUSieveProcessSize=16
~330 GhzD/day, tune Mfaktc v0.21 Feb 5 2015 mfaktc-win-64.exe (cuda 6.5)

GPUSieveProcessSize=32 214.5
GPUSieveProcessSize=24 317.94
GPUSieveProcessSize=16 328.10 *
GPUSieveProcessSize=8 318.18

GPUSieveSize=128 329.85 *
GPUSieveSize=64 318.55
GPUSieveSize=96 327.13
GPUSieveSize=32 299.28
GPUSieveSize=16 267.98 (82% gpu load)

GPUSieveProcessSize=16, GPUSieveSize=128
GPUSievePrimes=90000 340.8
GPUSievePrimes=100000 339.80
GPUSievePrimes=80000 340.97
GPUSievePrimes=110000 338.37
GPUSievePrimes=85000 340.77
GPUSievePrimes=82000 341.08 * ~97% gpu load

2 instances 178+177.34 = [B]355.34[/B] (GPU load 99%)

1 instance, GPUSievePrimes=82000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64
GPUSieveSize=128 346.27
GPUSieveSize=256 352.25
GPUSieveSize=512 356.48
GPUSieveSize=1023 [B]357.98[/B] 99-100% gpu load on gpu-z
GPUSieveSize=1024 failed after 1 class, with error message:
ERROR: cudaGetLastError() returned 9: invalid configuration argument
GPUSieveSize=2047 not attempted

2 instances GPUSieveSize=1023, GPUSievePrimes=82000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64
100034929,75,76 first instance (also used in preceding single instance tests) 179.12
100108363,75,76 second instance (also used in preceding double instance tests) 182.77
combined throughput [B]361.89[/B] GhzD/day, consistent 100% gpu load in gpu-z
second instance raised throughput over single instance, 361.89/357.98 = 1.0109

Ratio between GPUSieveSize-optimized versions, 2 instances, "2047"/128 max, 361.89/355.34 = 1.0184
[/CODE]If it also adds 1.5-2% on my other old gpu models, it adds up to a nice little boost.
Perhaps the error relates to the CUDA 1D Linear Texture Size 134217728?
(134217728 bytes = 1024 Mib) The rating for the GTX480 at mersenne.ca is ~368. GhzD/day, but is n practice a function of exponent and bit depth.

kriesel 2020-01-28 15:24

Quadro 5000 large GPUSieveSize tune
 
[QUOTE=kriesel;536057]If it also adds 1.5-2% on my other old gpu models, it adds up to a nice little boost.
Perhaps the error relates to the CUDA 1D Linear Texture Size 134217728?
(134217728 bytes = 1024 Mib) The rating for the GTX480 at mersenne.ca is ~368. GhzD/day, but is n practice a function of exponent and bit depth.[/QUOTE]Yes other old models benefit. No it's not the 1D Linear Texture Size, since other models have different limits but the same value for that.
[CODE]Quadro 5000, GPU clock 513 Mhz, memory clock 750 Mhz

tune mfaktc.ini gpu sieve parameters in this order:
GPUSieveProcessSize
GPUSieveSize
GPUSievePrimes

starting from
GPUSievePrimes=82486
GPUSieveSize=64
GPUSieveProcessSize=16
184.23 GhzD/day, tune Mfaktc v0.21 Feb 5 2015 mfaktc-win-64.exe (cuda 6.5), 96-97% gpu load

Factor=97482377,74,75
GPUSieveProcessSize=32 119.69
GPUSieveProcessSize=24 184.07
GPUSieveProcessSize=16 184.75 *
GPUSieveProcessSize=8 178.41

Factor=100103767,75,76; GPUSieveProcessSize=16, GPUSievePrimes=82486
GPUSieveSize=128 188.19 (98% gpu load) *
GPUSieveSize=96 186.70 (98% gpu load)
GPUSieveSize=64 184.11 (97% gpu load)
GPUSieveSize=32 176.72 (93% gpu load)
GPUSieveSize=16 163.33 (88% gpu load)

GPUSieveProcessSize=16, GPUSieveSize=128
GPUSievePrimes=90000 188.32
GPUSievePrimes=100000 187.72
GPUSievePrimes=80000 188.73
GPUSievePrimes=85000 188.56
GPUSievePrimes=82000 188.38 ~98% gpu load
GPUSievePrimes=75000 188.64
GPUSievePrimes=78000 188.72
GPUSievePrimes=79000 188.80 *

2 instances 96.33 + 95.32 = 191.65 (GPU load 99-100%)

1 instance, GPUSievePrimes=79000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64
GPUSieveSize=128 189.25
GPUSieveSize=256 191.23 (99% gpu load)
GPUSieveSize=512 192.00
GPUSieveSize=1023 192.75 100% gpu load on gpu-z *
GPUSieveSize=1024 error
GPUSieveSize=2047 not tried

2 instances GPUSieveSize=1023, GPUSievePrimes=79000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64
97482377,74,75 first instance (also used in preceding tests) 99.12
100124191,73,74 second instance (also used in preceding tests) 95.01
combined throughput 194.13 GhzD/day, consistent 100% gpu load in gpu-z, 194.13/192.75 = 1.0072 x single instance throughput

Ratio between GPUSieveSize-optimized versions, 2 instances, "2047"/128 max, = 194.13/191.65 = 1.0129

Overall gain, 194.13/184.23 = 1.0537.
[/CODE]

kriesel 2020-01-28 15:27

Quadro 2000 large GPUSieveSize tune
 
[CODE]tune mfaktc.ini gpu sieve parameters in this order:
GPUSieveProcessSize
GPUSieveSize
GPUSievePrimes

Q2000 gpu clock 625.9 Mhz, memory 652 Mhz

(GPUSieveSize=64, GPUSievePrimes=82486)
GPUSieveProcessSize=16 86.794
GPUSieveProcessSize=24 86.77
GPUSieveProcessSize=32 46.3
GPUSieveProcessSize=8 87.19 *

GPUSieveSize=128 87.56 *
GPUSieveSize=96 87.42
GPUSieveSize=32 86.34
GPUSieveSize=16 84.95

GPUSievePrimes=100000 87.75 *
GPUSievePrimes=120000 87.70
GPUSievePrimes=110000 87.72
GPUSievePrimes=95000 87.698
GPUSievePrimes=102000 87.75

above with cuda6.5win32 2015 executable: 88.09 *


GPUSieveProcessSize=8
GPUSieveSize=128
GPUSievePrimes=100000

mfaktc-more-cuda65-64 from nomead October 2019 post:
GPUSieveSize=128 88.44
GPUSieveSize=256 88.67
GPUSieveSize=511 88.79 *
GPUSieveSize=512 fail
higher values not tried


advantage to increased gpusievesize 88.79/88.09 =~ 1.0079
[/CODE]

kriesel 2020-01-28 17:43

[QUOTE=petrw1;526915]I've only tried it with P-1 on the CPU.
I don't have the tools (maybe the mental tools) to recompile mfaktc.

Larger gpuseiveprimes or gpusievprocesssize seem to have negligible impact for me.

Thanks[/QUOTE]Spellcheck. It will probably [B]ignoar misplled keywrds[/B], and use the defaults instead.

kriesel 2020-01-28 17:45

mfaktc retune with 2047-capable mfaktc on GTX1060
 
[CODE]gtx1060 mfaktc tune with nomead's CUDA8 2047-capable executable
on Win7 X64 Pro

GPUSievePrimes=82486
GPUSieveSize=128
GPUSieveProcessSize=32 503 GhzD/day (32=max; increments of 8) *
GPUSieveProcessSize=24 492.7
GPUSieveProcessSize=16 492.7
GPUSieveProcessSize=8 489

GPUSieveProcessSize=32 503; GPUSievePrimes=82486
GPUSieveSize=128 503.5
GPUSieveSize=64 493.1
GPUSieveSize=256 509.46
GPUSieveSize=512 511.71
GPUSieveSize=1024 513.2
GPUSieveSize=2047 513.64 *

GPUSieveProcessSize=32, GPUSieveSize=2047
GPUSievePrimes=82486 513.64
GPUSievePrimes=90000 514.64
GPUSievePrimes=100000 515.2
GPUSievePrimes=110000 515.32
GPUSievePrimes=120000 514.71
GPUSievePrimes=115000 515.01
GPUSievePrimes=106000 515.63 *

2 instances: 258.04+258.03 = 516.07
516.07/515.63-1 = .085% gain from two instances over one
There is an additional gain if shutting one down for some sort of software maintenance;
the other uses the full gpu while one is stopped, so no productive time lost.
[/CODE]

kriesel 2020-01-29 02:00

Quadro 4000 with 2047-Mib GPUSieveSize mfaktc
 
[CODE]Q4000 mfaktc tuning

1) 2015 mfaktc v0.21
GPUSievePrimes=82486 (0-1075000)
GPUSieveSize=64 (4-128)
varying GPUSieveProcessSize (value fequired to be multiples of 8)
GPUSieveProcessSize=16 127.56 GhzD/day
GPUSieveProcessSize=24 127.8 *
GPUSieveProcessSize=32 82.24
GPUSieveProcessSize=8 127.69

GPUSieveProcessSize=24, GPUSievePrimes=82486, vary GPUSieveSize
GPUSieveSize=64 127.8
GPUSieveSize=32 125.89
GPUSieveSize=96 123.72
GPUSieveSize=128 128.9 *


GPUSieveProcessSize=24, GPUSievePrimes varied, GPUSieveSize=128

GPUSievePrimes=82486 128.9 *
GPUSievePrimes=90000 128.87
GPUSievePrimes=70000 128.36
GPUSievePrimes=100000 126.97
GPUSievePrimes=86000 127.86

2) mfaktc-more-cuda65-64.exe from nomead allowing 2047Mib GPUSieveSize:
GPUSievePrimes=82486, GPUSieveProcessSize=24
Factor=95389123,75,76
GPUSieveSize=96 124.16
GPUSieveSize=192 124.83
GPUSieveSize=384 125.17
GPUSieveSize=768 125.33
GPUSieveSize=1024 failed

GPUSievePrimes=82486, GPUSieveProcessSize=16
GPUSieveSize=16 122.82
GPUSieveSize=32 126.09
GPUSieveSize=64 128.12
GPUSieveSize=128 129.16
GPUSieveSize=256 129.71
GPUSieveSize=512 129.99
GPUSieveSize=1008 130.13 (221MiB used) *

advantage due to increased GPUSieveSize 130.13/128.9 = 1.0095

2 instances
Factor=95389123,75,76 64.35
Factor=100110187,75,76 65.78
total 130.13
Two-instance gain = none.
GPU ram occupancy 389 MiB
[/CODE]


All times are UTC. The time now is 22:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.