![]() |
[QUOTE=storm5510;532459]"Make sure "non-free" in ticked in the aptitude..." I suppose this is something I will see along the line? If so, then this looks very simple. I didn't realize a compiler was incorporated into Linux, regardless of source. Mine is [I]Ubuntu[/I].[/QUOTE]
It'll be in the software center -- possibly called synaptic not aptitude -- under repositories. It is crucial it is checked and you afterwards start the process with [c]sudo apt-get update[/c] |
Aptitude was not installed on my system so I ran the Nvidia update from a terminal windows. There did not seem to be any problems.
I ran "make" and the last line said, "unsupported hardware" and ended there. So, I went back to my original [I]mfaktc[/I] folder, tried the one there, and it runs. The Nvidia update gave it what it needed. I ran a self-test and everything was successful. For James Heinrich's project, there does not seem to be a "less classes" version for Linux. |
[QUOTE=storm5510;532511]Aptitude was not installed on my system so I ran the Nvidia update from a terminal windows. There did not seem to be any problems.
I ran "make" and the last line said, "unsupported hardware" and ended there. So, I went back to my original [I]mfaktc[/I] folder, tried the one there, and it runs. The Nvidia update gave it what it needed. I ran a self-test and everything was successful. For James Heinrich's project, there does not seem to be a "less classes" version for Linux.[/QUOTE] You can edit the [c]Makefile[/c] in the "src" directory What worked for me was this: [CODE]# Compiler settings for .cu files (CPU/GPU) NVCC = nvcc [COLOR="Red"]-ccbin clang-3.8[/COLOR] NVCCFLAGS = $(CUDA_INCLUDE) --ptxas-options=-v [/CODE] and [CODE]# generate code for various compute capabilities [COLOR="red"]#[/COLOR]NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc) [COLOR="red"]#[/COLOR]NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all! NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code [COLOR="red"]#[/COLOR]NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc [COLOR="red"]#[/COLOR]NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code [/CODE] then I ran [c]make[/c] and it works on my... [C]lspci | grep VGA[/C] [CODE]01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710B] (rev a1) [/CODE] The self test works! |
2 Attachment(s)
I noticed that compiling mfaktc for a system with a Tesla V100 and CUDA 10.x will result in an compilation errors because the V100 only supports up to compute capability 7.0 unlike newer Volta cards. Running [c]make build=cuda100[/c] would result in an "Unsupported gpu architecture" error. Therefore, I updated my custom makefile to support CUDA 9.0 builds.
[LIST][*][c]make build=cuda65[/c] for CUDA 6.5 (supports compute capability 1.1 to 5.0)[*][c]make build=cuda80[/c] for CUDA 8.0 (supports compute capability 2.0 to 6.1)[*][c]make build=cuda90[/c] for CUDA 9.0 (supports compute capability 3.0 to 7.0)[*][c]make build=cuda100[/c] for CUDA 10.0 (supports compute capability 3.5 to 7.5)[/LIST] I've also updated my script for launching multiple mfaktc instances. It is now more compact and uses less variables. |
[QUOTE=ixfd64;533734]I noticed that compiling mfaktc for a system with a Tesla V100 and CUDA 10.x will result in an compilation errors because the V100 only supports up to compute capability 7.0 unlike newer Volta cards. .[/QUOTE]
FYI: There were only 3 Volta cards made: Titan V, Tesla V100, and Quadro gv100. Since they all use the same gpu, they are all cuda 7.0. |
GTX480 to 1023Mib GPUSieveSize
[QUOTE=nomead;527423]the attached binaries are for 64-bit CUDA 6.5, compute versions 20 30 35 37 50 and 52.[/QUOTE]THANKS Sam! I finally got around to testing before deploying it today. Early results look good.[CODE]GTX 480, GPU clock 701Mhz, memory clock 924 Mhz
tune mfaktc.ini gpu sieve parameters in this order: GPUSieveProcessSize GPUSieveSize GPUSievePrimes Single application instance tests except where indicated dual, starting from GPUSievePrimes=82486 GPUSieveSize=64 GPUSieveProcessSize=16 ~330 GhzD/day, tune Mfaktc v0.21 Feb 5 2015 mfaktc-win-64.exe (cuda 6.5) GPUSieveProcessSize=32 214.5 GPUSieveProcessSize=24 317.94 GPUSieveProcessSize=16 328.10 * GPUSieveProcessSize=8 318.18 GPUSieveSize=128 329.85 * GPUSieveSize=64 318.55 GPUSieveSize=96 327.13 GPUSieveSize=32 299.28 GPUSieveSize=16 267.98 (82% gpu load) GPUSieveProcessSize=16, GPUSieveSize=128 GPUSievePrimes=90000 340.8 GPUSievePrimes=100000 339.80 GPUSievePrimes=80000 340.97 GPUSievePrimes=110000 338.37 GPUSievePrimes=85000 340.77 GPUSievePrimes=82000 341.08 * ~97% gpu load 2 instances 178+177.34 = [B]355.34[/B] (GPU load 99%) 1 instance, GPUSievePrimes=82000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64 GPUSieveSize=128 346.27 GPUSieveSize=256 352.25 GPUSieveSize=512 356.48 GPUSieveSize=1023 [B]357.98[/B] 99-100% gpu load on gpu-z GPUSieveSize=1024 failed after 1 class, with error message: ERROR: cudaGetLastError() returned 9: invalid configuration argument GPUSieveSize=2047 not attempted 2 instances GPUSieveSize=1023, GPUSievePrimes=82000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64 100034929,75,76 first instance (also used in preceding single instance tests) 179.12 100108363,75,76 second instance (also used in preceding double instance tests) 182.77 combined throughput [B]361.89[/B] GhzD/day, consistent 100% gpu load in gpu-z second instance raised throughput over single instance, 361.89/357.98 = 1.0109 Ratio between GPUSieveSize-optimized versions, 2 instances, "2047"/128 max, 361.89/355.34 = 1.0184 [/CODE]If it also adds 1.5-2% on my other old gpu models, it adds up to a nice little boost. Perhaps the error relates to the CUDA 1D Linear Texture Size 134217728? (134217728 bytes = 1024 Mib) The rating for the GTX480 at mersenne.ca is ~368. GhzD/day, but is n practice a function of exponent and bit depth. |
Quadro 5000 large GPUSieveSize tune
[QUOTE=kriesel;536057]If it also adds 1.5-2% on my other old gpu models, it adds up to a nice little boost.
Perhaps the error relates to the CUDA 1D Linear Texture Size 134217728? (134217728 bytes = 1024 Mib) The rating for the GTX480 at mersenne.ca is ~368. GhzD/day, but is n practice a function of exponent and bit depth.[/QUOTE]Yes other old models benefit. No it's not the 1D Linear Texture Size, since other models have different limits but the same value for that. [CODE]Quadro 5000, GPU clock 513 Mhz, memory clock 750 Mhz tune mfaktc.ini gpu sieve parameters in this order: GPUSieveProcessSize GPUSieveSize GPUSievePrimes starting from GPUSievePrimes=82486 GPUSieveSize=64 GPUSieveProcessSize=16 184.23 GhzD/day, tune Mfaktc v0.21 Feb 5 2015 mfaktc-win-64.exe (cuda 6.5), 96-97% gpu load Factor=97482377,74,75 GPUSieveProcessSize=32 119.69 GPUSieveProcessSize=24 184.07 GPUSieveProcessSize=16 184.75 * GPUSieveProcessSize=8 178.41 Factor=100103767,75,76; GPUSieveProcessSize=16, GPUSievePrimes=82486 GPUSieveSize=128 188.19 (98% gpu load) * GPUSieveSize=96 186.70 (98% gpu load) GPUSieveSize=64 184.11 (97% gpu load) GPUSieveSize=32 176.72 (93% gpu load) GPUSieveSize=16 163.33 (88% gpu load) GPUSieveProcessSize=16, GPUSieveSize=128 GPUSievePrimes=90000 188.32 GPUSievePrimes=100000 187.72 GPUSievePrimes=80000 188.73 GPUSievePrimes=85000 188.56 GPUSievePrimes=82000 188.38 ~98% gpu load GPUSievePrimes=75000 188.64 GPUSievePrimes=78000 188.72 GPUSievePrimes=79000 188.80 * 2 instances 96.33 + 95.32 = 191.65 (GPU load 99-100%) 1 instance, GPUSievePrimes=79000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64 GPUSieveSize=128 189.25 GPUSieveSize=256 191.23 (99% gpu load) GPUSieveSize=512 192.00 GPUSieveSize=1023 192.75 100% gpu load on gpu-z * GPUSieveSize=1024 error GPUSieveSize=2047 not tried 2 instances GPUSieveSize=1023, GPUSievePrimes=79000, GPUSieveProcessSize=16, nomead mfaktc-more-cuda65-64 97482377,74,75 first instance (also used in preceding tests) 99.12 100124191,73,74 second instance (also used in preceding tests) 95.01 combined throughput 194.13 GhzD/day, consistent 100% gpu load in gpu-z, 194.13/192.75 = 1.0072 x single instance throughput Ratio between GPUSieveSize-optimized versions, 2 instances, "2047"/128 max, = 194.13/191.65 = 1.0129 Overall gain, 194.13/184.23 = 1.0537. [/CODE] |
Quadro 2000 large GPUSieveSize tune
[CODE]tune mfaktc.ini gpu sieve parameters in this order:
GPUSieveProcessSize GPUSieveSize GPUSievePrimes Q2000 gpu clock 625.9 Mhz, memory 652 Mhz (GPUSieveSize=64, GPUSievePrimes=82486) GPUSieveProcessSize=16 86.794 GPUSieveProcessSize=24 86.77 GPUSieveProcessSize=32 46.3 GPUSieveProcessSize=8 87.19 * GPUSieveSize=128 87.56 * GPUSieveSize=96 87.42 GPUSieveSize=32 86.34 GPUSieveSize=16 84.95 GPUSievePrimes=100000 87.75 * GPUSievePrimes=120000 87.70 GPUSievePrimes=110000 87.72 GPUSievePrimes=95000 87.698 GPUSievePrimes=102000 87.75 above with cuda6.5win32 2015 executable: 88.09 * GPUSieveProcessSize=8 GPUSieveSize=128 GPUSievePrimes=100000 mfaktc-more-cuda65-64 from nomead October 2019 post: GPUSieveSize=128 88.44 GPUSieveSize=256 88.67 GPUSieveSize=511 88.79 * GPUSieveSize=512 fail higher values not tried advantage to increased gpusievesize 88.79/88.09 =~ 1.0079 [/CODE] |
[QUOTE=petrw1;526915]I've only tried it with P-1 on the CPU.
I don't have the tools (maybe the mental tools) to recompile mfaktc. Larger gpuseiveprimes or gpusievprocesssize seem to have negligible impact for me. Thanks[/QUOTE]Spellcheck. It will probably [B]ignoar misplled keywrds[/B], and use the defaults instead. |
mfaktc retune with 2047-capable mfaktc on GTX1060
[CODE]gtx1060 mfaktc tune with nomead's CUDA8 2047-capable executable
on Win7 X64 Pro GPUSievePrimes=82486 GPUSieveSize=128 GPUSieveProcessSize=32 503 GhzD/day (32=max; increments of 8) * GPUSieveProcessSize=24 492.7 GPUSieveProcessSize=16 492.7 GPUSieveProcessSize=8 489 GPUSieveProcessSize=32 503; GPUSievePrimes=82486 GPUSieveSize=128 503.5 GPUSieveSize=64 493.1 GPUSieveSize=256 509.46 GPUSieveSize=512 511.71 GPUSieveSize=1024 513.2 GPUSieveSize=2047 513.64 * GPUSieveProcessSize=32, GPUSieveSize=2047 GPUSievePrimes=82486 513.64 GPUSievePrimes=90000 514.64 GPUSievePrimes=100000 515.2 GPUSievePrimes=110000 515.32 GPUSievePrimes=120000 514.71 GPUSievePrimes=115000 515.01 GPUSievePrimes=106000 515.63 * 2 instances: 258.04+258.03 = 516.07 516.07/515.63-1 = .085% gain from two instances over one There is an additional gain if shutting one down for some sort of software maintenance; the other uses the full gpu while one is stopped, so no productive time lost. [/CODE] |
Quadro 4000 with 2047-Mib GPUSieveSize mfaktc
[CODE]Q4000 mfaktc tuning
1) 2015 mfaktc v0.21 GPUSievePrimes=82486 (0-1075000) GPUSieveSize=64 (4-128) varying GPUSieveProcessSize (value fequired to be multiples of 8) GPUSieveProcessSize=16 127.56 GhzD/day GPUSieveProcessSize=24 127.8 * GPUSieveProcessSize=32 82.24 GPUSieveProcessSize=8 127.69 GPUSieveProcessSize=24, GPUSievePrimes=82486, vary GPUSieveSize GPUSieveSize=64 127.8 GPUSieveSize=32 125.89 GPUSieveSize=96 123.72 GPUSieveSize=128 128.9 * GPUSieveProcessSize=24, GPUSievePrimes varied, GPUSieveSize=128 GPUSievePrimes=82486 128.9 * GPUSievePrimes=90000 128.87 GPUSievePrimes=70000 128.36 GPUSievePrimes=100000 126.97 GPUSievePrimes=86000 127.86 2) mfaktc-more-cuda65-64.exe from nomead allowing 2047Mib GPUSieveSize: GPUSievePrimes=82486, GPUSieveProcessSize=24 Factor=95389123,75,76 GPUSieveSize=96 124.16 GPUSieveSize=192 124.83 GPUSieveSize=384 125.17 GPUSieveSize=768 125.33 GPUSieveSize=1024 failed GPUSievePrimes=82486, GPUSieveProcessSize=16 GPUSieveSize=16 122.82 GPUSieveSize=32 126.09 GPUSieveSize=64 128.12 GPUSieveSize=128 129.16 GPUSieveSize=256 129.71 GPUSieveSize=512 129.99 GPUSieveSize=1008 130.13 (221MiB used) * advantage due to increased GPUSieveSize 130.13/128.9 = 1.0095 2 instances Factor=95389123,75,76 64.35 Factor=100110187,75,76 65.78 total 130.13 Two-instance gain = none. GPU ram occupancy 389 MiB [/CODE] |
| All times are UTC. The time now is 22:42. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.