mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2022-10-14, 21:02   #3543
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

5·223 Posts
Default

I guess you're almost expecting this:

It is exactly the same binary as used nearly two years ago for the RTX 3090.


Code:
mfaktc v0.22-pre8 (64bit built)
[...]
CUDA version info
  binary compiled for CUDA  11.10
  CUDA runtime version      11.10
  CUDA driver version       11.80

CUDA device info
  name                      NVIDIA GeForce RTX 4090
  compute capability        8.9
  max threads per block     1024
  max shared memory per MP  102400 byte
  number of multiprocessors 128
  clock rate (CUDA cores)   2520MHz
  memory clock rate:        10501MHz
  memory bus width:         384 bit
[...]
Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days)
 k_min =  142321062303420
 k_max =  284642124610180
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Oct 14 22:48 |    0   0.1% |  0.352   5m38s |   14741.08    82485    n.a.%
Oct 14 22:48 |    4   0.2% |  0.349   5m34s |   14867.79    82485    n.a.%
Oct 14 22:48 |    9   0.3% |  0.351   5m36s |   14783.07    82485    n.a.%
[...]
Oct 14 22:53 | 4617 100.0% |  0.354   0m00s |   14657.79    82485    n.a.%
no factor for M66362159 from 2^74 to 2^75 [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.10 arch 8.0] B29A657C
tf(): total time spent:  5m 39.575s
Looks like I have to rethink about my default benchmark, time per class is getting low.
Power consumption is solid at 440+ Watts (using default 450 Watts power target) while the GPU clock is around 2655 to 2670 MHz on this specific GPU. Lowering the power target too 300 Watts still yields ~87% of the stock performance for mfaktc!

Fun fact: Not sure if this is intended by Nvidia but it looks like it is possible to turn on ECC for the GPU memory! Might be a hidden gem for GPU Owl!

Oliver
TheJudger is offline   Reply With Quote
Old 2022-10-14, 23:57   #3544
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

7×13×47 Posts
Default

Mfaktx benchmark page update, thanks Oliver for the benchmark data:
https://www.mersenne.ca/mfaktc.php
James Heinrich is offline   Reply With Quote
Old 2022-10-15, 02:44   #3545
xx005fs
 
"Eric"
Jan 2018
USA

223 Posts
Default

Quote:
Fun fact: Not sure if this is intended by Nvidia but it looks like it is possible to turn on ECC for the GPU memory! Might be a hidden gem for GPU Owl!
Very much excited for that! Though I doubt it'll be much faster than the 6950xt.
xx005fs is offline   Reply With Quote
Old 2022-10-15, 03:50   #3546
axn
 
axn's Avatar
 
Jun 2003

23×683 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Very much excited for that! Though I doubt it'll be much faster than the 6950xt.
A 6950xt is 3x faster compared to 3090. 4090 has a little over 2x the DP FLOPS of a 3090, but only about 10% more memory bandwidth. Best case, it will be 2x faster than 3090, so still 1.5x slower than 6950xt. Worst case, it is only 1.2x faster than 3090, so slower than 6950xt by 2x.
axn is offline   Reply With Quote
Old 2022-10-15, 04:00   #3547
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41·251 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Fun fact: Not sure if this is intended by Nvidia but it looks like it is possible to turn on ECC for the GPU memory! Might be a hidden gem for GPU Owl!
With the lousy* FP64 performance? I don't think so...
Code:
FP32 (float) performance : 82.58 TFLOPS
FP64 (double) performance : 1,290 GFLOPS (1:64)
This is a TF/gaming/mining card. Actual GEC (and certs) does quite a good job, so we may not need ECC for a while hihihi. With hardware getting faster and faster, too... lost work in case of errors won't be much.
By comparison, this is the old good Radeon Vii:
Code:
FP32 (float) performance : 13.44 TFLOPS
FP64 (double) performance : 3.360 TFLOPS (1:4)
and this are V100, respective A100:
Code:
FP32 (float) performance : 15.67 TFLOPS
FP64 (double) performance : 7.834 TFLOPS (1:2)
Code:
FP32 (float) performance : 19.49 TFLOPS
FP64 (double) performance : 9.746 TFLOPS (1:2)
----------
* compared with FP32, not with other older cards, which are still a lot slower, hehe
LaurV is offline   Reply With Quote
Old 2022-10-15, 09:33   #3548
tuckerkao
 
"Tucker Kao"
Jan 2020
Head Base M168202123

37016 Posts
Default

Quote:
Originally Posted by axn View Post
A 6950xt is 3x faster compared to 3090. 4090 has a little over 2x the DP FLOPS of a 3090, but only about 10% more memory bandwidth. Best case, it will be 2x faster than 3090, so still 1.5x slower than 6950xt. Worst case, it is only 1.2x faster than 3090, so slower than 6950xt by 2x.
AMD Radeon 7900 XT is going to be launched on Nov 3, 2022, it should be stocked on most of the shelves by Dec, 2022. would like to know its PRP stats on GpuOwl.
tuckerkao is offline   Reply With Quote
Old 2022-11-03, 04:34   #3549
AlvinBunk
 
Dec 2017

910 Posts
Default fatal error C1021: invalid preprocessor command 'include_next'

Quote:
Originally Posted by TheJudger View Post
Hello!


  1. Installed Visual Studio 2017.8 "Community"
  2. Installed CUDA Toolkit 10 for Windows
  3. Installed MinGW as on of many options for GNU Make on Windows. In MinGW folder I've copied bin/mingw32-make.exe to bin/make.exe because I'm lazy. Careful when updating mingw32-make.exe...
  4. Configure Environment for "x64 Native Tools-Command Promt" - add MinGW/bin and CUDA/bin to PATH variable.
Oliver
I'm using a similar MAKE file to the above and am getting the following error when building in VS 2022 Professional:

C:\Mfaktc_Source\src>make -f Makefile.Test
cl /Ox /Oy /GL /W2 /fp:fast /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8"\include /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8"\include\cuda\std\detail\libcxx\include /nologo /c /Tp sieve.c
sieve.c
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\cuda\std\detail\libcxx\include\stdio.h(107): fatal error C1021: invalid preprocessor command 'include_next'
make: *** [sieve.obj] Error 2


I have an RTX 3060 and RTX 2060 with Cuda Toolkit 11.8 installed.
I suspect the makefile needs to be modifed, but I have no clue. I saw this post, and maybe someone can understand it fully:
https://forums.developer.nvidia.com/...da-code/120175

Last fiddled with by Uncwilly on 2022-11-03 at 05:33 Reason: trimmed quote
AlvinBunk is offline   Reply With Quote
Old 2022-11-03, 19:19   #3550
AlvinBunk
 
Dec 2017

32 Posts
Default re: invalid preprocessor command 'include_next'

Quote:
Originally Posted by AlvinBunk View Post
I'm using a similar MAKE file to the above and am getting the following error when building in VS 2022 Professional:
Attached is a copy of the Makefile I used for reference...
Makefile.txt

Last fiddled with by Uncwilly on 2022-11-03 at 20:13 Reason: trimmed excessive quote
AlvinBunk is offline   Reply With Quote
Old 2022-11-03, 19:50   #3551
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1E9016 Posts
Default

RTX 3060 compatible builds exist and are available for downloading. Why build your own?
RTX 3060 is CC 8.6. You'll want to add a flags line for cc 8.6
https://www.nvidia.com/en-us/geforce...x-3060-3060ti/
RTX2080 is CC 7.5. You'll want to uncomment that flags line.
https://en.wikipedia.org/wiki/CUDA

Last fiddled with by kriesel on 2022-11-03 at 19:52
kriesel is online now   Reply With Quote
Old 2022-11-05, 02:11   #3552
AlvinBunk
 
Dec 2017

32 Posts
Default

Quote:
Originally Posted by kriesel View Post
RTX 3060 compatible builds exist and are available for downloading. Why build your own?
RTX 3060 is CC 8.6. You'll want to add a flags line for cc 8.6
https://www.nvidia.com/en-us/geforce...x-3060-3060ti/
RTX2080 is CC 7.5. You'll want to uncomment that flags line.
https://en.wikipedia.org/wiki/CUDA
Okay, previously I tried a number of zip files and never got it working. I downloaded "mfaktc-0.21.win_cuda80-2047.zip" plus the dll, and get this output (invalid device function):

mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 230945bits
SIEVE_SPLIT 250
MORE_CLASSES disabled

Runtime options
SievePrimes 36000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 6
CPUStreams 4
GridSize 3
GPU Sieving enabled
GPUSievePrimes 128000
GPUSieveSize 96Mi bits
GPUSieveProcessSize 24Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode compact
V5UserID AlvinBunk
ComputerID mfaktc
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 8.0
CUDA runtime version 8.0
CUDA driver version 12.0

CUDA device info
name NVIDIA GeForce RTX 3060
compute capability 8.6
max threads per block 1024
max shared memory per MP 102400 byte
number of multiprocessors 28
clock rate (CUDA cores) 1777MHz
memory clock rate: 7501MHz
memory bus width: 192 bit

Automatic parameters
threads per grid 917504
GPUSievePrimes (adjusted) 128566
GPUsieve minimum exponent 1706180

running a simple selftest...
ERROR: cudaGetLastError() returned 8: invalid device function



NOTE: I'm using my RTX 2060 (on the same host) using version "mfaktc-0.21.win.cuda100" with no errors and same "CUDA driver version".

Last fiddled with by AlvinBunk on 2022-11-05 at 02:26
AlvinBunk is offline   Reply With Quote
Old 2022-11-05, 05:41   #3553
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

CUDA SDK 8 is required for GTX10xx. You need higher for RTX20xx, higher still for RTX30xx. (Assuming PTX is not there.) Read the last link I posted. SDK level != CC level.

Last fiddled with by kriesel on 2022-11-05 at 05:43
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 42 2022-12-18 05:59
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 14:43.


Fri Jul 7 14:43:24 UTC 2023 up 323 days, 12:11, 0 users, load averages: 1.44, 1.33, 1.13

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔