![]() |
|
|
#3543 |
|
"Oliver"
Mar 2005
Germany
5·223 Posts |
I guess you're almost expecting this:
It is exactly the same binary as used nearly two years ago for the RTX 3090. Code:
mfaktc v0.22-pre8 (64bit built) [...] CUDA version info binary compiled for CUDA 11.10 CUDA runtime version 11.10 CUDA driver version 11.80 CUDA device info name NVIDIA GeForce RTX 4090 compute capability 8.9 max threads per block 1024 max shared memory per MP 102400 byte number of multiprocessors 128 clock rate (CUDA cores) 2520MHz memory clock rate: 10501MHz memory bus width: 384 bit [...] Starting trial factoring M66362159 from 2^74 to 2^75 (57.65 GHz-days) k_min = 142321062303420 k_max = 284642124610180 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Oct 14 22:48 | 0 0.1% | 0.352 5m38s | 14741.08 82485 n.a.% Oct 14 22:48 | 4 0.2% | 0.349 5m34s | 14867.79 82485 n.a.% Oct 14 22:48 | 9 0.3% | 0.351 5m36s | 14783.07 82485 n.a.% [...] Oct 14 22:53 | 4617 100.0% | 0.354 0m00s | 14657.79 82485 n.a.% no factor for M66362159 from 2^74 to 2^75 [mfaktc 0.22-pre8 barrett76_mul32_gs CUDA 11.10 arch 8.0] B29A657C tf(): total time spent: 5m 39.575s ![]() Power consumption is solid at 440+ Watts (using default 450 Watts power target) while the GPU clock is around 2655 to 2670 MHz on this specific GPU. Lowering the power target too 300 Watts still yields ~87% of the stock performance for mfaktc! Fun fact: Not sure if this is intended by Nvidia but it looks like it is possible to turn on ECC for the GPU memory! Might be a hidden gem for GPU Owl! Oliver |
|
|
|
|
|
#3544 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
7×13×47 Posts |
Mfaktx benchmark page update, thanks Oliver for the benchmark data:
https://www.mersenne.ca/mfaktc.php |
|
|
|
|
|
#3545 | |
|
"Eric"
Jan 2018
USA
223 Posts |
Quote:
|
|
|
|
|
|
|
#3546 |
|
Jun 2003
23×683 Posts |
A 6950xt is 3x faster compared to 3090. 4090 has a little over 2x the DP FLOPS of a 3090, but only about 10% more memory bandwidth. Best case, it will be 2x faster than 3090, so still 1.5x slower than 6950xt. Worst case, it is only 1.2x faster than 3090, so slower than 6950xt by 2x.
|
|
|
|
|
|
#3547 | |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41·251 Posts |
Quote:
FP64 performance? I don't think so... Code:
FP32 (float) performance : 82.58 TFLOPS FP64 (double) performance : 1,290 GFLOPS (1:64) By comparison, this is the old good Radeon Vii: Code:
FP32 (float) performance : 13.44 TFLOPS FP64 (double) performance : 3.360 TFLOPS (1:4) Code:
FP32 (float) performance : 15.67 TFLOPS FP64 (double) performance : 7.834 TFLOPS (1:2) Code:
FP32 (float) performance : 19.49 TFLOPS FP64 (double) performance : 9.746 TFLOPS (1:2) * compared with FP32, not with other older cards, which are still a lot slower, hehe |
|
|
|
|
|
|
#3548 | |
|
"Tucker Kao"
Jan 2020
Head Base M168202123
37016 Posts |
Quote:
|
|
|
|
|
|
|
#3549 | |
|
Dec 2017
910 Posts |
Quote:
C:\Mfaktc_Source\src>make -f Makefile.Test cl /Ox /Oy /GL /W2 /fp:fast /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8"\include /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8"\include\cuda\std\detail\libcxx\include /nologo /c /Tp sieve.c sieve.c C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\cuda\std\detail\libcxx\include\stdio.h(107): fatal error C1021: invalid preprocessor command 'include_next' make: *** [sieve.obj] Error 2 I have an RTX 3060 and RTX 2060 with Cuda Toolkit 11.8 installed. I suspect the makefile needs to be modifed, but I have no clue. I saw this post, and maybe someone can understand it fully: https://forums.developer.nvidia.com/...da-code/120175 Last fiddled with by Uncwilly on 2022-11-03 at 05:33 Reason: trimmed quote |
|
|
|
|
|
|
#3550 | |
|
Dec 2017
32 Posts |
Quote:
Makefile.txt Last fiddled with by Uncwilly on 2022-11-03 at 20:13 Reason: trimmed excessive quote |
|
|
|
|
|
|
#3551 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1E9016 Posts |
RTX 3060 compatible builds exist and are available for downloading. Why build your own?
RTX 3060 is CC 8.6. You'll want to add a flags line for cc 8.6 https://www.nvidia.com/en-us/geforce...x-3060-3060ti/ RTX2080 is CC 7.5. You'll want to uncomment that flags line. https://en.wikipedia.org/wiki/CUDA Last fiddled with by kriesel on 2022-11-03 at 19:52 |
|
|
|
|
|
#3552 | |
|
Dec 2017
32 Posts |
Quote:
mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_SPLIT 250 MORE_CLASSES disabled Runtime options SievePrimes 36000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 6 CPUStreams 4 GridSize 3 GPU Sieving enabled GPUSievePrimes 128000 GPUSieveSize 96Mi bits GPUSieveProcessSize 24Ki bits Checkpoints enabled CheckpointDelay 30s WorkFileAddDelay 600s Stages enabled StopAfterFactor bitlevel PrintMode compact V5UserID AlvinBunk ComputerID mfaktc AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 8.0 CUDA runtime version 8.0 CUDA driver version 12.0 CUDA device info name NVIDIA GeForce RTX 3060 compute capability 8.6 max threads per block 1024 max shared memory per MP 102400 byte number of multiprocessors 28 clock rate (CUDA cores) 1777MHz memory clock rate: 7501MHz memory bus width: 192 bit Automatic parameters threads per grid 917504 GPUSievePrimes (adjusted) 128566 GPUsieve minimum exponent 1706180 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function NOTE: I'm using my RTX 2060 (on the same host) using version "mfaktc-0.21.win.cuda100" with no errors and same "CUDA driver version". Last fiddled with by AlvinBunk on 2022-11-05 at 02:26 |
|
|
|
|
|
|
#3553 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
CUDA SDK 8 is required for GTX10xx. You need higher for RTX20xx, higher still for RTX30xx. (Assuming PTX is not there.) Read the last link I posted. SDK level != CC level.
Last fiddled with by kriesel on 2022-11-05 at 05:43 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1724 | 2023-06-04 23:31 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 42 | 2022-12-18 05:59 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |