![]() |
|
|
#34 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
172178 Posts |
-maxAlloc matters mostly for P-1. PRP alone does not need much GPU ram.
PRP proof power would have little or no influence on PRP iteration time. Is a monitor connected to the GPU being benchmarked? Any user activity, screen saver, other GPU app, etc? Is it operating at reduced power or reduced clock rates? May just be slow DP by design in the GPU chip used, or you got a bad unit. (I've seen some old GPUs half-fail, and run GIMPS apps at much reduced speed, 50-60% of expected.) https://www.techpowerup.com/gpu-spec...7900-xtx.c3941 says FP64 3.838 TFLOPS; 960 GB/s memory bandwidth; so it should be comparable in performance to a Radeon VII (~525usec/iter), & faster than a 6900XT (~625 usec/iter) observed on Windows. Maybe use mfakto on it a while and enjoy that FAST SP. Or return it for a refund while you still can. |
|
|
|
|
|
#35 | |
|
"Yuki@karoushi"
Feb 2020
Japan, Chiba pref
2·3·5 Posts |
Quote:
3.838 TFLOPS (1:16) yes. it wrote down. Acutually 1:32 performance. It https://www.coelacanth-dream.com/pos...x11-dpfp-rate/ japanese site but you can reconize. https://www.coelacanth-dream.com/pos...ls/#fn:3(sorce in jpn lang) LLVMへのパッチでは、FP64(DP) :RDNA 3アーキテクチャのFP32(FP)演算性能率は32:1になるとされていたが、スライドの図ではSIMD64(2xSIMD32)のDPFP(1)となっており、64:1という可能性が出てきた3 また超越関数、三角関数、平方根逆数などを演算するThe Transcendental UnitはSIMD8として維持されている。 this taranceration =In LLVM patch FP64 will 32:1 but Actually SIMD64 DPFP1 so64:1 techpower up mistakes these DP numbers. 1.919TFLOPS(FP64) is corrct. |
|
|
|
|
|
|
#36 |
|
"Yuki@karoushi"
Feb 2020
Japan, Chiba pref
2×3×5 Posts |
Anyway I also install AMD GPU monitor,
I attached jpg file. sry for not scrennshot, but took in directly in the monitor. I think it works well. Gpuowl online benchmark is huge faster than 4090. If userbench mark on XTX (not specuration) I want to watch it. GPU monitor also works well. I dont use OC tools. I think I have to install another appication or driver. |
|
|
|
|
|
#37 |
|
Mar 2022
Earth
1768 Posts |
So the Rtx 4090 beats the 7900 xtx in gpu owl?
|
|
|
|
|
|
#38 |
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
Someone is getting inconsistent results trying to measure SP/FP, from post #111 onwards, there's a small chance the issues are driver related (otherwise the 1:32 ratio is real and AMD might as well have kicked a puppy): https://www.phoronix.com/forums/foru...ormance/page12
There's a good chance AdrianBC is one of us, I'm almost certain they've at least read the DP emulation research on the forum. Crazy how easy it is to spot a wild mersennite. |
|
|
|
|
|
#39 | |
|
"Yuki@karoushi"
Feb 2020
Japan, Chiba pref
2×3×5 Posts |
yes. In my environments, RTX4090 is the slight better.
If you need to study about AI or use RT core you buy RTX4090 instead XTX. But almost the same performance on PRP. Wattage is RTX4090 under300W(GPU-z) XTX almost 320W (amdgpu info) 70XX Redeon is the worst. Nothing means to buy. Its the best to buy RTX4090 or 80. Even if it has 12VHPWER connctor on fire problem. I run on 24/7 PL70but no problem. Quote:
Opencl is good at AMD. I really noticed that. But now.Its nothing on AMD GPU. It serves just reasonable price. ------------------------------- I changes the prefernce? or install? software. Cotinue to try on XTX. If improved the performance on PRP on XTX, I sold RTX4090;the best TF machine. If not sell XTX If you advice me to try, plz reply me. I move on my hometown Dec29th-Jan7th, I cant post reply or try to use machine. |
|
|
|
|
|
|
#40 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
7×13×47 Posts |
I would love to see a 7900 XTX benchmark for mfakto if at all possible:
https://www.mersenne.ca/mfaktc.php#benchmark |
|
|
|
|
|
#41 | |
|
Mar 2022
Earth
2·32·7 Posts |
Quote:
I don't get 525 usec/iter on my Radeon Pro VII using windows... Is ~525 overclocked? If so, what program are you using? Code:
PS C:\Users\Jesus\Desktop\Mersenne\GPU VII> .\gpuowl-win -prp 77936867
20230101 01:05:48 GpuOwl VERSION v7.2-112-gd6ad1e0-dirty
20230101 01:05:48 GpuOwl VERSION v7.2-112-gd6ad1e0-dirty
20230101 01:05:48 config: -user Magallan3s -maxAlloc 16G
20230101 01:05:48 config:
20230101 01:05:48 config: -prp 77936867
20230101 01:05:48 device 0, unique id ''
20230101 01:05:48 gfx906-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20230101 01:05:48 gfx906-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,} -cl-std=CL2.0 -cl-finite-math-only "
20230101 01:05:48 gfx906-0 77936867 ASM compilation failed, retrying compilation using NO_ASM
20230101 01:05:51 gfx906-0 77936867 OpenCL compilation in 2.66 s
20230101 01:05:51 gfx906-0 77936867 maxAlloc: 16.0 GB
20230101 01:05:51 gfx906-0 77936867 P1(0) 0 bits
20230101 01:05:51 gfx906-0 77936867 PRP starting from beginning
20230101 01:05:51 gfx906-0 77936867 OK 0 on-load: blockSize 400, 0000000000000003
20230101 01:05:51 gfx906-0 77936867 validating proof residues for power 8
20230101 01:05:51 gfx906-0 77936867 Proof using power 8
20230101 01:05:52 gfx906-0 77936867 OK 800 0.00% 1579c241dc63eca6 623 us/it + check 0.30s + save 0.11s; ETA 13:30
20230101 01:05:58 gfx906-0 77936867 10000 fc4f135f7cf4ad29 622
20230101 01:06:04 gfx906-0 77936867 20000 3cd1bd9d5e09cbc5 623
20230101 01:06:10 gfx906-0 77936867 30000 c4e0ff35e3290d98 623
20230101 01:06:17 gfx906-0 77936867 40000 dffe1b1b0d748128 623
20230101 01:06:23 gfx906-0 77936867 50000 52e286945371ed29 623
20230101 01:06:29 gfx906-0 77936867 60000 0945da4dc08bdd95 623
20230101 01:06:35 gfx906-0 77936867 70000 7131fa4eb77f4bb2 623
20230101 01:06:37 gfx906-0 77936867 Stopping, please wait..
20230101 01:06:38 gfx906-0 77936867 OK 72800 0.09% a6b0d27f69718156 625 us/it + check 0.30s + save 0.11s; ETA 13:31
20230101 01:06:38 gfx906-0 Exiting because "stop requested"
20230101 01:06:38 gfx906-0 Bye
Last fiddled with by Magellan3s on 2023-01-01 at 07:09 |
|
|
|
|
|
|
#42 | |
|
Sep 2002
Database er0rr
111158 Posts |
Quote:
Then there is a control program in bin of the ROCm installation directory under /opt. <path>/bin/rocm-smi --setfan 255 is the maximum. And <path>/bin/rocm-smi --setsclk 4 will overclock it. The maximum is 5. You can also tweak the voltages down. Search the site for radeon VII and voltages. Keep an eye on temperatures and remember that overclocking the GPU uses more electricity. Last fiddled with by paulunderwood on 2023-01-01 at 07:44 |
|
|
|
|
|
|
#43 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7,823 Posts |
Quote:
George has reported being able to run VRAM at 1200 MHz; my Radeon VII GPUs mostly range up to 1170 with Hynix, ~870 with Samsung (yes, considerably below nominal clock for Samsung, and it's still too error prone to run P-1 or LLDC). Typically gpuowl V6.11-38x is several percent faster than v7.x on the same work, hardware and tune, on Radeon VII & Windows. I have no Linux data. If reliable enough, use a large block size which reduces GEC checking overhead and lifts performance slightly. Same everything,including fft length, ~74M runs 580 usec/iter in LLDC or PRP. For comparison, RX6900XT, 705 usec/iter., -10% power, +7% VRAM (allowed min and max respectively). Last fiddled with by kriesel on 2023-01-01 at 09:39 |
|
|
|
|
|
|
#44 |
|
Jul 2009
Germany
70610 Posts |
Contrary to expectations, the FP64/FP32 performance values are significantly worse than those of the previous generation. This could constitute AMD's knowingly misleading of customers.
Despite all this, the card is well suited for first-time PRP tests at the wavefront I have changed the table. Yellow values are estimated. https://docs.google.com/spreadsheets...f=true&sd=true |
|
|
|