mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2022-12-28, 02:06   #34
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

172178 Posts
Default

-maxAlloc matters mostly for P-1. PRP alone does not need much GPU ram.
PRP proof power would have little or no influence on PRP iteration time.
Is a monitor connected to the GPU being benchmarked?
Any user activity, screen saver, other GPU app, etc?
Is it operating at reduced power or reduced clock rates?

May just be slow DP by design in the GPU chip used, or you got a bad unit. (I've seen some old GPUs half-fail, and run GIMPS apps at much reduced speed, 50-60% of expected.)
https://www.techpowerup.com/gpu-spec...7900-xtx.c3941 says
FP64 3.838 TFLOPS; 960 GB/s memory bandwidth; so it should be comparable in performance to a Radeon VII (~525usec/iter), & faster than a 6900XT (~625 usec/iter) observed on Windows. Maybe use mfakto on it a while and enjoy that FAST SP. Or return it for a refund while you still can.
kriesel is offline   Reply With Quote
Old 2022-12-28, 03:11   #35
yuki0831
 
"Yuki@karoushi"
Feb 2020
Japan, Chiba pref

2·3·5 Posts
Default

Quote:
Originally Posted by kriesel View Post
-maxAlloc matters mostly for P-1. PRP alone does not need much GPU ram.
PRP proof power would have little or no influence on PRP iteration time.
Is a monitor connected to the GPU being benchmarked?
Any user activity, screen saver, other GPU app, etc?
Is it operating at reduced power or reduced clock rates?

May just be slow DP by design in the GPU chip used, or you got a bad unit. (I've seen some old GPUs half-fail, and run GIMPS apps at much reduced speed, 50-60% of expected.)
https://www.techpowerup.com/gpu-spec...7900-xtx.c3941 says
FP64 3.838 TFLOPS; 960 GB/s memory bandwidth; so it should be comparable in performance to a Radeon VII (~525usec/iter), & faster than a 6900XT (~625 usec/iter) observed on Windows. Maybe use mfakto on it a while and enjoy that FAST SP. Or return it for a refund while you still can.
FP64 (double) performance
3.838 TFLOPS (1:16)
yes. it wrote down.
Acutually 1:32 performance. It
https://www.coelacanth-dream.com/pos...x11-dpfp-rate/
japanese site but you can reconize.

https://www.coelacanth-dream.com/pos...ls/#fn:3(sorce in jpn lang)
LLVMへのパッチでは、FP64(DP) :RDNA 3アーキテクチャのFP32(FP)演算性能率は32:1になるとされていたが、スライドの図ではSIMD64(2xSIMD32)のDPFP(1)となっており、64:1という可能性が出てきた3 また超越関数、三角関数、平方根逆数などを演算するThe Transcendental UnitはSIMD8として維持されている。
this taranceration
=In LLVM patch FP64 will 32:1 but Actually SIMD64 DPFP1 so64:1

techpower up mistakes these DP numbers.
1.919TFLOPS(FP64) is corrct.
yuki0831 is offline   Reply With Quote
Old 2022-12-28, 03:19   #36
yuki0831
 
"Yuki@karoushi"
Feb 2020
Japan, Chiba pref

2×3×5 Posts
Red face

Anyway I also install AMD GPU monitor,
I attached jpg file.
sry for not scrennshot, but took in directly in the monitor.
I think it works well.

Gpuowl online benchmark is huge faster than 4090.
If userbench mark on XTX (not specuration)
I want to watch it.

GPU monitor also works well. I dont use OC tools.
I think I have to install another appication or driver.
Attached Thumbnails
Click image for larger version

Name:	6165.jpg
Views:	51
Size:	205.4 KB
ID:	27845   Click image for larger version

Name:	6163.jpg
Views:	46
Size:	115.6 KB
ID:	27846   Click image for larger version

Name:	6162.jpg
Views:	46
Size:	127.2 KB
ID:	27847  
yuki0831 is offline   Reply With Quote
Old 2022-12-28, 09:17   #37
Magellan3s
 
Mar 2022
Earth

2·32·7 Posts
Lightbulb

So the Rtx 4090 beats the 7900 xtx in gpu owl?
Magellan3s is offline   Reply With Quote
Old 2022-12-28, 09:30   #38
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×52×19 Posts
Default

Someone is getting inconsistent results trying to measure SP/FP, from post #111 onwards, there's a small chance the issues are driver related (otherwise the 1:32 ratio is real and AMD might as well have kicked a puppy): https://www.phoronix.com/forums/foru...ormance/page12

There's a good chance AdrianBC is one of us, I'm almost certain they've at least read the DP emulation research on the forum. Crazy how easy it is to spot a wild mersennite.
M344587487 is offline   Reply With Quote
Old 2022-12-28, 12:33   #39
yuki0831
 
"Yuki@karoushi"
Feb 2020
Japan, Chiba pref

3010 Posts
Unhappy

Quote:
Originally Posted by Magellan3s View Post
So the Rtx 4090 beats the 7900 xtx in gpu owl?
yes. In my environments, RTX4090 is the slight better.

If you need to study about AI or use RT core
you buy RTX4090 instead XTX.
But almost the same performance on PRP.

Wattage is RTX4090 under300W(GPU-z)
XTX almost 320W (amdgpu info)

70XX Redeon is the worst. Nothing means to buy. Its the best to buy RTX4090 or 80.
Even if it has 12VHPWER connctor on fire problem. I run on 24/7 PL70but no problem.

Quote:
Someone is getting inconsistent results trying to measure SP/FP, from post #111 onwards, there's a small chance the issues are driver related (otherwise the 1:32 ratio is real and AMD might as well have kicked a puppy): https://www.phoronix.com/forums/foru...ormance/page12

There's a good chance AdrianBC is one of us, I'm almost certain they've at least read the DP emulation research on the forum. Crazy how easy it is to spot a wild mersennite.
I really sad for bad performance on Opencl and FP64 in 7900XTX machine.
Opencl is good at AMD. I really noticed that. But now.Its nothing on AMD GPU.
It serves just reasonable price.

-------------------------------
I changes the prefernce? or install? software. Cotinue to try on XTX. If improved the performance on PRP on XTX,
I sold RTX4090;the best TF machine. If not sell XTX

If you advice me to try, plz reply me.

I move on my hometown Dec29th-Jan7th, I cant post reply or try to use machine.
yuki0831 is offline   Reply With Quote
Old 2022-12-28, 15:58   #40
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

7×13×47 Posts
Default

I would love to see a 7900 XTX benchmark for mfakto if at all possible:
https://www.mersenne.ca/mfaktc.php#benchmark
James Heinrich is offline   Reply With Quote
Old 2023-01-01, 07:08   #41
Magellan3s
 
Mar 2022
Earth

2×32×7 Posts
Default

Quote:
Originally Posted by kriesel View Post
FP64 3.838 TFLOPS; 960 GB/s memory bandwidth; so it should be comparable in performance to a Radeon VII (~525usec/iter), & faster than a 6900XT (~625 usec/iter) observed on Windows.


I don't get 525 usec/iter on my Radeon Pro VII using windows... Is ~525 overclocked? If so, what program are you using?


Code:
PS C:\Users\Jesus\Desktop\Mersenne\GPU VII> .\gpuowl-win -prp 77936867
20230101 01:05:48 GpuOwl VERSION v7.2-112-gd6ad1e0-dirty
20230101 01:05:48 GpuOwl VERSION v7.2-112-gd6ad1e0-dirty
20230101 01:05:48 config: -user Magallan3s  -maxAlloc 16G
20230101 01:05:48 config:
20230101 01:05:48 config: -prp 77936867
20230101 01:05:48 device 0, unique id ''
20230101 01:05:48 gfx906-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20230101 01:05:48 gfx906-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20230101 01:05:48 gfx906-0 77936867 ASM compilation failed, retrying compilation using NO_ASM
20230101 01:05:51 gfx906-0 77936867 OpenCL compilation in 2.66 s
20230101 01:05:51 gfx906-0 77936867 maxAlloc: 16.0 GB
20230101 01:05:51 gfx906-0 77936867 P1(0) 0 bits
20230101 01:05:51 gfx906-0 77936867 PRP starting from beginning
20230101 01:05:51 gfx906-0 77936867 OK         0 on-load: blockSize 400, 0000000000000003
20230101 01:05:51 gfx906-0 77936867 validating proof residues for power 8
20230101 01:05:51 gfx906-0 77936867 Proof using power 8
20230101 01:05:52 gfx906-0 77936867 OK       800   0.00% 1579c241dc63eca6  623 us/it + check 0.30s + save 0.11s; ETA 13:30
20230101 01:05:58 gfx906-0 77936867     10000 fc4f135f7cf4ad29  622
20230101 01:06:04 gfx906-0 77936867     20000 3cd1bd9d5e09cbc5  623
20230101 01:06:10 gfx906-0 77936867     30000 c4e0ff35e3290d98  623
20230101 01:06:17 gfx906-0 77936867     40000 dffe1b1b0d748128  623
20230101 01:06:23 gfx906-0 77936867     50000 52e286945371ed29  623
20230101 01:06:29 gfx906-0 77936867     60000 0945da4dc08bdd95  623
20230101 01:06:35 gfx906-0 77936867     70000 7131fa4eb77f4bb2  623
20230101 01:06:37 gfx906-0 77936867 Stopping, please wait..
20230101 01:06:38 gfx906-0 77936867 OK     72800   0.09% a6b0d27f69718156  625 us/it + check 0.30s + save 0.11s; ETA 13:31
20230101 01:06:38 gfx906-0 Exiting because "stop requested"
20230101 01:06:38 gfx906-0 Bye

Last fiddled with by Magellan3s on 2023-01-01 at 07:09
Magellan3s is offline   Reply With Quote
Old 2023-01-01, 07:39   #42
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×937 Posts
Default

Quote:
Originally Posted by Magellan3s View Post
I don't get 525 usec/iter on my Radeon Pro VII using windows... Is ~525 overclocked? If so, what program are you using?
Firstly you need some way of measuring temperature -- you don't want the GPU to go much over 95C. I recommend installing lm-sensors from apt and then run sensors.

Then there is a control program in bin of the ROCm installation directory under /opt. <path>/bin/rocm-smi --setfan 255 is the maximum. And <path>/bin/rocm-smi --setsclk 4 will overclock it. The maximum is 5. You can also tweak the voltages down. Search the site for radeon VII and voltages.

Keep an eye on temperatures and remember that overclocking the GPU uses more electricity.

Last fiddled with by paulunderwood on 2023-01-01 at 07:44
paulunderwood is offline   Reply With Quote
Old 2023-01-01, 08:51   #43
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

Quote:
Originally Posted by Magellan3s View Post
I don't get 525 usec/iter on my Radeon Pro VII using windows... Is ~525 overclocked? If so, what program are you using?
My bad (memory). Retested, on 3 Radeon VII (not pro) GPUs, ~630 (good) - 640 (failing fan) usec/iter, gpuowl-win V6.11-38x, if the GPU VRAM is Hynix not Samsung. Set to -20% power, & VRAM as fast as it will reliably run, which varies from one GPU to the next, with "AMD Radeon Software", launched from the Windows start menu, using its tuning tab. That software can be installed along with the driver during initial installation from the same download package, or during a driver update.
George has reported being able to run VRAM at 1200 MHz; my Radeon VII GPUs mostly range up to 1170 with Hynix, ~870 with Samsung (yes, considerably below nominal clock for Samsung, and it's still too error prone to run P-1 or LLDC).
Typically gpuowl V6.11-38x is several percent faster than v7.x on the same work, hardware and tune, on Radeon VII & Windows. I have no Linux data.
If reliable enough, use a large block size which reduces GEC checking overhead and lifts performance slightly.
Same everything,including fft length, ~74M runs 580 usec/iter in LLDC or PRP.
For comparison, RX6900XT, 705 usec/iter., -10% power, +7% VRAM (allowed min and max respectively).

Last fiddled with by kriesel on 2023-01-01 at 09:39
kriesel is offline   Reply With Quote
Old 2023-01-02, 20:27   #44
moebius
 
moebius's Avatar
 
Jul 2009
Germany

13028 Posts
Default

Contrary to expectations, the FP64/FP32 performance values are significantly worse than those of the previous generation. This could constitute AMD's knowingly misleading of customers.


Despite all this, the card is well suited for first-time PRP tests at the wavefront

I have changed the table. Yellow values are estimated.

https://docs.google.com/spreadsheets...f=true&sd=true
moebius is offline   Reply With Quote
Reply



All times are UTC. The time now is 13:59.


Fri Jul 7 13:59:54 UTC 2023 up 323 days, 11:28, 0 users, load averages: 1.12, 1.17, 1.16

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔