mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2022-08-09, 22:38   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

72×139 Posts
Default Microjoules/iteration at 3.25M fft DC

Measuring wattage by GPU-Z for GPUs (board power), or HWMonitor for CPUs (package power), the best power performance I've seen for 3.25M fft PRP (just sampling some hardware here) are:

For CPUs, i7-1165g7 Laptop CPU, 89,813 uJoule / iter in prime95 v30.x, Windows 10 set for best performance (so might be able to improve on that power efficiency a bit)

For GPUs, Radeon VII, Windows 10, with Radeon Software set to minimum power, clocking ~1600MHz GPU and ~1150MHz ram, 92,880 uJoule / iter in Gpuowl v6.11-380, at ~180 watts; iteration time ~516 usec / iter.
There is sometimes an anomalous power state which is 570. MHz GPU clock after a gpu-cpu read failure. Ram clock is unaffected. Iteration time goes up considerably to 1284 usec / iter, power drops to 49 watts indicated card power, power per iteration goes down 32% to 62,916 uJoule / iter.

If there is other hardware that beats these for power efficiency, what hardware is it, and what is the energy per iteration?

(Watts x usec / iter = uJoule / iter)
kriesel is online now   Reply With Quote
Old 2022-08-09, 23:44   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

681110 Posts
Default

Lowest efficiency I've seen so far:
GPUs:
rx550's 10281 us/iter and 27.6 watts corresponds to a much higher 283,755.6 uJoule/iter
gtx1080 12700 us/iter * (62M/230M)^1.1 * 122watts = est 366,348. uJoule/iter
Probably a quadro 4000 would be considerably worse.

CPU:
celeron g1840 25.8ms/iter 24 watts, ~619,200. uJoule/iter (power usage per iteration 6.67 times as high as the Radeon VII normal operation, 6.89 times as high as the i7=1165G7)
Probably Core 2 Duo would be much worse.

Last fiddled with by kriesel on 2022-08-09 at 23:44
kriesel is online now   Reply With Quote
Old 2022-08-10, 12:33   #3
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

23·32·47 Posts
Default

Google Colab:

Code:
Tesla V100-SXM2-16GB-0	Gpuowl v6.11-380-g79ea0cc	3.25M FFT LL (not PRP):  340 µs/iter
Tesla A100-SXM4-40GB-0	Gpuowl v6.11-380-g79ea0cc	3.25M FFT LL (not PRP):  205 µs/iter
I cannot measure power usage with nvidia-smi:
Failed to initialize NVML: Driver/library version mismatch

I do not want to reinstall nvidia drivers and CUDA on a colab instance which would require a reboot as far as I can surmise, is there another way to see GPU power usage in Linux ?



Using the "Thermal design power" of the cards gives an upper bound:
300 W * 340 µs/iter = 102000 µJ / iter
250 W * 205 µs/iter = 51250 µJ / iter

Last fiddled with by ATH on 2022-08-10 at 12:34
ATH is online now   Reply With Quote
Old 2022-08-10, 16:11   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

152338 Posts
Default

From DuckDuckGo re gpu power measure in Linux cmd line:
nvtop? (unless it also depends on nvml) #6 at https://www.cyberciti.biz/open-sourc...stic-commands/ https://www.maketecheasier.com/monit...dia-gpu-linux/

probably not gpustat which appears to depend on nvml https://github.com/wookayin/gpustat/...ster/README.md

not nvapi, oriented to Windows https://medium.com/devoops-and-unive...s-cd174bf89311

not powertop which is oriented to running on battery https://www.tecmint.com/powertop-mon...battery-usage/

The A100 power efficiency is impressive. Too bad buying it's a 5-digit $US price per unit. (Payback period relative to a number of Radeon VIIs of comparable total throughput would be decades.) Its 40GB ram would be helpful for large exponent P-1.

The V100 is probably comparable to Radeon VII for efficiency.
Thanks for contributing.


On Colab free, gpuowl v6.11-380-g79ea0cc M63715567 LL, 3.5M fft, 3718 us/iter on T4, 70W TDP, scale to 70 * 3718 *3.25/3.5 ~ 241,670 uJoule / iteration bound

Last fiddled with by kriesel on 2022-08-10 at 17:08
kriesel is online now   Reply With Quote
Old 2022-08-11, 00:52   #5
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

23×32×47 Posts
Default

I found the command:
export LD_LIBRARY_PATH="/usr/lib64-nvidia"
fixed nvidia-smi.

V100 is using 248W-253W and A100 is using 321W-323W (and TDP is actually 400W).

V100: 340 µs/iter * 253 W = 86,020 µJ/iter (41,850,732 iter / KWh)
A100: 205 µs/iter * 323 W = 66,215 µJ/iter (54,368,345 iter / KWh)

Last fiddled with by ATH on 2022-08-11 at 00:57
ATH is online now   Reply With Quote
Old 2022-08-13, 13:23   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

72·139 Posts
Default

Core 2 duo e8200 (no on-chip power instrumentation) prime95 v30.7b9 Windows Vista
40.7 msec/iteration at 3.36M fft length, tdp 65W, 40700 * 3.328M/3.36M * 65 = 2,620,305. uJoule / iteration; ~30 days for a PRP DC, at $0.13/kwhr, ~$5.87/DC.
A radeon VII can do it for $0.21, ~3.5% of the power cost, in several hours.

The AMD GPU driver for RX5xx ... Radeon VII etc is apparently incompatible with Core 2 e8200, based on past attempts here. Quadro 2000 drivers can be run on Core 2 e8200, but those old GPUs are also inefficient power use, and won't run gpuowl. So less efficient GPU, plus less efficient CUDALucas code and algorithmic limits, less effective error checking.

RX 6900 XT 267W 561us/iter =149,787. uJoule/iter at default settings
or tuned downward in power, 582 us / iter at 237w = 137,934. uJoule/iter; 48.5% more power/iter than radeon VII.
Much more power efficient than the gtx1080 though or older GPUs.

Quadro 2000, CUDALucas, 3456K, 27.52msec/iter, 60W TDP
3.25Mi/3.375Mi * 25720 usec * 60 W = 1,590,044 uJoule/iter

Last fiddled with by kriesel on 2022-08-13 at 13:25
kriesel is online now   Reply With Quote
Old 2022-08-14, 02:55   #7
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

14138 Posts
Default

Quote:
Originally Posted by kriesel View Post
Quadro 2000, CUDALucas, 3456K, 27.52msec/iter, 60W TDP
3.25Mi/3.375Mi * 25720 usec * 60 W = 1,590,044 uJoule/iter
Switched up the digits, but the number is correct for the 27.52 value, so it's just a cosmetic issue.
Viliam Furik is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Iteration of (sigma(n)+phi(n))/2 sean Factoring 2 2017-09-18 15:39
Iteration times in i5 and i7 Jud McCranie Information & Answers 53 2013-08-17 19:09
Per iteration time Jwb52z PrimeNet 6 2011-09-09 04:06
What are your per-iteration times? LiquidNitrogen Hardware 22 2011-07-12 23:15
Per iteration time sofII Software 8 2002-09-07 01:51

All times are UTC. The time now is 12:06.


Sun Sep 25 12:06:17 UTC 2022 up 38 days, 9:34, 0 users, load averages: 1.00, 1.16, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔