![]() |
|
|
#1 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11110100100002 Posts |
Measuring wattage by GPU-Z for GPUs (board power), or HWMonitor for CPUs (package power), the best power performance I've seen for 3.25M fft PRP (just sampling some hardware here) are:
For CPUs, i7-1165g7 Laptop CPU, 89,813 uJoule / iter in prime95 v30.x, Windows 10 set for best performance (so might be able to improve on that power efficiency a bit) For GPUs, Radeon VII, Windows 10, with Radeon Software set to minimum power, clocking ~1600MHz GPU and ~1150MHz ram, 92,880 uJoule / iter in Gpuowl v6.11-380, at ~180 watts; iteration time ~516 usec / iter. There is sometimes an anomalous power state which is 570. MHz GPU clock after a gpu-cpu read failure. Ram clock is unaffected. Iteration time goes up considerably to 1284 usec / iter, power drops to 49 watts indicated card power, power per iteration goes down 32% to 62,916 uJoule / iter. If there is other hardware that beats these for power efficiency, what hardware is it, and what is the energy per iteration? (Watts x usec / iter = uJoule / iter) |
|
|
|
|
|
#2 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Lowest efficiency I've seen so far:
GPUs: rx550's 10281 us/iter and 27.6 watts corresponds to a much higher 283,755.6 uJoule/iter gtx1080 12700 us/iter * (62M/230M)^1.1 * 122watts = est 366,348. uJoule/iter Probably a quadro 4000 would be considerably worse. CPU: celeron g1840 25.8ms/iter 24 watts, ~619,200. uJoule/iter (power usage per iteration 6.67 times as high as the Radeon VII normal operation, 6.89 times as high as the i7=1165G7) Probably Core 2 Duo would be much worse. Last fiddled with by kriesel on 2022-08-09 at 23:44 |
|
|
|
|
|
#3 |
|
Einyen
Dec 2003
Denmark
22·863 Posts |
Google Colab:
Code:
Tesla V100-SXM2-16GB-0 Gpuowl v6.11-380-g79ea0cc 3.25M FFT LL (not PRP): 340 µs/iter Tesla A100-SXM4-40GB-0 Gpuowl v6.11-380-g79ea0cc 3.25M FFT LL (not PRP): 205 µs/iter Failed to initialize NVML: Driver/library version mismatch I do not want to reinstall nvidia drivers and CUDA on a colab instance which would require a reboot as far as I can surmise, is there another way to see GPU power usage in Linux ? Using the "Thermal design power" of the cards gives an upper bound: 300 W * 340 µs/iter = 102000 µJ / iter 250 W * 205 µs/iter = 51250 µJ / iter Last fiddled with by ATH on 2022-08-10 at 12:34 |
|
|
|
|
|
#4 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
From DuckDuckGo re gpu power measure in Linux cmd line:
nvtop? (unless it also depends on nvml) #6 at https://www.cyberciti.biz/open-sourc...stic-commands/ https://www.maketecheasier.com/monit...dia-gpu-linux/ probably not gpustat which appears to depend on nvml https://github.com/wookayin/gpustat/...ster/README.md not nvapi, oriented to Windows https://medium.com/devoops-and-unive...s-cd174bf89311 not powertop which is oriented to running on battery https://www.tecmint.com/powertop-mon...battery-usage/ The A100 power efficiency is impressive. Too bad buying it's a 5-digit $US price per unit. (Payback period relative to a number of Radeon VIIs of comparable total throughput would be decades.) Its 40GB ram would be helpful for large exponent P-1. The V100 is probably comparable to Radeon VII for efficiency. Thanks for contributing. On Colab free, gpuowl v6.11-380-g79ea0cc M63715567 LL, 3.5M fft, 3718 us/iter on T4, 70W TDP, scale to 70 * 3718 *3.25/3.5 ~ 241,670 uJoule / iteration bound Last fiddled with by kriesel on 2022-08-10 at 17:08 |
|
|
|
|
|
#5 |
|
Einyen
Dec 2003
Denmark
22·863 Posts |
I found the command:
export LD_LIBRARY_PATH="/usr/lib64-nvidia" fixed nvidia-smi. V100 is using 248W-253W and A100 is using 321W-323W (and TDP is actually 400W). V100: 340 µs/iter * 253 W = 86,020 µJ/iter (41,850,732 iter / KWh) A100: 205 µs/iter * 323 W = 66,215 µJ/iter (54,368,345 iter / KWh) Last fiddled with by ATH on 2022-08-11 at 00:57 |
|
|
|
|
|
#6 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
172208 Posts |
Core 2 duo e8200 (no on-chip power instrumentation) prime95 v30.7b9 Windows Vista
40.7 msec/iteration at 3.36M fft length, tdp 65W, 40700 * 3.328M/3.36M * 65 = 2,620,305. uJoule / iteration; ~30 days for a PRP DC, at $0.13/kwhr, ~$5.87/DC. A radeon VII can do it for $0.21, ~3.5% of the power cost, in several hours. The AMD GPU driver for RX5xx ... Radeon VII etc is apparently incompatible with Core 2 e8200, based on past attempts here. Quadro 2000 drivers can be run on Core 2 e8200, but those old GPUs are also inefficient power use, and won't run gpuowl. So less efficient GPU, plus less efficient CUDALucas code and algorithmic limits, less effective error checking. RX 6900 XT 267W 561us/iter =149,787. uJoule/iter at default settings or tuned downward in power, 582 us / iter at 237w = 137,934. uJoule/iter; 48.5% more power/iter than radeon VII. Much more power efficient than the gtx1080 though or older GPUs. Quadro 2000, CUDALucas, 3456K, 27.52msec/iter, 60W TDP 3.25Mi/3.375Mi * 25720 usec * 60 W = 1,590,044 uJoule/iter Last fiddled with by kriesel on 2022-08-13 at 13:25 |
|
|
|
|
|
#7 |
|
"Viliam Furík"
Jul 2018
Martin, Slovakia
2×401 Posts |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Iteration of (sigma(n)+phi(n))/2 | sean | Factoring | 2 | 2017-09-18 15:39 |
| Iteration times in i5 and i7 | Jud McCranie | Information & Answers | 53 | 2013-08-17 19:09 |
| Per iteration time | Jwb52z | PrimeNet | 6 | 2011-09-09 04:06 |
| What are your per-iteration times? | LiquidNitrogen | Hardware | 22 | 2011-07-12 23:15 |
| Per iteration time | sofII | Software | 8 | 2002-09-07 01:51 |