20220809, 22:38  #1 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7^{2}×139 Posts 
Microjoules/iteration at 3.25M fft DC
Measuring wattage by GPUZ for GPUs (board power), or HWMonitor for CPUs (package power), the best power performance I've seen for 3.25M fft PRP (just sampling some hardware here) are:
For CPUs, i71165g7 Laptop CPU, 89,813 uJoule / iter in prime95 v30.x, Windows 10 set for best performance (so might be able to improve on that power efficiency a bit) For GPUs, Radeon VII, Windows 10, with Radeon Software set to minimum power, clocking ~1600MHz GPU and ~1150MHz ram, 92,880 uJoule / iter in Gpuowl v6.11380, at ~180 watts; iteration time ~516 usec / iter. There is sometimes an anomalous power state which is 570. MHz GPU clock after a gpucpu read failure. Ram clock is unaffected. Iteration time goes up considerably to 1284 usec / iter, power drops to 49 watts indicated card power, power per iteration goes down 32% to 62,916 uJoule / iter. If there is other hardware that beats these for power efficiency, what hardware is it, and what is the energy per iteration? (Watts x usec / iter = uJoule / iter) 
20220809, 23:44  #2 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
6811_{10} Posts 
Lowest efficiency I've seen so far:
GPUs: rx550's 10281 us/iter and 27.6 watts corresponds to a much higher 283,755.6 uJoule/iter gtx1080 12700 us/iter * (62M/230M)^1.1 * 122watts = est 366,348. uJoule/iter Probably a quadro 4000 would be considerably worse. CPU: celeron g1840 25.8ms/iter 24 watts, ~619,200. uJoule/iter (power usage per iteration 6.67 times as high as the Radeon VII normal operation, 6.89 times as high as the i7=1165G7) Probably Core 2 Duo would be much worse. Last fiddled with by kriesel on 20220809 at 23:44 
20220810, 12:33  #3 
Einyen
Dec 2003
Denmark
2^{3}·3^{2}·47 Posts 
Google Colab:
Code:
Tesla V100SXM216GB0 Gpuowl v6.11380g79ea0cc 3.25M FFT LL (not PRP): 340 µs/iter Tesla A100SXM440GB0 Gpuowl v6.11380g79ea0cc 3.25M FFT LL (not PRP): 205 µs/iter Failed to initialize NVML: Driver/library version mismatch I do not want to reinstall nvidia drivers and CUDA on a colab instance which would require a reboot as far as I can surmise, is there another way to see GPU power usage in Linux ? Using the "Thermal design power" of the cards gives an upper bound: 300 W * 340 µs/iter = 102000 µJ / iter 250 W * 205 µs/iter = 51250 µJ / iter Last fiddled with by ATH on 20220810 at 12:34 
20220810, 16:11  #4 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
15233_{8} Posts 
From DuckDuckGo re gpu power measure in Linux cmd line:
nvtop? (unless it also depends on nvml) #6 at https://www.cyberciti.biz/opensourc...sticcommands/ https://www.maketecheasier.com/monit...diagpulinux/ probably not gpustat which appears to depend on nvml https://github.com/wookayin/gpustat/...ster/README.md not nvapi, oriented to Windows https://medium.com/devoopsandunive...scd174bf89311 not powertop which is oriented to running on battery https://www.tecmint.com/powertopmon...batteryusage/ The A100 power efficiency is impressive. Too bad buying it's a 5digit $US price per unit. (Payback period relative to a number of Radeon VIIs of comparable total throughput would be decades.) Its 40GB ram would be helpful for large exponent P1. The V100 is probably comparable to Radeon VII for efficiency. Thanks for contributing. On Colab free, gpuowl v6.11380g79ea0cc M63715567 LL, 3.5M fft, 3718 us/iter on T4, 70W TDP, scale to 70 * 3718 *3.25/3.5 ~ 241,670 uJoule / iteration bound Last fiddled with by kriesel on 20220810 at 17:08 
20220811, 00:52  #5 
Einyen
Dec 2003
Denmark
2^{3}×3^{2}×47 Posts 
I found the command:
export LD_LIBRARY_PATH="/usr/lib64nvidia" fixed nvidiasmi. V100 is using 248W253W and A100 is using 321W323W (and TDP is actually 400W). V100: 340 µs/iter * 253 W = 86,020 µJ/iter (41,850,732 iter / KWh) A100: 205 µs/iter * 323 W = 66,215 µJ/iter (54,368,345 iter / KWh) Last fiddled with by ATH on 20220811 at 00:57 
20220813, 13:23  #6 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7^{2}·139 Posts 
Core 2 duo e8200 (no onchip power instrumentation) prime95 v30.7b9 Windows Vista
40.7 msec/iteration at 3.36M fft length, tdp 65W, 40700 * 3.328M/3.36M * 65 = 2,620,305. uJoule / iteration; ~30 days for a PRP DC, at $0.13/kwhr, ~$5.87/DC. A radeon VII can do it for $0.21, ~3.5% of the power cost, in several hours. The AMD GPU driver for RX5xx ... Radeon VII etc is apparently incompatible with Core 2 e8200, based on past attempts here. Quadro 2000 drivers can be run on Core 2 e8200, but those old GPUs are also inefficient power use, and won't run gpuowl. So less efficient GPU, plus less efficient CUDALucas code and algorithmic limits, less effective error checking. RX 6900 XT 267W 561us/iter =149,787. uJoule/iter at default settings or tuned downward in power, 582 us / iter at 237w = 137,934. uJoule/iter; 48.5% more power/iter than radeon VII. Much more power efficient than the gtx1080 though or older GPUs. Quadro 2000, CUDALucas, 3456K, 27.52msec/iter, 60W TDP 3.25Mi/3.375Mi * 25720 usec * 60 W = 1,590,044 uJoule/iter Last fiddled with by kriesel on 20220813 at 13:25 
20220814, 02:55  #7 
"Viliam Furík"
Jul 2018
Martin, Slovakia
1413_{8} Posts 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Iteration of (sigma(n)+phi(n))/2  sean  Factoring  2  20170918 15:39 
Iteration times in i5 and i7  Jud McCranie  Information & Answers  53  20130817 19:09 
Per iteration time  Jwb52z  PrimeNet  6  20110909 04:06 
What are your periteration times?  LiquidNitrogen  Hardware  22  20110712 23:15 
Per iteration time  sofII  Software  8  20020907 01:51 