mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2022-06-24, 17:42   #2773
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

153108 Posts
Default

Quote:
Originally Posted by Zhangrc View Post
why does GPUowl do some iterations beyond the exponent?
GEC block size is kept constant through the run on an exponent, and is often highly composite. The exponent is necessarily prime or there is no point to the PRP test. So GECblocksize > mod (exponent, GECblocksize) > 0. The entire set of gpuowl PRP iterations are guarded by GEC by computing additional iterations to complete the last GEC block, up to and past the exponent. (IIRC Preda has explained this before.)
For example, 77232917 / 1000 = 77232.917, so 77233 blocks of 1000 (77233000 iterations) would be used.
The overhead of GEC is small, but such that larger than default blocksize computing a few more total iterations can actually be more efficient, if the reliability is high. (See end of this reference post.)
Code:
-block <value>     : PRP GEC block size, or LL iteration-block size. Must divide 10'000.
10,000 = 104 = 24 54 implying legal block sizes of 2 4 5 8 10 16 20 25 40 50 80 100 125 200 250 400 500 625 1000 1250 2000 2500 5000, and perhaps 1 and 10,000. For 113032800 = 400 * 282582 >113032481, blocksize> 113032800-113032481 -> blocksize > 319, apparently you used block size 400.
Block size is determined at the start, stored in the save file and used unchanged throughout the gpuowl run of the exponent. Saving 0.3% on the whole run by using block size 1000 instead of 400 more than pays for a possible additional ~1000-400= 600 iterations at the end; 113M*0.3% = 339,000.

Mprime/prime95 does it differently, leaving unguarded the last few iterations past the last whole block up to the exponent. It also dynamically varies GEC block size based on error rate observed or iterations left IIRC.

Last fiddled with by kriesel on 2022-06-24 at 18:21
kriesel is offline   Reply With Quote
Old 2022-07-30, 15:24   #2774
SyauqiA
 
Jul 2022
Indonesia

102 Posts
Default is this the expected performance of my GPU?

Hello, I am a beginner user of gpuowl and a real beginner to the technical aspects of Prime95 and GPU computing. I've been using Prime95 for a few months with my CPU, but now I want to utilize my GPU for contributing, too. My laptop's GPU is AMD Radeon R7 M440. I've downloaded a build of version v7.2-93-ga5402c5-dirty from https://www.mersenneforum.org/showpo...4&postcount=30 . I tested it with M77936867 and it's performing at around 26800 us/iter :


Code:
20220730 21:36:28 GpuOwl VERSION v7.2-93-ga5402c5-dirty
20220730 21:36:28 GpuOwl VERSION v7.2-93-ga5402c5-dirty
20220730 21:36:28 config: -user SyauqiMA -cpu recomp-radeonr7-2-w2 -device 1 -log 20000 -save 5
20220730 21:36:28 config: -prp 77936867 -iters 60000 -proof 9 -use NO_ASM -log 20000
20220730 21:36:28 device 1, unique id ''
20220730 21:36:28 recomp-radeonr7-2-w2 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20220730 21:36:28 recomp-radeonr7-2-w2 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,} -DNO_ASM=1  -cl-std=CL2.0 -cl-finite-math-only "
20220730 21:36:31 recomp-radeonr7-2-w2 77936867 OpenCL compilation in 2.55 s
20220730 21:36:31 recomp-radeonr7-2-w2 77936867 maxAlloc: 0.0 GB
20220730 21:36:31 recomp-radeonr7-2-w2 77936867 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h'
20220730 21:36:31 recomp-radeonr7-2-w2 77936867 P1(0) 0 bits
20220730 21:36:31 recomp-radeonr7-2-w2 77936867 PRP starting from beginning
20220730 21:36:43 recomp-radeonr7-2-w2 77936867 OK         0 on-load: blockSize 400, 0000000000000003
20220730 21:36:43 recomp-radeonr7-2-w2 77936867 validating proof residues for power 9
20220730 21:36:43 recomp-radeonr7-2-w2 77936867 Proof using power 9
20220730 21:37:16 recomp-radeonr7-2-w2 77936867 OK       800   0.00% 1579c241dc63eca6 26859 us/it + check 10.83s + save 0.61s; ETA 24d 05:29
20220730 21:41:23 recomp-radeonr7-2-w2 77936867     10000 fc4f135f7cf4ad29 26843
20220730 21:46:03 recomp-radeonr7-2-w2 77936867 OK     20000   0.03% 3cd1bd9d5e09cbc5 26851 us/it + check 10.84s + save 0.63s; ETA 24d 05:09
20220730 21:50:32 recomp-radeonr7-2-w2 77936867     30000 c4e0ff35e3290d98 26844
20220730 21:52:08 recomp-radeonr7-2-w2 77936867 Stopping, please wait..
20220730 21:52:20 recomp-radeonr7-2-w2 77936867 OK     33600   0.04% dbfb036c7ae970f8 26854 us/it + check 10.84s + save 0.60s; ETA 24d 05:06
20220730 21:52:20 recomp-radeonr7-2-w2 Exiting because "stop requested"
20220730 21:52:20 recomp-radeonr7-2-w2 Bye
Is the us/iter really expected from my GPU? Or did I missed something in the setup, or is there something I did wrong? Because i think with GPU, that speed is really slow. Any help will be appreciated!
Oh, and another weird thing is my GPU-Z shows 0% GPU load, but the GPU and memory clock is maxed out when doing the PRP.


Thanks for your time and your help, and sorry if my english is a bit weird!
Attached Thumbnails
Click image for larger version

Name:	Screenshot 2022-07-30 220150.png
Views:	26
Size:	22.4 KB
ID:	27144  
SyauqiA is offline   Reply With Quote
Old 2022-07-31, 11:16   #2775
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

2·5,059 Posts
Default

Quote:
Originally Posted by tdulcet View Post
That is not what I observed recently when benchmarking the Colab GPUs
Related to that cudaLucas benchmarking, the A100 fft files you distribute in the package are mistakenly called "Nvidia A100 blah blah....", which is not known by the cudaLucas guts, so as soon as Gugu has the benevolence to give the user the A100 (I say "the" because I think they have only one, which they distribute to all users in turns, considering how often we get it ), then few hours will be wasted to make the "A100 blah blah" files, unless the user realizes, and rename the two files properly before starting the Colab Notebook. The new made files are 99% similar with the one you distribute (which is normal, timing will differ here and there for different runs, but the remaining FFTs should be the same).

So, please edit the file names in the package and remove the "Nvidia" part.

Last fiddled with by LaurV on 2022-07-31 at 11:20
LaurV is offline   Reply With Quote
Old 2022-07-31, 14:01   #2776
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23·857 Posts
Default

Quote:
Originally Posted by SyauqiA View Post
Hello, I am a beginner user of gpuowl and a real beginner to the technical aspects of Prime95 and GPU computing. I've been using Prime95 for a few months with my CPU, but now I want to utilize my GPU for contributing, too. My laptop's GPU is AMD Radeon R7 M440. I've downloaded a build of version v7.2-93-ga5402c5-dirty from https://www.mersenneforum.org/showpo...4&postcount=30 . I tested it with M77936867 and it's performing at around 26800 us/iter
...

Is the us/iter really expected from my GPU? Or did I missed something in the setup, or is there something I did wrong? Because i think with GPU, that speed is really slow. Any help will be appreciated!
Oh, and another weird thing is my GPU-Z shows 0% GPU load, but the GPU and memory clock is maxed out when doing the PRP.

Thanks for your time and your help, and sorry if my english is a bit weird!
Welcome to the forum, and to GPU GIMPS computing. Your English seems to me quite good in the quoted post, and that is appreciated. You may find some of the reference info I've assembled over the past 4 years useful.
Congrats on getting gpuowl up and running.

Given that the DP/SP ratio for your GPU model is only 1/16, it might be more productively used in TF with mfakto than other computation types. https://www.techpowerup.com/gpu-spec...-r7-m440.c2851 shows it has about 1% of the DP performance of a Vega 20 Radeon VII GPU, and about 1.4% of the memory bandwidth. The datasheet indicates your GPU is an IGP (integrated graphics processor). There are slower GPUs than yours, and considerably faster. IGPs tend to have low power budgets and run slow. They also IIRC can have clock linkages to the CPU, & to system ram clock since they share system ram, & fan speed since they share a fan, being in the same chip package as the CPU.

To confirm gpuowl is running on your GPU, not some other OpenCL device in the system, pause the prime95 application, monitor CPU usage in Task Manager, and monitor any other GPUs in your system also with GPU-Z. If you're running Windows 10 or later, Task Manager may also show loading on the primary GPU. On an otherwise idle system, load increases you see after launching gpuowl and letting it load files and get going will show what system resources it's using.

Also run gpuowl-win -h >help.txt and review the part of its output that lists OpenCl devices found on your system, checking the device number for the R7 M440 is present and matches the one you specify to run gpuowl on the command line or in its config.txt. (That device listing by number is a feature I asked Mihai for, for scenarios like this.) Make sure you're running Gpuowl on the OpenCL device intended, not a different GPU or IGP or CPU cores.

There are benchmark tables for many GPU models online for primality testing (DP dependent) and TF (SP or INT dependent). I don't see your GPU model listed there. Please contribute benchmark data as directed at those pages, after verifying device mapping and use as above.

Re odd or missing sensor values, I've seen that combined use of Windows 7, AMD drivers, Windows remote desktop use, and GPU-Z result in some sensor readings being zeroed or blanked. I started reporting that issue to TechPowerUp beginning at v2.7.0, and subsequently also to AMD, but there's been no resolution despite numerous GPU-Z and driver updates. Perhaps you are experiencing something similar. I've attached a GPU-Z capture illustrating that with a discrete RX550 GPU.

Good luck and have fun!
Attached Thumbnails
Click image for larger version

Name:	amd rx550 gpuowl 611-364 M658M PRP Win7.png
Views:	22
Size:	13.7 KB
ID:	27152  

Last fiddled with by kriesel on 2022-07-31 at 14:03
kriesel is offline   Reply With Quote
Old 2022-07-31, 16:19   #2777
SyauqiA
 
Jul 2022
Indonesia

2 Posts
Default

Quote:
Originally Posted by kriesel View Post
Welcome to the forum, and to GPU GIMPS computing. Your English seems to me quite good in the quoted post, and that is appreciated. You may find some of the reference info I've assembled over the past 4 years useful.
Congrats on getting gpuowl up and running.

Given that the DP/SP ratio for your GPU model is only 1/16, it might be more productively used in TF with mfakto than other computation types. https://www.techpowerup.com/gpu-spec...-r7-m440.c2851 shows it has about 1% of the DP performance of a Vega 20 Radeon VII GPU, and about 1.4% of the memory bandwidth. The datasheet indicates your GPU is an IGP (integrated graphics processor). There are slower GPUs than yours, and considerably faster. IGPs tend to have low power budgets and run slow. They also IIRC can have clock linkages to the CPU, & to system ram clock since they share system ram, & fan speed since they share a fan, being in the same chip package as the CPU.

To confirm gpuowl is running on your GPU, not some other OpenCL device in the system, pause the prime95 application, monitor CPU usage in Task Manager, and monitor any other GPUs in your system also with GPU-Z. If you're running Windows 10 or later, Task Manager may also show loading on the primary GPU. On an otherwise idle system, load increases you see after launching gpuowl and letting it load files and get going will show what system resources it's using.

Also run gpuowl-win -h >help.txt and review the part of its output that lists OpenCl devices found on your system, checking the device number for the R7 M440 is present and matches the one you specify to run gpuowl on the command line or in its config.txt. (That device listing by number is a feature I asked Mihai for, for scenarios like this.) Make sure you're running Gpuowl on the OpenCL device intended, not a different GPU or IGP or CPU cores.

There are benchmark tables for many GPU models online for primality testing (DP dependent) and TF (SP or INT dependent). I don't see your GPU model listed there. Please contribute benchmark data as directed at those pages, after verifying device mapping and use as above.

Re odd or missing sensor values, I've seen that combined use of Windows 7, AMD drivers, Windows remote desktop use, and GPU-Z result in some sensor readings being zeroed or blanked. I started reporting that issue to TechPowerUp beginning at v2.7.0, and subsequently also to AMD, but there's been no resolution despite numerous GPU-Z and driver updates. Perhaps you are experiencing something similar. I've attached a GPU-Z capture illustrating that with a discrete RX550 GPU.

Good luck and have fun!

Thank you for your response!
I will definitely check out your reference info thread.


In my first post, I have already confirmed my gpuowl is running with the right GPU.
This is my device list from the help file :
Code:
-device <N>        : select a specific device:
 0  : Intel(R) UHD Graphics 620- not-AMD
 1  : Iceland-AMD Radeon R7 M440 AMD
and this is my config.txt file :
Code:
-user SyauqiMA -cpu recomp-radeonr7-2-w2 -device 1 -log 20000 -save 5
Just to clarify, I am using Windows 10.

But I forgot to send another important information in my last post (i'm sorry!). So in addition to 0 GPU load in GPU-Z, the CPU and GPU usage for gpuowl in Task Manager is also mostly 0 (I've added screenshots for those), despite my fan is spinning hard. My overall CPU and GPU usage while running gpuowl (and no other demanding software running) is still idle level, just with Firefox running, but I have no problem of GPU load while opening a game. Is this perhaps an issue of my system's settings?

Thank you for your time, and in the meantime maybe I will try out mfakto
Attached Thumbnails
Click image for larger version

Name:	Screenshot 2022-07-31 215116.png
Views:	25
Size:	25.1 KB
ID:	27154   Click image for larger version

Name:	Screenshot 2022-07-31 230830.png
Views:	23
Size:	20.9 KB
ID:	27155  
SyauqiA is offline   Reply With Quote
Old 2022-07-31, 18:18   #2778
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23×857 Posts
Default

I conclude based on the following attachment from a Windows 10 system with an IGP as primary graphics, and 3 discrete GPUs each running full out per GPU-Z, that Win10 Task Manager does not show OpenCL or CUDA GIMPS GPU loads, only or primarily Windows display activity which in this case goes on the IGP primary GPU (even for remote desktop usage). Supporting GPUs in Task Manager is a recent addition and I think it is incomplete. You'll note there is no "OpenCL" or "CUDA" field in the GPU Task Manager display. Gpuowl uses OpenCL heavily, and mfaktc uses CUDA. (But Windows 7 Task Manager provides no info about GPU usage at all.)

On my example system, only the one GPU with Windows display duties normally shows loading in Task Manager:
0 IGP HD4600 Windows GUI display duties only, displays light loading in Task Manager continuously
1 Radeon VII gpuowl with normal timings & normal GPU-Z display of load, but usually 0 load shown in Task Manager (occasionally shows a little "Copy" activity)
2 RX550 gpuowl with normal timings & normal GPU-Z display of load, but 0 load shown in Task Manager
3 GTX1650 Super mfaktc with normal timings & normal GPU-Z display of load, but 0 load shown in Task Manager

Some proxies for discrete GPU compute loading are temperature rise, or fan speed % or RPM, or power draw, which may display in GPU-Z. But only if the GPU hardware and driver support the sensors, and GPU-Z supports that functionality for the GPU model, and maybe other requirements too.

Try some other utilities. For CUDA supporting GPUs, nvidia-smi is a useful command line utility that can display loading & power use. I don't know of an equivalent command line utility for AMD on Windows, although some driver installs include a graphical software utility that can be used to adjust memory clock, max GPU clock, and display some sensors too.
Look in the Windows Start menu for something like "AMD Software Adrenalin Edition". I think we may only get that utility by downloading and installing the AMD driver package from the AMD website, while Windows Device Manager updates may get only the driver.
Also CPUID HWMonitor is supporting some GPU sensors on some GPU models, and a lot more.

Some earlier GPUs or systems did not offer variable fan speed control; just on/off, or always-on.

Gpuowl CPU use is common during startup, orderly exit, computing GCDs for P-1, generating proof files, validating proof residues on resumption, etc. During PRP iterations, except for certain housekeeping (console & log output, saving interim progress, doing the GEC) gpuowl CPU use should be zero on AMD GPUs. (In some Windows environments gpuowl on NVIDIA will occupy one CPU core continuously. -yield addressed that on Windows 7 but apparently does not on Windows 10. Not an issue for your AMD GPU.)

FYI, it's preferred to quote only the portions of a previous post to which we are responding with comments or questions, not the whole thing. It matters more when quoting longer posts like some of mine or LaurV's etc. Saves forum database space and readers' time.
Attached Thumbnails
Click image for larger version

Name:	task manager gpu support seems incomplete.png
Views:	29
Size:	96.3 KB
ID:	27156  

Last fiddled with by kriesel on 2022-07-31 at 18:37
kriesel is offline   Reply With Quote
Old 2022-08-04, 00:48   #2779
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23×857 Posts
Default

Windows 11 Task Manager sees mfaktc activity on a GPU, but not gpuowl on the same GPU at the same time.
The attached image shows mfaktc running on a GTX 1050Ti, 3d at 100%. It drops to 90% as does overall GPU usage indicated after gpuowl v6.11-380 is launched also on the same GPU and OpenCl compile is completed so PRP iterations start.
Attached Thumbnails
Click image for larger version

Name:	win11-mfaktc-taskmgr.png
Views:	29
Size:	143.1 KB
ID:	27162  
kriesel is offline   Reply With Quote
Old 2022-08-09, 12:26   #2780
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

4716 Posts
Default

Quote:
Originally Posted by LaurV View Post
Related to that cudaLucas benchmarking, the A100 fft files you distribute in the package are mistakenly called "Nvidia A100 blah blah....", which is not known by the cudaLucas guts, so as soon as Gugu has the benevolence to give the user the A100 (I say "the" because I think they have only one, which they distribute to all users in turns, considering how often we get it ), then few hours will be wasted to make the "A100 blah blah" files, unless the user realizes, and rename the two files properly before starting the Colab Notebook. The new made files are 99% similar with the one you distribute (which is normal, timing will differ here and there for different runs, but the remaining FFTs should be the same).

So, please edit the file names in the package and remove the "Nvidia" part.
Sorry, I did not notice your post sooner, as I only occasionally check this thread. Next time please post in our dedicated thread, as this is unrelated to GpuOwl.

Interesting, thanks for letting us know! As you may have seen in this post, at the same time I did that benchmarking, I also regenerated all the optimization files using twice as many iterations and added them for the A100 GPU, where I used four times the default number of iterations. These optimization files also cover a much larger range of FFT lengths (1K to 32768K) compared to what our GPU notebook currently does by default when the files do not already exist (1024K to 8192K). I did this on Google Cloud, as we had some credits that were expiring. The other five GPUs had identical names on Google Cloud as on Colab, so that is weird that the A100 is different. Daniel and I have never been lucky enough to actually get an A100 GPU on Colab, so we have not able to test this. Anyway, on Google Cloud it was called NVIDIA A100-SXM4-40GB, so I just pushed an update to rename those two files using just A100-SXM4-40GB. See here for the changes.

If you are interested, I attached a file with the timings on the A100 GPU for FFT lengths from 1M to 32M in GpuOwl for both the master and v6 branches, which could be compared to those respective CUDALucas optimization files.
Attached Files
File Type: txt A100 bench.txt (18.9 KB, 19 views)
tdulcet is offline   Reply With Quote
Old 2022-08-09, 14:36   #2781
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

2×5,059 Posts
Default

Thanks. At the time I was posting that, I also figured out the fact those files had more work put into them, so I ended up renaming them by hand too (i.e. ignoring my files, and keeping yours, with the "Nvidia" part deleted from the name). Currently I am getting an A100 every 3 or 4 days (with 4 sessions in the same time, one from the four gets the A100, the rest either V100 or P100). But yeah, as I was telling to Chris in the colab thread, this is paid account, and, I believe, more important: this is Singapore center, which seems not as much "stressed" as the US ones. Gugu allocates resources "geographically", and as I am in Thailand, I mostly get the Singapore clocks... Not many people want to "colab" in the area

This is how A100 (currently running) looks like, compared with v100.

Click image for larger version

Name:	a100.PNG
Views:	45
Size:	89.4 KB
ID:	27174 Click image for larger version

Name:	v100.PNG
Views:	43
Size:	38.4 KB
ID:	27175


(edit: of course, these posts can be moved to colab threads, sorry for offtopic; I am a bit in hurry now, I may move them myself later if nobody does it meantime)

Last fiddled with by LaurV on 2022-08-09 at 14:42
LaurV is offline   Reply With Quote
Old 2022-09-07, 22:47   #2782
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

851110 Posts
Default

6800 XT

(This is with the "quiet" BIOS enabled and the default fan curve.)
Code:
20220907 16:49:20 GpuOwl VERSION v7.2-93-ga5402c5-dirty
20220907 16:49:20 Note: not found 'config.txt'
20220907 16:49:20 config: -prp 77936867 -iters 1000000 
20220907 16:49:20 device 0, unique id ''
20220907 16:49:20 gfx1030-0 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20220907 16:49:20 gfx1030-0 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURACY=1 -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20220907 16:49:22 gfx1030-0 77936867 OpenCL compilation in 1.74 s
20220907 16:49:22 gfx1030-0 77936867 maxAlloc: 0.0 GB
20220907 16:49:22 gfx1030-0 77936867 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h'
20220907 16:49:22 gfx1030-0 77936867 P1(0) 0 bits
20220907 16:49:22 gfx1030-0 77936867 PRP starting from beginning
20220907 16:49:22 gfx1030-0 77936867 OK         0 on-load: blockSize 400, 0000000000000003
20220907 16:49:22 gfx1030-0 77936867 validating proof residues for power 8
20220907 16:49:22 gfx1030-0 77936867 Proof using power 8
20220907 16:49:23 gfx1030-0 77936867 OK       800   0.00% 1579c241dc63eca6  657 us/it + check 0.29s + save 0.08s; ETA 14:13
20220907 16:49:29 gfx1030-0 77936867     10000 fc4f135f7cf4ad29  659
20220907 16:49:36 gfx1030-0 77936867     20000 3cd1bd9d5e09cbc5  660
20220907 16:49:43 gfx1030-0 77936867     30000 c4e0ff35e3290d98  661
20220907 16:49:49 gfx1030-0 77936867     40000 dffe1b1b0d748128  662
20220907 16:49:56 gfx1030-0 77936867     50000 52e286945371ed29  662
20220907 16:50:02 gfx1030-0 77936867     60000 0945da4dc08bdd95  662
20220907 16:50:09 gfx1030-0 77936867     70000 7131fa4eb77f4bb2  663
20220907 16:50:16 gfx1030-0 77936867     80000 8d76071d27ee4221  663
20220907 16:50:22 gfx1030-0 77936867     90000 0bacff453b2f470e  664
20220907 16:50:29 gfx1030-0 77936867    100000 6d7296b9e2830f50  664
20220907 16:50:36 gfx1030-0 77936867    110000 8cbfd4435622bda7  664
20220907 16:50:42 gfx1030-0 77936867    120000 79ae5dad855057ad  664
20220907 16:50:49 gfx1030-0 77936867    130000 50c97bcbf876231f  664
20220907 16:50:56 gfx1030-0 77936867    140000 e1db15f897271496  664
20220907 16:51:02 gfx1030-0 77936867    150000 127631386c6a9b17  664
20220907 16:51:09 gfx1030-0 77936867    160000 25b7b6206fc6f085  665
20220907 16:51:15 gfx1030-0 77936867    170000 416816b0d9f4bba8  664
20220907 16:51:22 gfx1030-0 77936867    180000 6bee5d054f770861  665
20220907 16:51:29 gfx1030-0 77936867    190000 f37f068f014b18a0  665
20220907 16:51:36 gfx1030-0 77936867 OK    200000   0.26% f0b04b45b0855bd2  665 us/it + check 0.29s + save 0.08s; ETA 14:21
20220907 16:51:42 gfx1030-0 77936867    210000 43eb2fc2424d8aac  664
20220907 16:51:49 gfx1030-0 77936867    220000 a1081c6dc6a7689f  665
20220907 16:51:56 gfx1030-0 77936867    230000 2387818d3d3d0d01  666
20220907 16:52:02 gfx1030-0 77936867    240000 a9deae45055e5216  665
20220907 16:52:09 gfx1030-0 77936867    250000 89fcab15218f7cac  665
20220907 16:52:16 gfx1030-0 77936867    260000 55da428da4cf928a  665
20220907 16:52:22 gfx1030-0 77936867    270000 dc349756c5f05abf  665
20220907 16:52:29 gfx1030-0 77936867    280000 3564af24488443f4  666
20220907 16:52:36 gfx1030-0 77936867    290000 63fb281a06f78198  665
20220907 16:52:42 gfx1030-0 77936867    300000 990aa099aad5bf9c  665
20220907 16:52:49 gfx1030-0 77936867    310000 61e14297a2cc0096  675
20220907 16:52:56 gfx1030-0 77936867    320000 37e630c5f956cf8a  665
20220907 16:53:02 gfx1030-0 77936867    330000 66ccde7e28ce2b33  665
20220907 16:53:09 gfx1030-0 77936867    340000 d4a7cff61adaa84e  665
20220907 16:53:16 gfx1030-0 77936867    350000 d57b659cc1ca2753  665
20220907 16:53:22 gfx1030-0 77936867    360000 992df79b843f90de  665
20220907 16:53:29 gfx1030-0 77936867    370000 10b0b99eba490a1e  665
20220907 16:53:36 gfx1030-0 77936867    380000 56b1e40cd2666109  665
20220907 16:53:42 gfx1030-0 77936867    390000 ecccd874a8a0d961  665
20220907 16:53:49 gfx1030-0 77936867 OK    400000   0.51% c03f94396a5aa29e  665 us/it + check 0.29s + save 0.08s; ETA 14:20
20220907 16:53:56 gfx1030-0 77936867    410000 bf44242560060429  665
20220907 16:54:03 gfx1030-0 77936867    420000 7f656173fb521927  665
20220907 16:54:09 gfx1030-0 77936867    430000 b6c7618e60b71bb7  665
20220907 16:54:16 gfx1030-0 77936867    440000 200cbc3b887fcfb2  665
20220907 16:54:23 gfx1030-0 77936867    450000 4811fce58ccc9cab  665
20220907 16:54:29 gfx1030-0 77936867    460000 15cb337858fe7eb1  665
20220907 16:54:36 gfx1030-0 77936867    470000 ad96b1f48c8bf011  665
20220907 16:54:43 gfx1030-0 77936867    480000 93e184ebad6d3cd4  665
20220907 16:54:49 gfx1030-0 77936867    490000 70160baec6378071  665
20220907 16:54:56 gfx1030-0 77936867    500000 591eecd8448042ad  666
20220907 16:55:02 gfx1030-0 77936867    510000 8afd187213816739  665
20220907 16:55:09 gfx1030-0 77936867    520000 74e993308f33ac5b  665
20220907 16:55:16 gfx1030-0 77936867    530000 57c0f9c504186096  665
20220907 16:55:22 gfx1030-0 77936867    540000 bcb42100a7c391ad  665
20220907 16:55:29 gfx1030-0 77936867    550000 ff6f9c39e0347941  665
20220907 16:55:36 gfx1030-0 77936867    560000 9a740c005539ec84  665
20220907 16:55:42 gfx1030-0 77936867    570000 a82132fa0e95b673  665
20220907 16:55:49 gfx1030-0 77936867    580000 200fc0c1347e2854  666
20220907 16:55:56 gfx1030-0 77936867    590000 48edfb50a88114d1  665
20220907 16:56:03 gfx1030-0 77936867 OK    600000   0.77% b9decd65ca71b629  665 us/it + check 0.29s + save 0.08s; ETA 14:18
20220907 16:56:10 gfx1030-0 77936867    610000 15a66f18e764db77  675
20220907 16:56:16 gfx1030-0 77936867    620000 e35adf8422264428  665
20220907 16:56:23 gfx1030-0 77936867    630000 fd5849d82defeb62  665
20220907 16:56:29 gfx1030-0 77936867    640000 45736f195df56074  665
20220907 16:56:36 gfx1030-0 77936867    650000 947e1ab8b4dc318d  665
20220907 16:56:43 gfx1030-0 77936867    660000 acec19d8201699ce  666
20220907 16:56:49 gfx1030-0 77936867    670000 a513b29e0a46fc34  665
20220907 16:56:56 gfx1030-0 77936867    680000 8cccb6612f0f52c0  666
20220907 16:57:03 gfx1030-0 77936867    690000 1518615a0cf6e034  665
20220907 16:57:09 gfx1030-0 77936867    700000 310e1a3255551a62  665
20220907 16:57:16 gfx1030-0 77936867    710000 591e3891de6d29a1  665
20220907 16:57:23 gfx1030-0 77936867    720000 f61357a1e2bf3240  665
20220907 16:57:29 gfx1030-0 77936867    730000 9b186483d53f0d88  665
20220907 16:57:36 gfx1030-0 77936867    740000 7a12be1526255968  665
20220907 16:57:43 gfx1030-0 77936867    750000 b4f2e6d7a6a6615c  665
20220907 16:57:49 gfx1030-0 77936867    760000 0a1ba085f801c4c5  666
20220907 16:57:56 gfx1030-0 77936867    770000 9d7bbfb9f93deb22  666
20220907 16:58:03 gfx1030-0 77936867    780000 8600d4ab710627c0  665
20220907 16:58:09 gfx1030-0 77936867    790000 9a71d05fe3b961ff  665
20220907 16:58:16 gfx1030-0 77936867 OK    800000   1.03% 21ebf3636148f663  665 us/it + check 0.29s + save 0.08s; ETA 14:15
20220907 16:58:23 gfx1030-0 77936867    810000 52386f2e497aa955  665
20220907 16:58:30 gfx1030-0 77936867    820000 c481c083465a7b07  665
20220907 16:58:36 gfx1030-0 77936867    830000 96aea2b137ce50af  665
20220907 16:58:43 gfx1030-0 77936867    840000 84be20a8d22048ad  665
20220907 16:58:50 gfx1030-0 77936867    850000 78f3bbdeabd2bc02  665
20220907 16:58:56 gfx1030-0 77936867    860000 10c80cdb7eebced2  665
20220907 16:59:03 gfx1030-0 77936867    870000 b4cbb9d09f0fd89f  665
20220907 16:59:10 gfx1030-0 77936867    880000 5a5645c14acb870e  665
20220907 16:59:16 gfx1030-0 77936867    890000 2fc993d10648bd75  665
20220907 16:59:23 gfx1030-0 77936867    900000 518b906b58c64c40  665
20220907 16:59:29 gfx1030-0 77936867    910000 431308ea83b6c0ce  665
20220907 16:59:36 gfx1030-0 77936867    920000 62f61b11720d5394  676
20220907 16:59:43 gfx1030-0 77936867    930000 febec5e60548f928  665
20220907 16:59:50 gfx1030-0 77936867    940000 45c594641c8b2e5f  665
20220907 16:59:56 gfx1030-0 77936867    950000 ec9dfef8e2deb9bf  665
20220907 17:00:03 gfx1030-0 77936867    960000 259e2b4b65b85c03  665
20220907 17:00:09 gfx1030-0 77936867    970000 abd30602b41f18ec  665
20220907 17:00:16 gfx1030-0 77936867    980000 16471b2f3972a667  665
20220907 17:00:23 gfx1030-0 77936867    990000 1a6a0fc93f328b02  666
20220907 17:00:29 gfx1030-0 77936867 Stopping, please wait..
20220907 17:00:30 gfx1030-0 77936867 OK   1000000   1.28% 9bf9d9e6bff4286e  665 us/it + check 0.29s + save 0.13s; ETA 14:13
20220907 17:00:30 gfx1030-0 Exiting because "stop requested"
20220907 17:00:30 gfx1030-0 Bye
Attached Thumbnails
Click image for larger version

Name:	a.gif
Views:	15
Size:	26.5 KB
ID:	27295   Click image for larger version

Name:	b.gif
Views:	12
Size:	16.6 KB
ID:	27296  
Xyzzy is offline   Reply With Quote
Old 2022-09-22, 22:24   #2783
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23·857 Posts
Default Anomalously low power operation of radeon vii on windows 10 in gpuowl

System contains a Celeron G1840 on an Asrock H81 Pro BTC R2.0 motherboard, maxed out at 16GiB system ram, and 5 Radeon VII GPUs. This is powered by a 1600W gold rated power supply. The intention is to over-spec the supply so that it operates above but near its maximum efficiency point. (In my experience, pushing nominal 15Amp power cords to near their rated capacity is risky. Plugs may overheat and discolor & warp due to heat.)

Two of the GPUs are connected via a 1-to-4-way PCIe expander card. It may be these two that occasionally have anomalous GPU -> Host read errors and subsequent anomalous clock episodes. Other GPUs in the system seem to be unaffected, or rarely affected. All are connected via 1x to 16x PCIe extenders.
(I've also seen the GPU -> Host error occur with a 5700XT or Radeon VII on different systems with no expander card.)

Radeon VII idle power is ~19W at 21 MHz.
After a GPU -> Host repeated read failure, GPU clock rate drops to 570 MHz. Memory clock is unaffected and may remain overclocked. GPU clock rate is no longer controllable via the AMD software (until manual intervention as below.

A single gpuowl session launched after a GPU->CPU read failure (either manually or by progression in a batch script) runs at 570 MHz GPU clock and 49 watts on the affected GPU.
Iteration time on a 3.25 M fft size in those conditions is ~1284 us/iter. ~62,916. uJoule/iter. ~68% of energy below;
Disable and reenable the affected GPU via device manager. (Or in some cases, if the driver will not reload on the affected GPU, perform controlled GPU and CPU applications shutdown followed by system reboot to reset behavior to normal.)
(The AMD software for monitoring or setting clocks etc typically crashes at device reenable.)
Relaunch AMD software for adjusting clock parameters etc. Custom settings support power level limits in the range +-20% of nominal for Radeon VII.
Resume same session shows on the same GPU, fft size, and exponent, ~516 us/iter at ~180 W. That's 92,880. uJoule/iter.
A lower watts times us/iter product indicates higher power efficiency.
The 2.49 times slower iteration time at low power was 1.476 times more energy efficient per iteration (~32% energy savings per unit of computation).

I've also seen ~799MHz GPU clock occur, less frequently, but don't have details for that.

These results hint at possible power efficiency improvements by lowering GPU clock limits for all GPUs in the system. And perhaps running more GPUs, for greater total throughput for the same total power limit.
At the cost of increased latency of a given work item.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1684 2022-04-19 20:25
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 07:11.


Mon Oct 3 07:11:58 UTC 2022 up 46 days, 4:40, 0 users, load averages: 0.86, 0.89, 0.99

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔