mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-01-10, 01:56   #111
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

11111001011102 Posts
Default

Possibly interesting:

gpuowl = 172W
mfakto = 202W
Xyzzy is offline   Reply With Quote
Old 2021-01-10, 02:51   #112
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

26216 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Possibly interesting:

gpuowl = 172W
mfakto = 202W
Since GPUOWL uses FP64 computations, many threads will be paused often waiting for this resource; consuming no power.

MFAKTC primarily uses INT computations so the threads tend to run full blast; consuming more power.
tServo is offline   Reply With Quote
Old 2021-01-10, 10:46   #113
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

18216 Posts
Default

It seems that mfakto only uses half of the compute units, 1920 of the total 3840.

I've tried to find information on INT32 performance ratio, but it is hard.
Viliam Furik is offline   Reply With Quote
Old 2021-01-10, 14:40   #114
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2·5·61 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
It seems that mfakto only uses half of the compute units, 1920 of the total 3840.

I've tried to find information on INT32 performance ratio, but it is hard.
Are you using a performance monitor to measure compute units in use?

MFAKTO was written 9 years ago and perhaps some of the kernel's launch parameters could use re-tuning due to the newer GPUs having different architecture.

Also, using half the compute units could be due to any GPU's Achilles heel: memory access stalls.
tServo is offline   Reply With Quote
Old 2021-01-10, 16:05   #115
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

38610 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Code:
...
  number of multiprocessors 30 (1920 compute elements)
  clock rate                1815 MHz
...
RX6800 has 60 CUs, and a total of 3840 compute cores.
Viliam Furik is offline   Reply With Quote
Old 2021-01-10, 17:27   #116
axn
 
axn's Avatar
 
Jun 2003

23·607 Posts
Default

It is what OpenCL reports, not what the program "uses". Power consumption of 200W is consistent with all the CUs being used since the TDP is 250W.
axn is offline   Reply With Quote
Old 2021-01-10, 22:27   #117
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2·193 Posts
Default

Quote:
Originally Posted by axn View Post
It is what OpenCL reports, not what the program "uses". Power consumption of 200W is consistent with all the CUs being used since the TDP is 250W.
Oh, ok then. Thanks.

In that case, why is that so low? I have read it could have a 1:1 ratio of FP32 and INT32 operations per second, and it should have 16 TFLOPS of FP32.
Viliam Furik is offline   Reply With Quote
Old 2021-01-12, 16:28   #118
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

2×13×307 Posts
Default

Running each card alone, in an open-air test bench, they will run at default speed with no errors.

Putting them in a case, close to each other, the top card gets errors. We are experimenting with reducing the power draw to get everything stable.

Unfortunately, there is no memory temperature reading. The errors occur (we think) when the junction temp gets around 90°C. The junction is rated for 110°C so we assume the memory is the culprit.

Probably most people will never notice an occasional video error in games but it is obviously an issue for compute tasks.

We put this system together with what we had laying around, so our cost ended up being pretty low.

CPU: Intel Celeron G5900 3.4 GHz Dual-Core Processor
CPU Cooler: Noctua NH-U12S 55 CFM CPU Cooler
Motherboard: Asus ROG STRIX Z490-F GAMING ATX LGA1200 Motherboard
Memory: Corsair Vengeance LPX 16 GB (2 x 8 GB) DDR4-2400 CL16 Memory
Memory: Corsair Vengeance LPX 16 GB (2 x 8 GB) DDR4-2400 CL16 Memory
Storage: Seagate Barracuda Compute 256 GB M.2-2280 NVME Solid State Drive
Video Card: Gigabyte Radeon RX 6800 16 GB Video Card
Video Card: Gigabyte Radeon RX 6800 16 GB Video Card
Case: Fractal Design Meshify C ATX Mid Tower Case
Power Supply: SeaSonic FOCUS Plus Platinum 750 W 80+ Platinum Certified Fully Modular ATX Power Supply


Attached Thumbnails
Click image for larger version

Name:	system.jpg
Views:	35
Size:	621.6 KB
ID:	24162   Click image for larger version

Name:	c.png
Views:	24
Size:	26.7 KB
ID:	24163  
Xyzzy is offline   Reply With Quote
Old 2021-01-14, 21:30   #119
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

174568 Posts
Default

Over the past few days we have been dealing with numerous issues with the 6800 cards. We have isolated the problem to the memory on both cards. We have tested each card individually and in different systems.

The memory runs at ~2,000MT/s. There is no way to tell the memory to run slower. Once either card heats up into the high 70s/low 80s it is only a matter of time before they either start to throw errors or completely bork the system. When we say bork the system, it borks it so hard it resets the system's BIOS to defaults. (!)

We can clock them down so they run in the 60s but what is the point? They run slow at that speed, and even then sometimes they will hang. It just takes several hours to a day. We explored modding the card's BIOS to lower the memory clock speed but the BIOS is digitally signed so they can't be modified.

If it was just one card we would suspect that it was a defective card and we would warranty it. Both cards acting weird in multiple systems means, we think, there is a driver or design problem.

If we give the cards many work units, like 10K iterations of a FFT then another 10K of a different FFT, etc., (using a batch file) the cards will "get confused" eventually and hang.

There are only two available drivers for them. We tried both. We used DDU to purge the systems of all video/audio drivers prior to installing them.

We never got to test them in Linux. We never got past the initial "easy pointy-and-clicky Windows" stage.

Putting them in a case is a disaster. They fail on a test bench but in a case they fail even faster.

We cannot recommend the 6800 (reference style) at this point. We don't know if the drivers are not right or if the cards are not suitable for compute work or what.

Note that in games and synthetic benchmarks, the cards work fine. They never reset or bork the system even though they are running very hot at "stock" speeds. Above 50% or so the fans on the cards stop providing additional cooling and just get louder.

Our time is worth about a dollar an hour so we decided to cut our losses and we got rid of them. Life is too short to spend that much time on unreliable hardware. We now have a RTX 3070 installed which "just works". It is "slow" for gpuowl but it is great for our games.

Xyzzy is offline   Reply With Quote
Old 2021-01-15, 00:43   #120
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·3·127 Posts
Default

That's a shame, I really hope that poor windows drivers are to blame and not the hardware. Gamers tend to be able to overclock much higher than compute can and be considered "stable", I wouldn't be surprised if RDNA2 needs to be dialed back a bit for stability now that they've optimised for gaming. gpuowl does hammer the memory in a way gaming doesn't, there's a chance the memory cooling solution is not fit for purpose. Judging by this video the reference cooling is not ideal although the die and RAM contact looks fine: https://www.youtube.com/watch?v=0s7bOaa6X9E
M344587487 is offline   Reply With Quote
Old 2021-01-15, 02:50   #121
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

52×7×53 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Above 50% or so the fans on the cards stop providing additional cooling and just get louder.
This! You put the dot on the ı (unicode: latin character i without dot )
Is the card blowing air in both directions? (i.e. the hot air blown back into the case?)
Photo/link to picture on web/vendor?
Is mfakto different from gpuowl? (you said it works well in games even if it gets hotter? this may mean that the card has issues with the memory, either supply or bus/impedance, when you use more memory it can not sustain, in this case mfakto should behave "gaming style" and be more reliable? Maybe you can use them for TF? Is the memory cooled by the same metal block, or separate? Did you open it to see if the pads for memory are thicker, or different material/type/color/dryness/etc? It may be that the memory chips are set lower then the gpu chip and the pads are thicker, like M34 said, it can be a memory-cooling issue).
Is there any water blocks available for it?

Last fiddled with by LaurV on 2021-01-15 at 02:55
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Navi (RX 5700, RX 5700XT) M344587487 GPU Computing 29 2019-11-28 14:00

All times are UTC. The time now is 14:00.

Tue Mar 2 14:00:19 UTC 2021 up 89 days, 10:11, 0 users, load averages: 1.97, 2.04, 2.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.