![]() |
|
|
#1 |
|
"University student"
May 2021
Beijing, China
269 Posts |
However not every user here has a Radeon VII
![]() Many people, like me, run Prime95 on laptops or home computers. For safety and environmental reasons, we could not run Prime95 24/7. Thus it takes us longer to finish assignments, that's months for 108M exponents, and 1.5 years for 332M. Last fiddled with by axn on 2021-07-25 at 13:01 |
|
|
|
|
|
#2 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
32×11×79 Posts |
Quote:
None of what I post should be misconstrued as disparagement of the small-throughput user or their hardware. It's all welcome, as long as it does not interfere with orderly progress, and all adds up. |
|
|
|
|
|
|
#3 |
|
"Tucker Kao"
Jan 2020
Head Base M168202123
24×5×11 Posts |
Buy AMD Threadripper 5970X and Nvidia Geforce 3080 Ti, exponents of the M332M should finish within at most 3 weeks.
|
|
|
|
|
|
#4 |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
2·229 Posts |
I'm no expert on GPUs, but I thought if you were going to buy a GPU to test exponents, the CPU does not need to be very powerful - the GPU is doing all the work. Obviously if one is not constrained by money, heat or power consumption, then buy the best of everything. But if one is trying to achieve a good performance system without spending a fortune, then buying both a high-end CPU and a high-end GPU, would be unnecessary.
Last fiddled with by drkirkby on 2021-07-24 at 21:20 |
|
|
|
|
|
#5 | |
|
"Tucker Kao"
Jan 2020
Head Base M168202123
11011100002 Posts |
Quote:
I'm waiting to hear from another user who already bought Geforce 3080 Ti, the details of heat consumptions and GHz days/Day. I use the CPU of my current old machine to run all the P-1 factoring of all M168,***,*23 with B1 = 1,000,000 and B2 = 40,000,000, will take around 20 hours each. Running my GPU of my current old machine to finish those exponents up to 2^78, it seems to me that both can function at the same time without significant slowing downs. When I get my new PC which will likely be after Nov 21, 2021(Threadripper 5970X release date), I can perform 2 PRPs at the same time, 1 on CPU and 1 on GPU. Last fiddled with by tuckerkao on 2021-07-24 at 23:18 |
|
|
|
|
|
|
#6 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
32×11×79 Posts |
RTX 3080 Ti is fast at ~4800 GHD/d in TF, but at ~93. GHD/day for PRP, LL or P-1, is comparable to a GTX 1080 Ti or RX 5500 XT or ~30% of an RX 6900XT or Radeon VII per https://www.mersenne.ca/cudalucas.php
I have multiple Radeon VIIs on a system served by a Celeron G1840, so yes it does not take much CPU to keep GPU apps going. Except when doing GCDs in P-1. I recommend about as many physical CPU cores as GPUs & HT so the GPUs are unlikely to wait for each other. Also >16GB of system ram if doing a lot of GPU P-1 on multiple 16GB-vram GPUs simultaneously. |
|
|
|
|
|
#7 | |
|
"Tucker Kao"
Jan 2020
Head Base M168202123
24·5·11 Posts |
Quote:
How do I know exactly the amount of days and hours needed to finish a PRP test of M168779323 on AMD RX 6900XT is no one else runs it the first time. Glad Kriesel mentioned about the difference between trial factoring and PRPs on GPU that Geforce 3080 Ti cannot support both. Once I get the new machine, I won't ask anyone's help, I'll just run myself. |
|
|
|
|
|
|
#8 | |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
2×229 Posts |
Quote:
For PRP tests it is not clear the GPU wins, but for trial-factoring the CPUs are not good. |
|
|
|
|
|
|
#9 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
32·11·79 Posts |
For the CUDALucas or TF benchmark pages on mersenne.ca, and any blue column heading or the downward arrow right of GHzDays/day, pause your mouse cursor on them for popup descriptions.
The mersenne.ca CUDALucas benchmark page is useful within its limitations for relative comparisons between GPUs. The ~295 GHD/day values for Radeon VII are old, from old less efficient versions of Gpuowl, or from CUDALucas, and considerably understate maximum available performance with recent versions of Gpuowl. Tdulcet reported ~75% longer run times on Colab & NVIDIA Tesla GPUs with CUDALucas than recent Gpuowl. I've extensively benchmarked a Radeon VII across a wide variety of Gpuowl versions and all fft lengths supported in them from 3M to 192M, on Windows 10, for specified conditions. Resulting timings in ms/iter can be seen at the last attachment of https://www.mersenneforum.org/showpo...35&postcount=2. Those timings correspond to a range of performance for best version timing per fft length, from 316. to 486. GHD/day. (It might be possible to find other fft formulations that perform better; I used the first / default for each size. On occasion an alternate may perform better.) Note that these measurements were made while the GPU was neither as aggressively clocked as I and others have been able to reliably use on Radeon VIIs with Hynix Vram, nor operating at full GPU power, nor highest performance OS/driver combo. Benchmarking was done at 86% power limit for improved power efficiency. Also, reportedly ROCm on Linux provides the highest performance, with Woltman having reported 510 GHD/day with it on IIRC 5M fft. Compare to 447. at reduced power and clock on Windows at 5M. Finally, power consumption may be elevated by the more aggressive than standard GPU fan curve I'm using. Note also that prime/prime95 and Gpuowl each have some fft lengths for which running the next higher fft can be faster. I've found in benchmarking Gpuowl that the 13-smooth ffts (3.25M, 6.5M etc) tend to be slower than the next larger fft (3.5M, 7M, etc.), as does 15M. At current wavefront ~105.1M, 5.5M fft applies, and Gpuowl V6.11-380 benchmarked at 0.821 ms/iter, which corresponds to 0.9987 day/exponent/GPU, 419. GHD/day/GPU, again at reduced GPU power, on Windows, with below-maximum reliable vram clocking. I computed ~1.53 GHD/d/W for a multi-RadeonVII system, with power measured at the AC power cord, while running prime95 on its cpu. The GPU-only efficiency would be slightly higher. That AC input power accounts for all power used, including the system ram which drkirkby omitted from his list, and at 384GiB ECC on his system, is probably consuming considerable power in his system. Due to the high cost of a >1KW output UPS, I am running my GPUs rig with inline surge suppression but not UPS. Indicated GPU power per GPU range from 190 to 212W at the 86% setting. Total AC input power divided by number of GPUs operating was less than the nominal max GPU TDP. I'm currently running these GPUs at 80% for better power efficiency. The 419. GHD/day/GPU/~200Wactual/GPU is ~2.1 GHD/d/W on the GPUs alone, omitting system overhead and conversion losses. One Radeon VII so configured can match the throughput of the dual-26-core-8167M $5000 system under certain conditions, at better power efficiency, and original cost of the entire open frame system divided by number of GPUs was ~$700. More power efficient, and much more capital efficient per unit throughput. And would still be ~4x more cost effective today than the 8167M system if created with current GPU costs. Last fiddled with by kriesel on 2021-07-25 at 15:45 |
|
|
|
|
|
#10 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
32·11·79 Posts |
|
|
|
|
|
|
#11 |
|
Jun 2003
125308 Posts |
TDP doesn't mean maximum power consumed by the CPU. A 165W TDP processor could easily consume 200W or more running flat out. Not saying that's what your CPUs are doing, but it is possible.
Also 12 sticks of RAM consumes a fair bit of power. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Is it forbidden to talk about manual testing strategies? | Dobri | Dobri | 45 | 2021-07-26 03:39 |
| HTTP forbidden message? | bchaffin | Aliquot Sequences | 1 | 2011-12-26 06:48 |
| Which of these CPUs is most productive? | Rodrigo | Hardware | 123 | 2011-02-05 21:42 |
| LLR benchmark thread | Oddball | Riesel Prime Search | 5 | 2010-08-02 00:11 |
| Deutscher Thread (german thread) | TauCeti | NFSNET Discussion | 0 | 2003-12-11 22:12 |