mersenneforum.org gpuOwL: an OpenCL program for Mersenne primality testing
 Register FAQ Search Today's Posts Mark Forums Read

2022-11-21, 05:17   #2861
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·13·283 Posts

Quote:
 Originally Posted by axn These beasts (Instincts, Teslas, Quadros, etc.) are only of theoretical interest to the project. They are f***ing expensive! You're better off building one or more mulit-GPU PCs with that kind of money.
Google Colab free used to offer 4 different models of Tesla. Current state of the art gear will eventually be resold-used-for-a-fraction-of-list.

Quote:
 Originally Posted by frmky 32GB MI60's are available for under $1000. But they are passively cooled so you'd have to deal with rigging a cooler for it. Or buy a used rack style server with a row of blowers included. And either way, deal with limited driver availability. (Few Linux distros, no Windows.) 2022-11-21, 13:45 #2862 axn Jun 2003 22·32·151 Posts Quote:  Originally Posted by frmky 32GB MI60's are available for under$1000. But they are passively cooled so you'd have to deal with rigging a cooler for it.
How does the performance/$compare to current gen / next gen consumer cards? I am not sure it will outperform$1000 7900 XTX

Quote:
 Originally Posted by kriesel Google Colab free used to offer 4 different models of Tesla. Current state of the art gear will eventually be resold-used-for-a-fraction-of-list.
With colab (free or paid), you're at the mercy of google for what card you get and how long you get it for. Not really worth getting excited about.

I am not sure how much cheaper these will be in used market, but yes, if you can get some of these cheap enough, it _might_ be worth it. "Might", because, by the time they become cheap enough, the latest gen consumer card might still offer better performance/$2022-11-21, 15:44 #2863 kriesel "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2·13·283 Posts Quote:  Originally Posted by axn With colab (free or paid), you're at the mercy of google for what card you get and how long you get it for. No purchase cost, and no utility cost, but higher end user time cost, is a tradeoff that may be very palatable for some with limited funds or cooling capacity. Plus it offers a try-before-you-buy experience. I'm routinely running 7 Colab free tabs, getting usually a 3.5 hour GPU & CPU session & multiple CPU-only sessions per day on each tab, for a combined throughput of ~ >1. T4 24/7. It's not for everyone. 2022-11-21, 20:18 #2864 frmky Jul 2003 So Cal 2×1,297 Posts Quote:  Originally Posted by axn How does the performance/$ compare to current gen / next gen consumer cards? I am not sure it will outperform \$1000 7900 XTX
Likely it will. While the RX 7900 XTX is overall a much faster card (a MI60 is based on the same architecture as the Radeon VII) the 1/16 FP64/FP32 ratio on the RX 7900 XTX slows it down for gpuOwl.

2022-11-22, 00:37   #2865
Magellan3s

Mar 2022
Earth

5·23 Posts

Quote:
 Originally Posted by frmky Likely it will. While the RX 7900 XTX is overall a much faster card (a MI60 is based on the same architecture as the Radeon VII) the 1/16 FP64/FP32 ratio on the RX 7900 XTX slows it down for gpuOwl.
Is the FP32/FP64 Ratio really 1/16?!?!

 2022-11-22, 02:40 #2866 yuki0831   "Yuki@karoushi" Feb 2020 Japan, Chiba pref 2·3·5 Posts https://www.techpowerup.com/gpu-spec...7900-xtx.c3941 3.848 TFLOPS (1:16) i think PRP much faster than my RTX4090
2022-11-22, 21:40   #2867
Magellan3s

Mar 2022
Earth

1638 Posts

Quote:
 Originally Posted by yuki0831 https://www.techpowerup.com/gpu-spec...7900-xtx.c3941 3.848 TFLOPS (1:16) i think PRP much faster than my RTX4090
RTX4090 is 1,290 GFLOPS (1:64)

 2022-12-09, 14:46 #2868 Runtime Error   Sep 2017 USA 2·5·19 Posts P-1 on both gpuOwl and prime95? Hi, is it currently possible to run P-1 with B1 only parallel with PRP on gpuOwl, and then copy the P-1 save file over to mprime/prime95 for the new and improved B2 w/ lots of RAM? Thank you.
2022-12-09, 16:40   #2869
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·13·283 Posts

Quote:
 Originally Posted by Runtime Error Hi, is it currently possible to run P-1 with B1 only parallel with PRP on gpuOwl, and then copy the P-1 save file over to mprime/prime95 for the new and improved B2 w/ lots of RAM? Thank you.
Not currently. File formats are incompatible between GIMPS apps in general. (And sometimes versions of the same app are also incompatible.)
There's a reference info post for that: Save file (in)compatibility https://www.mersenneforum.org/showpo...8&postcount=26
I looked at implementing CUDAPm1 -> MNEF ->prime95 some time ago, and found they used not merely different formats, but different data structures, such that what prime95 expected in a resume was not present from CUDAPm1.
IIRC Mihai has contemplated implementing split P-1 (perhaps a future v8) Gpuowl S1 -> mprime/prime95 S2, but I've not seen any announcement of the capability yet.

2022-12-09, 17:30   #2870
preda

"Mihai Preda"
Apr 2015

144110 Posts

Quote:
 Originally Posted by kriesel Not currently. File formats are incompatible between GIMPS apps in general.
It so happens that I just pushed some changes, that do export the result of P-1 first-stage in mprime savefile format, that can be continued by mprime. With these changes gpuowl also does not do second-stage anymore, and does not do the merged first-stage + PRP anymore. In exchange, the first-stage is error-checked with a GEC equivalent, so that's buletproof (bar implementation errors :)

All this is still work-in-progress to a large degree. It's not very documented, and it's not polished, etc. But the basics are there. I do use it myself this way:
- I run P-1 first-stage on the GPU (on Radeon VII), with B1=2M at the wavefront,
- I continue with second-stage on the CPU with mprime, with either 256GB or 128GB of RAM, with a B2 between 300M and 1000M. The second stage takes around 1h - 1.5h.

I plan to post more about how I run it.

Last fiddled with by preda on 2022-12-09 at 18:21

2022-12-09, 18:05   #2871
preda

"Mihai Preda"
Apr 2015

11·131 Posts

So we need to feed mprime with tasks originating for gpuowl.

Gpuowl runs a P-1 first-stage on an exponent, using an assignment that looks like:
Quote:
 PFactor=xxxxxxxxxxxxxxx,1,2,117142303,-1,77,1.3
Such an assignment is located in gpuowl's worktodo.txt. It was obtained either by using the script primenet.py, or by using the manual assignments page.

Once gpuowl is done with P-1 first-stage, it needs to forward the same assignment line to mprime. It does this by appending the assignment to a worktodo.add file in the folder that is set with the "-mprimeDir <dir>" argument to gpuowl.

(Note: normally the arguments to gpuowl are not passed on the command line, but are instead read from the config.txt file that is found in the folder passed with -dir <folder> to gpuowl)

Now there's a problem with frequent writes to mprime's worktodo.add while mprime is doing P-1 second-stage: mprime, realizing there's a worktodo.add, will interrupt the P-1 second-stage, integrate the worktodo.add, and continue the second-stage, and this interruption is rather expensive (on the order of 5-10 minutes). This problem will likely be fixed in a future release of mprime (by not interrupting the second-stage in the middle just for the sake of an worktodo.add). But in the meantime, the workaround that I use is this:
1. I accumulate the assignments in a separate worktodo.add (that is in a different folder, not in mprime's folder). Thus mprime is not interrupted.
2. Using a cron job, every 24h I move the big worktodo.add into mprime's folder. This will cause an interruption, but only once in 24h which is ok.

(to be continued)

Last fiddled with by preda on 2022-12-09 at 18:08

 Similar Threads Thread Thread Starter Forum Replies Last Post Bdot GPU Computing 1719 2023-01-16 15:51 xx005fs GpuOwl 0 2019-07-26 21:37 1260 Software 17 2015-08-28 01:35 CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12 Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 08:08.

Sat Jan 28 08:08:14 UTC 2023 up 163 days, 5:36, 0 users, load averages: 1.14, 1.58, 1.65