mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2022-11-21, 05:17   #2861
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×127 Posts
Default

Quote:
Originally Posted by axn View Post
These beasts (Instincts, Teslas, Quadros, etc.) are only of theoretical interest to the project. They are f***ing expensive! You're better off building one or more mulit-GPU PCs with that kind of money.
Google Colab free used to offer 4 different models of Tesla. Current state of the art gear will eventually be resold-used-for-a-fraction-of-list.

Quote:
Originally Posted by frmky View Post
32GB MI60's are available for under $1000. But they are passively cooled so you'd have to deal with rigging a cooler for it.
Or buy a used rack style server with a row of blowers included. And either way, deal with limited driver availability. (Few Linux distros, no Windows.)
kriesel is offline   Reply With Quote
Old 2022-11-21, 13:45   #2862
axn
 
axn's Avatar
 
Jun 2003

22·32·151 Posts
Default

Quote:
Originally Posted by frmky View Post
32GB MI60's are available for under $1000. But they are passively cooled so you'd have to deal with rigging a cooler for it.
How does the performance/$ compare to current gen / next gen consumer cards? I am not sure it will outperform $1000 7900 XTX

Quote:
Originally Posted by kriesel View Post
Google Colab free used to offer 4 different models of Tesla. Current state of the art gear will eventually be resold-used-for-a-fraction-of-list.
With colab (free or paid), you're at the mercy of google for what card you get and how long you get it for. Not really worth getting excited about.

I am not sure how much cheaper these will be in used market, but yes, if you can get some of these cheap enough, it _might_ be worth it. "Might", because, by the time they become cheap enough, the latest gen consumer card might still offer better performance/$
axn is offline   Reply With Quote
Old 2022-11-21, 15:44   #2863
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts
Default

Quote:
Originally Posted by axn View Post
With colab (free or paid), you're at the mercy of google for what card you get and how long you get it for.
No purchase cost, and no utility cost, but higher end user time cost, is a tradeoff that may be very palatable for some with limited funds or cooling capacity. Plus it offers a try-before-you-buy experience. I'm routinely running 7 Colab free tabs, getting usually a 3.5 hour GPU & CPU session & multiple CPU-only sessions per day on each tab, for a combined throughput of ~ >1. T4 24/7. It's not for everyone.
kriesel is offline   Reply With Quote
Old 2022-11-21, 20:18   #2864
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

A2516 Posts
Default

Quote:
Originally Posted by axn View Post
How does the performance/$ compare to current gen / next gen consumer cards? I am not sure it will outperform $1000 7900 XTX
Likely it will. While the RX 7900 XTX is overall a much faster card (a MI60 is based on the same architecture as the Radeon VII) the 1/16 FP64/FP32 ratio on the RX 7900 XTX slows it down for gpuOwl.
frmky is offline   Reply With Quote
Old 2022-11-22, 00:37   #2865
Magellan3s
 
Mar 2022
Earth

1638 Posts
Default

Quote:
Originally Posted by frmky View Post
Likely it will. While the RX 7900 XTX is overall a much faster card (a MI60 is based on the same architecture as the Radeon VII) the 1/16 FP64/FP32 ratio on the RX 7900 XTX slows it down for gpuOwl.
Is the FP32/FP64 Ratio really 1/16?!?!
Magellan3s is offline   Reply With Quote
Old 2022-11-22, 02:40   #2866
yuki0831
 
"Yuki@karoushi"
Feb 2020
Japan, Chiba pref

2×3×5 Posts
Default

https://www.techpowerup.com/gpu-spec...7900-xtx.c3941

3.848 TFLOPS (1:16)

i think PRP much faster than my RTX4090
yuki0831 is offline   Reply With Quote
Old 2022-11-22, 21:40   #2867
Magellan3s
 
Mar 2022
Earth

5×23 Posts
Default

Quote:
Originally Posted by yuki0831 View Post
https://www.techpowerup.com/gpu-spec...7900-xtx.c3941

3.848 TFLOPS (1:16)

i think PRP much faster than my RTX4090
RTX4090 is 1,290 GFLOPS (1:64)
Magellan3s is offline   Reply With Quote
Old 2022-12-09, 14:46   #2868
Runtime Error
 
Sep 2017
USA

BE16 Posts
Default P-1 on both gpuOwl and prime95?

Hi, is it currently possible to run P-1 with B1 only parallel with PRP on gpuOwl, and then copy the P-1 save file over to mprime/prime95 for the new and improved B2 w/ lots of RAM? Thank you.
Runtime Error is offline   Reply With Quote
Old 2022-12-09, 16:40   #2869
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×127 Posts
Default

Quote:
Originally Posted by Runtime Error View Post
Hi, is it currently possible to run P-1 with B1 only parallel with PRP on gpuOwl, and then copy the P-1 save file over to mprime/prime95 for the new and improved B2 w/ lots of RAM? Thank you.
Not currently. File formats are incompatible between GIMPS apps in general. (And sometimes versions of the same app are also incompatible.)
There's a reference info post for that: Save file (in)compatibility https://www.mersenneforum.org/showpo...8&postcount=26
I looked at implementing CUDAPm1 -> MNEF ->prime95 some time ago, and found they used not merely different formats, but different data structures, such that what prime95 expected in a resume was not present from CUDAPm1.
IIRC Mihai has contemplated implementing split P-1 (perhaps a future v8) Gpuowl S1 -> mprime/prime95 S2, but I've not seen any announcement of the capability yet.
And see also the exponent limits reference info post.
kriesel is offline   Reply With Quote
Old 2022-12-09, 17:30   #2870
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·192 Posts
Default

Quote:
Originally Posted by kriesel View Post
Not currently. File formats are incompatible between GIMPS apps in general.
It so happens that I just pushed some changes, that do export the result of P-1 first-stage in mprime savefile format, that can be continued by mprime. With these changes gpuowl also does not do second-stage anymore, and does not do the merged first-stage + PRP anymore. In exchange, the first-stage is error-checked with a GEC equivalent, so that's buletproof (bar implementation errors :)

All this is still work-in-progress to a large degree. It's not very documented, and it's not polished, etc. But the basics are there. I do use it myself this way:
- I run P-1 first-stage on the GPU (on Radeon VII), with B1=2M at the wavefront,
- I continue with second-stage on the CPU with mprime, with either 256GB or 128GB of RAM, with a B2 between 300M and 1000M. The second stage takes around 1h - 1.5h.

I plan to post more about how I run it.

Last fiddled with by preda on 2022-12-09 at 18:21
preda is offline   Reply With Quote
Old 2022-12-09, 18:05   #2871
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·192 Posts
Default Transfering tasks from gpuowl to mprime (worktodo.add)

So we need to feed mprime with tasks originating for gpuowl.

Gpuowl runs a P-1 first-stage on an exponent, using an assignment that looks like:
Quote:
PFactor=xxxxxxxxxxxxxxx,1,2,117142303,-1,77,1.3
Such an assignment is located in gpuowl's worktodo.txt. It was obtained either by using the script primenet.py, or by using the manual assignments page.

Once gpuowl is done with P-1 first-stage, it needs to forward the same assignment line to mprime. It does this by appending the assignment to a worktodo.add file in the folder that is set with the "-mprimeDir <dir>" argument to gpuowl.

(Note: normally the arguments to gpuowl are not passed on the command line, but are instead read from the config.txt file that is found in the folder passed with -dir <folder> to gpuowl)

Now there's a problem with frequent writes to mprime's worktodo.add while mprime is doing P-1 second-stage: mprime, realizing there's a worktodo.add, will interrupt the P-1 second-stage, integrate the worktodo.add, and continue the second-stage, and this interruption is rather expensive (on the order of 5-10 minutes). This problem will likely be fixed in a future release of mprime (by not interrupting the second-stage in the middle just for the sake of an worktodo.add). But in the meantime, the workaround that I use is this:
1. I accumulate the assignments in a separate worktodo.add (that is in a different folder, not in mprime's folder). Thus mprime is not interrupted.
2. Using a cron job, every 24h I move the big worktodo.add into mprime's folder. This will cause an interruption, but only once in 24h which is ok.

(to be continued)

Last fiddled with by preda on 2022-12-09 at 18:08
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1719 2023-01-16 15:51
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 20:17.


Mon Jan 30 20:17:39 UTC 2023 up 165 days, 17:46, 0 users, load averages: 1.11, 0.83, 0.84

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔