mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-03-21, 17:00   #12
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

3×547 Posts
Default

Quote:
Originally Posted by kriesel View Post
Also there are probably programming advantages to having b fixed during a run.
Checkout that p95 can change the blocksize, I don't see any advantage to fix it.

Quote:
Originally Posted by kriesel View Post
B needs to be an integral multiple of b, and b constant during a B-long interval, don't they?
Yes, that is true.

I have overlooked that as I can remember gpuowl is a little tricky in this area, because it is using (the slow) cpu, but only in the check, so in general superblock!=block^2 in an optimal setup. Suppose that cpu is v times slower than gpu (in general v>1 but in the formula we allow v<1 also), do the base*res^(2^block)==res2 mod N check on cpu,
while doing this the gpu continues the run, and fall back to the saved residue if the check fails on cpu. Then the new running time:
Code:
G(p,v,B,block)={q=(1-e)^B;return(p/B*B/block/q+p*(1/q-1)+p/B*(1/q-1)*v*block)}
where B=superblock, we have a slightly modified first term, and in the 3rd very new term the v*block multiple is the number of iterations you lost (on gpu) if you need to do a single rollback [we don't need to include the time spent on cpu, because that is running in paralel]. To hide the checking time on cpu you need: v*block<B+block. What there is still not in this formula is that sending data from gpu to cpu takes time.

Say v=3, so the cpu is 3 times slower then Kriesel's problem:
Code:
? G(p,3,50000,400)
%51 = 5696048.2009918175108932964623818885302
? G(p,3,20007,1053)
%52 = 2502938.5127396770013937653268849096083
a good saving where I have restricted the search to B>20000, just to lower the amount of data from gpu->cpu.
R. Gerbicz is offline   Reply With Quote
Old 2021-03-21, 20:02   #13
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22×3×112 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
Checkout that p95 can change the blocksize, I don't see any advantage to fix it.
A small advantage of "b" being fixed is that, in the GEC verification, we have a multiplication by "3" (the PRP base). When "b" is variable, this multiplication must be changed to a general multiplication with some full-sized base. This does require one addiotional buffer on the GPU for this number, and a few implementation changes that I didn't get to do yet. Also the savefiles, that right now store a single residue, may need to be changed to store two residues to accomodate variable "b".

I'm not against such changes, it's just that they're not high-priority enough ATM.

Last fiddled with by preda on 2021-03-21 at 20:04
preda is offline   Reply With Quote
Old 2021-03-22, 03:26   #14
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41·251 Posts
Default

Quote:
Originally Posted by preda View Post
addiotional (sic!) buffer
Better don't use that kind of buffer!
LaurV is offline   Reply With Quote
Old 2021-03-22, 22:39   #15
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11110100100002 Posts
Default

Quote:
Originally Posted by preda View Post
A small advantage of "b" being fixed is that, in the GEC verification, we have a multiplication by "3" (the PRP base). When "b" is variable, this multiplication must be changed to a general multiplication with some full-sized base. This does require one addiotional buffer on the GPU for this number, and a few implementation changes that I didn't get to do yet. Also the savefiles, that right now store a single residue, may need to be changed to store two residues to accomodate variable "b".

I'm not against such changes, it's just that they're not high-priority enough ATM.
Wondering if I'm misunderstanding you or current gpuowl implementation. How does changing individual block size (I used b earlier, Gerbicz' original post on PRP/error checking used L to represent that) from 1000 to 500 or 400 or other values, as gpuowl already supports, or down to 50 as prime95 does, affect precision needed, for that multiplication by the usual 3 for PRP base that you mention? L or L2 will be 32 bits or smaller, usually no more than 10 and 20 bits respectively, even for F33 and possibly other extreme cases. (Ernst what do you plan to use in the attempts on F33?)
From gpuowl v6.11-380 help output:
Code:
-block <value>     : PRP GEC block size, or LL iteration-block size. Must divide 10'000.
-log <step>        : log every <step> iterations. Multiple of 10'000.
Here -block <value> is -block <L>, -log <step> is -log <L2>, and gpuowl allows L!=sqrt(L2). PRP base remains 3 for the PRP type 1 computation implemented since gpuowl v6.5-84, and in some earlier versions.

Last fiddled with by kriesel on 2021-03-22 at 22:40
kriesel is online now   Reply With Quote
Old 2021-03-29, 07:43   #16
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22×3×112 Posts
Default

Quote:
Originally Posted by kriesel View Post
Wondering if I'm misunderstanding you or current gpuowl implementation. How does changing individual block size (I used b earlier, Gerbicz' original post on PRP/error checking used L to represent that) from 1000 to 500 or 400 or other values, as gpuowl already supports, or down to 50 as prime95 does, affect precision needed, for that multiplication by the usual 3 for PRP base that you mention? L or L2 will be 32 bits or smaller, usually no more than 10 and 20 bits respectively, even for F33 and possibly other extreme cases. (Ernst what do you plan to use in the attempts on F33?)
From gpuowl v6.11-380 help output:
Code:
-block <value>     : PRP GEC block size, or LL iteration-block size. Must divide 10'000.
-log <step>        : log every <step> iterations. Multiple of 10'000.
Here -block <value> is -block <L>, -log <step> is -log <L2>, and gpuowl allows L!=sqrt(L2). PRP base remains 3 for the PRP type 1 computation implemented since gpuowl v6.5-84, and in some earlier versions.
The need for the general-MUL vs. MUL-3 only appears when changing the "L" step dinamically during a test. This is something GpuOwl does not support (and thus gets away with using MUL-3), but prime95 does support (and I assume uses a general MUL for that).

GpuOwl can start a new test with a different L value, and still apply the MUL-3 only, as long as the L is fixed during the test.

(not that it's a big deal performance-wise, as that MUL is only involved in the GEC verification, which is very rarely done).

Last fiddled with by preda on 2021-03-29 at 07:46
preda is offline   Reply With Quote
Old 2021-03-29, 16:43   #17
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default update

I've put up a draft post analyzing logs on the problem gpu for the current exponent running at https://www.mersenneforum.org/showpost.php?p=574709
I can update it there without 1hour edit limit or cluttering up this thread with lots of update posts.

It seems to be responding favorably to gpu ram clock rates below 900 MHz. But error rate is still too high, and I'm working on getting it lower, slowly.
kriesel is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
OFFICIAL "SERVER PROBLEMS" THREAD ewmayer PrimeNet 2596 2023-07-06 19:09
newPGen "Data Execution Prevention" on Windows Server R2 2012 MisterBitcoin Software 4 2017-02-21 15:50
AMD Announces Industry's First "Supercomputing" Server Graphics Card ET_ GPU Computing 23 2013-11-18 17:49
Server has been "busy" and/or "unavailable Grant Information & Answers 0 2008-01-13 22:45
"Archive" server - community input requested delta_t PrimeNet 8 2007-03-09 20:24

All times are UTC. The time now is 16:36.


Fri Jul 7 16:36:23 UTC 2023 up 323 days, 14:04, 1 user, load averages: 3.00, 2.51, 2.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔