mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-04-09, 12:44   #111
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×52×19 Posts
Default

Quote:
Originally Posted by kriesel View Post
P-1 should be done to bounds maximizing total run time savings for TWO primality tests. (~2.04 for LL; 2 for PRP) Both LL and PRP are being subjected to verification by DC.

I plan to do P-1 at ~344M and am trying to figure out how best to use an R7 with this calculator (https://www.mersenne.ca/prob.php ). Taking M344587487 as a representative example, if stage 2 taking ~1.3x as long as stage 1 is a reasonable rule of thumb it'll take ~13.5 hours to do P-1 at B1=3465000 B2=86625000 (about 1/28 the time it would take to do a PRP test) with ~3.6% chance of finding a factor. These are the parameters the calculator gives when TF has been set to 82 bits and "save 2 LL tests". The equivalent of ~1 PRP-worth of time to rule out an exponent without having to do any PRP: https://www.mersenne.ca/prob.php?exp...tton=Calculate

Doing the same calculation for "save 1 LL" uses lower B1 and B2 for a ~2.725% chance of finding a factor at 76.5GHzD. Assuming ~42% the GHzD of the first test means it takes ~42% of the time, that's ~66.6 P-1 tests at this level in the same time as one PRP test. That makes the numbers even better, taking the equivalent of ~0.55 PRP tests-worth of time to rule out an exponent. https://www.mersenne.ca/prob.php?exp...tton=Calculate

Going one step further, a target of 2% factor rate with TF done to 82 bits yields B1=801,591 B2=12,825,456 GHzD=33.3. As before, ~153 P-1 tests at this level to find a factor every in ~0.327 PRP tests-worth of time: https://www.mersenne.ca/prob.php?exp...tton=Calculate

A target of 1% finds a factor in ~0.147 PRP-tests worth of time, etc:
Code:
[Factor Chance] [Test count per PRP] [Expected equivalent PRP time per factor found]
 3.6%              28                 0.99
 2.725%            66.6               0.55
 2%               153                 0.327
 1%               678                 0.147
 0.75%           1126                 0.118
If the calculator and the above figures are accurate it seems to make sense to do a pass over all the exponents sufficiently TF'd with some small P-1 before moving on to larger P-1 and PRP. Is that ignorant?
M344587487 is offline   Reply With Quote
Old 2019-04-09, 13:03   #112
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

Quote:
Originally Posted by M344587487 View Post
[...] it seems to make sense to do a pass over all the exponents sufficiently TF'd with some small P-1 before moving on to larger P-1 and PRP. Is that ignorant?
It depends: if you do a first round of "small P-1", then the second round (from scratch) of "big P-1" may not be worth it anymore. Because the probability of finding factors in the "big P-1" becomes much smaller following the negative "small P-1".

Note that exponents in 3xxM are TFed to lower than 82, e.g. I see many TF'ed to 72. For such P-1 has much better probabilities. E.g. B1=1M,B2=20M gives 6.07%.
preda is offline   Reply With Quote
Old 2019-04-09, 14:26   #113
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11110100100002 Posts
Default

Quote:
Originally Posted by preda View Post
Yes I agree in general about the need to check and find errors, but requiring a full 2x double-check seems overkill for PRP.

if given the option of using the next 1y for either finding the next MP, or closing the DC gap, the choice is not so obvious anymore.
PRP DC addresses both certain types of software bugs and certain types of human factors. Anything less is not credible as mathematical proof of absence of an Mp. PRP DC is necessary for ascertaining whether the nth known Mp is the nth Mp in the series. Even with DC there can remain some question.

In prioritizing first-test and DC, there is more than just an either/or choice.
Currently (past two calendar years) the first-test wavefront is advancing at 6M/year while the DC wavefront is advancing at only 4M/year, so the gap is increasing. Since the DC and first test wavefronts are currently ~47M and ~84M, and primality test effort is ~p2.1, a first-test is ~3.4 times the effort of a DC currently. The combination is equivalent to about 7.2M first test/year if DC was suspended entirely. We could drop the rate of progress to 3.6M/year first-test and accelerate to 12.2M/year DC and close the DC gap considerably in a single year while still making considerable first-test progress, 60% of recent rate. (We could thereby conceivably resolve the status of M578885161 in 2020 as definitely M48.) There's no guarantee of GIMPS finding a new Mp in any given calendar year.

I think we could have some fun with how any such DC push was implemented. Maybe pick a month in spring each year, and issue DC only for automatic primenet assignments for that month, no first-tests, sort of a prime spring clean up. One month a year would leave ~92% of current first-test progress rate = 5.5M/year (and boost DC rate to ~5.7M/year); two months, ~83%=5M/year (and boost DC rate to ~9.4M/year); spring and fall cleanup pushes. Over a period of about 5 years of 2 cleanup months per year the DC gap could be about halved. Catchup becomes slower as the gap gets reduced.

PRP does not represent much of the DC backlog, since it was introduced when the first-test wavefront was ~73M, and adoption was and is gradual. Only 13 of the 153 remaining first tests in progress between 83M and 84M are PRP, 8.5%. https://www.mersenne.org/assignments...xtf=1&exdchk=1

If the catchup months were automatic primenet assignments only, that would leave manual assignments unaffected, and therefore gpu primality testing by CUDALucas and gpuowl could continue to be first-test or DC as chosen by the owner / operator. Since gpu primality testing is a small fraction of the project's throughput, it would not affect the above numbers much.

If increased emphasis on DC motivates people to switch from LL to PRP for first-test, that's not a problem, it's a good thing. Assuming DC is retained for PRP, the occasional TC and QC of LL are saved due to PRP's lower error rate, in proportion to how much throughput is switched, and net throughput increases slightly.
kriesel is online now   Reply With Quote
Old 2019-04-10, 13:01   #114
SELROC
 

41916 Posts
Default

Quote:
Originally Posted by SELROC View Post
but amdcovc with ROCm doesn't give utilization percentile. Retrying with radeontop...

Radeontop values:


- Graphics pipe ~100%
- Texture Addresser ~97%
- Shader Interpolator ~100%
- all other values 0%
  Reply With Quote
Old 2019-04-14, 08:20   #115
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×52×19 Posts
Default

Quote:
Originally Posted by preda View Post
It depends: if you do a first round of "small P-1", then the second round (from scratch) of "big P-1" may not be worth it anymore. Because the probability of finding factors in the "big P-1" becomes much smaller following the negative "small P-1".

Note that exponents in 3xxM are TFed to lower than 82, e.g. I see many TF'ed to 72. For such P-1 has much better probabilities. E.g. B1=1M,B2=20M gives 6.07%.
I have some data to test the theory. Using B1=18676, B2=12x a test takes ~4m30s at --setsclk 4 mclk=1200, with ~1% chance of finding a factor on a 344M exponent TF'd to 72 bits. That's ~320 tests per day with an expected outcome of ~3.2 factors per day. This range was chosen to find the most factors per day. After ~3 days it's doing slightly better than expected at 11/934, ~3.77 factors per day. Maximising factors found comes at the expense of the value of the NF tests, but as a quick test that seems par for the course. The factors found are 1x73bit, 3x74bit, 1x77bit, 1x78bit, 2x79bit, 1x83bit, 1x89bit, 1x90bit. In a vacuum it's probably better to do this than TF at some bit level between 70 and 80 as the card is weak at TF, whether it's a good use of the hardware in a distributed context is less clear.
M344587487 is offline   Reply With Quote
Old 2019-04-29, 07:11   #116
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

95010 Posts
Default

£626 including delivery from scan with 3 year warranty. Tempting but I have nothing to plug it into: https://www.scan.co.uk/products/sapp...0mhz-gpu-1750m
M344587487 is offline   Reply With Quote
Old 2019-05-03, 08:41   #117
SELROC
 

22·5·149 Posts
Default

Quote:
Originally Posted by preda View Post
2x would be amazing. In practice I would be very happy if I see a 50% speedup.

About memory, it is my impression that the latency did not improve much, but the bandwidth doubled. But to take advantage of this, better occupancy would be required (double the number of memory operations in flight), and this is not easily achievable because of other limiting resources: LDS memory and nb. of registers (VGPRs) that remain unchanged I guess.

About compute, the parts that aren't DP (e.g. pointer arithmetic, other integer e.g. carry, logic) remain unchanged, and this will reduce the observed speedup.

IMO another limiting factor for GCN performance is still the compiler, after so many years: the compiler does a rather poor job at generating highly efficient code (not an easy task I agree).

OTOH the better cooling will help, and allow the card to be higher clocked without thermal throttling (which is a problem on Vega64 blower cooler)
Quote:
Originally Posted by SELROC View Post



The Radeon VII should be a Vega20 board. Fixes are coming...


https://www.phoronix.com/scan.php?pa...eSync-Hits-5.2




Exactly this set of patches: https://lists.freedesktop.org/archiv...ay/033664.html

Last fiddled with by SELROC on 2019-05-03 at 08:43
  Reply With Quote
Old 2019-05-11, 09:48   #118
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·52·19 Posts
Default

£610 including delivery: https://www.overclockers.co.uk/power...gx-196-pc.html
M344587487 is offline   Reply With Quote
Old 2019-06-21, 03:03   #119
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Linux -- what a disaster.

I was happily crunching with gpuowl. Lost power. On reboot, gpuowl no longer runs.
Code:
Exception 9gpu_error:  clGetPlatformIDs
rocminfo sees the device, clinfo does not.

Any ideas? My only idea is a complete reinstall of ubuntu 19.
Prime95 is offline   Reply With Quote
Old 2019-06-21, 03:14   #120
SELROC
 

23·3·383 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Linux -- what a disaster.

I was happily crunching with gpuowl. Lost power. On reboot, gpuowl no longer runs.
Code:
Exception 9gpu_error:  clGetPlatformIDs
rocminfo sees the device, clinfo does not.

Any ideas? My only idea is a complete reinstall of ubuntu 19.

try running gpuowl as root.
  Reply With Quote
Old 2019-06-21, 04:56   #121
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Quote:
Originally Posted by SELROC View Post
try running gpuowl as root.
That worked, thanks!

What changes do I need to make to be able to run gpuowl as a normal user again?
Prime95 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Vega 20 announced with 7.64 TFlops of FP64 M344587487 GPU Computing 4 2018-11-08 16:56
GTX 1180 Mars Volta consumer card specs leaked tServo GPU Computing 20 2018-06-24 08:04
RX Vega performance xx005fs GPU Computing 5 2018-01-17 00:22
Radeon Pro Duo 0PolarBearsHere GPU Computing 0 2016-03-15 01:32
AMD Radeon R9 295X2 firejuggler GPU Computing 33 2014-09-03 21:42

All times are UTC. The time now is 14:49.


Fri Jul 7 14:49:15 UTC 2023 up 323 days, 12:17, 0 users, load averages: 0.89, 1.25, 1.16

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔