mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Radeon VII (2nd gen consumer Vega GPU) (https://www.mersenneforum.org/showthread.php?t=23982)

M344587487 2019-04-09 12:44

[QUOTE=kriesel;513151]P-1 should be done to bounds maximizing total run time savings for TWO primality tests. (~2.04 for LL; 2 for PRP) Both LL and PRP are being subjected to verification by DC.[/QUOTE]


I plan to do P-1 at ~344M and am trying to figure out how best to use an R7 with this calculator ([URL]https://www.mersenne.ca/prob.php[/URL] ). Taking M344587487 as a representative example, if stage 2 taking ~1.3x as long as stage 1 is a reasonable rule of thumb it'll take ~13.5 hours to do P-1 at B1=3465000 B2=86625000 (about 1/28 the time it would take to do a PRP test) with ~3.6% chance of finding a factor. These are the parameters the calculator gives when TF has been set to 82 bits and "save 2 LL tests". The equivalent of ~1 PRP-worth of time to rule out an exponent without having to do any PRP: [URL]https://www.mersenne.ca/prob.php?exponent=344587487&b1=&b2=&guess_saved_tests=2&factorbits=82&K=1&C=-1&submitbutton=Calculate[/URL]

Doing the same calculation for "save 1 LL" uses lower B1 and B2 for a ~2.725% chance of finding a factor at 76.5GHzD. Assuming ~42% the GHzD of the first test means it takes ~42% of the time, that's ~66.6 P-1 tests at this level in the same time as one PRP test. That makes the numbers even better, taking the equivalent of ~0.55 PRP tests-worth of time to rule out an exponent. [URL]https://www.mersenne.ca/prob.php?exponent=344587487&b1=&b2=&guess_saved_tests=1&factorbits=82&K=1&C=-1&submitbutton=Calculate[/URL]

Going one step further, a target of 2% factor rate with TF done to 82 bits yields B1=801,591 B2=12,825,456 GHzD=33.3. As before, ~153 P-1 tests at this level to find a factor every in ~0.327 PRP tests-worth of time: [URL]https://www.mersenne.ca/prob.php?exponent=344587487&prob=2&work=&factorbits=82&force_b1E=&force_b1=&submitbutton=Calculate[/URL]

A target of 1% finds a factor in ~0.147 PRP-tests worth of time, etc:
[code]
[Factor Chance] [Test count per PRP] [Expected equivalent PRP time per factor found]
3.6% 28 0.99
2.725% 66.6 0.55
2% 153 0.327
1% 678 0.147
0.75% 1126 0.118
[/code]If the calculator and the above figures are accurate it seems to make sense to do a pass over all the exponents sufficiently TF'd with some small P-1 before moving on to larger P-1 and PRP. Is that ignorant?

preda 2019-04-09 13:03

[QUOTE=M344587487;513239][...] it seems to make sense to do a pass over all the exponents sufficiently TF'd with some small P-1 before moving on to larger P-1 and PRP. Is that ignorant?[/QUOTE]

It depends: if you do a first round of "small P-1", then the second round (from scratch) of "big P-1" may not be worth it anymore. Because the probability of finding factors in the "big P-1" becomes much smaller following the negative "small P-1".

Note that exponents in 3xxM are TFed to lower than 82, e.g. I see many TF'ed to 72. For such P-1 has much better probabilities. E.g. B1=1M,B2=20M gives 6.07%.

kriesel 2019-04-09 14:26

[QUOTE=preda;513238]Yes I agree in general about the need to check and find errors, but requiring a full 2x double-check seems overkill for PRP.

if given the option of using the next 1y for either finding the next MP, or closing the DC gap, the choice is not so obvious anymore.[/QUOTE]
PRP DC addresses both certain types of software bugs and certain types of human factors. Anything less is not credible as mathematical proof of absence of an Mp. PRP DC is necessary for ascertaining whether the nth known Mp is the nth Mp in the series. Even with DC there can remain some question.

In prioritizing first-test and DC, there is more than just an either/or choice.
Currently (past two calendar years) the first-test wavefront is advancing at 6M/year while the DC wavefront is advancing at only 4M/year, so the gap is increasing. Since the DC and first test wavefronts are currently ~47M and ~84M, and primality test effort is ~p[SUP]2.1[/SUP], a first-test is ~3.4 times the effort of a DC currently. The combination is equivalent to about 7.2M first test/year if DC was suspended entirely. We could drop the rate of progress to 3.6M/year first-test and accelerate to 12.2M/year DC and close the DC gap considerably in a single year while still making considerable first-test progress, 60% of recent rate. (We could thereby conceivably resolve the status of M578885161 in 2020 as definitely M48.) There's no guarantee of GIMPS finding a new Mp in any given calendar year.

I think we could have some fun with how any such DC push was implemented. Maybe pick a month in spring each year, and issue DC only for automatic primenet assignments for that month, no first-tests, sort of a prime spring clean up. One month a year would leave ~92% of current first-test progress rate = 5.5M/year (and boost DC rate to ~5.7M/year); two months, ~83%=5M/year (and boost DC rate to ~9.4M/year); spring and fall cleanup pushes. Over a period of about 5 years of 2 cleanup months per year the DC gap could be about halved. Catchup becomes slower as the gap gets reduced.

PRP does not represent much of the DC backlog, since it was introduced when the first-test wavefront was ~73M, and adoption was and is gradual. Only 13 of the 153 remaining first tests in progress between 83M and 84M are PRP, 8.5%. [URL]https://www.mersenne.org/assignments/?exp_lo=83032877&exp_hi=84000000&execm=1&exp1=1&extf=1&exdchk=1[/URL]

If the catchup months were automatic primenet assignments only, that would leave manual assignments unaffected, and therefore gpu primality testing by CUDALucas and gpuowl could continue to be first-test or DC as chosen by the owner / operator. Since gpu primality testing is a small fraction of the project's throughput, it would not affect the above numbers much.

If increased emphasis on DC motivates people to switch from LL to PRP for first-test, that's not a problem, it's a good thing. Assuming DC is retained for PRP, the occasional TC and QC of LL are saved due to PRP's lower error rate, in proportion to how much throughput is switched, and net throughput increases slightly.

SELROC 2019-04-10 13:01

[QUOTE=SELROC;513215]but amdcovc with ROCm doesn't give utilization percentile. Retrying with radeontop...[/QUOTE]


Radeontop values:


- Graphics pipe ~100%
- Texture Addresser ~97%
- Shader Interpolator ~100%
- all other values 0%

M344587487 2019-04-14 08:20

[QUOTE=preda;513240]It depends: if you do a first round of "small P-1", then the second round (from scratch) of "big P-1" may not be worth it anymore. Because the probability of finding factors in the "big P-1" becomes much smaller following the negative "small P-1".

Note that exponents in 3xxM are TFed to lower than 82, e.g. I see many TF'ed to 72. For such P-1 has much better probabilities. E.g. B1=1M,B2=20M gives 6.07%.[/QUOTE]
I have some data to test the theory. Using B1=18676, B2=12x a test takes ~4m30s at --setsclk 4 mclk=1200, with ~1% chance of finding a factor on a 344M exponent TF'd to 72 bits. That's ~320 tests per day with an expected outcome of ~3.2 factors per day. This range was chosen to find the most factors per day. After ~3 days it's doing slightly better than expected at 11/934, ~3.77 factors per day. Maximising factors found comes at the expense of the value of the NF tests, but as a quick test that seems par for the course. The factors found are 1x73bit, 3x74bit, 1x77bit, 1x78bit, 2x79bit, 1x83bit, 1x89bit, 1x90bit. In a vacuum it's probably better to do this than TF at some bit level between 70 and 80 as the card is weak at TF, whether it's a good use of the hardware in a distributed context is less clear.

M344587487 2019-04-29 07:11

£626 including delivery from scan with 3 year warranty. Tempting but I have nothing to plug it into: [url]https://www.scan.co.uk/products/sapphire-radeon-vii-16gb-hbm2-vr-ready-graphics-card-7nm-2nd-gen-vega-3840-streams-1400mhz-gpu-1750m[/url]

SELROC 2019-05-03 08:41

[QUOTE=preda;506206]2x would be amazing. In practice I would be very happy if I see a 50% speedup.

About memory, it is my impression that the latency did not improve much, but the bandwidth doubled. But to take advantage of this, better occupancy would be required (double the number of memory operations in flight), and this is not easily achievable because of other limiting resources: LDS memory and nb. of registers (VGPRs) that remain unchanged I guess.

About compute, the parts that aren't DP (e.g. pointer arithmetic, other integer e.g. carry, logic) remain unchanged, and this will reduce the observed speedup.

IMO another limiting factor for GCN performance is still the compiler, after so many years: the compiler does a rather poor job at generating highly efficient code (not an easy task I agree).

OTOH the better cooling will help, and allow the card to be higher clocked without thermal throttling (which is a problem on Vega64 blower cooler)[/QUOTE]

[QUOTE=SELROC;506338][URL]https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.20-Increase-AMD-GPU-TDP[/URL][/QUOTE]




The Radeon VII should be a Vega20 board. Fixes are coming...


[URL]https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-FreeSync-Hits-5.2[/URL]




Exactly this set of patches: [url]https://lists.freedesktop.org/archives/amd-gfx/2019-May/033664.html[/url]

M344587487 2019-05-11 09:48

£610 including delivery: [url]https://www.overclockers.co.uk/powercolor-radeon-rx-vega-vii-16gb-hbm2-pci-express-graphics-card-gx-196-pc.html[/url]

Prime95 2019-06-21 03:03

Linux -- what a disaster.

I was happily crunching with gpuowl. Lost power. On reboot, gpuowl no longer runs.
[CODE]Exception 9gpu_error: clGetPlatformIDs[/CODE]

rocminfo sees the device, clinfo does not.

Any ideas? My only idea is a complete reinstall of ubuntu 19.

SELROC 2019-06-21 03:14

[QUOTE=Prime95;519687]Linux -- what a disaster.

I was happily crunching with gpuowl. Lost power. On reboot, gpuowl no longer runs.
[CODE]Exception 9gpu_error: clGetPlatformIDs[/CODE]rocminfo sees the device, clinfo does not.

Any ideas? My only idea is a complete reinstall of ubuntu 19.[/QUOTE]


try running gpuowl as root.

Prime95 2019-06-21 04:56

[QUOTE=SELROC;519689]try running gpuowl as root.[/QUOTE]

That worked, thanks!

What changes do I need to make to be able to run gpuowl as a normal user again?


All times are UTC. The time now is 13:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.