![]() |
|
|
#265 | |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6,793 Posts |
Quote:
Look ma, I computed a full iteration of a 1000T test in 0.000001 seconds. ![]() Doesn't tell us anything. We can't extrapolate an expected runtime from that. |
|
|
|
|
|
|
#266 |
|
∂2ω=0
Sep 2002
República de California
101101111011002 Posts |
You said "A proper complete iteration of the full 100M value", which is just what I gave - my formula works for any shift count you like. So where's the cheat?
(Yes, I know what you *mean*, I just want you to *say* what you mean. :) |
|
|
|
|
|
#267 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
679310 Posts |
|
|
|
|
|
|
#268 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Something I've thought about from time to time and it comes up as questions by others, is beginning at iteration small x>0 with a precomputed interim term. For LL, seed of 4, iteration 1 14, iteration 2 194, etc, and it almost doubles in number of bits per iteration, so iteration 3, 37634, 16 bits, still fits in a single word and is far shorter than the mod n for any reasonable sized exponent. So begin with 37634 and the next iteration is called 4. There's an equivalent in PRP3. Saving a few iterations is usually dismissed as not worth the bother, at current exponents.
Yes, as a fraction of the work it's tiny, only ~3/100M saved, 30ppb. There are around 400,000 exponents to primality test within 10% of 100M, times two each; 800,000. It adds up to about 0.024 primality tests saved over that span. Step and repeat over additional exponent spans, and the tiny savings add up overall, theoretically, ignoring erasure of some by people quitting, and the savings being distributed over the many participants. Seems like a tiny but fast and straightforward tweak to implement. But it does not result in somewhere an additional primality test being completed, because the savings are divided up among too many systems and workers. (Maybe it results in additional factoring, but even that seems unlikely.) Results are reached a tiny bit sooner. Over a year's time, 30ppb is ~1 second. I just spent centuries worth of the savings estimating and describing it. (If one was doing 64 bit unsigned int math, it could go to iteration 5. In a variable length representation, the initial iterations are only a single word long so the net effect is much much less.) This is also why the first 5 iterations' res64 values are independent of exponent>64. In hex: seed 4 1 E 2 C2 3 9302 4 546B 4C02 5 1BD6 96D9 F03D 3002 So, it seems, in the existing software, one could consider adding this test. After the first 5 iterations, for any sizable exponent and any shift, generate the res64, and test for a match. If not a match, there was excessive round-off or some other error, perhaps an initialization error or memory. I wonder if any existing software does that, or if the code wizards have determined it would detect an error so rarely it is a waste of time. It could perhaps save some novice user from a long wasted run with too small an fft length, but only if roundoff or other error may have a detectable effect this early. Probably if there was something to be gained George and Ernst would be doing it already. I'd find interesting an answer as to why it's not worthwhile. It might be useful in the gpu arena, such as CUDALucas, where some configurations with some gpu models produce wrong residues from the start. Last fiddled with by kriesel on 2019-02-07 at 14:53 |
|
|
|
|
|
#269 |
|
"Composite as Heck"
Oct 2017
11101101102 Posts |
Single worker quick (sclk=1800, mclk=1200, fan=jet_engine, ~240W rocm-smi power figure):
Code:
amdcube@amdcube:~/Documents/git/gpuowl3$ ./openowl -device 0 2019-03-25 18:05:40 gpuowl 6.2-3a95f98-mod 2019-03-25 18:05:40 -device 0 2019-03-25 18:05:40 332220523 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 17.60 bits/word 2019-03-25 18:05:40 using short carry kernels 2019-03-25 18:05:42 OpenCL compilation in 1931 ms, with "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-03-25 18:05:44 332220523.owl not found, starting from the beginning. 2019-03-25 18:05:50 332220523 OK 800 0.00%; 3.39 ms/sq; ETA 13d 00:30; b950798999630b08 (check 1.74s) 2019-03-25 18:06:21 332220523 10000 0.00%; 3.39 ms/sq; ETA 13d 00:58; 503cd91d7b8e30e5 2019-03-25 18:06:55 332220523 20000 0.01%; 3.39 ms/sq; ETA 13d 00:43; f2d3ffbb3586c527 2019-03-25 18:07:29 332220523 30000 0.01%; 3.39 ms/sq; ETA 13d 00:40; e7846100baf7ce53 2019-03-25 18:08:03 332220523 40000 0.01%; 3.39 ms/sq; ETA 13d 00:42; e305c82567149969 2019-03-25 18:08:37 332220523 50000 0.02%; 3.39 ms/sq; ETA 13d 00:38; 72885d5ee0a11128 Code:
amdcube@amdcube:~/Documents/git/gpuowl3$ ./openowl -device 0 2019-03-25 17:59:57 gpuowl 6.2-3a95f98-mod 2019-03-25 17:59:57 -device 0 2019-03-25 17:59:57 332220523 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 17.60 bits/word 2019-03-25 17:59:57 using short carry kernels 2019-03-25 17:59:59 OpenCL compilation in 1933 ms, with "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-03-25 18:00:01 332220523.owl not found, starting from the beginning. 2019-03-25 18:00:07 332220523 OK 800 0.00%; 3.66 ms/sq; ETA 14d 01:52; b950798999630b08 (check 1.85s) 2019-03-25 18:00:41 332220523 10000 0.00%; 3.67 ms/sq; ETA 14d 02:17; 503cd91d7b8e30e5 2019-03-25 18:01:18 332220523 20000 0.01%; 3.66 ms/sq; ETA 14d 02:03; f2d3ffbb3586c527 2019-03-25 18:01:54 332220523 30000 0.01%; 3.66 ms/sq; ETA 14d 02:10; e7846100baf7ce53 2019-03-25 18:02:31 332220523 40000 0.01%; 3.66 ms/sq; ETA 14d 01:56; e305c82567149969 2019-03-25 18:03:07 332220523 50000 0.02%; 3.66 ms/sq; ETA 14d 02:05; 72885d5ee0a11128 Code:
amdcube@amdcube:~/Documents/git/gpuowl3$ ./openowl -device 0 2019-03-25 18:11:13 gpuowl 6.2-3a95f98-mod 2019-03-25 18:11:13 -device 0 2019-03-25 18:11:13 332220523 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 17.60 bits/word 2019-03-25 18:11:13 using short carry kernels 2019-03-25 18:11:15 OpenCL compilation in 2010 ms, with "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-03-25 18:11:16 332220523.owl not found, starting from the beginning. 2019-03-25 18:11:27 332220523 OK 800 0.00%; 7.06 ms/sq; ETA 27d 03:24; b950798999630b08 (check 2.85s) 2019-03-25 18:12:32 332220523 10000 0.00%; 7.07 ms/sq; ETA 27d 04:32; 503cd91d7b8e30e5 2019-03-25 18:13:43 332220523 20000 0.01%; 7.07 ms/sq; ETA 27d 04:29; f2d3ffbb3586c527 2019-03-25 18:14:53 332220523 30000 0.01%; 7.08 ms/sq; ETA 27d 05:14; e7846100baf7ce53 2019-03-25 18:16:04 332220523 40000 0.01%; 7.07 ms/sq; ETA 27d 04:44; e305c82567149969 2019-03-25 18:17:15 332220523 50000 0.02%; 7.06 ms/sq; ETA 27d 03:50; 72885d5ee0a11128 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── amdcube@amdcube:~/Documents/git/gpuowl4$ ./openowl -device 0 2019-03-25 18:11:14 gpuowl 6.2-3a95f98-mod 2019-03-25 18:11:14 -device 0 2019-03-25 18:11:14 332220523 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 17.60 bits/word 2019-03-25 18:11:14 using short carry kernels 2019-03-25 18:11:16 OpenCL compilation in 2018 ms, with "-DEXP=332220523u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-03-25 18:11:18 332220523.owl not found, starting from the beginning. 2019-03-25 18:11:30 332220523 OK 800 0.00%; 6.62 ms/sq; ETA 25d 11:09; b950798999630b08 (check 3.18s) 2019-03-25 18:12:35 332220523 10000 0.00%; 7.08 ms/sq; ETA 27d 05:06; 503cd91d7b8e30e5 2019-03-25 18:13:46 332220523 20000 0.01%; 7.08 ms/sq; ETA 27d 04:55; f2d3ffbb3586c527 2019-03-25 18:14:56 332220523 30000 0.01%; 7.08 ms/sq; ETA 27d 05:16; e7846100baf7ce53 2019-03-25 18:16:07 332220523 40000 0.01%; 7.07 ms/sq; ETA 27d 04:45; e305c82567149969 2019-03-25 18:17:18 332220523 50000 0.02%; 7.06 ms/sq; ETA 27d 03:53; 72885d5ee0a11128 |
|
|
|
|
|
#270 | |
|
Aug 2010
Republic of Belarus
2628 Posts |
Quote:
Top card with a reasonable price.
|
|
|
|
|
|
|
#271 | |
|
Jun 2019
Boston, MA
478 Posts |
Quote:
I assume the benchmark numbers posted on mersenne.ca/cudalucas.php for the radeon vii are from you, and it clearly blows all of the competition out of the water, especially for its price (JVR2 is almost 4x the next best GPU). I think you've made a massive achievement, and I'd like to learn more!
|
|
|
|
|
|
|
#272 | |
|
"Composite as Heck"
Oct 2017
16668 Posts |
Quote:
I and preda at least contributed to the mersenne.ca result which is at stock settings. This thread contains me fumbling through testing the card to make it more efficient: https://www.mersenneforum.org/showth...461#post511461 The specs of my machine are not really relevant, technically every million iterations there is a GEC check that is done on the CPU but it takes a very small amount of the total test time to complete and is not part of the benchmarks. Testing was done on a Ryzen 1700 and am in the process of tearing my hair out trying to get an intel celeron dual core ex-mining motherboard setup running. |
|
|
|
|
|
|
#273 |
|
Apr 2019
5×41 Posts |
The craziest thing about it is that the its basically the same core as their server grade Instinct MI50/MI60, which are capable of 2x that double precision performance, its only nerfed on the VII so they aren't undercutting themselves(although you could easily buy 2 or more VII's for the cost of those).
Its been mentioned that AMD changed this in BIOS, so there is speculation that some future hack could enable ALL the compute! Although I've also read that their bios images are digitally signed, so it may never happen depending on how strong that check is. I wonder what sort of signing is done or how quickly a VII could theoretically crack its own protection
|
|
|
|
|
|
#274 | |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Quote:
For other compute loads, of course. |
|
|
|
|
|
|
#275 | ||
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Quote:
Quote:
|
||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Perpetual benchmark thread... | Xyzzy | Hardware | 897 | 2023-06-15 13:46 |
| Sieve Benchmark Thread | Historian | Twin Prime Search | 105 | 2013-02-05 01:35 |
| LLR benchmark thread | Oddball | Riesel Prime Search | 5 | 2010-08-02 00:11 |
| sr5sieve Benchmark thread | axn | Sierpinski/Riesel Base 5 | 25 | 2010-05-28 23:57 |
| Old Hardware Thread | E_tron | Hardware | 0 | 2004-06-18 03:32 |