![]() |
[QUOTE=preda;466649]- all LL error protections have been removed: loop detection, Jacobi-check, rounding-error, etc. (as the new check is stronger).
[/QUOTE] Hopefully you've put in a "zero check". Without it, the whole thing becomes worthless. EDIT:- Also, rounding error still needs to be detected, as it could mean you need to go to higher FFT |
[QUOTE=axn;466653]Hopefully you've put in a "zero check". Without it, the whole thing becomes worthless.
EDIT:- Also, rounding error still needs to be detected, as it could mean you need to go to higher FFT[/QUOTE] Yes, zero check is in (as required by the new check). Rounding errors should be caught by the normal check though, why should there be dedicated rounding detection? It's true that rounding suggests strongly "highter FFT", but that same idea can arise from a simple bits/word computation as well. |
1 Attachment(s)
Windows binaries from latest commit(55d094a)... not tested, sorry!
|
[QUOTE=kracker;467125]Windows binaries from latest commit(55d094a)... not tested, sorry![/QUOTE]
Thanks Kracker! A summary of the changes in 1.1: - savefile name change, for exponent NNNN they're now called: NNNN.ll NNNN-prev.ll NNNN-temp.ll NNNN.<iteration>.ll i.e. all the savefile for some exponent start with that exponent. To see details about a savefile just print the last line of the file, like this: tail -n1 file.ll (the last line is human-readable text, all before that is binary). - savefile format change. See the savefile signature that is now changed to "LL4". It is possible to bring over an LL3 savefile to LL4 format, if it is edited with care by appending a "0" on the last line, and updating the signature to LL4. (this "0" that was added is the number of error rollbacks). And not news, it does PRP-3. It writes JSON-formatted result to results.txt . Soon it will be possible to submit this result format. |
[QUOTE=airsquirrels;465777]Updated timings from the Windows driver (Which I'm told is still using the previous closed compiler, while Linux has transitioned to LLVM/rocm - apparently regrettably)
RX64 Air: ms/iter: 1.639 This is with stock clocks. If I use the "stable" 1000Mhz memory clocks that pass self test I get 1.600 Both kernels work as expected in Windows, so likely an llvm regression.[/QUOTE] The recent head version now works correctly on VEGA on Linux with amdgpu-pro 17.30, in both modes (-legacy or not). It appears the "workaround" that fixed the behavior was the removal of the "max-error" computation in the amalgamation kernel (which kernel is only used in non-legacy mode). Also other small improvements, bring the speed to 1.54 ms/it on Vega air standard (but with quite some heat generated). |
[QUOTE=preda;467130]Thanks Kracker!
A summary of the changes in 1.1: - savefile name change, for exponent NNNN they're now called: NNNN.ll NNNN-prev.ll NNNN-temp.ll NNNN.<iteration>.ll i.e. all the savefile for some exponent start with that exponent. To see details about a savefile just print the last line of the file, like this: tail -n1 file.ll (the last line is human-readable text, all before that is binary). - savefile format change. See the savefile signature that is now changed to "LL4". It is possible to bring over an LL3 savefile to LL4 format, if it is edited with care by appending a "0" on the last line, and updating the signature to LL4. (this "0" that was added is the number of error rollbacks). And not news, it does PRP-3. It writes JSON-formatted result to results.txt . Soon it will be possible to submit this result format.[/QUOTE] Why file extension .ll if you're doing a PRP-3 computation not Lucas-Lehmer? |
[QUOTE=kriesel;467562]Why file extension .ll if you're doing a PRP-3 computation not Lucas-Lehmer?[/QUOTE]
Legacy? -- it simply remained unchanged from LL times. I can change it if desired. |
[QUOTE=preda;467585]Legacy? -- it simply remained unchanged from LL times. I can change it if desired.[/QUOTE]
It does create confusion. |
[QUOTE=GP2;467591]It does create confusion.[/QUOTE]
OK. I plan to change the file extension to "owl". |
I'm trying to make sure gpuOwl interim residues match prime95 interim residues. As the person that implemented completely non-standard iteration numbers (off by 2) for LL testing, let me check on the standard for PRP tests.
I view a squaring and optional mul-by-3 as one iteration. Thus, when PRPing a Mersenne number, I think the interim residue for iteration 1 is (3^2)*3 = 27. There are N-1 iterations to PRP 2^N-1. |
[QUOTE=Prime95;468213]I'm trying to make sure gpuOwl interim residues match prime95 interim residues. As the person that implemented completely non-standard iteration numbers (off by 2) for LL testing, let me check on the standard for PRP tests.
I view a squaring and optional mul-by-3 as one iteration. Thus, when PRPing a Mersenne number, I think the interim residue for iteration 1 is (3^2)*3 = 27. There are N-1 iterations to PRP 2^N-1.[/QUOTE] A brief description of my implem is here: [url]http://www.mersenneforum.org/showthread.php?p=466655#post466655[/url] In the residue computation, there is no mul-by-3 at any point. The mul-by-3 is only involved in the verification, thus does not affect the residue. My values are: iteration 0: residue is 3. iteration 1: residue is 3^2. iteration 2: residue is 3^4. For M = 2^p - 1, the final residue is 3^(2^(p-1)). This final residue is -3 for a PRP. Note, computing 3^(2^(p-1)) does not require any mul-by-3, only squarings. See also [url]http://www.mersenneforum.org/showpost.php?p=466054&postcount=138[/url] where I propose adding 3 to the final residue to make the final value for a PRP == 0, an idea that I abandoned in the end (adding the 3) [because it would make the final residue non-uniform with the res at other positions]. |
| All times are UTC. The time now is 21:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.