![]() |
v19 pre-release discussion
Update on v19 code development: At long last I have the basic Mersenne-mod PRP Gerbicz-checking working in Mlucas - here from my gdb session stepping through the latest code, this fiddled things so as to update the Gerbicz checkproduct (mod-product of PRP-test residues at fixed user-set iteration intervals) every 1000 squarings, and perform the final up-squaring step and comparison of the 2 resulting mod-products at the end of that. The two displayed Res64 values compare just the bottom limb of the 2 residue-length quantities being compared ... the mi64_cmp_eq() call compares the full vectors:
[code] Gerbicz check: B[] Res64: 7681E95DC8B1C8A6 Breakpoint 10, ernstMain (mod_type=1, test_type=2, exponent=173431, fft_length=8, radix_set=2, maxFFT=8, iterations=10000, sh0=0x7fff5fbff910, sh1=0x7fff5fbff908, sh2=0x7fff5fbff900, scrnFlag=0, runtime=0x7fff5fbff480) at ../Mlucas.c:1871 1871 for(i = 1; i < PRP_BASE; i++) { (gdb) n 1872 if(mi64_cmpult(c_uint64_ptr,d_uint64_ptr,j)) break; (gdb) 1876 ASSERT(HERE, mi64_cmpult(c_uint64_ptr,d_uint64_ptr,j), "Gerbicz checkproduct reduction (mod 2^p-1) failed!"); (gdb) 1877 fprintf(stderr,"Gerbicz check: D[] Res64: %016llX\n",c_uint64_ptr[0]); (gdb) Gerbicz check: D[] Res64: 7681E95DC8B1C8A6 1878 if(mi64_cmp_eq((uint64*)arrtmp,c_uint64_ptr,j)) (gdb) 1879 fprintf(stderr,"Gerbicz check passed!\n"); (gdb) Gerbicz check passed! [/code] The defaults I intend to implement for the code release are as above to update the Gerbicz checkproduct every 1000 squarings (and save it along with the main PRP-test residue to savefile every 10000 or 100000 squarings), and perform the final up-squaring step and comparison of the 2 resulting mod-products every 10^6 squarings. That gives a near-optimal extra-work expense equivalent to 1000 2-input modmuls (for the every-1000-iter updates) plus 1000 modsquares (for the final up-squaring step), roughly equivalent to 3000 modsquares total, every 10^6 PRP iterations, or 0.3% overhead to implement the added check. Should said check detect a residue error during the PRP test there will of course be extra work involved in the rollback to the last 1M-iter savefile. But the % of runs which suffer such an error-detected and rollback should be about the same as the current level of bad-LL-test residues, so assuming the latter suffer fewer than 1 G-check rollback every million iterations, it's a win. Still remaining to be done prior to code release: 1. New 2-operand modmul code in the real/complex-FFT-wrapper-and-dyadic-square step occurring between the forward and inverse FFTs needs to be SIMD-ized, which means sse2, avx, avx-512 and ARM asimd assembly-code macros need to be written and debugged; 2. Gerbicz-checkproduct write/read and integrity-checking needs to be added to the savefile write/read code; 3. The rollback-on-error handling mechanism needs to be coded up; 4. Said savefile-code needs to play nice with the v18-added premature-checkpoint-on-signal-and-exit handling ... when such a signal is cuaght we need to update the PRP residue in the savefile but not the Gerbicz checkproduct; 5. The Gerbicz-check computations need to be fiddled to play nice in the cintext of circularly-shifted PRP residues - all my code-dev so far has been for the easier shift = 0 case. |
(.... crickets, for days....)
6. Update documentation, including file format Any guess when v19 release may occur? With or without primenet integration? |
[QUOTE=kriesel;524786](.... crickets, for days....)
6. Update documentation, including file format Any guess when v19 release may occur? With or without primenet integration?[/QUOTE] I should've provided some kind of estimated timeline in my above - it's gonna be at least a month before a beta release, and that's only absent the kinds of time-costing hurdles that nearly always crop up along the way. The primenet.py script that's been shipping since v17 makes tight primenet integration unnecessary, IMO. The one added feature that would be nice to have in said script would be the ability to do regular assignment progress updates ... I actually have the code needed for that in place and debugged, but the current server set-up is such that it needed Aaron (madpoo) to do some manual intervention at the server end (in effect simulating the result of a v5 API "update computer info" transaction) for me to be able to test and use that new feature. |
A month should coincide with the end of my first N2 wavefront LL test. I look forward to using the PRP beta version.
|
[QUOTE=paulunderwood;524817]A month should coincide with the end of my first N2 wavefront LL test. I look forward to using the PRP beta version.[/QUOTE]
So do I, if it allows for PRP on Mersenne composites :smile: |
[QUOTE=ET_;524847]So do I, if it allows for PRP on Mersenne composites :smile:[/QUOTE]
PRP-C (composite cofactor PRP testing) will have to wait until v20 - based on past experience and this being a 1-man coding show, I try to limit myself to one major new feature per release. Sorry! |
[QUOTE=ewmayer;524866]PRP-C (composite cofactor PRP testing) will have to wait until v20 - based on past experience and this being a 1-man coding show, I try to limit myself to one major new feature per release. Sorry![/QUOTE]
I will wait. Though I guess Raspy users would have chosen PRP-C over plain vanillina PRP. Shorter running times, you know... :smile: |
[QUOTE=ewmayer;524811]I should've provided some kind of estimated timeline in my above - it's gonna be at least a month before a beta release, and that's only absent the kinds of time-costing hurdles that nearly always crop up along the way.[/QUOTE]How goes it? Presumably this is still taking priority over the SP LL experiment [URL]https://www.mersenneforum.org/showthread.php?t=23926&page=4[/URL]
|
:bump: My first LLR test on the N2 is nearly done. Is PRP-3 mlucas imminent?
|
[QUOTE=paulunderwood;527254]:bump: My first LLR test on the N2 is nearly done. Is PRP-3 mlucas imminent?[/QUOTE]
Alas, no - I have all the new core-math-code infrastructure (needed to support generic 2-imput FFT-modmul ... my code was 100% geared toward 1-input FFT-autosquare up til now) in place including 6 custom versions of the key optimized code macros (scalar-double, ARMv8 SIMD, sse2,avx,avx2/fma,avx-512) but wrestling with all the control logic needed for the new execution path is taking longer than I had hoped. So please bear with me and just queue up more LL-test work as your current jobs finish. |
I am doing final shakedown tests of the v19 beta release on the hardware available to me. Since the fellow who physically hosted the GIMPS KNL workstation has gone AWOL, I could use remote access to a Skylake-X system running Linux in order to test the new PRP+Gerbicz code under avx-512.
|
[QUOTE=ewmayer;530782]I am doing final shakedown tests of the v19 beta release on the hardware available to me. Since the fellow who physically hosted the GIMPS KNL workstation has gone AWOL, I could use remote access to a Skylake-X system running Linux in order to test the new PRP+Gerbicz code under avx-512.[/QUOTE]
Sorry, can't help you with that, since I don't own any avx512-capable hardware yet, unless I occasionally get access to one via Colab. But. [URL]https://www.mersenneforum.org/mayer/README.html#news[/URL] says [URL="https://www.mersenneforum.org/mayer/README.html#news"]Recent News: v19 released[/URL] but contains no contents or links relevant to V19. Don't tease us like that! |
[QUOTE=ewmayer;530782]I am doing final shakedown tests of the v19 beta release on the hardware available to me. Since the fellow who physically hosted the GIMPS KNL workstation has gone AWOL, I could use remote access to a Skylake-X system running Linux in order to test the new PRP+Gerbicz code under avx-512.[/QUOTE]
Could Amazon's cloud cpus be an option? |
[QUOTE=kriesel;531080]unless I occasionally get access to one via Colab.[/QUOTE]
I get one within less than 5 tries 90% of the time. Of course, I'm just using CPU-only notebook. |
[QUOTE=kriesel;531080]Sorry, can't help you with that, since I don't own any avx512-capable hardware yet, unless I occasionally get access to one via Colab.
But. [URL]https://www.mersenneforum.org/mayer/README.html#news[/URL] says [URL="https://www.mersenneforum.org/mayer/README.html#news"]Recent News: v19 released[/URL] but contains no contents or links relevant to V19. Don't tease us like that![/QUOTE] Ha - one of my early v19-oriented edits in my local version of the README.html sneaked into an edit I uploaded with material on the Android-phone battery-blowup problems I had. The actual section pointed to by that 'v19' link still says v18, so will leave as is, since it's not like I have millions of confused users burning up the interwebs due to the typo. [QUOTE=henryzz;531144]Could Amazon's cloud cpus be an option?[/QUOTE] Fellow forumite Laurent Desnogues has access to an avx-512 machine via work and has kindly been helping out. Alas, no remote access, so the debug is proceeding slowly. But I still have at least a week of further shakedown testing on the ARM, sse2 and avx2 builds of the current v19 code I need to do, as well as lots of edits needed to the above-mentioned readme page, so no breakneck speed needed on the avx-512 front, since the debug issue Laurent's build turned up has been localized to a small section of new code, where I figured any such SIMD-version-related bugs would occur. |
[QUOTE=ewmayer;531179]Ha - one of my early v19-oriented edits in my local version of the README.html sneaked into an edit I uploaded with material on the Android-phone battery-blowup problems I had. The actual section pointed to by that 'v19' link still says v18, so will leave as is, since it's not like I have millions of confused users burning up the interwebs due to the typo.
Fellow forumite Laurent Desnogues has access to an avx-512 machine via work and has kindly been helping out. Alas, no remote access, so the debug is proceeding slowly. But I still have at least a week of further shakedown testing on the ARM, sse2 and avx2 builds of the current v19 code I need to do, as well as lots of edits needed to the above-mentioned readme page, so no breakneck speed needed on the avx-512 front, since the debug issue Laurent's build turned up has been localized to a small section of new code, where I figured any such SIMD-version-related bugs would occur.[/QUOTE] Please, keep us informed about what the ARM version can (and can't) do... :smile: |
Luigi, the ARM version will have the same PRP support as the rest, I currently have one PRP-DC running on my Intel Haswell box and another on one of my Samsung S7 Android phones as part of my pre-release testing. The ARM run will need ~3 months, but I will not hold up the release waiting for it to finish, rather I will consider a successful Haswell PRP-DC run sufficient as a correctness test, since I have done enough short-length tests on my various ARM and Intel hardware to assure myself that the various builds are producing matching results. I know you are keen to see PRP-C support ... I am hopeful that with the basic PRP stuff in place adding PRP-C functionality will be less of an effort, but that remains to be seen.
In the meantime, just finished analyzing and working around a subtle bit of regexp weirdness was causing first-time-PRP-assignment-fetch to fail in my v19 primenet.py script, at the same time that PRP-DC-fetch was working fine. Laurent sent me some avx-512 debug data yesterday which will hopefully allow me to pin down the bug in the new 2-input-FFT-mul code there, and my check of the PRP-DC on my Haswell this morning showed that said run got hit by a data-corruption glitch - my Haswell is prone to these, roughly once per week under full all-cores load - and that caused the retry-on-error logic in the code to go into an infinite loop, so another bug to track down. Back to the salt mines! |
| All times are UTC. The time now is 06:25. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.