![]() |
[QUOTE=kriesel;551107]Results.txt contains zero bytes.[/QUOTE]
-verify in gpuowl ATM is more of a debugging tool, allowing to check the correctness of a proof, but that's not how the proof verification is implemented by primenet. That's why it does not produce a result (in results.txt). In brief this is how the primenet proof verification will work: 1. the server will do a bit of pre-processing of the proof file (run all the verification steps *excluding* the bulk of the iterations that took most of the time at the end in your verification), and produce a pair of residues that must match A^(2^n)==B. 2. the server will apply "random", a form of let's say encrypting both A and B such: A'=A^random B'=B^random and A'^(2^n)==B' still holds. 3. the new work-type "CERT" consists in the client downloading A' and n, and the result consists of hash(A'^(2^n)) that is sent back to the server. So: - CERT is not yet implemented in GpuOwl (but shouldn't be hard) - the proof file isn't needed - OTOH a download of A' (one residue) from primenet is needed for CERT. A question is whether the CERT worktype needs to be implemented at all in GpuOwl. As this work-type is rather tiny and limited in supply (i.e. limited by the number of PRPs completed), probably a tiny amount of participants running mprime can exaust all the CERTs satisfactorilly. Concerning your proof: keep it around a bit more, next upload it to primenet as soon as the uploader becomes available. After the upload, it will be turned into a CERT that will be run most likely by somebody else (but you could run it too, the fact that you're the original author and hold onto the full proof does not weaken the verification (because of the "random" trick above)). |
[QUOTE=preda;549109]Yes, the GCD is done on the CPU using GNU-MP. It's a convenient solution from the coding POV. The GCD is infrequent, and one GCD takes on the order of 1min on one core of the CPU, no big deal.
Porting the fancy GCD algo to GPU would be a lot of work. Worth it if somebody was doing mainly GCDs, but that's not the case for gpuowl ATM.[/QUOTE] I am running small P-1's for [B]James Heinrich[/B]. The use of the CPU is considerable in what I would call "Stage 1." I can tell by the temperature. I have a widget which sits in the upper-right corner of the screen. "Stage 2" is the same, but with more GPU involvement. The CPU stays around 70°C. I do not see this as a problem. Overall, I have been quite satisfied with it. |
[QUOTE=storm5510;551138]I am running small P-1's for [B]James Heinrich[/B]. The use of the CPU is considerable in what I would call "Stage 1." I can tell by the temperature. I have a widget which sits in the upper-right corner of the screen. "Stage 2" is the same, but with more GPU involvement. The CPU stays around 70°C. I do not see this as a problem.[/QUOTE]
I would think major CPU involvement would be similar for stages 1 and 2, occurring only at end of each stage, when the GCD is run. |
[QUOTE=ewmayer;551234]I would think major CPU involvement would be similar for stages 1 and 2, occurring only at end of each stage, when the GCD is run.[/QUOTE]Unless it's the cpu-core-saturated issue that has variously shown up in Windows and Linux. [URL]https://www.mersenneforum.org/showpost.php?p=539077&postcount=11[/URL]
[URL]https://www.mersenneforum.org/showpost.php?p=537171&postcount=1829[/URL] [URL]https://www.mersenneforum.org/showpost.php?p=534675&postcount=1730[/URL] [URL]https://www.mersenneforum.org/showpost.php?p=532699&postcount=1587[/URL] etc. Or maybe unusual options, such as -log 10000. |
I'm trying to optimize the speed of the newest version. From Readme.md:
[QUOTE]-use NEW_FFT8,OLD_FFT5,NEW_FFT10[/QUOTE] What is FFT5, FFT8, FFT10? Is it using all 3, so I have to optimize OLD, NEW or NEWEST for each of them? or is it using 1 of the 9 possible combination? Btw the FFT10 is not mentioned in the "gpuowl.cl" along with FFT5 and FFT8. |
Could someone elaborate on exactly what CERT is?
I must not have [I]gpuOwl[/I] configured properly. For each exponent tested, a checkpoint folder is created using the exponent as the folder name. When any particular exponent test is finished, the folder is left behind. I do not see a reason. About "-use." I changed the default and have it set to use NEW_FFT8 only! An afterthought is did this help or is it a hindrance? Perhaps I need to restore the default and see if there is any difference. |
Neither Mihai or I have a nVidia card. We tend to leave around some code that may or may not be useful in the case of nVidia.
The FFT5 and FFT10 would only be used in FFT lengths divisible by 5. The wavefront just passed into 5.5M and 6M territory. The FFT8 one won't be used for quite a while (pass1 or pass2 is 512). Gpuowl prefers to use 256 and 1024 for its passes (less registers used). So, not much to be gained messing with any of the above. The best hope I think is playing with memory layouts. These are the IN and OUT settings, not all combinations work. From the code (numbers are Radeon VII timings): // OUT_WG=256, OUT_SIZEX=4, OUT_SPACING=1 (old WorkingOut4) : 154 + 252 = 406 (but may be best on nVidia) // OUT_WG=256, OUT_SIZEX=8, OUT_SPACING=1 (old WorkingOut3): 124 + 260 = 384 // OUT_WG=256, OUT_SIZEX=32, OUT_SPACING=1 (old WorkingOut5): 105 + 281 = 386 // OUT_WG=256, OUT_SIZEX=8, OUT_SPACING=2: 122 + 249 = 371 // OUT_WG=256, OUT_SIZEX=32, OUT_SPACING=4: 108 + 257 = 365 <- best // IN_WG=256, IN_SIZEX=4, IN_SPACING=1 (old WorkingIn4) : 177 + 164 (but may be best on nVidia) // IN_WG=256, IN_SIZEX=8, IN_SPACING=1 (old WorkingIn3): 129 + 166 = 295 // IN_WG=256, IN_SIZEX=32, IN_SPACING=1 (old WorkingIn5): 107 + 171 = 278 <- best // IN_WG=256, IN_SIZEX=8, IN_SPACING=2: 139 + 166 = 305 // IN_WG=256, IN_SIZEX=32, IN_SPACING=4: 121 + 161 = 282 Use the -time command line argument. |
[QUOTE=storm5510;552553]Could someone elaborate on exactly what CERT is?
I must not have [I]gpuOwl[/I] configured properly. For each exponent tested, a checkpoint folder is created using the exponent as the folder name. When any particular exponent test is finished, the folder is left behind. I do not see a reason. About "-use." I changed the default and have it set to use NEW_FFT8 only! An afterthought is did this help or is it a hindrance? Perhaps I need to restore the default and see if there is any difference.[/QUOTE]Some of this will depend on what software version you are running. CERT is a PrimeNet work type, to verify a PRP proof file. It's not available as a manual assignment, so not applicable to gpuowl. From Gpuowl help, put -proof in your command line or config.txt, to generate proof files in future PRP test runs. It must be there from the start of an exponent's PRP test, for a Gpuowl version that supports it. From Gpuowl's help output,[CODE]-proof [<power>] : enable PRP proof generation. Default <power> is 8. Use 8 - 10.[/CODE]Then after the primality test is finished, the proof file is generated, and the proof file must be uploaded to the server, either through gpuowl's primenet.py or through the uploader program George provided, before a CERT verification can be run on it to validate the primality test. PrimeNet and prime95 automate that upload, and also download of a file to verify, as the input to the CERT assignment, and upload of the verification result, as the output of the CERT assignment. For example, [URL]https://www.mersenne.org/report_exponent/?exp_lo=97829899&full=1[/URL] was PRP tested by Mihai and then the PrimeNet-connected prime95 on my laptop called falcon got assigned and completed the CERT assignment. The prime95 cert does more than gpuowl's verify, [CODE]-verify <file>|<exponent> : verify PRP-proof contained in <file> or in the folder <exponent>/[/CODE]since prime95 sends verification results back to the server which confirms it's valid. |
[QUOTE=Prime95]The FFT5 and FFT10 would only be used in FFT lengths divisible by 5. The wavefront just passed into 5.5M and 6M territory.[/QUOTE]
I ran a wavefront P-1 to test the affects of inserting OLD_FFT5 ahead of NEW_FFT8 in my "-use" line. The result was considerable. In previous tests with NEW_FFT8, the runtime for a similar test was 150 minutes, give or take. The inclusion of OLD_FFT5 reduced this to 105 minutes. I should have left well-enough alone. [QUOTE=kriesel]CERT is a PrimeNet work type, to verify a PRP proof file. For example, [URL="https://www.mersenne.org/report_exponent/?exp_lo=97829899&full=1"]https://www.mersenne.org/report_expo...7829899&full=1[/URL] was PRP tested by Mihai and then the PrimeNet-connected prime95 on my laptop called falcon got assigned and completed the CERT assignment. The prime95 cert does more than gpuowl's verify,[/QUOTE] Thank you for the reply. I understand the process. A person runs the PRP test, another runs the DC test, and still another will be assigned the CERT verification test. I believe I may see where this is headed, the elimination of LL and LL-DC. Even though LL and PRP are not my cup-of-tea because of the time required, I still like to keep a grasp on the processes involved. :smile: |
[QUOTE=storm5510;552575]I understand the process. A person runs the PRP test, another runs the DC test, and still another will be assigned the CERT verification test. [/QUOTE]
Not quite. There is no DC test. You run the PRP test, upload the proof file, someone runs the CERT test. Done. |
On Google Colab Pro [B]Tesla P100-PCIE-16GB-0[/B] the best I could get with the old [B]v6.11-238-g62a3025[/B] was 809 us/iteration on an 91.6M exponent (5M FFT):
[B]-use ORIG_X2,ORIG_SLOWTRIG,UNROLL_ALL,NO_T2_SHUFFLE,CARRY32,OUT_WG=64,OUT_SIZEX=8,OUT_SPACING=4,IN_WG=64,IN_SIZEX=8,IN_SPACING=2[/B] In the new version [B]v6.11-366-gf887d6e[/B] the best I can get after a lot of testing on the same exponent is 840-841 us/iteration, which is close enough considering we now save a DC. The settings are almost the same except the settings that no longer exist: [B]-use CARRY32,OUT_WG=64,OUT_SIZEX=8,OUT_SPACING=4,IN_WG=64,IN_SIZEX=8,IN_SPACING=4[/B] Any of the 4 combination of these settings give 840-841 us/iteration: OUT_WG=64,OUT_SIZEX=8,OUT_SPACING=4 OUT_WG=16,OUT_SIZEX=8,OUT_SPACING=8 IN_WG=64,IN_SIZEX=8,IN_SPACING=4 IN_WG=128,IN_SIZEX=16,IN_SPACING=1 |
| All times are UTC. The time now is 22:54. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.