![]() |
|
|
#1 |
|
Oct 2009
2 Posts |
perhaps I'm being really numb here, and forgive me if I am, but the readme page is giving me instructions for submitting results.. but the link/button isn't there to press, so to speak.
Anyone know what page/link I use to submit results obtained running Mlucas? Thanks in advance. |
|
|
|
|
|
#2 |
|
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
17·251 Posts |
http://www.mersenne.org/manual_result/ accepts Mlucas inputs.
(admittedly, it's not at all visible that it accepts things other than Prime95/Mprime lines, but it lists "Mlucas lines found" after you give it results) Remember to log in to the site first so the credit goes to you and not ANONYMOUS. |
|
|
|
|
|
#3 |
|
Oct 2009
2 Posts |
Ah..thanks for that.. .. for whatever reason, as I looked around the site, I just didn't see that one.
Cheers. |
|
|
|
|
|
#4 | |
|
Dec 2014
37 Posts |
I have just finished my 1st LL test with Mlucas but it seems the primenet manual check-in form does not recognize the results.
Does anyone know why? Here is my results.txt: Quote:
Last fiddled with by ewmayer on 2015-05-12 at 05:16 Reason: Xed out last few digits of res64 and res36 |
|
|
|
|
|
|
#5 | |
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Quote:
Also, please PM me your p67773569.stat file - I want to look into that fatal-error retry around midway through, to see if the code handled it properly (this is a recently-added feature). The exponent is close to the default breakover point between 3584K and 3840K so the first dangerous [0.46875] roundoff error is not surprising, but the 0.500 one on the ensuing interval-retry needs looking into. (Your result should still be fine, if the code switched to the larger 3840K FFT length at the point). What kind of hardware (and on how many cores) did you run this on? |
|
|
|
|
|
|
#6 |
|
Dec 2014
37 Posts |
Thanks author.
Yes the quoted text is what I submitted. The .stat file is in the attachment. I cannot pm because it does not support attachment and the file is too large to be in-line. Code:
$ uname -a Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) x86_64 GNU/Linux Code:
$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 945 Processor stepping : 3 microcode : 0x10000c8 cpu MHz : 3012.706 cache size : 512 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save vmmcall bogomips : 6025.41 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 1 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 945 Processor stepping : 3 microcode : 0x10000c8 cpu MHz : 3012.706 cache size : 512 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save vmmcall bogomips : 6025.41 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 2 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 945 Processor stepping : 3 microcode : 0x10000c8 cpu MHz : 3012.706 cache size : 512 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save vmmcall bogomips : 6025.41 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 945 Processor stepping : 3 microcode : 0x10000c8 cpu MHz : 3012.706 cache size : 512 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save vmmcall bogomips : 6025.41 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate |
|
|
|
|
|
#7 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
24×3×71 Posts |
Quote:
http://www.mersenne.org/M67773569 |
|
|
|
|
|
|
#8 | ||
|
∂2ω=0
Sep 2002
República de California
103·113 Posts |
Alex, thanks - I am rerunning your exponent on my Haswell quad, 4-cored it needs ~9 ms/iter - currently around iter 6M, no roundoff warnings yet, but note that the FMA-using build I'm using has slightly lower roundoff errors than your non-FMA one. (I can't tell instantly from your CPU diagnostics whether this model supports AMD's FMA4, but in any event I have no intention of coding for that, i.e. my code supports only the Intel-introduced FMA3.)
Now, re. the history captured in your stat file: o I see lots of restarts - not in itself unusual depending on system usage and stability - but also some one-restart-after-another inervals like this: Quote:
o The fatal-error occurrence about midway through is different than I expected. Instead of being in midst of some ongoing run (e.g. due to some kind of system/power/cosmic-ray glitch), in your case it happens almost immediately upon restart from the checkpoint file: Quote:
I shall investigate - we'll know (based on my DC run) in around 48 hours whether your run got past the above glitch OK. BTW, the code should have deposited persistent savefiles with custom suffixes every 10 Miters, so if my DC indicates the above fatal error somehow hosed your run, you can still rerun from the 20 Miter checkpoint file - but first let's wait on my DC to get to iter 28480000. James, thanks for the server-side update. Note that Mlucas version numbers will henceforth have a major index reflecting the calendar year, i.e. 14.*** means a 2014 release. (14.0 was the first 2014 release; 14.1 was the second, last December). |
||
|
|
|
|
|
#9 |
|
Dec 2014
37 Posts |
Thanks for the primenet submission!
Yes, some of the things you mentioned did happen. I had data corruption back (or not since ext4 has journaling) then because of power failure. I remembered during fsck, serveral lines of 'clearing orphan inodes blablabla' was printed, but nothing bad happen afterwards. My computer was over-clocked during winter from 3000MHz to 3300MHz. I switched it back to 3000MHz when summer arrived because the summer in HK is hot and humid. I don't think it is the cause of the round-off errors since I double-checked some of my mprime results and they appears to be correct. (The only bad result was when I over-clocked my cpu to 3600MHz) One day my computer failed to boot and I found out it was due to ram slot failure. I transfered the ram from ram slot 1, 3 to ram slot 2, 4 and my computer booted again and passed memtest. Last fiddled with by alexvong1995 on 2015-05-13 at 11:24 |
|
|
|
|
|
#10 | |
|
∂2ω=0
Sep 2002
República de California
1163910 Posts |
Alex, our runs match through iter 19M ... still a day from seeing whether your run made it through the fatal-error-and-interval-retry at 28.47M intact, but that does mean the first three 0.4375-level roundoff errors you hit were all benign. (I allow this as the largest acceptable maxerr based on a fair bit of statistics, but still treat it as "threat level yellow" in terms of keeping an eye on such runs.)
My run has encountered no warning-level roundoffs so far, i.e. all ROEs have been < 0.40. Quote:
Haven't yet had time yet to set up a debug simulation of that scenario - been working on a couple of must-fix bugs in my newly parallelized TF code. Found/fixed last of those a few hours ago, so will focus on the fatal-ROE/retry stuff tomorrow. |
|
|
|
|
|
|
#11 |
|
∂2ω=0
Sep 2002
República de California
265678 Posts |
Alex, I inserted some debug code into the mers_mod_square.c function (which has the key roundoff error handling logic) to make it easy to simulate the issue which I thought you had encountered in your run-restart at iter = 28.47M. Here are the highlights of that - first I did 10000 iterations (with no debug-code yet inserted) of the exponent used for the 2304K self-test (unimportant, I just picked one that was visible in my build/test xterm):
M44207087: using FFT length 2304K = 2359296 8-byte floats. this gives an average 18.737405988905167 bits per digit Using complex FFT radices 288 16 16 16 [May 14 11:47:28] M44207087 Iter# = 10000 clocks = 00:01:09.224 [ 0.0069 sec/iter] Res64: C2FA6D57AA3DF3F1. AvgMaxErr = 0.276980321. MaxErr = 0.375000000 I killed the run at this point, and added some debug code which made the control logic 'think' a ROE = 0.46 was hit on [restart iter + 78], then simulated on the ensuing retry of the iteration interval. Then I restarted under gdb (allowing me to step through the relevant code) - everything behaved as designed, here is the stat file output: Restarting M44207087 at iteration = 10000. Res64: C2FA6D57AA3DF3F1 M44207087: using FFT length 2304K = 2359296 8-byte floats. this gives an average 18.737405988905167 bits per digit Using complex FFT radices 288 16 16 16 M44207087 Roundoff warning on iteration 10078, maxerr = 0.460000000000 Retrying iteration interval to see if roundoff error is reproducible. Restarting M44207087 at iteration = 10000. Res64: C2FA6D57AA3DF3F1 M44207087: using FFT length 2304K = 2359296 8-byte floats. this gives an average 18.737405988905167 bits per digit M44207087 Roundoff warning on iteration 10026, maxerr = 0.500000000000 Now the only (minor) bug I found was that the final informational-message-to-user that gets printed after the 2nd roundoff error only got printed to stdout. I fixed that in my local code, here is the message that should have gotten printed at this point (followed by exit): The error is not reproducible, but encountered a different ROE in the retry of the interval ... as this is an indicator of likely data corruption, quitting. Please restart the program at your earliest convenience. Notice one thing about the above diagnostics: The "Using complex FFT radices" line gets printed just once, because the retries in question all use the original FFT length. If the ROE encountered is fatal (> 0.4375) but "normal", i.e. simply a result of the FFT length and the data in question leading to a too-high error level, then you would see the same error (both magnitude and iteration#) on the retry attempt, and then you would see an added diagnostic indicating the program has switched to the next-higher FFT length and restarted from the checkpoint in question. In your data we don't see any next-larger-FFT-length stuff: Restarting M67773569 at iteration = 28470000. Res64: 5FA073737FB6FF79 M67773569: using FFT length 3584K = 3670016 8-byte floats. this gives an average 18.466832024710520 bits per digit Using complex FFT radices 224 16 16 32 M67773569 Roundoff warning on iteration 28470196, maxerr = 0.468750000000 Retrying iteration interval to see if roundoff error is reproducible. Restarting M67773569 at iteration = 28470000. Res64: 5FA073737FB6FF79 M67773569: using FFT length 3584K = 3670016 8-byte floats. this gives an average 18.466832024710520 bits per digit Restarting M67773569 at iteration = 28470000. Res64: 5FA073737FB6FF79 M67773569: using FFT length 3584K = 3670016 8-byte floats. this gives an average 18.466832024710520 bits per digit I now see that I was parsing this wrong - what I thought was duplicate printing of the last 6 lines above (same 3-lines repeated twice) should instead most probably (can't bee 100% sure since I wasn't there) be read as follows: M67773569 Roundoff warning on iteration 28470196, maxerr = 0.468750000000 Retrying iteration interval to see if roundoff error is reproducible. Restarting M67773569 at iteration = 28470000. Res64: 5FA073737FB6FF79 M67773569: using FFT length 3584K = 3670016 8-byte floats. this gives an average 18.466832024710520 bits per digit [*** above run got killed ***] [*** start of a new run: ***] Restarting M67773569 at iteration = 28470000. Res64: 5FA073737FB6FF79 M67773569: using FFT length 3584K = 3670016 8-byte floats. this gives an average 18.466832024710520 bits per digit Using complex FFT radices 224 16 16 32 M67773569 Roundoff warning on iteration 28470008, maxerr = 0.500000000000 Retrying iteration interval to see if roundoff error is reproducible. In other words, you hit almost-instant fatal errors - but not of the reproducible kind - on 2 successive restart attempts. Perhaps you were playing with OCing and had the clock speed jacked up higher than safe here. We then see what appears to be the standard "retry to see if reproducible" messaging, but again with 2x duplication of a 2-line sequence, and I don't have a ready explanation for that 2x-ing - do you pipe stdout to the stat file, perhaps? Restarting M67773569 at iteration = 28470000. Res64: 5FA073737FB6FF79 M67773569: using FFT length 3584K = 3670016 8-byte floats. this gives an average 18.466832024710520 bits per digit M67773569: using FFT length 3584K = 3670016 8-byte floats. this gives an average 18.466832024710520 bits per digit The next few lines don't clear up my puzzlement: Using complex FFT radices 224 16 16 32 M67773569: using FFT length 3584K = 3670016 8-byte floats. this gives an average 18.466832024710520 bits per digit Using complex FFT radices 224 16 16 32 The 2 "Using complex FFT radices" lines make it look like the code got restarted while the above interval-retry was still in progress. Anyhoo, my run just hit iter 28480000, the result matches yours, so we can rest easy about the above funny business possibly corrupting your run data. "The integrity of our customers' run data is our #1 priority." :) |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Automatic submit results + fetch assignments for mfaktc? | DuskFalls | GPU Computing | 5 | 2017-12-02 00:34 |
| How do we prevent miraculously true DC results from manual submit? | leonardyan96 | PrimeNet | 77 | 2017-06-01 16:18 |
| how to submit manually a job with exponent 100M which is done by mfaktc? | fairsky | Information & Answers | 17 | 2013-09-16 19:49 |
| Only submit part of ECM results? | dabaichi | PrimeNet | 5 | 2011-12-07 19:27 |
| Unable to submit / retrieve new work | Unregistered | Information & Answers | 12 | 2011-11-12 20:07 |