![]() |
Yeah! So I have stepped through that fetal error and retry safely. I really need to learn gdb so that I can also debug my code like what you did. Pointers and segfaults are the most confusing things to newbie C programmers. [URL="http://c2.com/cgi/wiki?ThreeStarProgrammer"]Three Star Programmer[/URL] is a good old joke. Regarding the duplicated lines, I think I should not be piping stdout to the stat file. Actually, I run mlucas by this command in tty1[CODE]$ nice ./Mlucas &> /dev/null[/CODE]My .bashrc is as followed: if it is tty1, run mlucas with the above command. The ttys are set up to be auto-logined. Will piping to /dev/null affect anything?
|
Building Mlucas for i386 on amd64
1 Attachment(s)
[QUOTE=ewmayer]
Again, I would be interested in seeing the specific error message(s) you get - this sounds like it might be the ran-out-of-registers issue I mentioned, since unoptimized builds need more registers for the C code surrounding the inline-asm.[/QUOTE] (The message becomes too long so I have to post it here.) Yes. I think you are right, optimization makes better use of registers thus eliminating the run-out-register error. About the i386 build, I have managed to build after applying a patch to the source code (by trial-and-error) and build with [CODE]$ gcc -m32 -Di386_build -o mlucas *.c -lm[/CODE]I try to build without -DUSE_SSE2 and -DUSE_THREADS because it is the case where mlucas almost get built (90/95 source files get built vs 5x/95 in the -DUSE_SSE2, -DUSE_THREADS or both attempts) I find problem in radix44_ditN_cy_dif1.c and radix176_ditN_cy_dif1.c. Compiler complains about undeclared a0, a1 ... a9, b0, b1 ... b9 in the expansion of radix44_main_carry_loop.h and radix176_main_carry_loop.h respectively. So what I do is to copy the declreation inside the #ifdef USE_SSE2 ... #endif in front of the #include radix44_main_carry_loop.h, and it seems working magically. The problems and the solutions of radix44_ditN_cy_dif1.c and radix176_ditN_cy_dif1.c are identical. Besides, there is also problem in get_fft_radices.c. When I try to compile, there are two case 7.[CODE] case 7 : numrad = 6; rvec[0] = 16; rvec[1] = 8; rvec[2] = 8; rvec[3] = 8; rvec[4] = 8; rvec[5] = 16; break; #ifndef USE_ONLY_LARGE_LEAD_RADICES case 7 : numrad = 5; rvec[0] = 8; rvec[1] = 16; rvec[2] = 16; rvec[3] = 32; rvec[4] = 16; break; [/CODE]So when USE_ONLY_LARGE_LEAD_RADICES is defined, there will be 2 case 7, causing compiler error. I try to add something like #ifdef USE_ONLY_LARGE_LEAD_RADICES. It does works, but mlucas will segfault after it finishes self-testing via [CODE]$ ./mlucas -s m[/CODE] So instead I try to add -DUSE_SSE2 when compiling this particular source file and it works just fine. But then I see -DUSE_SSE2 does nothing but to #define USE_ONLY_LARGE_LEAD_RADICES. So I add #if ... || defined(i386_build) #define USE_ONLY_LARGE_LEAD_RADICES #endif instead to simplify things. Finally, when I do the self-testing mentioned above, mlucas exit after a fetal error when using 144 as one of its radices and the error message comes from [CODE] default : sprintf(cbuf,"FATAL: radix %d not available for ditN_cy_dif1. Halting...\n",radix_vec0); fprintf(stderr,"%s", cbuf); ASSERT(HERE, 0,cbuf); [/CODE] in mers_mod_square.c The solution I attempted is to add a case 144 by copying case 288 below. This prevents mlucas from exiting after the fetal error occurs. Instead, a fetal error caused by insane ROE occurs, mlucas halt testing and try another radix set instead of 144. These are what I find out last night. The patch is included in the attachment. I define the new feature test macro i386_build just to prevent the new changes break existing code. Maybe I will post the self-testing result after I am done with it. |
1 Attachment(s)
Alex, I reran the first ~30M iterations of your test on my Haswell quad and the rest on my new Broadwell NUC, final result matches yours. Max roundoff error I encountered using an AVX2-mode (that is an FMA-using) build was 0.375 (3 such during the run). Zipped .stat file attached.
|
[QUOTE=ewmayer;403315]Alex, I reran the first ~30M iterations of your test on my Haswell quad and the rest on my new Broadwell NUC, final result matches yours. Max roundoff error I encountered using an AVX2-mode (that is an FMA-using) build was 0.375 (3 such during the run). Zipped .stat file attached.[/QUOTE]
I'm doing a full double-check of M67773569 on Prime95 so we should see how that does in ~58 hours. |
[QUOTE=Madpoo;403735]I'm doing a full double-check of M67773569 on Prime95 so we should see how that does in ~58 hours.[/QUOTE]
Done, it matched. [URL="http://www.mersenne.org/M67773569"]M67773569[/URL] |
Thanks for the verify - I had no doubts after my DC, but because my code isn't officially allowed to do both the 1st and 2nd runs (no power-of-2 residue shift is used), you just made it official.
Does your logfile indicate what the max ROE for your run was? |
[QUOTE=ewmayer;403940]Thanks for the verify - I had no doubts after my DC, but because my code isn't officially allowed to do both the 1st and 2nd runs (no power-of-2 residue shift is used), you just made it official.
Does your logfile indicate what the max ROE for your run was?[/QUOTE] I'm just running Prime95 so only if it went over 0.4 at any point. I don't have that saved but it doesn't happen often and I tend to remember when it does. Well, not a specific exponent, but if I've seen one at all in the past few days. And I don't think I saw any the day I checked that one in. I *think* it used the 3584K FFT size. Again, I kind of remember paying attention to that since it came up, and I made a mental note of that. |
| All times are UTC. The time now is 05:59. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.