Thread: mlucas on sun View Single Post
 2004-01-02, 18:03 #4 ewmayer ∂2ω=0     Sep 2002 República de California 22·5·11·53 Posts You've got 2 choices here: first is to build the latest version (anon-ftp to hogranch.com, cd pub/mayer/src/C, mget *) yourself (assuming you have access to the SunPro C compiler), and use the automated self-test feature (type Mlucas -h to see the options here) to help you find the best set of FFT radices for the runlengths of interest, which will go into the mlucas.cfg file, whose format and purpose is described here. Your second choice is to try a gzipped version of the sparc binary I use (built for me by Bill Rea - our Sparcs at work only have gcc) here: ftp://hogranch.com/pub/mayer/bin/SPA...as2.8_sparc.gz That version of the code has pretty much the same performance as the latest code, but lacks the automated self-test feature. To use it to build your mlucas.cfg file, go to the above .../src/C ftp archive and get only the Mlucas.c file. Scroll to the bottom portion of the source file, where you'll see a table of exponents and 64-bit hex residues, with entries that look like Code: /* Array of distinct test cases for self-tests. Add one extra slot to vector for user-specified self-test exponents: */ struct testCase testVec[numTest+1] = { /* FFT #radices p 100-iter Res64 #bits per digit FFT radices AvgMaxErr */ /* Small: x86 alfa */ { 128, 3, 2550001,"CB6030D5790E2460"},/* testVec[ 0] 19.455 16,16,16,16 .1034 .1334 */ { 144, 2, 2920013,"7CC1B41482BCB7C0"},/* testVec[ 1] 19.803 9,16,16,32 .1508 .2113 */ { 160, 6, 3265007,"B912804D7FE4A9E5"},/* testVec[ 2] 19.928 10,16,16,32 .2020 .2656 */ { 176, 3, 3550007,"5059094E256FB886"},/* testVec[ 3] 19.698 11,16,16,32 .1686 .2403 */ { 192, 6, 3900067,"4744CB8E5287DA60"},/* testVec[ 4] 19.837 12,16,16,32 .1885 .2523 */ { 224, 6, 4540007,"1DA37E1FAC27BC68"},/* testVec[ 5] 19.793 14,16,16,32 .2097 .2929 */ { 256, 7, 5190001,"15216788A374E144"},/* testVec[ 6] 19.798 16,16,16,32 .2563 .3086 */ { 288, 2, 5780087,"ADB1333A531F6EED"},/* testVec[ 7] 19.599 9,16,32,32 .1774 .2384 */ { 320, 3, 6400013,"6B2DF2F4FD779CBC"},/* testVec[ 8] 19.531 10,16,32,32 .1846 .2392 */ { 352, 2, 7010011,"4FC7B9144100998F"},/* testVec[ 9] 19.448 11,16,32,32 .1756 .2585 */ { 384, 3, 7600013,"2AFA7C90899B583E"},/* testVec[10] 19.328 12,16,32,32 .1383 .1872 */ { 416, 2, 8330009,"74AB1D925A0E7DB7"},/* testVec[11] 19.555 13,16,32,32 .2488 .3152 */ { 448, 3, 8950001,"7D9DD642E10F2525"},/* testVec[12] 19.509 14,16,32,32 .2041 .2906 */ { 480, 2, 9490001,"01A4E738255C522B"},/* testVec[13] 19.307 15,16,32,32 .1642 .2186 */ { 512, 3, 10110007,"24AAC84A6CD400BE"},/* testVec[14] 19.283 16,16,32,32 .1884 .2260 */ /* Medium: */ { 576, 2, 11350013,"7087EA4B45F416A6"},/* testVec[15] 19.243 9,32,32,32 .1657 .2181 */ { 640, 2, 12590009,"93E43FC168EAF6BF"},/* testVec[16] 19.211 10,32,32,32 .1885 .2382 */ { 704, 1, 13799939,"7A8B6F72D5F3A862"},/* testVec[17] 19.143 11,32,32,32 .1747 .2542 */ { 768, 2, 15099979,"D731A6D76D99F3F5"},/* testVec[18] 19.201 12,32,32,32 .1692 .2304 */ { 832, 1, 16299979,"39AB362A15AF832C"},/* testVec[19] 19.132 13,32,32,32 .2154 .2632 */ { 896, 2, 17599997,"EDF99B1D21DE8835"},/* testVec[20] 19.182 14,32,32,32 .2041 .2773 */ { 960, 1, 18899999,"AF0F81144A3372A4"},/* testVec[21] 19.226 15,32,32,32 .2186 .2915 */ {1024, 6, 20099983,"119B2956917D0CC1"},/* testVec[22] 19.169 16,32,32,32 .2457 .2934 */ {1152, 5, 22500011,"3D81D5C9CC3D1C65"},/* testVec[23] 19.073 9,16,16,16,16 .1845 .2582 */ {1280, 2, 25000009,"B4A3AF6909228279"},/* testVec[24] 19.073 10,16,16,16,16 .2534 .3083 */ ... Find the table rows containing FFT lengths around the current GIMPS wavefront (as of the start of 2004, you'll want 1152K and 1280K). Then do 100-iteration timing tests of the corresponding exponents, using a variety of FFT radix sets. For instance for 1152 K, a single 100-iteration self-test with radix set 0 results from pasting the following (sans my <=== comments) into your command window: time Mlucas 22500011 <=== exponent for LL test 1152 <=== FFT length (in K) for LL test 1 <=== 0 for a full LL test, 1 for a shorter timing test 100 <=== if previous line was a 1, how many iterations for the timing test 0 <=== This is the radix set index 1 <=== 0 for error checking off, 1 for EC on. Start with radix set 0 and increase by one each run until you start getting "radix set XYZ not available - using defaults" warnings. All radix sets should give Res64 = 3D81D5C9CC3D1C65, as per the Mlucas.c table entry. Of the radix sets you tried, pick the one that yielded the smallest runtime and add the corresponding entry to your mlucas.cfg file, e.g. if RS 3 gave the best time @1152K, your mlucas.cfg file would look like # # mlucas.cfg optimized for UlraSparc blah blah... # 200000 # 1152 3 The format of the .cfg file is important - you must begin with precisely 3 #-prefixed lines, where you may enter comments to the right of the # as desired. The fourth line tells the program how many initial iterations to do with per-iteration error checking turned on - in the above example if it gets through the first 200000 iterations on a given exponent with no roundoff errors greater than roughly 0.4, it turns of EC for the rest of the run. You can see if EC slows the code down appreciably by rerunning the self-tests, but entering a 0 instead of a 1 on the last line of input. If EC-on is no more than 1 or 2 % slower than EC-off, I recommend putting a large signed 32-bit integer (say 1000000000) on line 4 of the .cfg file, to force EC to be always on. Once you've set up your mlucas.cfg file, create a worktodo.ini file in the same dir as your executable and your .cfg filer, enter an exponent in it, and invoke the program sans any flags, e.g. with "nice Mlucas &".