mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2021-02-12, 00:00   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3·53·31 Posts
Default Mlucas v19.1 (latest) available

Mlucas v19.1 has gone live. Use this thread to report bugs, build issues, and for any other related discussion.

The only "must have" reason for folks already using v19 is in case you want to also play with builds using the Clang/LLVM compiler on Armv8-SIMD - supporting CPUs, including the new Apple Silicon ones, for which Clang is the native compiler.

Special thanks to forumites ldesnogues for reporting the "v19 won't build using Clang on Armv8" issue and helping to diagnose its root cause. There is also a nifty auto-install-and-tune-for-your-hardware script which author tdulcet has been tweaking based on my feedback and lots of back-and-forth; thanks also to tdulcet and dan2 for the bigly-enhanced version of the primenet.py work-management script which builds on Loïc Le Loarer's Primenet-API-interfacing work of last year. Details, as always at the above README page.

I have tried to make the README, which is quite sprawling due to the need to support the do-it-yourselfers who are legion in Linuxworld, more easily navigable - I hope the added "jump to"-arrowed in-page-links make it somewhat easier to navigate between e.g. the release description and how-to-build sections.

I expect to be spending the coming week or so uploading patches - hopefully more to the README than the v19.1 sources - and doing a big honking file merge of the v19.1 code changes upward into the v20 development branch, which has lain dormant for the past 2 months. All the v19.1 effort would have been needed in v20 anyway, so the time has not been wasted, I'm simply looking forward to getting back to feature-add work in form of p-1 factoring support.

Please subscribe to this thread if you want to be notified of patch uploads.
ewmayer is offline   Reply With Quote
Old 2021-02-12, 20:29   #2
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3·53·31 Posts
Default

README updated with corrected wget for tdulcet's mlucas.sh install script - original version would download and run it immediately, now users can first parse it, comment out the primenet.py - invocation block if they want to "try before you buy", etc.

I'm also looking for an owner of a hybrid BIG/little-CPU system like Odroid N2 to test tdulcet's Mlucas v19.1 install/autotune script on such. With my guidance he's made several changes in an effort to support such systems, which typically need separate run directories for each CPU, each with an mlucas.cfg file containing FFT params properly tuned for the CPU in question.
ewmayer is offline   Reply With Quote
Old 2021-02-15, 20:35   #3
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

3×191 Posts
Default

I've updated the PKGBUILD for Arch Linux to v19.1, which follows the procedure as described in the readme document.

The fp-link patch is no longer needed, however, the sysctl-missing patch is still needed.
Dylan14 is offline   Reply With Quote
Old 2021-02-15, 21:58   #4
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×53×31 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
I've updated the PKGBUILD for Arch Linux to v19.1, which follows the procedure as described in the readme document.

The fp-link patch is no longer needed, however, the sysctl-missing patch is still needed.
Thanks - the latter is the sysctl-deprecated warnings? Handling those is on my v20 to-do list - it's not as simple as blanket-removing the includes from platform.h, because I always try to support older platforms within reason, so that needs proper preprocessor #ifdef wrapping to retain the include on older distros of Linux and MacOS where that header is needed.

You're getting warnings, or your version of GCC is treating those "deprecated"s as errors?
ewmayer is offline   Reply With Quote
Old 2021-02-15, 23:02   #5
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

23D16 Posts
Default

When I try to build on Arch Linux (which is presently on kernel version 5.10.16) the error I would get if I kept the include <sys/sysctl.h> is:

Code:
platform.h:1307:12: fatal error: sys/sysctl.h: No such file or directory
compilation terminated.
This is using gcc 10.2.0.
This would not be needed, if I was using the linux-lts kernel which is on version 5.4 and has the sysctl.h file - so doing my blanket patch is a bit risky - I should only run the patch if the kernel version is at least 5.5.
Dylan14 is offline   Reply With Quote
Old 2021-02-15, 23:40   #6
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101011010012 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
When I try to build on Arch Linux (which is presently on kernel version 5.10.16) the error I would get if I kept the include <sys/sysctl.h> is:

Code:
platform.h:1307:12: fatal error: sys/sysctl.h: No such file or directory
compilation terminated.
This is using gcc 10.2.0.
This would not be needed, if I was using the linux-lts kernel which is on version 5.4 and has the sysctl.h file - so doing my blanket patch is a bit risky - I should only run the patch if the kernel version is at least 5.5.
What is needed is some way of conditionally including the file only on OS/kernel combinations which support it. I dumped all the compiler predefines for one of my Ubuntu v19 systems, 'uname -a ' indicates it's kernel 5.3:

Linux ewmayer-NUC8i3CYS 5.3.0-59-generic #53-Ubuntu SMP Wed Jun 3 15:52:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)

...but I don't see any specific-Linux-version info in the GCC predefines.

Do me a favor - for your Arch Linux distro, cd to mlucas_v19.1/src and run the following command there:

gcc -dM -E align.h < /dev/null > predefs.txt

(The align.h header is just so both the system and Mlucas predefs get dumped). Attach the resulting predefs.txt file to a post.
ewmayer is offline   Reply With Quote
Old 2021-02-15, 23:50   #7
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

3×191 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Do me a favor - for your Arch Linux distro, cd to mlucas_v19.1/src and run the following command there:

gcc -dM -E align.h < /dev/null > predefs.txt

(The align.h header is just so both the system and Mlucas predefs get dumped). Attach the resulting predefs.txt file to a post.
See attached file.
Attached Files
File Type: txt predefs.txt (29.9 KB, 20 views)
Dylan14 is offline   Reply With Quote
Old 2021-02-16, 00:24   #8
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

23D16 Posts
Default

Okay, I have figured out a way to determine when to patch within the PKGBUILD, as seen here in the prepare function:

Code:
prepare() { 
cd "${srcdir}"/"${pkgname}"_v"${pkgver}" 
#Only patch if the kernel version is at least 5.5.0 
kermajver=`uname -r | cut -d. -f1` 
kerminver=`uname -r | cut -d. -f2` 
if [ $kermajver -gt 5 ]; then    
   patch -p1 < "../../sysctl-missing.patch" 
elif [ $kermajver -eq 5 ] && [ $kerminver -ge 5 ]; then    
   patch -p1 < "../../sysctl-missing.patch" 
fi 
}
Basically, if the kernel major version is greater then 5 then run the patch, or if the kernel major version is 5 and the minor version is at least 5 then also run the patch. Otherwise do nothing.

Last fiddled with by Dylan14 on 2021-02-16 at 00:25
Dylan14 is offline   Reply With Quote
Old 2021-02-16, 20:02   #9
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3·53·31 Posts
Default

Thanks for the predefs - looking through those triggered a recollection, that I'd previously made a note-to-self for the v20 release re. this, namely that the sysctl.h deprecation was tied not to a specific Linux kernel version, but rather to the GLIBC version. That's useful because GCC predefines don't have info re. the former, but do have the GLIBC version info.

Per this Github discussion, "sysctl() is deprecated and may break build with glibc >= 2.30", so we wrap the #include like so:
Code:
#if (__GLIBC__ < 2) || (__GLIBC_MINOR__ < 30)
	#warning GLIBC either not defined or version < 2.30 ... including <sys/sysctl.h> header.
	#include <sys/sysctl.h>
#endif
If you make that mod in your platform.h file, does that fix the problem without having to apply your patch?
ewmayer is offline   Reply With Quote
Old 2021-02-16, 20:13   #10
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

57310 Posts
Default

Quote:
Originally Posted by ewmayer View Post
...
If you make that mod in your platform.h file, does that fix the problem without having to apply your patch?
That fixed the issue.
Dylan14 is offline   Reply With Quote
Old 2021-02-17, 20:31   #11
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2·89 Posts
Default

Hello!) I just want to share my experience with Apple M1 CPU.

Compiled smoothly without issues (i have included -DUSE_ARM_V8_SIMD flag according with the README page).

Code:
CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 12.0.0 (clang-1200.0.32.29).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
CPU extensions:

Code:
m1@599160f8-fb7f-41df-adc2-2b7f4da1aac7 src % sysctl hw.optional
hw.optional.floatingpoint: 1
hw.optional.watchpoint: 4
hw.optional.breakpoint: 6
hw.optional.neon: 1
hw.optional.neon_hpfp: 1
hw.optional.neon_fp16: 1
hw.optional.armv8_1_atomics: 1
hw.optional.armv8_crc32: 1
hw.optional.armv8_2_fhm: 1
hw.optional.armv8_2_sha512: 1
hw.optional.armv8_2_sha3: 1
hw.optional.amx_version: 2
hw.optional.ucnormal_mem: 1
hw.optional.arm64: 1
./Mlucas -s m -cpu 0:7
Code:
19.1
      2048  msec/iter =    3.32  ROE[avg,max] = [0.215347133, 0.312500000]  radices =  32 32 32 32  0  0  0  0  0  0
      2304  msec/iter =    4.00  ROE[avg,max] = [0.193772149, 0.250000000]  radices = 144 32 16 16  0  0  0  0  0  0
      2560  msec/iter =    4.28  ROE[avg,max] = [0.178074945, 0.234375000]  radices = 160 32 16 16  0  0  0  0  0  0
      2816  msec/iter =    4.98  ROE[avg,max] = [0.194841334, 0.281250000]  radices = 176 32 16 16  0  0  0  0  0  0
      3072  msec/iter =    5.27  ROE[avg,max] = [0.208759866, 0.312500000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =    5.94  ROE[avg,max] = [0.324307345, 0.406250000]  radices = 208 32 16 16  0  0  0  0  0  0
      3584  msec/iter =    6.01  ROE[avg,max] = [0.198822084, 0.250000000]  radices =  56 32 32 32  0  0  0  0  0  0
      3840  msec/iter =    6.54  ROE[avg,max] = [0.187369624, 0.250000000]  radices =  60 32 32 32  0  0  0  0  0  0
      4096  msec/iter =    6.88  ROE[avg,max] = [0.176231022, 0.218750000]  radices =  64 32 32 32  0  0  0  0  0  0
      4608  msec/iter =    7.91  ROE[avg,max] = [0.206297821, 0.281250000]  radices = 288 32 16 16  0  0  0  0  0  0
      5120  msec/iter =    8.42  ROE[avg,max] = [0.193601628, 0.250000000]  radices = 160 16 32 32  0  0  0  0  0  0
      5632  msec/iter =    9.70  ROE[avg,max] = [0.221504510, 0.281250000]  radices = 352 32 16 16  0  0  0  0  0  0
      6144  msec/iter =   10.67  ROE[avg,max] = [0.183728153, 0.250000000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =   11.75  ROE[avg,max] = [0.176554163, 0.218750000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =   11.84  ROE[avg,max] = [0.213558111, 0.312500000]  radices = 224 16 32 32  0  0  0  0  0  0
      7680  msec/iter =   13.14  ROE[avg,max] = [0.211455481, 0.281250000]  radices = 240 16 32 32  0  0  0  0  0  0
      8192  msec/iter =   13.50  ROE[avg,max] = [0.243920143, 0.312500000]  radices = 256 16 32 32  0  0  0  0  0  0
      9216  msec/iter =   15.44  ROE[avg,max] = [0.256431218, 0.343750000]  radices = 288 16 32 32  0  0  0  0  0  0
     10240  msec/iter =   16.75  ROE[avg,max] = [0.293991624, 0.375000000]  radices = 160 32 32 32  0  0  0  0  0  0
     11264  msec/iter =   18.68  ROE[avg,max] = [0.222417407, 0.281250000]  radices = 352 16 32 32  0  0  0  0  0  0
     12288  msec/iter =   21.39  ROE[avg,max] = [0.219849010, 0.281250000]  radices = 192 32 32 32  0  0  0  0  0  0
     13312  msec/iter =   23.73  ROE[avg,max] = [0.258116543, 0.312500000]  radices = 208 32 32 32  0  0  0  0  0  0
     14336  msec/iter =   23.98  ROE[avg,max] = [0.231325382, 0.281250000]  radices = 224 32 32 32  0  0  0  0  0  0
     15360  msec/iter =   26.76  ROE[avg,max] = [0.235138002, 0.281250000]  radices = 240 32 32 32  0  0  0  0  0  0
     16384  msec/iter =   26.98  ROE[avg,max] = [0.230396011, 0.312500000]  radices = 256 32 32 32  0  0  0  0  0  0
     18432  msec/iter =   31.07  ROE[avg,max] = [0.276530284, 0.375000000]  radices = 288 32 32 32  0  0  0  0  0  0
     20480  msec/iter =   35.91  ROE[avg,max] = [0.229381947, 0.312500000]  radices = 320 32 32 32  0  0  0  0  0  0
     22528  msec/iter =   37.85  ROE[avg,max] = [0.235262715, 0.296875000]  radices = 352 32 32 32  0  0  0  0  0  0
     24576  msec/iter =   42.70  ROE[avg,max] = [0.238062530, 0.375000000]  radices = 768 16 32 32  0  0  0  0  0  0
     26624  msec/iter =   60.50  ROE[avg,max] = [0.254043170, 0.312500000]  radices = 208 16 16 16 16  0  0  0  0  0
./Mlucas -s m -cpu 0:3
Looks like threads with heavy load assigned automatically to faster cores.
Code:
19.1
      2048  msec/iter =    3.88  ROE[avg,max] = [0.215133698, 0.312500000]  radices =  32 32 32 32  0  0  0  0  0  0
      2304  msec/iter =    4.84  ROE[avg,max] = [0.194502305, 0.281250000]  radices = 144 32 16 16  0  0  0  0  0  0
      2560  msec/iter =    5.00  ROE[avg,max] = [0.184244498, 0.250000000]  radices =  40 32 32 32  0  0  0  0  0  0
      2816  msec/iter =    6.03  ROE[avg,max] = [0.193770639, 0.250000000]  radices = 176 32 16 16  0  0  0  0  0  0
      3072  msec/iter =    6.17  ROE[avg,max] = [0.209568299, 0.281250000]  radices =  48 32 32 32  0  0  0  0  0  0
      3328  msec/iter =    7.15  ROE[avg,max] = [0.221850838, 0.281250000]  radices =  52 32 32 32  0  0  0  0  0  0
      3584  msec/iter =    7.12  ROE[avg,max] = [0.199199621, 0.281250000]  radices =  56 32 32 32  0  0  0  0  0  0
      3840  msec/iter =    7.90  ROE[avg,max] = [0.187449630, 0.250000000]  radices =  60 32 32 32  0  0  0  0  0  0
      4096  msec/iter =    8.21  ROE[avg,max] = [0.174905238, 0.218750000]  radices =  64 32 32 32  0  0  0  0  0  0
      4608  msec/iter =    9.57  ROE[avg,max] = [0.205330823, 0.281250000]  radices = 288 32 16 16  0  0  0  0  0  0
      5120  msec/iter =   10.01  ROE[avg,max] = [0.193377434, 0.250000000]  radices = 160 16 32 32  0  0  0  0  0  0
      5632  msec/iter =   11.74  ROE[avg,max] = [0.221915271, 0.281250000]  radices = 352 32 16 16  0  0  0  0  0  0
      6144  msec/iter =   12.89  ROE[avg,max] = [0.183260259, 0.250000000]  radices = 192 16 32 32  0  0  0  0  0  0
      6656  msec/iter =   14.32  ROE[avg,max] = [0.176914974, 0.250000000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =   14.40  ROE[avg,max] = [0.213720200, 0.281250000]  radices = 224 16 32 32  0  0  0  0  0  0
       7680  msec/iter =   16.16  ROE[avg,max] = [0.211763551, 0.281250000]  radices = 240 16 32 32  0  0  0  0  0  0
Perfomance looks awesome for mobile CPU. Just to compare timings with AXV-2 on i3-8100 (4 cores): M1 much faster.

AXV-2 on i3-8100:
Code:
19.1
      2048  msec/iter =    4.75  ROE[avg,max] = [0.167383863, 0.218750000]  radices = 128 16 16 32  0  0  0  0  0  0
      2304  msec/iter =    5.44  ROE[avg,max] = [0.182823637, 0.218750000]  radices = 144 16 16 32  0  0  0  0  0  0
      2560  msec/iter =    6.29  ROE[avg,max] = [0.224905364, 0.281250000]  radices = 160 16 16 32  0  0  0  0  0  0
      2816  msec/iter =    6.63  ROE[avg,max] = [0.183906382, 0.230468750]  radices = 176 16 16 32  0  0  0  0  0  0
      3072  msec/iter =    7.42  ROE[avg,max] = [0.252202803, 0.312500000]  radices = 192 16 16 32  0  0  0  0  0  0
      3328  msec/iter =    7.52  ROE[avg,max] = [0.225825548, 0.281250000]  radices = 208 16 16 32  0  0  0  0  0  0
      3584  msec/iter =    8.12  ROE[avg,max] = [0.260567010, 0.375000000]  radices = 224 16 16 32  0  0  0  0  0  0
      3840  msec/iter =    9.15  ROE[avg,max] = [0.200714048, 0.281250000]  radices = 240 16 16 32  0  0  0  0  0  0
      4096  msec/iter =   10.92  ROE[avg,max] = [0.165220469, 0.218750000]  radices =  64 32 32 32  0  0  0  0  0  0
      4608  msec/iter =   11.15  ROE[avg,max] = [0.192892739, 0.250000000]  radices = 288 16 16 32  0  0  0  0  0  0
      5120  msec/iter =   12.18  ROE[avg,max] = [0.229244523, 0.312500000]  radices = 160 32 32 16  0  0  0  0  0  0
      5632  msec/iter =   13.47  ROE[avg,max] = [0.187610146, 0.250000000]  radices = 352 16 16 32  0  0  0  0  0  0
      6144  msec/iter =   16.09  ROE[avg,max] = [0.209471649, 0.281250000]  radices = 192 32 32 16  0  0  0  0  0  0
      6656  msec/iter =   16.86  ROE[avg,max] = [0.196862667, 0.250000000]  radices = 208 16 32 32  0  0  0  0  0  0
      7168  msec/iter =   17.38  ROE[avg,max] = [0.196444104, 0.250000000]  radices = 224 32 32 16  0  0  0  0  0  0
      7680  msec/iter =   23.23  ROE[avg,max] = [0.239954494, 0.343750000]  radices = 240 32 32 16  0  0  0  0  0  0
      8192  msec/iter =   19.79  ROE[avg,max] = [0.272732764, 0.375000000]  radices = 256 32 32 16  0  0  0  0  0  0
      9216  msec/iter =   23.01  ROE[avg,max] = [0.242732915, 0.281250000]  radices = 288 32 32 16  0  0  0  0  0  0
     10240  msec/iter =   27.24  ROE[avg,max] = [0.271287049, 0.375000000]  radices = 320 32 32 16  0  0  0  0  0  0
     11264  msec/iter =   28.87  ROE[avg,max] = [0.271818621, 0.375000000]  radices = 352 32 32 16  0  0  0  0  0  0
     12288  msec/iter =   32.04  ROE[avg,max] = [0.259570478, 0.312500000]  radices = 768 16 16 32  0  0  0  0  0  0
     13312  msec/iter =   37.85  ROE[avg,max] = [0.254703482, 0.312500000]  radices = 208 32 32 32  0  0  0  0  0  0
     14336  msec/iter =   40.34  ROE[avg,max] = [0.234003331, 0.296875000]  radices = 224 32 32 32  0  0  0  0  0  0
     15360  msec/iter =   43.84  ROE[avg,max] = [0.245504855, 0.312500000]  radices = 960 16 16 32  0  0  0  0  0  0
     16384  msec/iter =   45.62  ROE[avg,max] = [0.272600878, 0.375000000]  radices = 256 32 32 32  0  0  0  0  0  0
     18432  msec/iter =   53.16  ROE[avg,max] = [0.236424995, 0.281250000]  radices = 288 32 32 32  0  0  0  0  0  0
     20480  msec/iter =   62.92  ROE[avg,max] = [0.237479031, 0.312500000]  radices = 320 32 32 32  0  0  0  0  0  0
     22528  msec/iter =   66.03  ROE[avg,max] = [0.228240432, 0.312500000]  radices = 352 32 32 32  0  0  0  0  0  0
     24576  msec/iter =   69.49  ROE[avg,max] = [0.261424145, 0.343750000]  radices = 768 16 32 32  0  0  0  0  0  0
Look forward for their desktop's M1X. Good job.
Lorenzo is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas v19 available ewmayer Mlucas 89 2021-02-01 20:37
Mlucas v18 available ewmayer Mlucas 48 2019-11-28 02:53
Mlucas on ubuntu Damian Mlucas 17 2017-11-13 18:12
Mlucas version 17 ewmayer Mlucas 3 2017-06-17 11:18
mlucas on sun delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 19:50.

Sun Apr 11 19:50:27 UTC 2021 up 3 days, 14:31, 1 user, load averages: 2.60, 2.37, 2.18

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.