mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Mlucas (https://www.mersenneforum.org/forumdisplay.php?f=118)
-   -   Mlucas v19.1 available (https://www.mersenneforum.org/showthread.php?t=26483)

ewmayer 2021-02-12 00:00

Mlucas v19.1 available
 
[url=http://www.mersenneforum.org/mayer/README.html]Mlucas v19.1 has gone live[/url]. Use this thread to report bugs, build issues, and for any other related discussion.

The only "must have" reason for folks already using v19 is in case you want to also play with builds using the Clang/LLVM compiler on Armv8-SIMD - supporting CPUs, including the new Apple Silicon ones, for which Clang is the native compiler.

Special thanks to forumites ldesnogues for reporting the "v19 won't build using Clang on Armv8" issue and helping to diagnose its root cause. There is also a nifty auto-install-and-tune-for-your-hardware script which author tdulcet has been tweaking based on my feedback and lots of back-and-forth; thanks also to tdulcet and dan2 for the bigly-enhanced version of the primenet.py work-management script which builds on Loïc Le Loarer's Primenet-API-interfacing work of last year. Details, as always at the above README page.

I have tried to make the README, which is quite sprawling due to the need to support the do-it-yourselfers who are legion in Linuxworld, more easily navigable - I hope the added "jump to"-arrowed in-page-links make it somewhat easier to navigate between e.g. the release description and how-to-build sections.

I expect to be spending the coming week or so uploading patches - hopefully more to the README than the v19.1 sources - and doing a big honking file merge of the v19.1 code changes upward into the v20 development branch, which has lain dormant for the past 2 months. All the v19.1 effort would have been needed in v20 anyway, so the time has not been wasted, I'm simply looking forward to getting back to feature-add work in form of p-1 factoring support.

Please subscribe to this thread if you want to be notified of patch uploads.

ewmayer 2021-02-12 20:29

README updated with corrected wget for tdulcet's mlucas.sh install script - original version would download and run it immediately, now users can first parse it, comment out the primenet.py - invocation block if they want to "try before you buy", etc.

I'm also looking for an owner of a hybrid BIG/little-CPU system like Odroid N2 to test tdulcet's Mlucas v19.1 install/autotune script on such. With my guidance he's made several changes in an effort to support such systems, which typically need separate run directories for each CPU, each with an mlucas.cfg file containing FFT params properly tuned for the CPU in question.

Dylan14 2021-02-15 20:35

I've updated the PKGBUILD for Arch Linux to v19.1, which follows the procedure as described in the readme document.

The fp-link patch is no longer needed, however, the sysctl-missing patch is still needed.

ewmayer 2021-02-15 21:58

[QUOTE=Dylan14;571681]I've updated the PKGBUILD for Arch Linux to v19.1, which follows the procedure as described in the readme document.

The fp-link patch is no longer needed, however, the sysctl-missing patch is still needed.[/QUOTE]

Thanks - the latter is the sysctl-deprecated warnings? Handling those is on my v20 to-do list - it's not as simple as blanket-removing the includes from platform.h, because I always try to support older platforms within reason, so that needs proper preprocessor #ifdef wrapping to retain the include on older distros of Linux and MacOS where that header is needed.

You're getting warnings, or your version of GCC is treating those "deprecated"s as errors?

Dylan14 2021-02-15 23:02

When I try to build on Arch Linux (which is presently on kernel version 5.10.16) the error I would get if I kept the include <sys/sysctl.h> is:

[code]platform.h:1307:12: fatal error: sys/sysctl.h: No such file or directory
compilation terminated.
[/code]This is using gcc 10.2.0.
This would not be needed, if I was using the linux-lts kernel which is on version 5.4 and has the sysctl.h file - so doing my blanket patch is a bit risky - I should only run the patch if the kernel version is at least 5.5.

ewmayer 2021-02-15 23:40

[QUOTE=Dylan14;571694]When I try to build on Arch Linux (which is presently on kernel version 5.10.16) the error I would get if I kept the include <sys/sysctl.h> is:

[code]platform.h:1307:12: fatal error: sys/sysctl.h: No such file or directory
compilation terminated.
[/code]This is using gcc 10.2.0.
This would not be needed, if I was using the linux-lts kernel which is on version 5.4 and has the sysctl.h file - so doing my blanket patch is a bit risky - I should only run the patch if the kernel version is at least 5.5.[/QUOTE]

What is needed is some way of conditionally including the file only on OS/kernel combinations which support it. I dumped all the compiler predefines for one of my Ubuntu v19 systems, 'uname -a ' indicates it's kernel 5.3:
[i]
Linux ewmayer-NUC8i3CYS 5.3.0-59-generic #53-Ubuntu SMP Wed Jun 3 15:52:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)
[/i]
...but I don't see any specific-Linux-version info in the GCC predefines.

Do me a favor - for your Arch Linux distro, cd to mlucas_v19.1/src and run the following command there:
[i]
gcc -dM -E align.h < /dev/null > predefs.txt
[/i]
(The align.h header is just so both the system and Mlucas predefs get dumped). Attach the resulting predefs.txt file to a post.

Dylan14 2021-02-15 23:50

1 Attachment(s)
[QUOTE=ewmayer;571696]
Do me a favor - for your Arch Linux distro, cd to mlucas_v19.1/src and run the following command there:
[i]
gcc -dM -E align.h < /dev/null > predefs.txt
[/i]
(The align.h header is just so both the system and Mlucas predefs get dumped). Attach the resulting predefs.txt file to a post.[/QUOTE]

See attached file.

Dylan14 2021-02-16 00:24

Okay, I have figured out a way to determine when to patch within the PKGBUILD, as seen here in the prepare function:

[CODE]prepare() {
cd "${srcdir}"/"${pkgname}"_v"${pkgver}"
#Only patch if the kernel version is at least 5.5.0
kermajver=`uname -r | cut -d. -f1`
kerminver=`uname -r | cut -d. -f2`
if [ $kermajver -gt 5 ]; then
patch -p1 < "../../sysctl-missing.patch"
elif [ $kermajver -eq 5 ] && [ $kerminver -ge 5 ]; then
patch -p1 < "../../sysctl-missing.patch"
fi
}[/CODE]Basically, if the kernel major version is greater then 5 then run the patch, or if the kernel major version is 5 and the minor version is at least 5 then also run the patch. Otherwise do nothing.

ewmayer 2021-02-16 20:02

Thanks for the predefs - looking through those triggered a recollection, that I'd previously made a note-to-self for the v20 release re. this, namely that the sysctl.h deprecation was tied not to a specific Linux kernel version, but rather to the GLIBC version. That's useful because GCC predefines don't have info re. the former, but do have the GLIBC version info.

Per this [url=https://github.com/open5gs/open5gs/issues/600]Github discussion[/url], "sysctl() is deprecated and may break build with glibc >= 2.30", so we wrap the #include like so:
[code]#if (__GLIBC__ < 2) || (__GLIBC_MINOR__ < 30)
#warning GLIBC either not defined or version < 2.30 ... including <sys/sysctl.h> header.
#include <sys/sysctl.h>
#endif[/code]
If you make that mod in your platform.h file, does that fix the problem without having to apply your patch?

Dylan14 2021-02-16 20:13

[QUOTE=ewmayer;571770]...
If you make that mod in your platform.h file, does that fix the problem without having to apply your patch?[/QUOTE]

That fixed the issue.

Lorenzo 2021-02-17 20:31

Hello!) I just want to share my experience with Apple M1 CPU.

Compiled smoothly without issues (i have included -DUSE_ARM_V8_SIMD flag according with the README page).

[CODE]CPU Family = ARM Embedded ABI, OS = OS X, 64-bit Version, compiled with Gnu-C-compatible [llvm/clang], Version 12.0.0 (clang-1200.0.32.29).
INFO: Build uses ARMv8 advanced-SIMD instruction set.
[/CODE]CPU extensions:

[CODE]m1@599160f8-fb7f-41df-adc2-2b7f4da1aac7 src % sysctl hw.optional
hw.optional.floatingpoint: 1
hw.optional.watchpoint: 4
hw.optional.breakpoint: 6
hw.optional.neon: 1
hw.optional.neon_hpfp: 1
hw.optional.neon_fp16: 1
hw.optional.armv8_1_atomics: 1
hw.optional.armv8_crc32: 1
hw.optional.armv8_2_fhm: 1
hw.optional.armv8_2_sha512: 1
hw.optional.armv8_2_sha3: 1
hw.optional.amx_version: 2
hw.optional.ucnormal_mem: 1
hw.optional.arm64: 1
[/CODE]./Mlucas -s m -cpu 0:7
[CODE]19.1
2048 msec/iter = 3.32 ROE[avg,max] = [0.215347133, 0.312500000] radices = 32 32 32 32 0 0 0 0 0 0
2304 msec/iter = 4.00 ROE[avg,max] = [0.193772149, 0.250000000] radices = 144 32 16 16 0 0 0 0 0 0
2560 msec/iter = 4.28 ROE[avg,max] = [0.178074945, 0.234375000] radices = 160 32 16 16 0 0 0 0 0 0
2816 msec/iter = 4.98 ROE[avg,max] = [0.194841334, 0.281250000] radices = 176 32 16 16 0 0 0 0 0 0
3072 msec/iter = 5.27 ROE[avg,max] = [0.208759866, 0.312500000] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 5.94 ROE[avg,max] = [0.324307345, 0.406250000] radices = 208 32 16 16 0 0 0 0 0 0
3584 msec/iter = 6.01 ROE[avg,max] = [0.198822084, 0.250000000] radices = 56 32 32 32 0 0 0 0 0 0
3840 msec/iter = 6.54 ROE[avg,max] = [0.187369624, 0.250000000] radices = 60 32 32 32 0 0 0 0 0 0
4096 msec/iter = 6.88 ROE[avg,max] = [0.176231022, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0
4608 msec/iter = 7.91 ROE[avg,max] = [0.206297821, 0.281250000] radices = 288 32 16 16 0 0 0 0 0 0
5120 msec/iter = 8.42 ROE[avg,max] = [0.193601628, 0.250000000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 9.70 ROE[avg,max] = [0.221504510, 0.281250000] radices = 352 32 16 16 0 0 0 0 0 0
6144 msec/iter = 10.67 ROE[avg,max] = [0.183728153, 0.250000000] radices = 192 16 32 32 0 0 0 0 0 0
6656 msec/iter = 11.75 ROE[avg,max] = [0.176554163, 0.218750000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 11.84 ROE[avg,max] = [0.213558111, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0
7680 msec/iter = 13.14 ROE[avg,max] = [0.211455481, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0
8192 msec/iter = 13.50 ROE[avg,max] = [0.243920143, 0.312500000] radices = 256 16 32 32 0 0 0 0 0 0
9216 msec/iter = 15.44 ROE[avg,max] = [0.256431218, 0.343750000] radices = 288 16 32 32 0 0 0 0 0 0
10240 msec/iter = 16.75 ROE[avg,max] = [0.293991624, 0.375000000] radices = 160 32 32 32 0 0 0 0 0 0
11264 msec/iter = 18.68 ROE[avg,max] = [0.222417407, 0.281250000] radices = 352 16 32 32 0 0 0 0 0 0
12288 msec/iter = 21.39 ROE[avg,max] = [0.219849010, 0.281250000] radices = 192 32 32 32 0 0 0 0 0 0
13312 msec/iter = 23.73 ROE[avg,max] = [0.258116543, 0.312500000] radices = 208 32 32 32 0 0 0 0 0 0
14336 msec/iter = 23.98 ROE[avg,max] = [0.231325382, 0.281250000] radices = 224 32 32 32 0 0 0 0 0 0
15360 msec/iter = 26.76 ROE[avg,max] = [0.235138002, 0.281250000] radices = 240 32 32 32 0 0 0 0 0 0
16384 msec/iter = 26.98 ROE[avg,max] = [0.230396011, 0.312500000] radices = 256 32 32 32 0 0 0 0 0 0
18432 msec/iter = 31.07 ROE[avg,max] = [0.276530284, 0.375000000] radices = 288 32 32 32 0 0 0 0 0 0
20480 msec/iter = 35.91 ROE[avg,max] = [0.229381947, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0
22528 msec/iter = 37.85 ROE[avg,max] = [0.235262715, 0.296875000] radices = 352 32 32 32 0 0 0 0 0 0
24576 msec/iter = 42.70 ROE[avg,max] = [0.238062530, 0.375000000] radices = 768 16 32 32 0 0 0 0 0 0
26624 msec/iter = 60.50 ROE[avg,max] = [0.254043170, 0.312500000] radices = 208 16 16 16 16 0 0 0 0 0
[/CODE]./Mlucas -s m -cpu 0:3
Looks like threads with heavy load assigned automatically to faster cores.
[CODE]19.1
2048 msec/iter = 3.88 ROE[avg,max] = [0.215133698, 0.312500000] radices = 32 32 32 32 0 0 0 0 0 0
2304 msec/iter = 4.84 ROE[avg,max] = [0.194502305, 0.281250000] radices = 144 32 16 16 0 0 0 0 0 0
2560 msec/iter = 5.00 ROE[avg,max] = [0.184244498, 0.250000000] radices = 40 32 32 32 0 0 0 0 0 0
2816 msec/iter = 6.03 ROE[avg,max] = [0.193770639, 0.250000000] radices = 176 32 16 16 0 0 0 0 0 0
3072 msec/iter = 6.17 ROE[avg,max] = [0.209568299, 0.281250000] radices = 48 32 32 32 0 0 0 0 0 0
3328 msec/iter = 7.15 ROE[avg,max] = [0.221850838, 0.281250000] radices = 52 32 32 32 0 0 0 0 0 0
3584 msec/iter = 7.12 ROE[avg,max] = [0.199199621, 0.281250000] radices = 56 32 32 32 0 0 0 0 0 0
3840 msec/iter = 7.90 ROE[avg,max] = [0.187449630, 0.250000000] radices = 60 32 32 32 0 0 0 0 0 0
4096 msec/iter = 8.21 ROE[avg,max] = [0.174905238, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0
4608 msec/iter = 9.57 ROE[avg,max] = [0.205330823, 0.281250000] radices = 288 32 16 16 0 0 0 0 0 0
5120 msec/iter = 10.01 ROE[avg,max] = [0.193377434, 0.250000000] radices = 160 16 32 32 0 0 0 0 0 0
5632 msec/iter = 11.74 ROE[avg,max] = [0.221915271, 0.281250000] radices = 352 32 16 16 0 0 0 0 0 0
6144 msec/iter = 12.89 ROE[avg,max] = [0.183260259, 0.250000000] radices = 192 16 32 32 0 0 0 0 0 0
6656 msec/iter = 14.32 ROE[avg,max] = [0.176914974, 0.250000000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 14.40 ROE[avg,max] = [0.213720200, 0.281250000] radices = 224 16 32 32 0 0 0 0 0 0
7680 msec/iter = 16.16 ROE[avg,max] = [0.211763551, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0[/CODE]
Perfomance looks awesome for mobile CPU. Just to compare timings with AXV-2 on i3-8100 (4 cores): M1 much faster.

AXV-2 on i3-8100:
[CODE]19.1
2048 msec/iter = 4.75 ROE[avg,max] = [0.167383863, 0.218750000] radices = 128 16 16 32 0 0 0 0 0 0
2304 msec/iter = 5.44 ROE[avg,max] = [0.182823637, 0.218750000] radices = 144 16 16 32 0 0 0 0 0 0
2560 msec/iter = 6.29 ROE[avg,max] = [0.224905364, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0
2816 msec/iter = 6.63 ROE[avg,max] = [0.183906382, 0.230468750] radices = 176 16 16 32 0 0 0 0 0 0
3072 msec/iter = 7.42 ROE[avg,max] = [0.252202803, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0
3328 msec/iter = 7.52 ROE[avg,max] = [0.225825548, 0.281250000] radices = 208 16 16 32 0 0 0 0 0 0
3584 msec/iter = 8.12 ROE[avg,max] = [0.260567010, 0.375000000] radices = 224 16 16 32 0 0 0 0 0 0
3840 msec/iter = 9.15 ROE[avg,max] = [0.200714048, 0.281250000] radices = 240 16 16 32 0 0 0 0 0 0
4096 msec/iter = 10.92 ROE[avg,max] = [0.165220469, 0.218750000] radices = 64 32 32 32 0 0 0 0 0 0
4608 msec/iter = 11.15 ROE[avg,max] = [0.192892739, 0.250000000] radices = 288 16 16 32 0 0 0 0 0 0
5120 msec/iter = 12.18 ROE[avg,max] = [0.229244523, 0.312500000] radices = 160 32 32 16 0 0 0 0 0 0
5632 msec/iter = 13.47 ROE[avg,max] = [0.187610146, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0
6144 msec/iter = 16.09 ROE[avg,max] = [0.209471649, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0
6656 msec/iter = 16.86 ROE[avg,max] = [0.196862667, 0.250000000] radices = 208 16 32 32 0 0 0 0 0 0
7168 msec/iter = 17.38 ROE[avg,max] = [0.196444104, 0.250000000] radices = 224 32 32 16 0 0 0 0 0 0
7680 msec/iter = 23.23 ROE[avg,max] = [0.239954494, 0.343750000] radices = 240 32 32 16 0 0 0 0 0 0
8192 msec/iter = 19.79 ROE[avg,max] = [0.272732764, 0.375000000] radices = 256 32 32 16 0 0 0 0 0 0
9216 msec/iter = 23.01 ROE[avg,max] = [0.242732915, 0.281250000] radices = 288 32 32 16 0 0 0 0 0 0
10240 msec/iter = 27.24 ROE[avg,max] = [0.271287049, 0.375000000] radices = 320 32 32 16 0 0 0 0 0 0
11264 msec/iter = 28.87 ROE[avg,max] = [0.271818621, 0.375000000] radices = 352 32 32 16 0 0 0 0 0 0
12288 msec/iter = 32.04 ROE[avg,max] = [0.259570478, 0.312500000] radices = 768 16 16 32 0 0 0 0 0 0
13312 msec/iter = 37.85 ROE[avg,max] = [0.254703482, 0.312500000] radices = 208 32 32 32 0 0 0 0 0 0
14336 msec/iter = 40.34 ROE[avg,max] = [0.234003331, 0.296875000] radices = 224 32 32 32 0 0 0 0 0 0
15360 msec/iter = 43.84 ROE[avg,max] = [0.245504855, 0.312500000] radices = 960 16 16 32 0 0 0 0 0 0
16384 msec/iter = 45.62 ROE[avg,max] = [0.272600878, 0.375000000] radices = 256 32 32 32 0 0 0 0 0 0
18432 msec/iter = 53.16 ROE[avg,max] = [0.236424995, 0.281250000] radices = 288 32 32 32 0 0 0 0 0 0
20480 msec/iter = 62.92 ROE[avg,max] = [0.237479031, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0
22528 msec/iter = 66.03 ROE[avg,max] = [0.228240432, 0.312500000] radices = 352 32 32 32 0 0 0 0 0 0
24576 msec/iter = 69.49 ROE[avg,max] = [0.261424145, 0.343750000] radices = 768 16 32 32 0 0 0 0 0 0
[/CODE]
Look forward for their desktop's M1X. Good job. :tu:


All times are UTC. The time now is 09:35.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.