![]() |
ARM builds and SIMD-assembler prospects
Forumite and ARM Odroid user David Willmore and I hacked together a small amount of predefine-code in the Mlucas platform.h file to enable him to get a build of the code on that platform (he is using [url=http://www.hardkernel.com/main/products/prdt_info.php]this Odroid hardware implementation[/url]). As the late baseball great Yogi Berra famously quipped, "Predictions are hard, especially about the future", but over the weekend I had a gander about the ARM architecture, especially SIMD support. Here some e-mail ruminations that spawned:
Me: [quote]Had a look at the [url=https://en.wikipedia.org/wiki/ARM_architecture]Wikipedia article on ARM[/url] - looks like the Neon has pretty nice SIMD support, though the page is a tad confusing re. the single-vs-double-precision-float aspect. First we see this - underlines mine: [i] [b]Advanced SIMD (NEON)[/b] The Advanced SIMD extension (aka NEON or "MPE" Media Processing Engine) is a combined 64- and 128-bit SIMD instruction set that provides standardized acceleration for media and signal processing applications. NEON is included in all Cortex-A8 devices but is optional in Cortex-A9 devices.[83] NEON can execute MP3 audio decoding on CPUs running at 10 MHz and can run the GSM adaptive multi-rate (AMR) speech codec at no more than 13 MHz. It features a comprehensive instruction set, separate register files and independent execution hardware.[84] [u]NEON supports 8-, 16-, 32- and 64-bit integer and single-precision (32-bit) floating-point data and SIMD operations[/u] for handling audio and video processing as well as graphics and gaming processing. In NEON, the SIMD supports up to 16 operations at the same time. The NEON hardware shares the same floating-point registers as used in VFP. Devices such as the ARM Cortex-A8 and Cortex-A9 support 128-bit vectors but will execute with 64 bits at a time,[80] whereas newer Cortex-A15 devices can execute 128 bits at a time. ProjectNe10 is ARM's first open source project (from its inception). The Ne10 library is a set of common, useful functions written in both NEON and C (for compatibility). The library was created to allow developers to use NEON optimisations without learning NEON but it also serves as a set of highly optimised NEON intrinsic and assembly code examples for common DSP, arithmetic and image processing routines. The code is available on GitHub. [/i] Later on we have this: [i] [b]AArch64 features[/b] New instruction set, A64 Has 31 general-purpose 64-bit registers. Has dedicated SP or zero register. The program counter (PC) is no longer directly accessible as a register. Instructions are still 32 bits long and mostly the same as A32 (with LDM/STM instructions and most conditional execution dropped). Has paired loads/stores (in place of LDM/STM). No predication for most instructions (except branches). Most instructions can take 32-bit or 64-bit arguments. Addresses assumed to be 64-bit. Advanced SIMD (NEON) enhanced Has 32× 128-bit registers (up from 16), also accessible via VFPv4. [u]Supports double-precision floating point[/u]. [/i] Intel only added 32-SIMD-regieters with their AVX-512, i.e. that won't come to the PC space until late this year. I'm doing a lot of 32-register-using code streamlining in my KNL dev work ... using the eventual inline-asm macros which will result from that as the basis for an ARM asm-translation seems quite doable, especially since access to the Intel KNL has allowed me to start the AVX-512 code upgrades a full year in advance of that arch hitting consumer PCs. That means that once I have a decent first-cut at AVX-512 - looking like summer based on work so far - I would have time for other things. If we had good DP support, we could probably expect a 3-5x gain over generic-C build from a SIMD-using enhancement. (ARM SIMD is 128-bit-wide, i.e. pairs-of-doubles, so one expects 2x at the very least, plus more on top of that from hand-tuned register and FMA usage.) If that opened up a realistic prospect of gaining some reasonable fraction of current GIMPS throughput - say 10% - from thousands of ARM users, it would worth doing. David, any sense of what kinds of ARM-using devices would be available for the kind of 24/7 crunching needed here? Just got [Mlucas self-test-produced] cfg-file timings from David's 1-core self-tests ... those indicate ~1/20th the per-cycle throughput of a single Intel Haswell core running 256-bit vector-inline-assembly ... that's not bad at all for a generic-C build.[/quote] David: [quote]Yes, the Cortex-A53 does one 128 bit NEON instruction/clock--which could be two doubles. The higher end models can dual issue NEON, but I don't know the details. I know a few people who might. There are hundreds of millions of these devices, but they tend to be in power and memory limited applications. The one use of them that would have no power limit and little memory limit would be in set top boxes as they are mains powered. Other than that, they tend to be in phones and other battery powered devices--or in things so small that thermal issues would become signifigant. Even those devices that could run this code effectively tend to be 'black boxes' that come as set from the manufacturer and leave the user very little ability to add programs. I think any effort you put into this should bear in mind that very little direct benefit will acrue to GIMPS. I would suggest that second order benefits would be higher--people learn about GIMPS because of the usefullness of this code to them in a different context. Because of that knowledge, they chose to support GIMPS with other hardware of theirs. Because of that, I would suggest that any work you do to optimize for AARCH64 be as generic as you can make it--use the Ne10 project code if/when possible. I don't want to sidetrack you from your valuable work on GIMPS with my silly little side project.[/quote] |
[QUOTE=ewmayer;451934]Forumite and ARM Odroid user David Willmore and I hacked together a small amount of predefine-code in the Mlucas platform.h file to enable him to get a build of the code on that platform.
David:[/QUOTE] Following the thread,,, :popcorn: |
Yesterday i have ordered this device [url]https://www.pine64.org/?product=pine-a64-board-1gb[/url] for only 19$ + cost of shipping!
It based on 1.2 GHz Quad-Core ARM Cortex A53 64-Bit Processor. Would be very interesting to test Mlucas on it when i get these devise in my hand. Unfortunately shipment to my country might takes up to two month. |
[QUOTE=Lorenzo;452155]Yesterday i have ordered this device [url]https://www.pine64.org/?product=pine-a64-board-1gb[/url] for only 19$ + cost of shipping!
It based on 1.2 GHz Quad-Core ARM Cortex A53 64-Bit Processor. Would be very interesting to test Mlucas on it when i get these devise in my hand. Unfortunately shipment to my country might takes up to two month.[/QUOTE] [commercial mode on] Hi Lorenzo. Once at that, you can easily get a Odroid-C2 clocked at 1.5GHz, 64bit ready with Ubuntu, with its own heatsink and a lot of room for overclocking, 2GB and nearly no memory bandwidth limitations... for $40 [commercial mode off] If I get it right, you are still in the process of building a microfarm... |
[QUOTE=ET_;452164][commercial mode on]
Hi Lorenzo. Once at that, you can easily get a Odroid-C2 clocked at 1.5GHz, 64bit ready with Ubuntu, with its own heatsink and a lot of room for overclocking, 2GB and nearly no memory bandwidth limitations... for $40 [commercial mode off] If I get it right, you are still in the process of building a microfarm...[/QUOTE] I booked it because in my country i have custom taxes limitation. I can receive goods with cost up to 22 USD per month without any custom taxes and absolutely free. In another way i have a very complicated way to get it and i must pay at least custom taxes (30%) and some extra taxes. So that is why i bought the PINE device and only with 1 Gb memory :smile: |
This thread has gotten me much more intrigued with these devices.
ET: I see why you recommend Odroid. That's an amazing package. I would need a PSU, the storage module, etc. As I remember, Android has a larger memory footprint, so Ubuntu would seem to be the obvious choice. Thinking more about it, anyway. EDIT: Lorenzo: That is really restrictive! But who knows? The US might be going that way, too. Tariffs on Mexican goods? Picking fights with AUSTRALIA?!? I note that He Who Shall Not Be Named does not use the word "tariff", but something like "border tax". Trade Wars, Anyone? |
1 Attachment(s)
Update.
I bought one of these PicoCubes with five Odroid-C2 (20 nodes) and am ready to test/benchmark Mlucas on it as soon as I receive the package and get it ready to work. Please refer to this thread whenever you have news or hints. Luigi |
[QUOTE=ET_;454592]Update.
I bought one of these PicoCubes with five Odroid-C2 (20 nodes) and am ready to test/benchmark Mlucas on it as soon as I receive the package and get it ready to work. Please refer to this thread whenever you have news or hints. Luigi[/QUOTE] You're at least a couple months ahead of me - that's how long it'll take me to finish a first-cut AVX-512 upgrade to all the Mlucas code, at which point I plan to get a low-cost Neon dev-board to play with. What software (I really just care about gcc/gdb and the associated libraries) did you install, or came preinstalled on your system? And is that a Cortex-A15, i.e. a true 128-bit NEON? |
[QUOTE=ewmayer;454594]You're at least a couple months ahead of me - that's how long it'll take me to finish a first-cut AVX-512 upgrade to all the Mlucas code, at which point I plan to get a low-cost Neon dev-board to play with. What software (I really just care about gcc/gdb and the associated libraries) did you install, or came preinstalled on your system? And is that a Cortex-A15, i.e. a true 128-bit NEON?[/QUOTE]
Don't feel pressed, I will need some time as well for the delivery and to get acquainted with the management software (and my own projects). You are doing a wonderful work with AVX-512 and everybody here would not like a slowdown on that front :smile: Unfortunately (?) the processor is a Cortex-A58 (ARMv8), a 64-bit processor like the one used on the Raspberry PI, but fully supported by a 64 bit OS with 2GB of memory and clocked at 1.5GHz. It is the same you said David Willmore is using ([url]http://www.hardkernel.com/main/products/prdt_info.php[/url] ). gcc is the version that runs on Ubuntu Mate (I will have more infos as soon as I get the package delivered). The specifications say: [quote] You can use this cluster to run almost any kind of distributed or parallel software. Run your own LAMP cluster, Docker, Kubernetes, Hadoop, ElasticSearch, Cassandra and many others. Also learn languages like Javascript, Java, Python, R, and so on. Use for Development, QA, DevOps, or Education. The PicoCluster Application Image Set is a basic cluster setup designed to get the PicoCluster user up and running quickly. It is the Image Set that will be pre-configured with any PicoCluster Cube or Kit that is ordered with memory cards. All other Application or Cluster Images Sets are based upon this one. You can either use PicoCluster as a desktop cluster by plugging in a mouse, keyboard, and monitor, or use it as a network cluster by connecting via SSH. [/quote] Just let me know if I can be of any help. Luigi |
Hi, Luigi:
Cortex-A58 ... so 128-bit vector instructions OK, but they actually get executed 64-bits at a time? Thanks for the kind offer of help - a remote-access account would be great, but no biggie since the fewer-core dev-boards are cheap. If you could LMK which precise dev-board I should get to get true 128-bit exec capability, that would be helpful. Post a pic of your rig once it's set up! |
I think you both mean ARM Cortex A53 (or A57). A58 doesnt exist (yet) ;).
List of ARM Cortex A: [url]http://www.arm.com/products/processors/cortex-a[/url] |
[QUOTE=VictordeHolland;454658]I think you both mean ARM Cortex A53 (or A57). A58 doesnt exist (yet) ;).
List of ARM Cortex A: [url]http://www.arm.com/products/processors/cortex-a[/url][/QUOTE] A53, that's correct :redface: |
[QUOTE=ewmayer;454652]Hi, Luigi:
Cortex-A58 ... so 128-bit vector instructions OK, but they actually get executed 64-bits at a time? Thanks for the kind offer of help - a remote-access account would be great, but no biggie since the fewer-core dev-boards are cheap. If you could LMK which precise dev-board I should get to get true 128-bit exec capability, that would be helpful. Post a pic of your rig once it's set up![/QUOTE] Yesterday I downloaded the hardware guide and programming manuals for both the A53 and the NEON/NE[sub]10[/sub] system, I will let you know how the A53 (or other processors) performs on 128 bits. IIRC, LaurV is the guru on such processors. As per the remote access, I don't have a public IP right now, but I can request it if needed. |
Hi!
Finally i received my PINE64 device. Looks great!!! [CODE]ubuntu@pine64:~/Solaris/mlucas-14.1$ lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 CPU max MHz: 1344.0000 CPU min MHz: 480.0000 ubuntu@pine64:~/Solaris/mlucas-14.1$ cat /proc/cpuinfo Processor : AArch64 Processor rev 4 (aarch64) processor : 0 processor : 1 processor : 2 processor : 3 Features : fp asimd aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: AArch64 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 Hardware : sun50iw1p1 [/CODE]But I can't compile mlucas. [CODE]ubuntu@pine64:~/Solaris/mlucas-14.1$ sudo make make all-am make[1]: Entering directory '/home/ubuntu/Solaris/mlucas-14.1' CC $NORMAL_O $THREADS_O Makefile:2984: recipe for target 'NORMAL_O-THREADS_O.stamp' failed make[1]: *** [NORMAL_O-THREADS_O.stamp] Error 1 make[1]: Leaving directory '/home/ubuntu/Solaris/mlucas-14.1' Makefile:2084: recipe for target 'all' failed make: *** [all] Error 2[/CODE]Is here something special for ARM? Or in generally i can't compile mlucas on arm? |
[QUOTE=Lorenzo;454690]Hi!
Finally i received my PINE64 device. Looks great!!! [CODE]ubuntu@pine64:~/Solaris/mlucas-14.1$ lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 CPU max MHz: 1344.0000 CPU min MHz: 480.0000 ubuntu@pine64:~/Solaris/mlucas-14.1$ cat /proc/cpuinfo Processor : AArch64 Processor rev 4 (aarch64) processor : 0 processor : 1 processor : 2 processor : 3 Features : fp asimd aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: AArch64 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 4 Hardware : sun50iw1p1 [/CODE]But I can't compile mlucas. [CODE]ubuntu@pine64:~/Solaris/mlucas-14.1$ sudo make make all-am make[1]: Entering directory '/home/ubuntu/Solaris/mlucas-14.1' CC $NORMAL_O $THREADS_O Makefile:2984: recipe for target 'NORMAL_O-THREADS_O.stamp' failed make[1]: *** [NORMAL_O-THREADS_O.stamp] Error 1 make[1]: Leaving directory '/home/ubuntu/Solaris/mlucas-14.1' Makefile:2084: recipe for target 'all' failed make: *** [all] Error 2[/CODE]Is here something special for ARM? Or in generally i can't compile mlucas on arm?[/QUOTE] Did you try [code]./configure[/code] before running make? |
Yes. Sure.
[CODE]ubuntu@pine64:~/Solaris/mlucas-14.1$ ubuntu@pine64:~/Solaris/mlucas-14.1$ sudo ./configure checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking whether make supports nested variables... (cached) yes checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking for style of include used by make... GNU checking dependency style of gcc... none checking for library containing ceil, log, pow, sqrt, sincos, floor, lrint, atan... -lm checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking fenv.h usability... yes checking fenv.h presence... yes checking for fenv.h... yes checking limits.h usability... yes checking limits.h presence... yes checking for limits.h... yes checking mach/mach.h usability... no checking mach/mach.h presence... no checking for mach/mach.h... no checking stddef.h usability... yes checking stddef.h presence... yes checking for stddef.h... yes checking for stdlib.h... (cached) yes checking for string.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking for unistd.h... (cached) yes checking for stdbool.h that conforms to C99... yes checking for _Bool... yes checking for inline... inline checking for pid_t... yes checking for size_t... yes checking for uint64_t... yes checking for stdlib.h... (cached) yes checking for GNU libc compatible malloc... yes checking for stdlib.h... (cached) yes checking for GNU libc compatible realloc... yes checking for clock_gettime... yes checking for gethrtime... no checking for gettimeofday... yes checking for memset... yes checking for pow... yes checking for sqrt... yes checking for strerror... yes checking for strstr... yes checking for strtoul... yes checking whether _LARGEFILE_SOURCE is declared... no checking build system type... aarch64-unknown-linux-gnu checking host system type... aarch64-unknown-linux-gnu checking that generated files are newer than configure... done configure: creating ./config.status config.status: creating Makefile config.status: creating config.h config.status: executing depfiles commands ubuntu@pine64:~/Solaris/mlucas-14.1$ sudo make make all-am make[1]: Entering directory '/home/ubuntu/Solaris/mlucas-14.1' CC $NORMAL_O $THREADS_O Makefile:2984: recipe for target 'NORMAL_O-THREADS_O.stamp' failed make[1]: *** [NORMAL_O-THREADS_O.stamp] Error 1 make[1]: Leaving directory '/home/ubuntu/Solaris/mlucas-14.1' Makefile:2084: recipe for target 'all' failed make: *** [all] Error 2 [/CODE] |
Did you download the .tar from the mlucas main page or the debian package from Vang?
|
[QUOTE=VictordeHolland;454693]Did you download the .tar from the mlucas main page or the debian package from Vang?[/QUOTE]
I have dowloaded from here: [url]http://www.mersenneforum.org/mayer/README.html[/url] |
[QUOTE=Lorenzo;454694]I have dowloaded from here: [url]http://www.mersenneforum.org/mayer/README.html[/url][/QUOTE]
Since you're using the auto-make version (which first appeared in v14), that's the correct one. I've contacted my ARM-build guru - same fellow who put together the auto-make stuff - about your make failure, will let you know as soon as we get a clue. I do know we needed to fiddle the platform.h file for some ARM systems, but that appears unrelated to your issue. If you do end up needing a not-yet-released version of said .h file, I'll post it here. |
David Willmore - another ARM user who I cc'ed on my mail to Alex Vong - says:
[i] Have him start over and this time don't build as root. Then we might get a better idea.[/i] I suggest you try that and post the output here. Assuming that also fails, you can try a manual-build; cd to the src dir of the install where all the .h and .c files reside, 'mkdir MY_OBJ' and cd into that dir, then first try just a single-file compile: gcc -c -Os -m64 -DUSE_THREADS ../Mlucas.c If that succeeds, try all the sourcefiles, here with output piped to a logfile: gcc -c -Os -m64 -DUSE_THREADS ../*.c >& build.log You can 'grep -i error build.log' to check for compile errors - if there are any, post the logfile here. If no compile errors, try linking: gcc -o Mlucas *.o -lm -lpthread -lrt |
1 Attachment(s)
I've tried first part of manual instruction but without success.
[CODE]ubuntu@pine64:~/Solaris/mlucas-14.1/src/MY_OBJ$ gcc -c -Os -m64 -DUSE_THREADS ../Mlucas.c gcc: error: unrecognized command line option ‘-m64’ ubuntu@pine64:~/Solaris/mlucas-14.1/src/MY_OBJ$ gcc -c -Os -DUSE_THREADS ../Mlucas.c In file included from ../types.h:30:0, from ../align.h:29, from ../Mlucas.h:29, from ../Mlucas.c:26: ../platform.h:1072:4: error: #error Multithreading currently only supported for Linux/GCC builds! #error Multithreading currently only supported for Linux/GCC builds! ^ ../Mlucas.c: In function ‘ernstMain’: ../Mlucas.c:1170:88: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] "WARN: Mlucas.c: a[] = 0x%08X not aligned on 128-byte boundary!\n", (uint32)a); ^ ../Mlucas.c:459:3: warning: ignoring return value of ‘fgets’, declared with attribute warn_unused_result [-Wunused-result] fgets(in_line, STR_MAX_LEN, fp); ^ [/CODE] So looks like -m64 option don't allow here. I did a list of allowed options (please have a look at attached file). |
[QUOTE=Lorenzo;454734]I've tried first part of manual instruction but without success.
[CODE]ubuntu@pine64:~/Solaris/mlucas-14.1/src/MY_OBJ$ gcc -c -Os -m64 -DUSE_THREADS ../Mlucas.c gcc: error: unrecognized command line option ‘-m64’ ubuntu@pine64:~/Solaris/mlucas-14.1/src/MY_OBJ$ gcc -c -Os -DUSE_THREADS ../Mlucas.c In file included from ../types.h:30:0, from ../align.h:29, from ../Mlucas.h:29, from ../Mlucas.c:26: ../platform.h:1072:4: error: #error Multithreading currently only supported for Linux/GCC builds! #error Multithreading currently only supported for Linux/GCC builds! ^ ../Mlucas.c: In function ‘ernstMain’: ../Mlucas.c:1170:88: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] "WARN: Mlucas.c: a[] = 0x%08X not aligned on 128-byte boundary!\n", (uint32)a); ^ ../Mlucas.c:459:3: warning: ignoring return value of ‘fgets’, declared with attribute warn_unused_result [-Wunused-result] fgets(in_line, STR_MAX_LEN, fp); ^ [/CODE] So looks like -m64 option don't allow here. I did a list of allowed options (please have a look at attached file).[/QUOTE] try [code] uname -a[/code] Raspberry PI 3 has a 64-bit enabled CPU, but works with 32-bit raspbian and ARMv7 subsystem. I'm mostly sure Pine64 comes configured with a 64-bit OS, but that would clear every doubt about it :smile: Also check the march= and mtune= values. And mcpu=native |
[QUOTE=Lorenzo;454692]Yes. Sure.
[CODE]ubuntu@pine64:~/Solaris/mlucas-14.1$ ubuntu@pine64:~/Solaris/mlucas-14.1$ sudo ./configure checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking whether make supports nested variables... (cached) yes checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking for style of include used by make... GNU checking dependency style of gcc... none checking for library containing ceil, log, pow, sqrt, sincos, floor, lrint, atan... -lm checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking fenv.h usability... yes checking fenv.h presence... yes checking for fenv.h... yes checking limits.h usability... yes checking limits.h presence... yes checking for limits.h... yes checking mach/mach.h usability... no checking mach/mach.h presence... no checking for mach/mach.h... no checking stddef.h usability... yes checking stddef.h presence... yes checking for stddef.h... yes checking for stdlib.h... (cached) yes checking for string.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking for unistd.h... (cached) yes checking for stdbool.h that conforms to C99... yes checking for _Bool... yes checking for inline... inline checking for pid_t... yes checking for size_t... yes checking for uint64_t... yes checking for stdlib.h... (cached) yes checking for GNU libc compatible malloc... yes checking for stdlib.h... (cached) yes checking for GNU libc compatible realloc... yes checking for clock_gettime... yes checking for gethrtime... no checking for gettimeofday... yes checking for memset... yes checking for pow... yes checking for sqrt... yes checking for strerror... yes checking for strstr... yes checking for strtoul... yes checking whether _LARGEFILE_SOURCE is declared... no checking build system type... aarch64-unknown-linux-gnu checking host system type... aarch64-unknown-linux-gnu checking that generated files are newer than configure... done configure: creating ./config.status config.status: creating Makefile config.status: creating config.h config.status: executing depfiles commands ubuntu@pine64:~/Solaris/mlucas-14.1$ sudo make make all-am make[1]: Entering directory '/home/ubuntu/Solaris/mlucas-14.1' CC $NORMAL_O $THREADS_O Makefile:2984: recipe for target 'NORMAL_O-THREADS_O.stamp' failed make[1]: *** [NORMAL_O-THREADS_O.stamp] Error 1 make[1]: Leaving directory '/home/ubuntu/Solaris/mlucas-14.1' Makefile:2084: recipe for target 'all' failed make: *** [all] Error 2 [/CODE][/QUOTE] Hello Lorenzo, Could you try to run [CODE]$ ./configure --enable-verbose-compiler && make[/CODE] as well? (I really should have made that option default!) Also, as David has pointed out, in general, you shouldn't be building software with root privilege. You only need root privilege when you install software. |
Looks like this device is working in 64 mode. So it's ok.
[CODE]ubuntu@pine64:~/Solaris/mlucas-14.1/src/MY_OBJ$ uname -a Linux pine64 3.10.104-2-pine64-longsleep #113 SMP PREEMPT Thu Dec 15 21:46:07 CET 2016 aarch64 aarch64 aarch64 GNU/Linux[/CODE] But i don't undertand: [QUOTE]Also check the march= and mtune= values. And mcpu=native [/QUOTE] Where is it? |
I did compilation as Alex have suggested. Looks like the problem in multithreading. I've recieved a lot of such error.
[CODE]./src/twopmodq80.c:2957:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ In file included from ./src/types.h:30:0, from ./src/util.h:30, from ./src/factor.h:29, from ./src/twopmodq96.c:23: ./src/platform.h:1072:4: error: #error Multithreading currently only supported for Linux/GCC builds! #error Multithreading currently only supported for Linux/GCC builds! ^ In file included from ./src/types.h:30:0, from ./src/util.h:30, from ./src/factor.h:29, from ./src/twopmodq.c:23: ./src/platform.h:1072:4: error: #error Multithreading currently only supported for Linux/GCC builds! #error Multithreading currently only supported for Linux/GCC builds! ^ [/CODE] Log from terminal: [CODE] ^ ./src/twopmodq80.c:342:2: note: in expansion of macro ‘RSHIFT192’ RSHIFT192(prod192,78,prod192); ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:342:2: note: in expansion of macro ‘RSHIFT192’ RSHIFT192(prod192,78,prod192); ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:342:2: note: in expansion of macro ‘RSHIFT192’ RSHIFT192(prod192,78,prod192); ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:342:2: note: in expansion of macro ‘RSHIFT192’ RSHIFT192(prod192,78,prod192); ^ ./src/twopmodq80.c: In function ‘twopmodq78_3WORD_DOUBLE_q2’: ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:811:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:811:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:811:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:811:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:811:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:811:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:812:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:812:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:812:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:812:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:812:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:812:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/twopmodq80.c: In function ‘twopmodq78_3WORD_DOUBLE_q4’: ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2210:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2210:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2210:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2210:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:2210:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:2210:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2211:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2211:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2211:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2211:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:2211:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:2211:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2212:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2212:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2212:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2212:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:2212:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:2212:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2213:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2213:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2213:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2213:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:2213:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:2213:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/twopmodq80.c: In function ‘twopmodq78_3WORD_DOUBLE_q4_REF’: ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2954:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2954:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2954:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2954:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:2954:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:2954:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q0,lo0,prod192); RSHIFT192(prod192,78,prod192); lo0.d0 = pr ^ ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2955:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2955:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2955:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2955:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:2955:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:2955:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q1,lo1,prod192); RSHIFT192(prod192,78,prod192); lo1.d0 = pr ^ ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2956:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2956:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2956:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2956:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:2956:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:2956:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q2,lo2,prod192); RSHIFT192(prod192,78,prod192); lo2.d0 = pr ^ ./src/imul_macro1.h:538:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2957:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:538:54: warning: left shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d0 >> __n) + ((uint64)__x.d1 << (64-__n));\ ^ ./src/twopmodq80.c:2957:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:539:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2957:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:539:54: warning: left shift count is negative [-Wshift-count-negative] __y.d1 = ((uint64)__x.d1 >> __n) + ((uint64)__x.d2 << (64-__n));\ ^ ./src/twopmodq80.c:2957:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:540:28: warning: right shift count >= width of type [-Wshift-count-overflow] __y.d2 = ((uint64)__x.d2 >> __n);\ ^ ./src/twopmodq80.c:2957:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ ./src/imul_macro1.h:556:28: warning: right shift count is negative [-Wshift-count-negative] __y.d0 = ((uint64)__x.d2 >> (__n-128));\ ^ ./src/twopmodq80.c:2957:38: note: in expansion of macro ‘RSHIFT192’ MUL_LOHI96_PROD192(q3,lo3,prod192); RSHIFT192(prod192,78,prod192); lo3.d0 = pr ^ In file included from ./src/types.h:30:0, from ./src/util.h:30, from ./src/factor.h:29, from ./src/twopmodq96.c:23: ./src/platform.h:1072:4: error: #error Multithreading currently only supported for Linux/GCC builds! #error Multithreading currently only supported for Linux/GCC builds! ^ In file included from ./src/types.h:30:0, from ./src/util.h:30, from ./src/factor.h:29, from ./src/twopmodq.c:23: ./src/platform.h:1072:4: error: #error Multithreading currently only supported for Linux/GCC builds! #error Multithreading currently only supported for Linux/GCC builds! ^ In file included from ./src/types.h:30:0, from ./src/types.c:23: ./src/platform.h:1072:4: error: #error Multithreading currently only supported for Linux/GCC builds! #error Multithreading currently only supported for Linux/GCC builds! ^ In file included from ./src/types.h:30:0, from ./src/threadpool.h:69, from ./src/threadpool.c:65: ./src/platform.h:1072:4: error: #error Multithreading currently only supported for Linux/GCC builds! #error Multithreading currently only supported for Linux/GCC builds! ^ ./src/threadpool.c:265:1: warning: ‘force_align_arg_pointer’ attribute directive ignored [-Wattributes] { ^ ./src/threadpool.c: In function ‘worker_thr_routine’: ./src/threadpool.c:312:20: warning: implicit declaration of function ‘syscall’ [-Wimplicit-function-declaration] pid_t thread_id = syscall (__NR_gettid); ^ ./src/threadpool.c:312:29: error: ‘__NR_gettid’ undeclared (first use in this function) pid_t thread_id = syscall (__NR_gettid); ^ ./src/threadpool.c:312:29: note: each undeclared identifier is reported only once for each function it appears in ./src/threadpool.c:316:2: warning: implicit declaration of function ‘CPU_ZERO’ [-Wimplicit-function-declaration] CPU_ZERO (&cpu_set); ^ ./src/threadpool.c:318:2: warning: implicit declaration of function ‘CPU_SET’ [-Wimplicit-function-declaration] CPU_SET(i, &cpu_set); ^ ./src/threadpool.c:319:12: warning: implicit declaration of function ‘sched_setaffinity’ [-Wimplicit-function-declaration] errcode = sched_setaffinity(thread_id, sizeof(cpu_set), &cpu_set); ^ ./src/threadpool.c:320:90: warning: implicit declaration of function ‘CPU_ISSET’ [-Wimplicit-function-declaration] %u, setaffinity[%d] = %d, ISSET[%d] = %d\n", thread_id,i,errcode,i,CPU_ISSET(i ^ make[1]: *** [NORMAL_O-THREADS_O.stamp] Error 1 make: *** [all] Error 2 ubuntu@pine64:~/Solaris2/mlucas-14.1$ [/CODE] |
Problems related to Aarch64, GCC, autoconf and threads have been noticed through the web :smile: and corrected.
[url]http://people.linaro.org/~rikuvoipio/aarch64-talk/#/11[/url] I am sure the one you are experimenting will be ironed off in a short while. That's the beauty of code porting... |
I guess the problem is that aarch64 (aka arm64) is not in platform.h yet. Maybe we should to add it and see if it works.
|
[QUOTE=Lorenzo;454737]Looks like this device is working in 64 mode. So it's ok.
[CODE]ubuntu@pine64:~/Solaris/mlucas-14.1/src/MY_OBJ$ uname -a Linux pine64 3.10.104-2-pine64-longsleep #113 SMP PREEMPT Thu Dec 15 21:46:07 CET 2016 aarch64 aarch64 aarch64 GNU/Linux[/CODE] But i don't undertand: Where is it?[/QUOTE] They are GCC switches. |
1 Attachment(s)
Good news, I manage to reproduce it on QEMU(aarch64).
|
1 Attachment(s)
Lorenzo, based on your single-file compile attempt it appears your issue is due to the code in platform.h not having the magic predefines for your arch, which then means the associated GCC-predefs are not invoked, which leads to your multithread-related error message.
Please try the following 2 things: [1] Do a quick compile-single-file with the flags set which cause the preprocessor to dump all of its predefines: 'gcc -dM -E - < /dev/null', and post the output here. [2] My (and Dave and Alex's) current working copies of platform.h do have added code for aarch64 recognition - download the attached version of the file and do a 'grep -i aarch64' to see that, then repeat your single-file compile try (not the version in [1], the actual build attempt) with that version of the .h placed in your src-dir. I'm not familiar enough with ARM to understand why -m64 is unsupported in GCC, but correctly handling aarch64 in platform.h will cause the build to be in 64-bit mode. (I had assumed -m64 was needed to trigger the aarch64-related predefs, but your output from [1] will settle that.) |
[QUOTE=ewmayer;454769]Lorenzo, based on your single-file compile attempt it appears your issue is due to the code in platform.h not having the magic predefines for your arch, which then means the associated GCC-predefs are not invoked, which leads to your multithread-related error message.
Please try the following 2 things: [1] Do a quick compile-single-file with the flags set which cause the preprocessor to dump all of its predefines: 'gcc -dM -E - < /dev/null', and post the output here. [2] My (and Dave and Alex's) current working copies of platform.h do have added code for aarch64 recognition - download the attached version of the file and do a 'grep -i aarch64' to see that, then repeat your single-file compile try (not the version in [1], the actual build attempt) with that version of the .h placed in your src-dir. I'm not familiar enough with ARM to understand why -m64 is unsupported in GCC, but correctly handling aarch64 in platform.h will cause the build to be in 64-bit mode. (I had assumed -m64 was needed to trigger the aarch64-related predefs, but your output from [1] will settle that.)[/QUOTE] Ok! I have done! [CODE]ubuntu@pine64:~/Solaris2/mlucas-14.1$ gcc -dM -E - < /dev/null #define __SSP_STRONG__ 3 #define __DBL_MIN_EXP__ (-1021) #define __UINT_LEAST16_MAX__ 0xffff #define __ARM_SIZEOF_WCHAR_T 4 #define __ATOMIC_ACQUIRE 2 #define __FLT_MIN__ 1.17549435082228750796873653722224568e-38F #define __GCC_IEC_559_COMPLEX 2 #define __UINT_LEAST8_TYPE__ unsigned char #define __INTMAX_C(c) c ## L #define __CHAR_BIT__ 8 #define __UINT8_MAX__ 0xff #define __WINT_MAX__ 0xffffffffU #define __ORDER_LITTLE_ENDIAN__ 1234 #define __SIZE_MAX__ 0xffffffffffffffffUL #define __WCHAR_MAX__ 0xffffffffU #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1 #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1 #define __DBL_DENORM_MIN__ ((double)4.94065645841246544176568792868221372e-324L) #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1 #define __GCC_ATOMIC_CHAR_LOCK_FREE 2 #define __GCC_IEC_559 2 #define __FLT_EVAL_METHOD__ 0 #define __unix__ 1 #define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2 #define __UINT_FAST64_MAX__ 0xffffffffffffffffUL #define __SIG_ATOMIC_TYPE__ int #define __DBL_MIN_10_EXP__ (-307) #define __FINITE_MATH_ONLY__ 0 #define __ARM_FEATURE_UNALIGNED 1 #define __GNUC_PATCHLEVEL__ 0 #define __UINT_FAST8_MAX__ 0xff #define __has_include(STR) __has_include__(STR) #define __DEC64_MAX_EXP__ 385 #define __INT8_C(c) c #define __UINT_LEAST64_MAX__ 0xffffffffffffffffUL #define __SHRT_MAX__ 0x7fff #define __LDBL_MAX__ 1.18973149535723176508575932662800702e+4932L #define __ARM_FEATURE_IDIV 1 #define __ARM_FP 14 #define __UINT_LEAST8_MAX__ 0xff #define __GCC_ATOMIC_BOOL_LOCK_FREE 2 #define __UINTMAX_TYPE__ long unsigned int #define __linux 1 #define __DEC32_EPSILON__ 1E-6DF #define __CHAR_UNSIGNED__ 1 #define __UINT32_MAX__ 0xffffffffU #define __AARCH64_CMODEL_SMALL__ 1 #define __LDBL_MAX_EXP__ 16384 #define __WINT_MIN__ 0U #define __linux__ 1 #define __SCHAR_MAX__ 0x7f #define __WCHAR_MIN__ 0U #define __INT64_C(c) c ## L #define __DBL_DIG__ 15 #define __GCC_ATOMIC_POINTER_LOCK_FREE 2 #define __SIZEOF_INT__ 4 #define __SIZEOF_POINTER__ 8 #define __USER_LABEL_PREFIX__ #define __STDC_HOSTED__ 1 #define __LDBL_HAS_INFINITY__ 1 #define __ARM_ALIGN_MAX_STACK_PWR 16 #define __FLT_EPSILON__ 1.19209289550781250000000000000000000e-7F #define __LDBL_MIN__ 3.36210314311209350626267781732175260e-4932L #define __STDC_UTF_16__ 1 #define __DEC32_MAX__ 9.999999E96DF #define __ARM_SIZEOF_MINIMAL_ENUM 4 #define __INT32_MAX__ 0x7fffffff #define __SIZEOF_LONG__ 8 #define __STDC_IEC_559__ 1 #define __STDC_ISO_10646__ 201505L #define __UINT16_C(c) c #define __DECIMAL_DIG__ 36 #define __gnu_linux__ 1 #define __has_include_next(STR) __has_include_next__(STR) #define __LDBL_HAS_QUIET_NAN__ 1 #define __GNUC__ 5 #define __FLT_HAS_DENORM__ 1 #define __SIZEOF_LONG_DOUBLE__ 16 #define __BIGGEST_ALIGNMENT__ 16 #define __DBL_MAX__ ((double)1.79769313486231570814527423731704357e+308L) #define __INT_FAST32_MAX__ 0x7fffffffffffffffL #define __DBL_HAS_INFINITY__ 1 #define __DEC32_MIN_EXP__ (-94) #define __INT_FAST16_TYPE__ long int #define __LDBL_HAS_DENORM__ 1 #define __DEC128_MAX__ 9.999999999999999999999999999999999E6144DL #define __INT_LEAST32_MAX__ 0x7fffffff #define __DEC32_MIN__ 1E-95DF #define __DBL_MAX_EXP__ 1024 #define __DEC128_EPSILON__ 1E-33DL #define __PTRDIFF_MAX__ 0x7fffffffffffffffL #define __STDC_NO_THREADS__ 1 #define __LONG_LONG_MAX__ 0x7fffffffffffffffLL #define __SIZEOF_SIZE_T__ 8 #define __ARM_ALIGN_MAX_PWR 28 #define __SIZEOF_WINT_T__ 4 #define __ARM_FP16_FORMAT_IEEE 1 #define __GXX_ABI_VERSION 1009 #define __FLT_MIN_EXP__ (-125) #define __INT_FAST64_TYPE__ long int #define __FP_FAST_FMAF 1 #define __DBL_MIN__ ((double)2.22507385850720138309023271733240406e-308L) #define __LP64__ 1 #define __aarch64__ 1 #define __ARM_FP16_ARGS 1 #define __DEC128_MIN__ 1E-6143DL #define __REGISTER_PREFIX__ #define __UINT16_MAX__ 0xffff #define __DBL_HAS_DENORM__ 1 #define __UINT8_TYPE__ unsigned char #define __NO_INLINE__ 1 #define __FLT_MANT_DIG__ 24 #define __VERSION__ "5.4.0 20160609" #define __UINT64_C(c) c ## UL #define _STDC_PREDEF_H 1 #define __ARM_FEATURE_FMA 1 #define __GCC_ATOMIC_INT_LOCK_FREE 2 #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__ #define __STDC_IEC_559_COMPLEX__ 1 #define __INT32_C(c) c #define __DEC64_EPSILON__ 1E-15DD #define __ORDER_PDP_ENDIAN__ 3412 #define __DEC128_MIN_EXP__ (-6142) #define __ARM_64BIT_STATE 1 #define __INT_FAST32_TYPE__ long int #define __UINT_LEAST16_TYPE__ short unsigned int #define unix 1 #define __INT16_MAX__ 0x7fff #define __SIZE_TYPE__ long unsigned int #define __UINT64_MAX__ 0xffffffffffffffffUL #define __INT8_TYPE__ signed char #define __ELF__ 1 #define __FLT_RADIX__ 2 #define __INT_LEAST16_TYPE__ short int #define __ARM_ARCH_PROFILE 65 #define __LDBL_EPSILON__ 1.92592994438723585305597794258492732e-34L #define __UINTMAX_C(c) c ## UL #define __ARM_PCS_AAPCS64 1 #define __SIG_ATOMIC_MAX__ 0x7fffffff #define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2 #define __SIZEOF_PTRDIFF_T__ 8 #define __AARCH64EL__ 1 #define __DEC32_SUBNORMAL_MIN__ 0.000001E-95DF #define __INT_FAST16_MAX__ 0x7fffffffffffffffL #define __UINT_FAST32_MAX__ 0xffffffffffffffffUL #define __UINT_LEAST64_TYPE__ long unsigned int #define __FLT_HAS_QUIET_NAN__ 1 #define __FLT_MAX_10_EXP__ 38 #define __LONG_MAX__ 0x7fffffffffffffffL #define __DEC128_SUBNORMAL_MIN__ 0.000000000000000000000000000000001E-6143DL #define __FLT_HAS_INFINITY__ 1 #define __unix 1 #define __UINT_FAST16_TYPE__ long unsigned int #define __DEC64_MAX__ 9.999999999999999E384DD #define __CHAR16_TYPE__ short unsigned int #define __PRAGMA_REDEFINE_EXTNAME 1 #define __INT_LEAST16_MAX__ 0x7fff #define __DEC64_MANT_DIG__ 16 #define __INT64_MAX__ 0x7fffffffffffffffL #define __UINT_LEAST32_MAX__ 0xffffffffU #define __GCC_ATOMIC_LONG_LOCK_FREE 2 #define __INT_LEAST64_TYPE__ long int #define __ARM_FEATURE_CLZ 1 #define __INT16_TYPE__ short int #define __INT_LEAST8_TYPE__ signed char #define __STDC_VERSION__ 201112L #define __DEC32_MAX_EXP__ 97 #define __INT_FAST8_MAX__ 0x7f #define __ARM_ARCH 8 #define __INTPTR_MAX__ 0x7fffffffffffffffL #define linux 1 #define __LDBL_MANT_DIG__ 113 #define __DBL_HAS_QUIET_NAN__ 1 #define __SIG_ATOMIC_MIN__ (-__SIG_ATOMIC_MAX__ - 1) #define __INTPTR_TYPE__ long int #define __UINT16_TYPE__ short unsigned int #define __WCHAR_TYPE__ unsigned int #define __SIZEOF_FLOAT__ 4 #define __UINTPTR_MAX__ 0xffffffffffffffffUL #define __ARM_ARCH_8A 1 #define __DEC64_MIN_EXP__ (-382) #define __INT_FAST64_MAX__ 0x7fffffffffffffffL #define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1 #define __FLT_DIG__ 6 #define __UINT_FAST64_TYPE__ long unsigned int #define __INT_MAX__ 0x7fffffff #define __INT64_TYPE__ long int #define __FLT_MAX_EXP__ 128 #define __ORDER_BIG_ENDIAN__ 4321 #define __DBL_MANT_DIG__ 53 #define __INT_LEAST64_MAX__ 0x7fffffffffffffffL #define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2 #define __DEC64_MIN__ 1E-383DD #define __WINT_TYPE__ unsigned int #define __UINT_LEAST32_TYPE__ unsigned int #define __SIZEOF_SHORT__ 2 #define __LDBL_MIN_EXP__ (-16381) #define __INT_LEAST8_MAX__ 0x7f #define __SIZEOF_INT128__ 16 #define __LDBL_MAX_10_EXP__ 4932 #define __ATOMIC_RELAXED 0 #define __DBL_EPSILON__ ((double)2.22044604925031308084726333618164062e-16L) #define _LP64 1 #define __UINT8_C(c) c #define __INT_LEAST32_TYPE__ int #define __SIZEOF_WCHAR_T__ 4 #define __UINT64_TYPE__ long unsigned int #define __ARM_NEON 1 #define __INT_FAST8_TYPE__ signed char #define __GNUC_STDC_INLINE__ 1 #define __DBL_DECIMAL_DIG__ 17 #define __STDC_UTF_32__ 1 #define __DEC_EVAL_METHOD__ 2 #define __UINT32_C(c) c ## U #define __INTMAX_MAX__ 0x7fffffffffffffffL #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ #define __FLT_DENORM_MIN__ 1.40129846432481707092372958328991613e-45F #define __INT8_MAX__ 0x7f #define __UINT_FAST32_TYPE__ long unsigned int #define __CHAR32_TYPE__ unsigned int #define __FLT_MAX__ 3.40282346638528859811704183484516925e+38F #define __FP_FAST_FMA 1 #define __ARM_FEATURE_NUMERIC_MAXMIN 1 #define __INT32_TYPE__ int #define __SIZEOF_DOUBLE__ 8 #define __FLT_MIN_10_EXP__ (-37) #define __INTMAX_TYPE__ long int #define __DEC128_MAX_EXP__ 6145 #define __ATOMIC_CONSUME 1 #define __GNUC_MINOR__ 4 #define __UINTMAX_MAX__ 0xffffffffffffffffUL #define __DEC32_MANT_DIG__ 7 #define __DBL_MAX_10_EXP__ 308 #define __LDBL_DENORM_MIN__ 6.47517511943802511092443895822764655e-4966L #define __INT16_C(c) c #define __ARM_ARCH_ISA_A64 1 #define __STDC__ 1 #define __PTRDIFF_TYPE__ long int #define __ATOMIC_SEQ_CST 5 #define __UINT32_TYPE__ unsigned int #define __UINTPTR_TYPE__ long unsigned int #define __DEC64_SUBNORMAL_MIN__ 0.000000000000001E-383DD #define __DEC128_MANT_DIG__ 34 #define __LDBL_MIN_10_EXP__ (-4931) #define __SIZEOF_LONG_LONG__ 8 #define __GCC_ATOMIC_LLONG_LOCK_FREE 2 #define __LDBL_DIG__ 33 #define __FLT_DECIMAL_DIG__ 9 #define __UINT_FAST16_MAX__ 0xffffffffffffffffUL #define __GCC_ATOMIC_SHORT_LOCK_FREE 2 #define __UINT_FAST8_TYPE__ unsigned char #define __ATOMIC_ACQ_REL 4 #define __ATOMIC_RELEASE 3 [/CODE] |
When I have tried to compile with new version platform.h i get huge count of next errors:
[CODE] ./src/platform.h:89:1: error: stray ‘\203’ in program ./src/platform.h:89:1: error: stray ‘\313’ in program ./src/platform.h:89:1: error: stray ‘\254’ in program ./src/platform.h:89:1: error: stray ‘\336’ in program ./src/platform.h:89:1: error: stray ‘\306’ in program ./src/platform.h:89:1: error: stray ‘\327’ in program ./src/platform.h:89:1: error: stray ‘\263’ in program ./src/platform.h:89:115: error: stray ‘@’ in program ./src/platform.h:89:1: error: stray ‘\340’ in program ./src/platform.h:89:1: error: stray ‘\270’ in program ./src/platform.h:89:1: error: stray ‘\253’ in program ./src/platform.h:89:1: error: stray ‘\305’ in program ./src/platform.h:89:1: error: stray ‘\201’ in program ./src/platform.h:89:1: error: stray ‘\253’ in program ./src/platform.h:89:1: error: stray ‘\346’ in program ./src/platform.h:89:1: error: stray ‘\2’ in program ./src/platform.h:89:1: error: stray ‘\31’ in program ./src/platform.h:89:133: error: stray ‘@’ in program ./src/platform.h:89:1: error: stray ‘\5’ in program ./src/platform.h:89:1: error: stray ‘\20’ in program ./src/platform.h:89:1: error: stray ‘\260’ in program ./src/platform.h:89:1: error: stray ‘\375’ in program ./src/platform.h:89:1: error: stray ‘\250’ in program ./src/platform.h:89:1: error: stray ‘\225’ in program ./src/platform.h:89:1: error: stray ‘\6’ in program ./src/platform.h:89:1: error: stray ‘\243’ in program ./src/platform.h:89:1: error: stray ‘\26’ in program ./src/platform.h:89:1: error: stray ‘\20’ in program ./src/platform.h:89:1: error: stray ‘\262’ in program ./src/platform.h:89:1: error: stray ‘\22’ in program ./src/platform.h:89:1: error: stray ‘\20’ in program ./src/platform.h:89:1: error: stray ‘\16’ in program ./src/platform.h:89:1: error: stray ‘\30’ in program ./src/platform.h:89:1: error: stray ‘\222’ in program ./src/platform.h:89:1: error: stray ‘\252’ in program ./src/platform.h:89:1: error: stray ‘\231’ in program ./src/platform.h:89:1: error: stray ‘\347’ in program ./src/platform.h:89:1: error: stray ‘\24’ in program ./src/platform.h:89:1: error: stray ‘\355’ in program ./src/platform.h:89:1: error: stray ‘\245’ in program ./src/platform.h:89:1: error: stray ‘\31’ in program ./src/platform.h:89:1: error: stray ‘\236’ in program ./src/platform.h:89:1: error: stray ‘\203’ in program ./src/platform.h:89:1: error: stray ‘\231’ in program ./src/platform.h:89:1: error: stray ‘\324’ in program ./src/platform.h:89:1: error: stray ‘\345’ in program ./src/platform.h:89:1: error: stray ‘\236’ in program ./src/platform.h:89:1: error: stray ‘\7’ in program ./src/platform.h:89:1: error: stray ‘\361’ in program ./src/platform.h:89:1: error: stray ‘\224’ in program ./src/platform.h:89:1: error: stray ‘\24’ in program ./src/platform.h:89:1: error: stray ‘\7’ in program ./src/platform.h:89:1: error: stray ‘\247’ in program ./src/platform.h:89:1: error: stray ‘\356’ in program ./src/platform.h:89:1: error: stray ‘\355’ in program ./src/platform.h:89:1: error: stray ‘\331’ in program ./src/platform.h:89:1: error: stray ‘\313’ in program ./src/platform.h:89:1: error: stray ‘\325’ in program ./src/platform.h:89:1: error: stray ‘\253’ in program ./src/platform.h:89:1: error: stray ‘\10’ in program ./src/platform.h:89:1: error: stray ‘\246’ in program ./src/platform.h:89:1: error: stray ‘\233’ in program ./src/platform.h:89:1: error: stray ‘\342’ in program ./src/platform.h:89:1: error: stray ‘\366’ in program ./src/platform.h:89:1: error: stray ‘\242’ in program ./src/platform.h:89:1: error: stray ‘\321’ in program ./src/platform.h:89:1: error: stray ‘\323’ in program ./src/platform.h:89:1: error: stray ‘\264’ in program ./src/platform.h:89:1: error: stray ‘\240’ in program ./src/platform.h:89:1: error: stray ‘\224’ in program ./src/platform.h:89:1: error: stray ‘\366’ in program ./src/platform.h:89:1: error: stray ‘\30’ in program ./src/platform.h:89:1: error: stray ‘\345’ in program ./src/platform.h:89:1: error: stray ‘\305’ in program ./src/platform.h:89:1: error: stray ‘\263’ in program ./src/platform.h:89:1: error: stray ‘\251’ in program ./src/platform.h:89:1: error: stray ‘\212’ in program ./src/platform.h:89:1: error: stray ‘\347’ in program ./src/platform.h:89:1: error: stray ‘\22’ in program ./src/platform.h:89:1: error: stray ‘\23’ in program ./src/platform.h:89:1: error: stray ‘\32’ in program ./src/platform.h:89:235: error: expected identifier or ‘(’ before ‘)’ token ./src/platform.h:89:235: error: stray ‘\221’ in program ./src/platform.h:89:240: warning: missing terminating " character ./src/platform.h:89:235: error: missing terminating " character ./src/platform.h:90:1: error: stray ‘\264’ in program ./src/platform.h:90:1: error: stray ‘\334’ in program ./src/platform.h:90:1: error: stray ‘\334’ in program ./src/platform.h:90:1: error: stray ‘\322’ in program ./src/platform.h:90:1: error: stray ‘\362’ in program ./src/platform.h:90:1: error: stray ‘\271’ in program ./src/platform.h:90:1: error: stray ‘\277’ in program ./src/platform.h:90:1: error: stray ‘\335’ in program ./src/platform.h:90:1: error: stray ‘\326’ in program ./src/platform.h:90:1: error: stray ‘\262’ in program ./src/platform.h:90:1: error: stray ‘\374’ in program ./src/platform.h:90:1: error: stray ‘\355’ in program ./src/platform.h:90:1: error: stray ‘\266’ in program ./src/platform.h:90:1: error: stray ‘\227’ in program ./src/platform.h:90:1: error: stray ‘\265’ in program ./src/platform.h:90:1: error: stray ‘\351’ in program ./src/platform.h:90:1: error: stray ‘\267’ in program ./src/platform.h:90:1: error: stray ‘\263’ in program ./src/platform.h:90:1: error: stray ‘\254’ in program ./src/platform.h:90:1: error: stray ‘\335’ in program ./src/platform.h:90:1: error: stray ‘\253’ in program ./src/platform.h:90:1: error: stray ‘\314’ in program ./src/platform.h:90:1: error: stray ‘\23’ in program ./src/platform.h:90:1: error: stray ‘\255’ in program ./src/platform.h:90:1: error: stray ‘\303’ in program ./src/platform.h:90:1: error: stray ‘\301’ in program ./src/platform.h:90:1: error: stray ‘\360’ in program ./src/platform.h:90:1: error: stray ‘\231’ in program ./src/platform.h:90:1: error: stray ‘\316’ in program ./src/platform.h:90:1: error: stray ‘\7’ in program ./src/platform.h:90:1: error: stray ‘\256’ in program ./src/platform.h:90:1: error: stray ‘\321’ in program ./src/platform.h:90:1: error: stray ‘\256’ in program ./src/platform.h:90:1: error: stray ‘\243’ in program ./src/platform.h:90:1: error: stray ‘\240’ in program ./src/platform.h:90:1: error: stray ‘\311’ in program ./src/platform.h:90:1: error: stray ‘\211’ in program ./src/platform.h:90:1: error: stray ‘\220’ in program ./src/platform.h:90:52: warning: missing terminating " character ./src/platform.h:90:1: error: missing terminating " character ./src/platform.h:91:1: error: stray ‘\305’ in program ./src/platform.h:91:1: error: stray ‘\260’ in program ./src/platform.h:91:1: error: stray ‘\220’ in program ./src/platform.h:91:1: error: stray ‘\262’ in program ./src/platform.h:91:1: error: stray ‘\262’ in program ./src/platform.h:91:1: error: stray ‘\274’ in program ./src/platform.h:91:10: warning: null character(s) ignored ./src/platform.h:91:1: error: stray ‘\266’ in program ./src/platform.h:91:1: error: stray ‘\367’ in program ./src/platform.h:91:1: error: stray ‘\327’ in program ./src/platform.h:91:1: error: stray ‘\217’ in program ./src/platform.h:91:1: error: stray ‘\313’ in program ./src/platform.h:91:1: error: stray ‘\200’ in program ./src/platform.h:91:19: warning: null character(s) ignored ./src/platform.h:91:1: error: stray ‘\264’ in program ./src/platform.h:91:1: error: stray ‘\207’ in program ./src/platform.h:91:1: error: stray ‘\313’ in program ./src/platform.h:91:1: error: stray ‘\252’ in program ./src/platform.h:91:1: error: stray ‘\321’ in program ./src/platform.h:91:1: error: stray ‘\201’ in program ./src/platform.h:91:29: error: invalid suffix "w" on integer constant ./src/platform.h:91:1: error: stray ‘\377’ in program ./src/platform.h:91:1: error: stray ‘\26’ in program ./src/platform.h:91:1: error: stray ‘\373’ in program ./src/platform.h:91:1: error: stray ‘\321’ in program ./src/platform.h:91:1: error: stray ‘\10’ in program ./src/platform.h:91:1: error: stray ‘\213’ in program ./src/platform.h:91:1: error: stray ‘\201’ in program ./src/platform.h:91:1: error: stray ‘\210’ in program ./src/platform.h:91:1: error: stray ‘\340’ in program ./src/platform.h:91:1: error: stray ‘\236’ in program ./src/platform.h:91:43: error: stray ‘@’ in program ./src/platform.h:91:1: error: stray ‘\254’ in program ./src/platform.h:91:1: error: stray ‘\241’ in program ./src/platform.h:91:1: error: stray ‘\3’ in program ./src/platform.h:91:1: error: stray ‘\214’ in program ./src/platform.h:91:1: error: stray ‘\315’ in program ./src/platform.h:91:1: error: stray ‘\342’ in program ./src/platform.h:91:1: error: stray ‘\340’ in program ./src/platform.h:91:1: error: stray ‘\4’ in program ./src/platform.h:91:1: error: stray ‘\220’ in program ./src/platform.h:91:1: error: stray ‘\330’ in program ./src/platform.h:91:1: error: stray ‘\260’ in program ./src/platform.h:91:1: error: stray ‘\356’ in program ./src/platform.h:91:1: error: stray ‘\343’ in program ./src/platform.h:91:1: error: stray ‘\366’ in program ./src/platform.h:91:1: error: stray ‘\26’ in program ./src/platform.h:91:1: error: stray ‘\6’ in program ./src/platform.h:91:1: error: stray ‘\21’ in program ./src/platform.h:91:1: error: stray ‘\363’ in program ./src/platform.h:91:1: error: stray ‘\375’ in program ./src/platform.h:91:1: error: stray ‘\234’ in program ./src/platform.h:91:1: error: stray ‘\241’ in program ./src/platform.h:91:1: error: stray ‘\246’ in program ./src/platform.h:91:1: error: stray ‘\367’ in program ./src/platform.h:91:1: error: stray ‘\361’ in program ./src/platform.h:91:1: error: stray ‘\335’ in program ./src/platform.h:91:1: error: stray ‘\330’ in program ./src/platform.h:91:1: error: stray ‘\212’ in program ./src/platform.h:91:1: error: stray ‘\177’ in program ./src/platform.h:91:1: error: stray ‘\330’ in program ./src/platform.h:91:1: error: stray ‘\367’ in program ./src/platform.h:91:1: error: stray ‘\37’ in program ./src/platform.h:91:1: error: stray ‘\233’ in program ./src/platform.h:91:1: error: stray ‘\341’ in program ./src/platform.h:91:1: error: stray ‘\243’ in program ./src/platform.h:91:1: error: stray ‘\303’ in program ./src/platform.h:91:1: error: stray ‘\275’ in program ./src/platform.h:91:1: error: stray ‘\3’ in program ./src/platform.h:91:91: error: stray ‘@’ in program ./src/platform.h:91:1: error: stray ‘\201’ in program ./src/platform.h:91:1: error: stray ‘\262’ in program ./src/platform.h:91:1: error: stray ‘\327’ in program ./src/platform.h:91:1: error: stray ‘\1’ in program ./src/platform.h:91:1: error: stray ‘\263’ in program ./src/platform.h:91:1: error: stray ‘\335’ in program ./src/platform.h:91:1: error: stray ‘\5’ in program ./src/platform.h:91:1: error: stray ‘\227’ in program ./src/platform.h:91:1: error: stray ‘\2’ in program ./src/platform.h:91:103: warning: missing terminating ' character ./src/platform.h:91:1: error: missing terminating ' character ./src/platform.h:92:1: error: stray ‘\244’ in program ./src/platform.h:92:1: error: stray ‘\6’ in program ./src/platform.h:92:1: error: stray ‘\223’ in program ./src/platform.h:92:1: error: stray ‘`’ in program ./src/platform.h:92:1: error: stray ‘\221’ in program ./src/platform.h:93:1: error: stray ‘\22’ in program ./src/platform.h:93:1: error: stray ‘\370’ in program ./src/platform.h:93:1: error: stray ‘\324’ in program ./src/platform.h:93:1: error: stray ‘\311’ in program ./src/platform.h:93:1: error: stray ‘\270’ in program ./src/platform.h:93:1: error: stray ‘\221’ in program ./src/platform.h:93:1: error: stray ‘\24’ in program ./src/platform.h:93:1: error: stray ‘\376’ in program ./src/platform.h:93:1: error: stray ‘\211’ in program ./src/platform.h:93:15: error: invalid suffix "de" on integer constant ./src/platform.h:93:1: error: stray ‘\’ in program ./src/platform.h:93:1: error: stray ‘\221’ in program ./src/platform.h:93:1: error: stray ‘\331’ in program ./src/platform.h:93:1: error: stray ‘\203’ in program ./src/platform.h:93:1: error: stray ‘\230’ in program ./src/platform.h:93:1: error: stray ‘\30’ in program ./src/platform.h:93:1: error: stray ‘\23’ in program ./src/platform.h:93:1: error: stray ‘\333’ in program ./src/platform.h:93:1: error: stray ‘\251’ in program ./src/platform.h:93:1: error: stray ‘\26’ in program ./src/platform.h:93:1: error: stray ‘\225’ in program ./src/platform.h:93:1: error: stray ‘\300’ in program ./src/platform.h:93:1: error: stray ‘\251’ in program ./src/platform.h:93:1: error: stray ‘\333’ in program ./src/platform.h:93:1: error: stray ‘\262’ in program ./src/platform.h:93:1: error: stray ‘\1’ in program ./src/platform.h:93:1: error: stray ‘\237’ in program ./src/platform.h:93:1: error: stray ‘\204’ in program ./src/platform.h:93:1: error: stray ‘\255’ in program ./src/platform.h:93:1: error: stray ‘\214’ in program ./src/platform.h:93:1: error: stray ‘\27’ in program ./src/platform.h:93:1: error: stray ‘\247’ in program ./src/platform.h:93:1: error: stray ‘\343’ in program ./src/platform.h:93:1: error: stray ‘\177’ in program ./src/platform.h:93:1: error: stray ‘\210’ in program ./src/platform.h:93:1: error: stray ‘\346’ in program ./src/platform.h:93:1: error: stray ‘\362’ in program ./src/platform.h:93:1: error: stray ‘\6’ in program ./src/platform.h:93:1: error: stray ‘\254’ in program ./src/platform.h:93:1: error: stray ‘\233’ in program ./src/platform.h:93:1: error: stray ‘\204’ in program ./src/platform.h:93:1: error: stray ‘\243’ in program ./src/platform.h:93:1: error: stray ‘\210’ in program ./src/platform.h:93:1: error: stray ‘\352’ in program ./src/platform.h:93:1: error: stray ‘\346’ in program ./src/platform.h:93:1: error: stray ‘\223’ in program ./src/platform.h:93:1: error: stray ‘\330’ in program ./src/platform.h:93:1: error: stray ‘\20’ in program ./src/platform.h:93:1: error: stray ‘\205’ in program ./src/platform.h:93:1: error: stray ‘\367’ in program ./src/platform.h:93:1: error: stray ‘\271’ in program ./src/platform.h:93:1: error: stray ‘\305’ in program ./src/platform.h:93:1: error: stray ‘\207’ in program ./src/platform.h:93:1: error: stray ‘\35’ in program ./src/platform.h:93:1: error: stray ‘\6’ in program ./src/platform.h:93:100: error: stray ‘#’ in program ./src/platform.h:93:1: error: stray ‘\361’ in program ./src/platform.h:93:1: error: stray ‘\346’ in program ./src/platform.h:93:1: error: stray ‘\236’ in program ./src/platform.h:93:1: error: stray ‘\177’ in program ./src/platform.h:93:1: error: stray ‘\370’ in program ./src/platform.h:93:1: error: stray ‘\230’ in program ./src/platform.h:94:1: error: stray ‘\303’ in program ./src/platform.h:94:1: error: stray ‘\376’ in program ./src/platform.h:94:1: error: stray ‘\264’ in program ./src/platform.h:94:1: error: stray ‘\26’ in program ./src/platform.h:94:1: error: stray ‘\366’ in program ./src/platform.h:94:1: error: stray ‘\213’ in program ./src/platform.h:94:1: error: stray ‘\205’ in program ./src/platform.h:94:1: error: stray ‘\366’ in program ./src/platform.h:94:1: error: stray ‘\354’ in program ./src/platform.h:94:1: error: stray ‘\352’ in program ./src/platform.h:94:1: error: stray ‘\347’ in program ./src/platform.h:94:1: error: stray ‘\260’ in program ./src/platform.h:94:1: error: stray ‘\205’ in program ./src/platform.h:94:1: error: stray ‘\6’ in program ./src/platform.h:94:1: error: stray ‘\347’ in program ./src/platform.h:94:1: error: stray ‘\244’ in program ./src/platform.h:94:1: error: stray ‘\317’ in program ./src/platform.h:94:22: error: stray ‘#’ in program ./src/platform.h:94:1: error: stray ‘\331’ in program ./src/platform.h:94:1: error: stray ‘\276’ in program ./src/platform.h:94:1: error: stray ‘\322’ in program ./src/platform.h:94:1: error: stray ‘\264’ in program ./src/platform.h:94:1: error: stray ‘\24’ in program ./src/platform.h:94:1: error: stray ‘\7’ in program ./src/platform.h:94:1: error: stray ‘\1’ in program ./src/platform.h:94:1: error: stray ‘\252’ in program ./src/platform.h:94:1: error: stray ‘\371’ in program ./src/platform.h:95:1: error: stray ‘\200’ in program ./src/platform.h:95:1: error: stray ‘\335’ in program ./src/platform.h:95:1: error: stray ‘\303’ in program ./src/platform.h:95:6: error: stray ‘@’ in program ./src/platform.h:95:1: error: stray ‘\360’ in program ./src/platform.h:95:1: error: stray ‘\347’ in program ./src/platform.h:95:1: error: stray ‘\25’ in program ./src/platform.h:95:1: error: stray ‘\261’ in program ./src/platform.h:95:1: error: stray ‘\265’ in program ./src/platform.h:95:1: error: stray ‘\261’ in program ./src/platform.h:95:1: error: stray ‘\271’ in program ./src/platform.h:95:1: error: stray ‘\340’ in program ./src/platform.h:95:1: error: stray ‘\217’ in program ./src/platform.h:95:1: error: stray ‘\235’ in program ./src/platform.h:95:1: error: stray ‘\212’ in program ./src/platform.h:95:1: error: stray ‘\26’ in program ./src/platform.h:95:1: error: stray ‘\272’ in program ./src/platform.h:95:1: error: stray ‘\353’ in program ./src/platform.h:95:1: error: stray ‘\260’ in program ./src/platform.h:95:1: error: stray ‘\203’ in program ./src/platform.h:96:1: error: stray ‘\352’ in program ./src/platform.h:96:1: error: stray ‘\303’ in program ./src/platform.h:96:1: error: stray ‘\373’ in program ./src/platform.h:96:1: error: stray ‘\376’ in program ./src/platform.h:96:1: error: stray ‘\360’ in program ./src/platform.h:96:1: error: stray ‘\347’ in program ./src/platform.h:96:1: error: stray ‘\32’ in program ./src/platform.h:96:1: error: stray ‘\324’ in program ./src/platform.h:96:1: error: stray ‘\340’ in program ./src/platform.h:96:1: error: stray ‘`’ in program ./src/platform.h:96:1: error: stray ‘\243’ in program ./src/platform.h:96:1: error: stray ‘\262’ in program ./src/platform.h:96:1: error: stray ‘\347’ in program ./src/platform.h:96:1: error: stray ‘\352’ in program ./src/platform.h:96:1: error: stray ‘\217’ in program ./src/platform.h:96:1: error: stray ‘\21’ in program ./src/platform.h:97:1: error: stray ‘\305’ in program ./src/platform.h:97:1: error: stray ‘\255’ in program ./src/platform.h:97:1: error: stray ‘\312’ in program ./src/platform.h:97:9: error: stray ‘#’ in program ./src/platform.h:97:1: error: stray ‘\372’ in program ./src/platform.h:97:1: error: stray ‘\375’ in program ./src/platform.h:97:1: error: stray ‘\202’ in program ./src/platform.h:97:1: error: stray ‘\24’ in program ./src/platform.h:97:1: error: stray ‘\323’ in program ./src/platform.h:97:1: error: stray ‘\260’ in program ./src/platform.h:97:19: warning: null character(s) ignored ./src/platform.h:97:1: error: stray ‘\232’ in program ./src/platform.h:97:1: error: stray ‘\35’ in program ./src/platform.h:97:1: error: stray ‘\274’ in program ./src/platform.h:97:1: error: stray ‘\352’ in program ./src/platform.h:97:1: error: stray ‘\204’ in program ./src/platform.h:97:1: error: stray ‘\364’ in program ./src/platform.h:97:1: error: stray ‘\261’ in program ./src/platform.h:97:1: error: stray ‘\346’ in program ./src/platform.h:97:1: error: stray ‘\320’ in program ./src/platform.h:97:1: error: stray ‘\277’ in program ./src/platform.h:97:37: warning: missing terminating " character ./src/platform.h:97:1: error: missing terminating " character ./src/platform.h:98:1: error: stray ‘\353’ in program ./src/platform.h:98:1: error: stray ‘\320’ in program ./src/platform.h:98:3: error: invalid suffix "F" on integer constant ./src/platform.h:98:1: error: stray ‘\254’ in program ./src/platform.h:98:1: error: stray ‘\341’ in program ./src/platform.h:98:1: error: stray ‘\337’ in program ./src/platform.h:98:1: error: stray ‘\201’ in program ./src/platform.h:98:1: error: stray ‘\250’ in program ./src/platform.h:98:1: error: stray ‘\16’ in program ./src/platform.h:98:1: error: stray ‘\206’ in program ./src/platform.h:98:1: error: stray ‘\360’ in program ./src/platform.h:98:1: error: stray ‘\351’ in program ./src/platform.h:98:1: error: stray ‘\252’ in program ./src/platform.h:98:1: error: stray ‘\203’ in program ./src/platform.h:98:1: error: stray ‘\356’ in program ./src/platform.h:98:1: error: stray ‘\200’ in program ./src/platform.h:98:1: error: stray ‘\377’ in program ./src/platform.h:98:1: error: stray ‘\216’ in program ./src/platform.h:98:1: error: stray ‘\233’ in program ./src/platform.h:98:1: error: stray ‘\177’ in program ./src/platform.h:98:1: error: stray ‘\201’ in program ./src/platform.h:98:1: error: stray ‘\3’ in program ./src/platform.h:98:1: error: stray ‘\275’ in program ./src/platform.h:98:1: error: stray ‘\236’ in program ./src/platform.h:98:1: error: stray ‘\4’ in program ./src/platform.h:98:1: error: stray ‘\303’ in program ./src/platform.h:98:1: error: stray ‘\360’ in program ./src/platform.h:98:1: error: stray ‘\232’ in program ./src/platform.h:98:1: error: stray ‘\374’ in program ./src/platform.h:98:1: error: stray ‘\2’ in program ./src/platform.h:98:39: warning: character constant too long for its type ./src/platform.h:98:1: error: stray ‘\250’ in program ./src/platform.h:98:1: error: stray ‘\3’ in program ./src/platform.h:98:1: error: stray ‘\330’ in program ./src/platform.h:98:1: error: stray ‘\225’ in program ./src/platform.h:98:1: error: stray ‘\36’ in program ./src/platform.h:98:1: error: stray ‘\257’ in program ./src/platform.h:98:1: error: stray ‘\31’ in program ./src/platform.h:98:1: error: stray ‘\344’ in program ./src/platform.h:98:1: error: stray ‘\20’ in program ./src/platform.h:98:1: error: stray ‘\345’ in program ./src/platform.h:98:1: error: stray ‘\10’ in program ./src/platform.h:98:1: error: stray ‘\216’ in program ./src/platform.h:98:1: error: stray ‘\355’ in program ./src/platform.h:98:1: error: stray ‘\323’ in program ./src/platform.h:98:1: error: stray ‘\304’ in program ./src/platform.h:98:128: warning: missing terminating ' character ./src/platform.h:98:1: error: missing terminating ' character ./src/platform.h:99:1: warning: missing terminating ' character ./src/platform.h:99:1: error: missing terminating ' character ./src/platform.h:100:1: error: stray ‘\271’ in program ./src/platform.h:100:1: error: stray ‘\351’ in program ./src/platform.h:100:1: error: stray ‘\376’ in program ./src/platform.h:100:1: error: stray ‘\277’ in program ./src/platform.h:100:1: error: stray ‘\31’ in program ./src/platform.h:100:1: error: stray ‘\335’ in program ./src/platform.h:100:1: error: stray ‘\177’ in program ./src/platform.h:100:1: error: stray ‘\275’ in program ./src/platform.h:100:1: error: stray ‘\202’ in program ./src/platform.h:100:1: error: stray ‘\32’ in program ./src/platform.h:100:1: error: stray ‘\26’ in program ./src/platform.h:100:1: error: stray ‘\331’ in program ./src/platform.h:100:1: error: stray ‘\276’ in program ./src/platform.h:100:1: error: stray ‘\226’ in program ./src/platform.h:100:1: error: stray ‘\346’ in program ./src/platform.h:100:1: error: stray ‘\342’ in program ./src/platform.h:100:1: error: stray ‘\355’ in program ./src/platform.h:100:1: error: stray ‘\247’ in program ./src/platform.h:100:1: error: stray ‘\273’ in program ./src/platform.h:100:1: error: stray ‘\304’ in program ./src/platform.h:100:1: error: stray ‘\243’ in program ./src/platform.h:100:1: error: stray ‘\345’ in program ./src/platform.h:100:1: error: stray ‘\37’ in program ./src/platform.h:100:1: error: stray ‘\213’ in program ./src/platform.h:100:1: error: stray ‘\227’ in program ./src/platform.h:100:1: error: stray ‘\357’ in program ./src/platform.h:100:1: error: stray ‘\21’ in program ./src/platform.h:100:1: error: stray ‘\210’ in program ./src/platform.h:100:1: error: stray ‘\30’ in program ./src/platform.h:100:1: error: stray ‘\253’ in program ./src/platform.h:100:1: error: stray ‘\345’ in program ./src/platform.h:100:1: error: stray ‘\350’ in program ./src/platform.h:100:1: error: stray ‘\246’ in program ./src/platform.h:100:1: error: stray ‘\260’ in program ./src/platform.h:100:1: error: stray ‘\337’ in program ./src/platform.h:100:1: error: stray ‘\350’ in program ./src/platform.h:100:1: error: stray ‘\373’ in program ./src/platform.h:100:1: error: stray ‘\374’ in program ./src/platform.h:100:1: error: stray ‘\325’ in program ./src/platform.h:100:1: error: stray ‘\355’ in program ./src/platform.h:100:1: error: stray ‘\235’ in program ./src/platform.h:100:1: error: stray ‘\272’ in program ./src/platform.h:100:1: error: stray ‘\353’ in program ./src/platform.h:100:1: error: stray ‘\343’ in program ./src/platform.h:100:1: error: stray ‘\270’ in program ./src/platform.h:100:1: error: stray ‘\334’ in program ./src/platform.h:100:1: error: stray ‘\251’ in program ./src/platform.h:100:1: error: stray ‘\233’ in program ./src/platform.h:100:79: warning: null character(s) ignored ./src/platform.h:100:1: error: stray ‘\201’ in program ./src/platform.h:100:1: error: stray ‘\265’ in program ./src/platform.h:100:1: error: stray ‘\377’ in program ./src/platform.h:100:1: error: stray ‘\22’ in program ./src/platform.h:100:1: error: stray ‘\316’ in program ./src/platform.h:100:1: error: stray ‘\343’ in program ./src/platform.h:100:1: error: stray ‘\305’ in program ./src/platform.h:100:1: error: stray ‘\353’ in program ./src/platform.h:100:1: error: stray ‘\256’ in program ./src/platform.h:100:1: error: stray ‘\334’ in program ./src/platform.h:100:1: error: stray ‘\216’ in program ./src/platform.h:100:1: error: stray ‘\277’ in program ./src/platform.h:100:1: error: stray ‘\313’ in program ./src/platform.h:100:1: error: stray ‘\302’ in program ./src/platform.h:100:1: error: stray ‘\204’ in program ./src/platform.h:100:1: error: stray ‘\372’ in program ./src/platform.h:100:1: error: stray ‘\241’ in program ./src/platform.h:100:1: error: stray ‘\247’ in program ./src/platform.h:100:1: error: stray ‘\274’ in program ./src/platform.h:100:1: error: stray ‘\206’ in program ./src/platform.h:100:1: error: stray ‘\267’ in program ./src/platform.h:100:1: error: stray ‘\300’ in program ./src/platform.h:100:1: error: stray ‘\35’ in program ./src/platform.h:100:1: error: stray ‘\357’ in program ./src/platform.h:100:1: error: stray ‘\301’ in program ./src/platform.h:100:1: error: stray ‘\262’ in program ./src/platform.h:100:1: error: stray ‘`’ in program ./src/platform.h:100:1: error: stray ‘\363’ in program ./src/platform.h:100:1: error: stray ‘\343’ in program ./src/platform.h:100:1: error: stray ‘\353’ in program ./src/platform.h:100:1: error: stray ‘\226’ in program ./src/platform.h:100:1: error: stray ‘\372’ in program ./src/platform.h:100:1: error: stray ‘\267’ in program ./src/platform.h:100:1: error: stray ‘\341’ in program ./src/platform.h:100:1: error: stray ‘`’ in program ./src/platform.h:100:1: error: stray ‘\214’ in program ./src/platform.h:100:1: error: stray ‘\272’ in program ./src/platform.h:100:1: error: stray ‘\320’ in program ./src/platform.h:100:1: error: stray ‘\230’ in program ./src/platform.h:100:1: error: stray ‘\263’ in program ./src/platform.h:100:1: error: stray ‘\370’ in program ./src/platform.h:100:1: error: stray ‘\377’ in program ./src/platform.h:100:1: error: stray ‘\27’ in program ./src/platform.h:100:144: error: stray ‘@’ in program ./src/platform.h:100:1: error: stray ‘\266’ in program ./src/platform.h:100:146: warning: null character(s) ignored Makefile:2984: recipe for target 'NORMAL_O-THREADS_O.stamp' failed make[1]: *** [NORMAL_O-THREADS_O.stamp] Error 1 make[1]: Leaving directory '/home/ubuntu/Solaris2/mlucas-14.1' Makefile:2084: recipe for target 'all' failed make: *** [all] Error 2 [/CODE] |
And log for single compilation:
[CODE]../platform.h:89:1: error: stray ‘\34’ in program ../platform.h:89:1: error: stray ‘\203’ in program ../platform.h:89:1: error: stray ‘\265’ in program ../platform.h:89:1: error: stray ‘\254’ in program ../platform.h:89:1: error: stray ‘\221’ in program ../platform.h:89:102: error: invalid suffix "W" on integer constant ../platform.h:89:1: error: stray ‘\367’ in program ../platform.h:89:1: error: stray ‘\203’ in program ../platform.h:89:1: error: stray ‘\313’ in program ../platform.h:89:1: error: stray ‘\254’ in program ../platform.h:89:1: error: stray ‘\336’ in program ../platform.h:89:1: error: stray ‘\306’ in program ../platform.h:89:1: error: stray ‘\327’ in program ../platform.h:89:1: error: stray ‘\263’ in program ../platform.h:89:115: error: stray ‘@’ in program ../platform.h:89:1: error: stray ‘\340’ in program ../platform.h:89:1: error: stray ‘\270’ in program ../platform.h:89:1: error: stray ‘\253’ in program ../platform.h:89:1: error: stray ‘\305’ in program ../platform.h:89:1: error: stray ‘\201’ in program ../platform.h:89:1: error: stray ‘\253’ in program ../platform.h:89:1: error: stray ‘\346’ in program ../platform.h:89:1: error: stray ‘\2’ in program ../platform.h:89:1: error: stray ‘\31’ in program ../platform.h:89:133: error: stray ‘@’ in program ../platform.h:89:1: error: stray ‘\5’ in program ../platform.h:89:1: error: stray ‘\20’ in program ../platform.h:89:1: error: stray ‘\260’ in program ../platform.h:89:1: error: stray ‘\375’ in program ../platform.h:89:1: error: stray ‘\250’ in program ../platform.h:89:1: error: stray ‘\225’ in program ../platform.h:89:1: error: stray ‘\6’ in program ../platform.h:89:1: error: stray ‘\243’ in program ../platform.h:89:1: error: stray ‘\26’ in program ../platform.h:89:1: error: stray ‘\20’ in program ../platform.h:89:1: error: stray ‘\262’ in program ../platform.h:89:1: error: stray ‘\22’ in program ../platform.h:89:1: error: stray ‘\20’ in program ../platform.h:89:1: error: stray ‘\16’ in program ../platform.h:89:1: error: stray ‘\30’ in program ../platform.h:89:1: error: stray ‘\222’ in program ../platform.h:89:1: error: stray ‘\252’ in program ../platform.h:89:1: error: stray ‘\231’ in program ../platform.h:89:1: error: stray ‘\347’ in program ../platform.h:89:1: error: stray ‘\24’ in program ../platform.h:89:1: error: stray ‘\355’ in program ../platform.h:89:1: error: stray ‘\245’ in program ../platform.h:89:1: error: stray ‘\31’ in program ../platform.h:89:1: error: stray ‘\236’ in program ../platform.h:89:1: error: stray ‘\203’ in program ../platform.h:89:1: error: stray ‘\231’ in program ../platform.h:89:1: error: stray ‘\324’ in program ../platform.h:89:1: error: stray ‘\345’ in program ../platform.h:89:1: error: stray ‘\236’ in program ../platform.h:89:1: error: stray ‘\7’ in program ../platform.h:89:1: error: stray ‘\361’ in program ../platform.h:89:1: error: stray ‘\224’ in program ../platform.h:89:1: error: stray ‘\24’ in program ../platform.h:89:1: error: stray ‘\7’ in program ../platform.h:89:1: error: stray ‘\247’ in program ../platform.h:89:1: error: stray ‘\356’ in program ../platform.h:89:1: error: stray ‘\355’ in program ../platform.h:89:1: error: stray ‘\331’ in program ../platform.h:89:1: error: stray ‘\313’ in program ../platform.h:89:1: error: stray ‘\325’ in program ../platform.h:89:1: error: stray ‘\253’ in program ../platform.h:89:1: error: stray ‘\10’ in program ../platform.h:89:1: error: stray ‘\246’ in program ../platform.h:89:1: error: stray ‘\233’ in program ../platform.h:89:1: error: stray ‘\342’ in program ../platform.h:89:1: error: stray ‘\366’ in program ../platform.h:89:1: error: stray ‘\242’ in program ../platform.h:89:1: error: stray ‘\321’ in program ../platform.h:89:1: error: stray ‘\323’ in program ../platform.h:89:1: error: stray ‘\264’ in program ../platform.h:89:1: error: stray ‘\240’ in program ../platform.h:89:1: error: stray ‘\224’ in program ../platform.h:89:1: error: stray ‘\366’ in program ../platform.h:89:1: error: stray ‘\30’ in program ../platform.h:89:1: error: stray ‘\345’ in program ../platform.h:89:1: error: stray ‘\305’ in program ../platform.h:89:1: error: stray ‘\263’ in program ../platform.h:89:1: error: stray ‘\251’ in program ../platform.h:89:1: error: stray ‘\212’ in program ../platform.h:89:1: error: stray ‘\347’ in program ../platform.h:89:1: error: stray ‘\22’ in program ../platform.h:89:1: error: stray ‘\23’ in program ../platform.h:89:1: error: stray ‘\32’ in program ../platform.h:89:235: error: expected identifier or ‘(’ before ‘)’ token ../platform.h:89:235: error: stray ‘\221’ in program ../platform.h:89:240: warning: missing terminating " character ../platform.h:89:235: error: missing terminating " character ../platform.h:90:1: error: stray ‘\264’ in program ../platform.h:90:1: error: stray ‘\334’ in program ../platform.h:90:1: error: stray ‘\334’ in program ../platform.h:90:1: error: stray ‘\322’ in program ../platform.h:90:1: error: stray ‘\362’ in program ../platform.h:90:1: error: stray ‘\271’ in program ../platform.h:90:1: error: stray ‘\277’ in program ../platform.h:90:1: error: stray ‘\335’ in program ../platform.h:90:1: error: stray ‘\326’ in program ../platform.h:90:1: error: stray ‘\262’ in program ../platform.h:90:1: error: stray ‘\374’ in program ../platform.h:90:1: error: stray ‘\355’ in program ../platform.h:90:1: error: stray ‘\266’ in program ../platform.h:90:1: error: stray ‘\227’ in program ../platform.h:90:1: error: stray ‘\265’ in program ../platform.h:90:1: error: stray ‘\351’ in program ../platform.h:90:1: error: stray ‘\267’ in program ../platform.h:90:1: error: stray ‘\263’ in program ../platform.h:90:1: error: stray ‘\254’ in program ../platform.h:90:1: error: stray ‘\335’ in program ../platform.h:90:1: error: stray ‘\253’ in program ../platform.h:90:1: error: stray ‘\314’ in program ../platform.h:90:1: error: stray ‘\23’ in program ../platform.h:90:1: error: stray ‘\255’ in program ../platform.h:90:1: error: stray ‘\303’ in program ../platform.h:90:1: error: stray ‘\301’ in program ../platform.h:90:1: error: stray ‘\360’ in program ../platform.h:90:1: error: stray ‘\231’ in program ../platform.h:90:1: error: stray ‘\316’ in program ../platform.h:90:1: error: stray ‘\7’ in program ../platform.h:90:1: error: stray ‘\256’ in program ../platform.h:90:1: error: stray ‘\321’ in program ../platform.h:90:1: error: stray ‘\256’ in program ../platform.h:90:1: error: stray ‘\243’ in program ../platform.h:90:1: error: stray ‘\240’ in program ../platform.h:90:1: error: stray ‘\311’ in program ../platform.h:90:1: error: stray ‘\211’ in program ../platform.h:90:1: error: stray ‘\220’ in program ../platform.h:90:52: warning: missing terminating " character ../platform.h:90:1: error: missing terminating " character ../platform.h:91:1: error: stray ‘\305’ in program ../platform.h:91:1: error: stray ‘\260’ in program ../platform.h:91:1: error: stray ‘\220’ in program ../platform.h:91:1: error: stray ‘\262’ in program ../platform.h:91:1: error: stray ‘\262’ in program ../platform.h:91:1: error: stray ‘\274’ in program ../platform.h:91:10: warning: null character(s) ignored ../platform.h:91:1: error: stray ‘\266’ in program ../platform.h:91:1: error: stray ‘\367’ in program ../platform.h:91:1: error: stray ‘\327’ in program ../platform.h:91:1: error: stray ‘\217’ in program ../platform.h:91:1: error: stray ‘\313’ in program ../platform.h:91:1: error: stray ‘\200’ in program ../platform.h:91:19: warning: null character(s) ignored ../platform.h:91:1: error: stray ‘\264’ in program ../platform.h:91:1: error: stray ‘\207’ in program ../platform.h:91:1: error: stray ‘\313’ in program ../platform.h:91:1: error: stray ‘\252’ in program ../platform.h:91:1: error: stray ‘\321’ in program ../platform.h:91:1: error: stray ‘\201’ in program ../platform.h:91:29: error: invalid suffix "w" on integer constant ../platform.h:91:1: error: stray ‘\377’ in program ../platform.h:91:1: error: stray ‘\26’ in program ../platform.h:91:1: error: stray ‘\373’ in program ../platform.h:91:1: error: stray ‘\321’ in program ../platform.h:91:1: error: stray ‘\10’ in program ../platform.h:91:1: error: stray ‘\213’ in program ../platform.h:91:1: error: stray ‘\201’ in program ../platform.h:91:1: error: stray ‘\210’ in program ../platform.h:91:1: error: stray ‘\340’ in program ../platform.h:91:1: error: stray ‘\236’ in program ../platform.h:91:43: error: stray ‘@’ in program ../platform.h:91:1: error: stray ‘\254’ in program ../platform.h:91:1: error: stray ‘\241’ in program ../platform.h:91:1: error: stray ‘\3’ in program ../platform.h:91:1: error: stray ‘\214’ in program ../platform.h:91:1: error: stray ‘\315’ in program ../platform.h:91:1: error: stray ‘\342’ in program ../platform.h:91:1: error: stray ‘\340’ in program ../platform.h:91:1: error: stray ‘\4’ in program ../platform.h:91:1: error: stray ‘\220’ in program ../platform.h:91:1: error: stray ‘\330’ in program ../platform.h:91:1: error: stray ‘\260’ in program ../platform.h:91:1: error: stray ‘\356’ in program ../platform.h:91:1: error: stray ‘\343’ in program ../platform.h:91:1: error: stray ‘\366’ in program ../platform.h:91:1: error: stray ‘\26’ in program ../platform.h:91:1: error: stray ‘\6’ in program ../platform.h:91:1: error: stray ‘\21’ in program ../platform.h:91:1: error: stray ‘\363’ in program ../platform.h:91:1: error: stray ‘\375’ in program ../platform.h:91:1: error: stray ‘\234’ in program ../platform.h:91:1: error: stray ‘\241’ in program ../platform.h:91:1: error: stray ‘\246’ in program ../platform.h:91:1: error: stray ‘\367’ in program ../platform.h:91:1: error: stray ‘\361’ in program ../platform.h:91:1: error: stray ‘\335’ in program ../platform.h:91:1: error: stray ‘\330’ in program ../platform.h:91:1: error: stray ‘\212’ in program ../platform.h:91:1: error: stray ‘\177’ in program ../platform.h:91:1: error: stray ‘\330’ in program ../platform.h:91:1: error: stray ‘\367’ in program ../platform.h:91:1: error: stray ‘\37’ in program ../platform.h:91:1: error: stray ‘\233’ in program ../platform.h:91:1: error: stray ‘\341’ in program ../platform.h:91:1: error: stray ‘\243’ in program ../platform.h:91:1: error: stray ‘\303’ in program ../platform.h:91:1: error: stray ‘\275’ in program ../platform.h:91:1: error: stray ‘\3’ in program ../platform.h:91:91: error: stray ‘@’ in program ../platform.h:91:1: error: stray ‘\201’ in program ../platform.h:91:1: error: stray ‘\262’ in program ../platform.h:91:1: error: stray ‘\327’ in program ../platform.h:91:1: error: stray ‘\1’ in program ../platform.h:91:1: error: stray ‘\263’ in program ../platform.h:91:1: error: stray ‘\335’ in program ../platform.h:91:1: error: stray ‘\5’ in program ../platform.h:91:1: error: stray ‘\227’ in program ../platform.h:91:1: error: stray ‘\2’ in program ../platform.h:91:103: warning: missing terminating ' character ../platform.h:91:1: error: missing terminating ' character ../platform.h:92:1: error: stray ‘\244’ in program ../platform.h:92:1: error: stray ‘\6’ in program ../platform.h:92:1: error: stray ‘\223’ in program ../platform.h:92:1: error: stray ‘`’ in program ../platform.h:92:1: error: stray ‘\221’ in program ../platform.h:93:1: error: stray ‘\22’ in program ../platform.h:93:1: error: stray ‘\370’ in program ../platform.h:93:1: error: stray ‘\324’ in program ../platform.h:93:1: error: stray ‘\311’ in program ../platform.h:93:1: error: stray ‘\270’ in program ../platform.h:93:1: error: stray ‘\221’ in program ../platform.h:93:1: error: stray ‘\24’ in program ../platform.h:93:1: error: stray ‘\376’ in program ../platform.h:93:1: error: stray ‘\211’ in program ../platform.h:93:15: error: invalid suffix "de" on integer constant ../platform.h:93:1: error: stray ‘\’ in program ../platform.h:93:1: error: stray ‘\221’ in program ../platform.h:93:1: error: stray ‘\331’ in program ../platform.h:93:1: error: stray ‘\203’ in program ../platform.h:93:1: error: stray ‘\230’ in program ../platform.h:93:1: error: stray ‘\30’ in program ../platform.h:93:1: error: stray ‘\23’ in program ../platform.h:93:1: error: stray ‘\333’ in program ../platform.h:93:1: error: stray ‘\251’ in program ../platform.h:93:1: error: stray ‘\26’ in program ../platform.h:93:1: error: stray ‘\225’ in program ../platform.h:93:1: error: stray ‘\300’ in program ../platform.h:93:1: error: stray ‘\251’ in program ../platform.h:93:1: error: stray ‘\333’ in program ../platform.h:93:1: error: stray ‘\262’ in program ../platform.h:93:1: error: stray ‘\1’ in program ../platform.h:93:1: error: stray ‘\237’ in program ../platform.h:93:1: error: stray ‘\204’ in program ../platform.h:93:1: error: stray ‘\255’ in program ../platform.h:93:1: error: stray ‘\214’ in program ../platform.h:93:1: error: stray ‘\27’ in program ../platform.h:93:1: error: stray ‘\247’ in program ../platform.h:93:1: error: stray ‘\343’ in program ../platform.h:93:1: error: stray ‘\177’ in program ../platform.h:93:1: error: stray ‘\210’ in program ../platform.h:93:1: error: stray ‘\346’ in program ../platform.h:93:1: error: stray ‘\362’ in program ../platform.h:93:1: error: stray ‘\6’ in program ../platform.h:93:1: error: stray ‘\254’ in program ../platform.h:93:1: error: stray ‘\233’ in program ../platform.h:93:1: error: stray ‘\204’ in program ../platform.h:93:1: error: stray ‘\243’ in program ../platform.h:93:1: error: stray ‘\210’ in program ../platform.h:93:1: error: stray ‘\352’ in program ../platform.h:93:1: error: stray ‘\346’ in program ../platform.h:93:1: error: stray ‘\223’ in program ../platform.h:93:1: error: stray ‘\330’ in program ../platform.h:93:1: error: stray ‘\20’ in program ../platform.h:93:1: error: stray ‘\205’ in program ../platform.h:93:1: error: stray ‘\367’ in program ../platform.h:93:1: error: stray ‘\271’ in program ../platform.h:93:1: error: stray ‘\305’ in program ../platform.h:93:1: error: stray ‘\207’ in program ../platform.h:93:1: error: stray ‘\35’ in program ../platform.h:93:1: error: stray ‘\6’ in program ../platform.h:93:100: error: stray ‘#’ in program ../platform.h:93:1: error: stray ‘\361’ in program ../platform.h:93:1: error: stray ‘\346’ in program ../platform.h:93:1: error: stray ‘\236’ in program ../platform.h:93:1: error: stray ‘\177’ in program ../platform.h:93:1: error: stray ‘\370’ in program ../platform.h:93:1: error: stray ‘\230’ in program ../platform.h:94:1: error: stray ‘\303’ in program ../platform.h:94:1: error: stray ‘\376’ in program ../platform.h:94:1: error: stray ‘\264’ in program ../platform.h:94:1: error: stray ‘\26’ in program ../platform.h:94:1: error: stray ‘\366’ in program ../platform.h:94:1: error: stray ‘\213’ in program ../platform.h:94:1: error: stray ‘\205’ in program ../platform.h:94:1: error: stray ‘\366’ in program ../platform.h:94:1: error: stray ‘\354’ in program ../platform.h:94:1: error: stray ‘\352’ in program ../platform.h:94:1: error: stray ‘\347’ in program ../platform.h:94:1: error: stray ‘\260’ in program ../platform.h:94:1: error: stray ‘\205’ in program ../platform.h:94:1: error: stray ‘\6’ in program ../platform.h:94:1: error: stray ‘\347’ in program ../platform.h:94:1: error: stray ‘\244’ in program ../platform.h:94:1: error: stray ‘\317’ in program ../platform.h:94:22: error: stray ‘#’ in program ../platform.h:94:1: error: stray ‘\331’ in program ../platform.h:94:1: error: stray ‘\276’ in program ../platform.h:94:1: error: stray ‘\322’ in program ../platform.h:94:1: error: stray ‘\264’ in program ../platform.h:94:1: error: stray ‘\24’ in program ../platform.h:94:1: error: stray ‘\7’ in program ../platform.h:94:1: error: stray ‘\1’ in program ../platform.h:94:1: error: stray ‘\252’ in program ../platform.h:94:1: error: stray ‘\371’ in program ../platform.h:95:1: error: stray ‘\200’ in program ../platform.h:95:1: error: stray ‘\335’ in program ../platform.h:95:1: error: stray ‘\303’ in program ../platform.h:95:6: error: stray ‘@’ in program ../platform.h:95:1: error: stray ‘\360’ in program ../platform.h:95:1: error: stray ‘\347’ in program ../platform.h:95:1: error: stray ‘\25’ in program ../platform.h:95:1: error: stray ‘\261’ in program ../platform.h:95:1: error: stray ‘\265’ in program ../platform.h:95:1: error: stray ‘\261’ in program ../platform.h:95:1: error: stray ‘\271’ in program ../platform.h:95:1: error: stray ‘\340’ in program ../platform.h:95:1: error: stray ‘\217’ in program ../platform.h:95:1: error: stray ‘\235’ in program ../platform.h:95:1: error: stray ‘\212’ in program ../platform.h:95:1: error: stray ‘\26’ in program ../platform.h:95:1: error: stray ‘\272’ in program ../platform.h:95:1: error: stray ‘\353’ in program ../platform.h:95:1: error: stray ‘\260’ in program ../platform.h:95:1: error: stray ‘\203’ in program ../platform.h:96:1: error: stray ‘\352’ in program ../platform.h:96:1: error: stray ‘\303’ in program ../platform.h:96:1: error: stray ‘\373’ in program ../platform.h:96:1: error: stray ‘\376’ in program ../platform.h:96:1: error: stray ‘\360’ in program ../platform.h:96:1: error: stray ‘\347’ in program ../platform.h:96:1: error: stray ‘\32’ in program ../platform.h:96:1: error: stray ‘\324’ in program ../platform.h:96:1: error: stray ‘\340’ in program ../platform.h:96:1: error: stray ‘`’ in program ../platform.h:96:1: error: stray ‘\243’ in program ../platform.h:96:1: error: stray ‘\262’ in program ../platform.h:96:1: error: stray ‘\347’ in program ../platform.h:96:1: error: stray ‘\352’ in program ../platform.h:96:1: error: stray ‘\217’ in program ../platform.h:96:1: error: stray ‘\21’ in program ../platform.h:97:1: error: stray ‘\305’ in program ../platform.h:97:1: error: stray ‘\255’ in program ../platform.h:97:1: error: stray ‘\312’ in program ../platform.h:97:9: error: stray ‘#’ in program ../platform.h:97:1: error: stray ‘\372’ in program ../platform.h:97:1: error: stray ‘\375’ in program ../platform.h:97:1: error: stray ‘\202’ in program ../platform.h:97:1: error: stray ‘\24’ in program ../platform.h:97:1: error: stray ‘\323’ in program ../platform.h:97:1: error: stray ‘\260’ in program ../platform.h:97:19: warning: null character(s) ignored ../platform.h:97:1: error: stray ‘\232’ in program ../platform.h:97:1: error: stray ‘\35’ in program ../platform.h:97:1: error: stray ‘\274’ in program ../platform.h:97:1: error: stray ‘\352’ in program ../platform.h:97:1: error: stray ‘\204’ in program ../platform.h:97:1: error: stray ‘\364’ in program ../platform.h:97:1: error: stray ‘\261’ in program ../platform.h:97:1: error: stray ‘\346’ in program ../platform.h:97:1: error: stray ‘\320’ in program ../platform.h:97:1: error: stray ‘\277’ in program ../platform.h:97:37: warning: missing terminating " character ../platform.h:97:1: error: missing terminating " character ../platform.h:98:1: error: stray ‘\353’ in program ../platform.h:98:1: error: stray ‘\320’ in program ../platform.h:98:3: error: invalid suffix "F" on integer constant ../platform.h:98:1: error: stray ‘\254’ in program ../platform.h:98:1: error: stray ‘\341’ in program ../platform.h:98:1: error: stray ‘\337’ in program ../platform.h:98:1: error: stray ‘\201’ in program ../platform.h:98:1: error: stray ‘\250’ in program ../platform.h:98:1: error: stray ‘\16’ in program ../platform.h:98:1: error: stray ‘\206’ in program ../platform.h:98:1: error: stray ‘\360’ in program ../platform.h:98:1: error: stray ‘\351’ in program ../platform.h:98:1: error: stray ‘\252’ in program ../platform.h:98:1: error: stray ‘\203’ in program ../platform.h:98:1: error: stray ‘\356’ in program ../platform.h:98:1: error: stray ‘\200’ in program ../platform.h:98:1: error: stray ‘\377’ in program ../platform.h:98:1: error: stray ‘\216’ in program ../platform.h:98:1: error: stray ‘\233’ in program ../platform.h:98:1: error: stray ‘\177’ in program ../platform.h:98:1: error: stray ‘\201’ in program ../platform.h:98:1: error: stray ‘\3’ in program ../platform.h:98:1: error: stray ‘\275’ in program ../platform.h:98:1: error: stray ‘\236’ in program ../platform.h:98:1: error: stray ‘\4’ in program ../platform.h:98:1: error: stray ‘\303’ in program ../platform.h:98:1: error: stray ‘\360’ in program ../platform.h:98:1: error: stray ‘\232’ in program ../platform.h:98:1: error: stray ‘\374’ in program ../platform.h:98:1: error: stray ‘\2’ in program ../platform.h:98:39: warning: character constant too long for its type ../platform.h:98:1: error: stray ‘\250’ in program ../platform.h:98:1: error: stray ‘\3’ in program ../platform.h:98:1: error: stray ‘\330’ in program ../platform.h:98:1: error: stray ‘\225’ in program ../platform.h:98:1: error: stray ‘\36’ in program ../platform.h:98:1: error: stray ‘\257’ in program ../platform.h:98:1: error: stray ‘\31’ in program ../platform.h:98:1: error: stray ‘\344’ in program ../platform.h:98:1: error: stray ‘\20’ in program ../platform.h:98:1: error: stray ‘\345’ in program ../platform.h:98:1: error: stray ‘\10’ in program ../platform.h:98:1: error: stray ‘\216’ in program ../platform.h:98:1: error: stray ‘\355’ in program ../platform.h:98:1: error: stray ‘\323’ in program ../platform.h:98:1: error: stray ‘\304’ in program ../platform.h:98:128: warning: missing terminating ' character ../platform.h:98:1: error: missing terminating ' character ../platform.h:99:1: warning: missing terminating ' character ../platform.h:99:1: error: missing terminating ' character ../platform.h:100:1: error: stray ‘\271’ in program ../platform.h:100:1: error: stray ‘\351’ in program ../platform.h:100:1: error: stray ‘\376’ in program ../platform.h:100:1: error: stray ‘\277’ in program ../platform.h:100:1: error: stray ‘\31’ in program ../platform.h:100:1: error: stray ‘\335’ in program ../platform.h:100:1: error: stray ‘\177’ in program ../platform.h:100:1: error: stray ‘\275’ in program ../platform.h:100:1: error: stray ‘\202’ in program ../platform.h:100:1: error: stray ‘\32’ in program ../platform.h:100:1: error: stray ‘\26’ in program ../platform.h:100:1: error: stray ‘\331’ in program ../platform.h:100:1: error: stray ‘\276’ in program ../platform.h:100:1: error: stray ‘\226’ in program ../platform.h:100:1: error: stray ‘\346’ in program ../platform.h:100:1: error: stray ‘\342’ in program ../platform.h:100:1: error: stray ‘\355’ in program ../platform.h:100:1: error: stray ‘\247’ in program ../platform.h:100:1: error: stray ‘\273’ in program ../platform.h:100:1: error: stray ‘\304’ in program ../platform.h:100:1: error: stray ‘\243’ in program ../platform.h:100:1: error: stray ‘\345’ in program ../platform.h:100:1: error: stray ‘\37’ in program ../platform.h:100:1: error: stray ‘\213’ in program ../platform.h:100:1: error: stray ‘\227’ in program ../platform.h:100:1: error: stray ‘\357’ in program ../platform.h:100:1: error: stray ‘\21’ in program ../platform.h:100:1: error: stray ‘\210’ in program ../platform.h:100:1: error: stray ‘\30’ in program ../platform.h:100:1: error: stray ‘\253’ in program ../platform.h:100:1: error: stray ‘\345’ in program ../platform.h:100:1: error: stray ‘\350’ in program ../platform.h:100:1: error: stray ‘\246’ in program ../platform.h:100:1: error: stray ‘\260’ in program ../platform.h:100:1: error: stray ‘\337’ in program ../platform.h:100:1: error: stray ‘\350’ in program ../platform.h:100:1: error: stray ‘\373’ in program ../platform.h:100:1: error: stray ‘\374’ in program ../platform.h:100:1: error: stray ‘\325’ in program ../platform.h:100:1: error: stray ‘\355’ in program ../platform.h:100:1: error: stray ‘\235’ in program ../platform.h:100:1: error: stray ‘\272’ in program ../platform.h:100:1: error: stray ‘\353’ in program ../platform.h:100:1: error: stray ‘\343’ in program ../platform.h:100:1: error: stray ‘\270’ in program ../platform.h:100:1: error: stray ‘\334’ in program ../platform.h:100:1: error: stray ‘\251’ in program ../platform.h:100:1: error: stray ‘\233’ in program ../platform.h:100:79: warning: null character(s) ignored ../platform.h:100:1: error: stray ‘\201’ in program ../platform.h:100:1: error: stray ‘\265’ in program ../platform.h:100:1: error: stray ‘\377’ in program ../platform.h:100:1: error: stray ‘\22’ in program ../platform.h:100:1: error: stray ‘\316’ in program ../platform.h:100:1: error: stray ‘\343’ in program ../platform.h:100:1: error: stray ‘\305’ in program ../platform.h:100:1: error: stray ‘\353’ in program ../platform.h:100:1: error: stray ‘\256’ in program ../platform.h:100:1: error: stray ‘\334’ in program ../platform.h:100:1: error: stray ‘\216’ in program ../platform.h:100:1: error: stray ‘\277’ in program ../platform.h:100:1: error: stray ‘\313’ in program ../platform.h:100:1: error: stray ‘\302’ in program ../platform.h:100:1: error: stray ‘\204’ in program ../platform.h:100:1: error: stray ‘\372’ in program ../platform.h:100:1: error: stray ‘\241’ in program ../platform.h:100:1: error: stray ‘\247’ in program ../platform.h:100:1: error: stray ‘\274’ in program ../platform.h:100:1: error: stray ‘\206’ in program ../platform.h:100:1: error: stray ‘\267’ in program ../platform.h:100:1: error: stray ‘\300’ in program ../platform.h:100:1: error: stray ‘\35’ in program ../platform.h:100:1: error: stray ‘\357’ in program ../platform.h:100:1: error: stray ‘\301’ in program ../platform.h:100:1: error: stray ‘\262’ in program ../platform.h:100:1: error: stray ‘`’ in program ../platform.h:100:1: error: stray ‘\363’ in program ../platform.h:100:1: error: stray ‘\343’ in program ../platform.h:100:1: error: stray ‘\353’ in program ../platform.h:100:1: error: stray ‘\226’ in program ../platform.h:100:1: error: stray ‘\372’ in program ../platform.h:100:1: error: stray ‘\267’ in program ../platform.h:100:1: error: stray ‘\341’ in program ../platform.h:100:1: error: stray ‘`’ in program ../platform.h:100:1: error: stray ‘\214’ in program ../platform.h:100:1: error: stray ‘\272’ in program ../platform.h:100:1: error: stray ‘\320’ in program ../platform.h:100:1: error: stray ‘\230’ in program ../platform.h:100:1: error: stray ‘\263’ in program ../platform.h:100:1: error: stray ‘\370’ in program ../platform.h:100:1: error: stray ‘\377’ in program ../platform.h:100:1: error: stray ‘\27’ in program ../platform.h:100:144: error: stray ‘@’ in program ../platform.h:100:1: error: stray ‘\266’ in program ../platform.h:100:146: warning: null character(s) ignored ubuntu@pine64:~/Solaris2/mlucas-14.1/src/MY_OBJ$ [/CODE] |
[QUOTE=ewmayer;454769]I'm not familiar enough with ARM to understand why -m64 is unsupported in GCC, but correctly handling aarch64 in platform.h will cause the build to be in 64-bit mode. (I had assumed -m64 was needed to trigger the aarch64-related predefs, but your output from [1] will settle that.)[/QUOTE]
gcc ARM comes in 2 flavors: one that targets 64-bit code while the other targets 32-bit code, so there's no need for -m64 or -m32. |
[QUOTE=ewmayer;454710]gcc -c -Os -m64 -DUSE_THREADS ../Mlucas.c[/QUOTE]
For that to succeed, you need this: [code]$ diff platform.h~ platform.h 714a715,728 > #elif defined(__AARCH64EL__) > #ifndef OS_BITS > #define OS_BITS 32 > #endif > #define CPU_TYPE > #define CPU_IS_ARM_EABI > #if(defined(__GNUC__) || defined(__GNUG__)) > #define COMPILER_TYPE > #define COMPILER_TYPE_GCC > #else > #define COMPILER_TYPE > #define COMPILER_TYPE_UNKNOWN > #endif > [/code]And it compiles: [code]$ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS -c *.c $ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS *.o -o mlucas64 -lm -lpthread $ file mlucas64 mlucas64: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped [/code]Tested with QEMU, it starts but I have no clue how I should launch the binary to do something sensible that doesn't take forever :) |
[QUOTE=Lorenzo;454809]Ok! I have done!
[CODE]ubuntu@pine64:~/Solaris2/mlucas-14.1$ gcc -dM -E - < /dev/null [snip][/QUOTE] Thanks! The key predefine there is __aarch64__, which is also the trigger in the .h file I posted ... so the latter should allow you to build. So I don't understand the raft of 'stray character' errors you get with that one - here are line 88-90 of that header: [code]#elif(defined(_AIX)) #define OS_TYPE #define OS_TYPE_AIX[/code] Can you open both the original and new .h in an editor, and compare the file encodings? If those are the same, can you diff your local copies of those 2 file versions? Maybe that will reveal something relevant to the stary-octals errors you are getting. [QUOTE=ldesnogu;454816]For that to succeed, you need this: [code]$ diff platform.h~ platform.h 714a715,728 > #elif defined(__AARCH64EL__) > #ifndef OS_BITS > #define OS_BITS 32 > #endif > #define CPU_TYPE > #define CPU_IS_ARM_EABI > #if(defined(__GNUC__) || defined(__GNUG__)) > #define COMPILER_TYPE > #define COMPILER_TYPE_GCC > #else > #define COMPILER_TYPE > #define COMPILER_TYPE_UNKNOWN > #endif > [/code]And it compiles: [code]$ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS -c *.c $ aarch64-none-linux-gnu-gcc -Os -DUSE_THREADS *.o -o mlucas64 -lm -lpthread $ file mlucas64 mlucas64: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, not stripped [/code]Tested with QEMU, it starts but I have no clue how I should launch the binary to do something sensible that doesn't take forever :)[/QUOTE] That sets the wrong value of OS_BITS - for basic C-code Mlucas builds that won't matter much except for various utility functions which make heavy use of 64-bit-int math (e.g. the quad-float library used for high-precision inits of double constants), but for future asm-code builds we need the right bitness to be set. The predef section beginning at line 792 in the .h I posted should work just fine for Lorenzo, and you as well - did you try building with that, or did you just make your mod above and use it? Please try the unmodified .h file - the one with the __aarch64__ predef stuff at line 792 and let me know if you get the same unrecognized-char errors as Lorenzo. You can quick-test the binary by trying some timing runs at a specific FFT length, say ./Mlucas -fftlen 1024 -nthread 1 will try all radix combos available @1024K and write the best-timing one to the mlucas.cfg file. You can also play with the threadcount - note the default there is to try to use all available cores. |
[QUOTE=ewmayer;454822]That sets the wrong value of OS_BITS - for basic C-code Mlucas builds that won't matter much except for various utility functions which make heavy use of 64-bit-int math (e.g. the quad-float library used for high-precision inits of double constants), but for future asm-code builds we need the right bitness to be set. The predef section beginning at line 792 in the .h I posted should work just fine for Lorenzo, and you as well - did you try building with that, or did you just make your mod above and use it? Please try the unmodified .h file - the one with the __aarch64__ predef stuff at line 792 and let me know if you get the same unrecognized-char errors as Lorenzo.[/QUOTE]Silly me, I had missed your attachment. It compiles fine with it. So Lorenzo's error comes from somewhere else.
[quote]You can quick-test the binary by trying some timing runs at a specific FFT length, say ./Mlucas -fftlen 1024 -nthread 1 will try all radix combos available @1024K and write the best-timing one to the mlucas.cfg file. You can also play with the threadcount - note the default there is to try to use all available cores.[/quote][code]/work/qemu/qemu/aarch64-linux-user/qemu-aarch64 -L /work/Cross/fsf-6.169/aarch64-none-linux-gnu/libc ./mlucas64 -fftlen 1024 -nthread 1 -iters 1 Mlucas 14.1 http://hogranch.com/mayer/README.html INFO: testing qfloat routines... CPU Family = ARM Embedded ABI, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 6.3.1 20170118. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 53-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. INFO: testing IMUL routines... INFO: System has 4 available processor cores. INFO: testing FFT radix tables...[/code]All MaxErr are at 0. |
Thanks, Laurent - so I suspect a file-encoding issue, with Lorenzo's .h file downloaded from my post, or perhaps his unzip utility inserted a bunch of garbage chars.
|
[QUOTE=ewmayer;454827]Thanks, Laurent - so I suspect a file-encoding issue, with Lorenzo's .h file downloaded from my post, or perhaps his unzip utility inserted a bunch of garbage chars.[/QUOTE]
RIght! Sorry, found issue. It's working nice!) So withoit SIMD optimization it looks like: [CODE]ubuntu@pine64:~/Solaris2/mlucas-14.1$ cat mlucas.cfg 14.1 1024 msec/iter = 114.57 ROE[avg,max] = [0.250000000, 0.250000000] radices = 32 32 16 32 0 0 0 0 0 0 1152 msec/iter = 109.04 ROE[avg,max] = [0.206808036, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0 1280 msec/iter = 133.03 ROE[avg,max] = [0.236600167, 0.281250000] radices = 160 16 16 16 0 0 0 0 0 0 1408 msec/iter = 140.47 ROE[avg,max] = [0.273688616, 0.343750000] radices = 176 16 16 16 0 0 0 0 0 0 1536 msec/iter = 161.30 ROE[avg,max] = [0.223493304, 0.281250000] radices = 192 16 16 16 0 0 0 0 0 0 1664 msec/iter = 166.09 ROE[avg,max] = [0.246149554, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0 1792 msec/iter = 180.60 ROE[avg,max] = [0.220703125, 0.281250000] radices = 224 16 16 16 0 0 0 0 0 0 1920 msec/iter = 198.81 ROE[avg,max] = [0.222460938, 0.250000000] radices = 240 16 16 16 0 0 0 0 0 0 2048 msec/iter = 206.38 ROE[avg,max] = [0.278125000, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 242.52 ROE[avg,max] = [0.208269392, 0.250000000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 308.94 ROE[avg,max] = [0.243164062, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 329.54 ROE[avg,max] = [0.272896903, 0.343750000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 371.71 ROE[avg,max] = [0.225892857, 0.281250000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 388.66 ROE[avg,max] = [0.241322545, 0.281250000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 414.33 ROE[avg,max] = [0.220870536, 0.250000000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 453.97 ROE[avg,max] = [0.213636998, 0.265625000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 472.52 ROE[avg,max] = [0.247321429, 0.250000000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 544.08 ROE[avg,max] = [0.201870292, 0.222656250] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 673.79 ROE[avg,max] = [0.239508929, 0.312500000] radices = 160 16 32 32 0 0 0 0 0 0 5632 msec/iter = 693.38 ROE[avg,max] = [0.278264509, 0.343750000] radices = 176 16 32 32 0 0 0 0 0 0 6144 msec/iter = 776.30 ROE[avg,max] = [0.213504464, 0.250000000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 814.97 ROE[avg,max] = [0.242299107, 0.281250000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 870.94 ROE[avg,max] = [0.219768415, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 955.79 ROE[avg,max] = [0.222209821, 0.250000000] radices = 240 16 32 32 0 0 0 0 0 0 [/CODE] |
[QUOTE=Lorenzo;454841]RIght! Sorry, found issue.
It's working nice!) So withoit SIMD optimization it looks like: [CODE]ubuntu@pine64:~/Solaris2/mlucas-14.1$ cat mlucas.cfg 14.1 1024 msec/iter = 114.57 ROE[avg,max] = [0.250000000, 0.250000000] radices = 32 32 16 32 0 0 0 0 0 0 1152 msec/iter = 109.04 ROE[avg,max] = [0.206808036, 0.250000000] radices = 288 8 16 16 0 0 0 0 0 0 1280 msec/iter = 133.03 ROE[avg,max] = [0.236600167, 0.281250000] radices = 160 16 16 16 0 0 0 0 0 0 [snip][/CODE][/QUOTE] Glad to hear it - what was the issue with the updated .h file? I'd like to know in case another user hits similar in future. The only timing that really pops out is the anomalously low one @1152K ... but SIMD timings will be the ones of real interest. How many threads did you run your self-test with? (Your screen output will indicate that, e.g. NTHREADS = {some value >= 1}. |
Issue was in that file was unzipped not correctly by me. So in generally it's ok.
I ran ./mlucas -s m. So looks like Mlucas used 4 cores (threads ) correctly. I didn't play with threads yet. So in generally very slow :mike: |
[QUOTE=Lorenzo;454845]I ran ./mlucas -s m. So looks like Mlucas used 4 cores (threads ) correctly. I didn't play with threads yet.
So in generally very slow :mike:[/QUOTE] Yes - even with a 2-3x speedup from use of SIMD, the ARM will be more about performance per watt (and per hardware $) than speed-per-core. |
[QUOTE=ewmayer;454847]Yes - even with a 2-3x speedup from use of SIMD, the ARM will be more about performance per watt (and per hardware $) than speed-per-core.[/QUOTE]
The following mlucas.cfg file was generated on a 2.8 GHz AMD Opteron running RedHat 64-bit linux. [code] 2048 sec/iter = 0.134 ROE[min,max] = [0.281250000, 0.343750000] radices = 32 32 32 32 0 0 0 0 0 0 [Any text offset from the list-ending 0 by whitespace is ignored] 2304 sec/iter = 0.148 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 8 16 16 16 0 0 0 0 0 2560 sec/iter = 0.166 ROE[min,max] = [0.281250000, 0.312500000] radices = 40 8 16 16 16 0 0 0 0 0 2816 sec/iter = 0.188 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 8 16 16 16 0 0 0 0 0 3072 sec/iter = 0.222 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0 3584 sec/iter = 0.264 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0 4096 sec/iter = 0.300 ROE[min,max] = [0.250000000, 0.312500000] radices = 16 16 16 16 32 0 0 0 0 0 [/code] The following mlucas.cfg file was generated on a 1.4 GHz ARM running 64-bit linux. [code] 2048 msec/iter = 206.38 ROE[avg,max] = [0.278125000, 0.281250000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 242.52 ROE[avg,max] = [0.208269392, 0.250000000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 308.94 ROE[avg,max] = [0.243164062, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 329.54 ROE[avg,max] = [0.272896903, 0.343750000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 371.71 ROE[avg,max] = [0.225892857, 0.281250000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 388.66 ROE[avg,max] = [0.241322545, 0.281250000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 414.33 ROE[avg,max] = [0.220870536, 0.250000000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 453.97 ROE[avg,max] = [0.213636998, 0.265625000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 472.52 ROE[avg,max] = [0.247321429, 0.250000000] radices = 256 16 16 32 0 0 0 0 0 0 [/code] In other words, a 4 threaded ARM is about 1.5x slower than one core of a 2.8 GHz Opteron. With a 3x SIMD speedup its efficiency would be 0.5x on a per-core comparison, and 1:1 on a per-core-and-GHz comparison with the Opteron. That's to say, a 20 ARM cores minicluster would be 20x faster on a per GHz measurement and 10x faster on a per-core measurement. And also as cheap as the single Opteron system. Not to speak about the energy saving... |
You got it working, nice!
That is a Pine64 with 4x ARM Cortex A53 cores (@1.4GHz) right? I'm a little bit surprised it is about as fast as my Odroid-U2 (4x ARM Cortex A9 cores @1.7Ghz) which is only 32bit and an much older architecture. [URL]http://mersenneforum.org/showpost.php?p=426575&postcount=94[/URL] [code] 1024 msec/iter = 121.70 ROE[avg,max] = [0.298214286, 0.312500000] radices = 128 16 16 16 0 0 0 0 0 0 1152 msec/iter = 142.69 ROE[avg,max] = [0.225310407, 0.250000000] radices = 144 16 16 16 0 0 0 0 0 0 1280 msec/iter = 161.44 ROE[avg,max] = [0.251618304, 0.312500000] radices = 160 16 16 16 0 0 0 0 0 0 1408 msec/iter = 185.52 ROE[avg,max] = [0.297056362, 0.375000000] radices = 176 16 16 16 0 0 0 0 0 0 1536 msec/iter = 195.56 ROE[avg,max] = [0.234742955, 0.312500000] radices = 192 16 16 16 0 0 0 0 0 0 1664 msec/iter = 208.36 ROE[avg,max] = [0.254631696, 0.312500000] radices = 208 16 16 16 0 0 0 0 0 0 1792 msec/iter = 222.32 ROE[avg,max] = [0.234012277, 0.250000000] radices = 224 16 16 16 0 0 0 0 0 0 1920 msec/iter = 243.65 ROE[avg,max] = [0.235016741, 0.281250000] radices = 240 16 16 16 0 0 0 0 0 0 2048 msec/iter = 255.25 ROE[avg,max] = [0.310714286, 0.312500000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 297.26 ROE[avg,max] = [0.228341239, 0.281250000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 339.70 ROE[avg,max] = [0.256682478, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 384.56 ROE[avg,max] = [0.296219308, 0.375000000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 413.85 ROE[avg,max] = [0.239704241, 0.281250000] radices = 192 16 16 32 0 0 0 0 0 0 3584 msec/iter = 370.28 ROE[avg,max] = [0.231487165, 0.281250000] radices = 224 16 16 32 0 0 0 0 0 0 4096 msec/iter = 455.10 ROE[avg,max] = [0.282142857, 0.312500000] radices = 128 16 32 32 0 0 0 0 0 0 [/code]In that post I also made the comparison with a Intel Core2Duo E7400 @2.8GHz, running Mprime28.7 . Looking back at it, that comparison might not have been entirely fair (Mlucas vs. Mprime) . So I dusted off the machine and also ran Mlucas: Intel Core2Duo E7400 @2.8GHz NTHREADS = 1 [code] 14.1 1024 msec/iter = 33.76 ROE[avg,max] = [0.264564732, 0.265625000] radices = 32 32 16 32 0 0 0 0 0 0 1152 msec/iter = 40.30 ROE[avg,max] = [0.237220982, 0.273437500] radices = 36 16 32 32 0 0 0 0 0 0 1280 msec/iter = 45.42 ROE[avg,max] = [0.251841518, 0.296875000] radices = 40 16 32 32 0 0 0 0 0 0 1408 msec/iter = 52.31 ROE[avg,max] = [0.285110910, 0.375000000] radices = 44 16 32 32 0 0 0 0 0 0 1536 msec/iter = 53.31 ROE[avg,max] = [0.239299665, 0.281250000] radices = 24 32 32 32 0 0 0 0 0 0 1664 msec/iter = 61.81 ROE[avg,max] = [0.261802455, 0.312500000] radices = 52 16 32 32 0 0 0 0 0 0 1792 msec/iter = 65.81 ROE[avg,max] = [0.267229353, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0 1920 msec/iter = 70.98 ROE[avg,max] = [0.243638393, 0.281250000] radices = 60 16 32 32 0 0 0 0 0 0 2048 msec/iter = 71.88 ROE[avg,max] = [0.257366071, 0.257812500] radices = 32 32 32 32 0 0 0 0 0 0 2304 msec/iter = 81.60 ROE[avg,max] = [0.236948940, 0.281250000] radices = 36 32 32 32 0 0 0 0 0 0 2560 msec/iter = 90.96 ROE[avg,max] = [0.255691964, 0.312500000] radices = 40 32 32 32 0 0 0 0 0 0 2816 msec/iter = 102.69 ROE[avg,max] = [0.283956473, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0 3072 msec/iter = 112.85 ROE[avg,max] = [0.233879743, 0.265625000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 123.71 ROE[avg,max] = [0.267947824, 0.312500000] radices = 52 32 32 32 0 0 0 0 0 0 3584 msec/iter = 135.08 ROE[avg,max] = [0.267689732, 0.301757812] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 144.52 ROE[avg,max] = [0.242107282, 0.281250000] radices = 60 32 32 32 0 0 0 0 0 0 4096 msec/iter = 154.69 ROE[avg,max] = [0.263169643, 0.281250000] radices = 64 32 32 32 0 0 0 0 0 0 4608 msec/iter = 177.26 ROE[avg,max] = [0.236798968, 0.281250000] radices = 36 16 16 16 16 0 0 0 0 0 5120 msec/iter = 201.17 ROE[avg,max] = [0.257240513, 0.312500000] radices = 40 16 16 16 16 0 0 0 0 0 5632 msec/iter = 224.76 ROE[avg,max] = [0.291057478, 0.375000000] radices = 44 16 16 16 16 0 0 0 0 0 6144 msec/iter = 244.47 ROE[avg,max] = [0.233741978, 0.265625000] radices = 48 16 16 16 16 0 0 0 0 0 6656 msec/iter = 271.08 ROE[avg,max] = [0.264965820, 0.312500000] radices = 52 16 16 16 16 0 0 0 0 0 7168 msec/iter = 292.72 ROE[avg,max] = [0.274094936, 0.312500000] radices = 56 16 16 16 16 0 0 0 0 0 7680 msec/iter = 312.74 ROE[avg,max] = [0.249065290, 0.290039062] radices = 60 16 16 16 16 0 0 0 0 0 [/code]NTHREADS = 2 [code] 14.1 1024 msec/iter = 21.01 ROE[avg,max] = [0.273214286, 0.281250000] radices = 32 16 32 32 0 0 0 0 0 0 1152 msec/iter = 25.43 ROE[avg,max] = [0.237220982, 0.273437500] radices = 36 16 32 32 0 0 0 0 0 0 1280 msec/iter = 28.85 ROE[avg,max] = [0.259319196, 0.312500000] radices = 20 32 32 32 0 0 0 0 0 0 1408 msec/iter = 35.14 ROE[avg,max] = [0.280566406, 0.343750000] radices = 176 16 16 16 0 0 0 0 0 0 1536 msec/iter = 33.98 ROE[avg,max] = [0.239299665, 0.281250000] radices = 24 32 32 32 0 0 0 0 0 0 1664 msec/iter = 38.98 ROE[avg,max] = [0.261802455, 0.312500000] radices = 52 16 32 32 0 0 0 0 0 0 1792 msec/iter = 40.84 ROE[avg,max] = [0.267229353, 0.312500000] radices = 28 32 32 32 0 0 0 0 0 0 1920 msec/iter = 45.63 ROE[avg,max] = [0.243638393, 0.281250000] radices = 60 16 32 32 0 0 0 0 0 0 2048 msec/iter = 45.92 ROE[avg,max] = [0.257366071, 0.257812500] radices = 32 32 32 32 0 0 0 0 0 0 2304 msec/iter = 54.36 ROE[avg,max] = [0.236948940, 0.281250000] radices = 36 32 32 32 0 0 0 0 0 0 2560 msec/iter = 54.64 ROE[avg,max] = [0.255691964, 0.312500000] radices = 40 32 32 32 0 0 0 0 0 0 2816 msec/iter = 63.06 ROE[avg,max] = [0.283956473, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0 3072 msec/iter = 67.77 ROE[avg,max] = [0.233879743, 0.265625000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 74.36 ROE[avg,max] = [0.267947824, 0.312500000] radices = 52 32 32 32 0 0 0 0 0 0 3584 msec/iter = 79.71 ROE[avg,max] = [0.267689732, 0.301757812] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 87.04 ROE[avg,max] = [0.242107282, 0.281250000] radices = 60 32 32 32 0 0 0 0 0 0 4096 msec/iter = 92.87 ROE[avg,max] = [0.263169643, 0.281250000] radices = 64 32 32 32 0 0 0 0 0 0 4608 msec/iter = 106.31 ROE[avg,max] = [0.238187081, 0.281250000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 116.95 ROE[avg,max] = [0.241458566, 0.312500000] radices = 160 16 32 32 0 0 0 0 0 0 5632 msec/iter = 147.80 ROE[avg,max] = [0.278641183, 0.312500000] radices = 176 16 32 32 0 0 0 0 0 0 6144 msec/iter = 150.32 ROE[avg,max] = [0.247349330, 0.281250000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 164.51 ROE[avg,max] = [0.250781250, 0.289062500] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 172.77 ROE[avg,max] = [0.277169364, 0.343750000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 191.50 ROE[avg,max] = [0.253627232, 0.281250000] radices = 240 16 32 32 0 0 0 0 0 0 [/code]I also reran the Mprime 28.7 benchmark: [code] [Tue Mar 14 19:28:48 2017] Compare your results to other computers at http://www.mersenne.org/report_benchmarks Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz CPU speed: 2800.02 MHz, 2 cores CPU features: Prefetch, SSE, SSE2, SSE4 L1 cache size: 32 KB L2 cache size: 3 MB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 256 Prime95 64-bit version 28.7, RdtscTiming=1 Best time for 1024K FFT length: 16.199 ms., avg: 16.704 ms. Best time for 1280K FFT length: 20.961 ms., avg: 21.575 ms. Best time for 1536K FFT length: 26.163 ms., avg: 27.718 ms. Best time for 1792K FFT length: 30.755 ms., avg: 32.141 ms. Best time for 2048K FFT length: 34.946 ms., avg: 38.731 ms. Best time for 2560K FFT length: 43.191 ms., avg: 46.909 ms. Best time for 3072K FFT length: 53.965 ms., avg: 59.120 ms. Best time for 3584K FFT length: 69.864 ms., avg: 83.959 ms. Best time for 4096K FFT length: 71.973 ms., avg: 72.495 ms. Best time for 5120K FFT length: 87.800 ms., avg: 88.870 ms. Best time for 6144K FFT length: 110.473 ms., avg: 111.362 ms. Best time for 7168K FFT length: 131.831 ms., avg: 132.743 ms. Best time for 8192K FFT length: 146.812 ms., avg: 147.631 ms. Timing FFTs using 2 threads. Best time for 1024K FFT length: 15.401 ms., avg: 15.644 ms. Best time for 1280K FFT length: 18.143 ms., avg: 19.026 ms. Best time for 1536K FFT length: 21.927 ms., avg: 22.995 ms. Best time for 1792K FFT length: 26.605 ms., avg: 27.481 ms. Best time for 2048K FFT length: 30.460 ms., avg: 31.351 ms. Best time for 2560K FFT length: 38.699 ms., avg: 39.689 ms. Best time for 3072K FFT length: 47.988 ms., avg: 49.353 ms. Best time for 3584K FFT length: 85.181 ms., avg: 85.865 ms. Best time for 4096K FFT length: 62.209 ms., avg: 66.705 ms. Best time for 5120K FFT length: 79.554 ms., avg: 80.260 ms. Best time for 6144K FFT length: 92.489 ms., avg: 94.000 ms. Best time for 7168K FFT length: 116.309 ms., avg: 119.709 ms. Best time for 8192K FFT length: 125.236 ms., avg: 128.261 ms. Timings for 1024K FFT length (1 cpu, 1 worker): 16.37 ms. Throughput: 61.08 iter/sec. Timings for 1024K FFT length (2 cpus, 2 workers): 30.59, 31.69 ms. Throughput: 64.25 iter/sec. Timings for 1280K FFT length (1 cpu, 1 worker): 21.24 ms. Throughput: 47.07 iter/sec. Timings for 1280K FFT length (2 cpus, 2 workers): 37.86, 39.14 ms. Throughput: 51.96 iter/sec. Timings for 1536K FFT length (1 cpu, 1 worker): 26.08 ms. Throughput: 38.34 iter/sec. Timings for 1536K FFT length (2 cpus, 2 workers): 45.43, 47.68 ms. Throughput: 42.99 iter/sec. Timings for 1792K FFT length (1 cpu, 1 worker): 31.05 ms. Throughput: 32.21 iter/sec. Timings for 1792K FFT length (2 cpus, 2 workers): 52.50, 53.32 ms. Throughput: 37.81 iter/sec. Timings for 2048K FFT length (1 cpu, 1 worker): 35.05 ms. Throughput: 28.53 iter/sec. Timings for 2048K FFT length (2 cpus, 2 workers): 61.40, 63.17 ms. Throughput: 32.12 iter/sec. Timings for 2560K FFT length (1 cpu, 1 worker): 43.36 ms. Throughput: 23.06 iter/sec. Timings for 2560K FFT length (2 cpus, 2 workers): 77.50, 79.16 ms. Throughput: 25.54 iter/sec. Timings for 3072K FFT length (1 cpu, 1 worker): 53.71 ms. Throughput: 18.62 iter/sec. Timings for 3072K FFT length (2 cpus, 2 workers): 96.11, 97.25 ms. Throughput: 20.69 iter/sec. Timings for 3584K FFT length (1 cpu, 1 worker): 67.86 ms. Throughput: 14.74 iter/sec. Timings for 3584K FFT length (2 cpus, 2 workers): 164.50, 169.02 ms. Throughput: 12.00 iter/sec. Timings for 4096K FFT length (1 cpu, 1 worker): 71.87 ms. Throughput: 13.91 iter/sec. [Tue Mar 14 19:33:59 2017] Timings for 4096K FFT length (2 cpus, 2 workers): 127.57, 128.14 ms. Throughput: 15.64 iter/sec. Timings for 5120K FFT length (1 cpu, 1 worker): 87.87 ms. Throughput: 11.38 iter/sec. Timings for 5120K FFT length (2 cpus, 2 workers): 153.62, 158.10 ms. Throughput: 12.83 iter/sec. Timings for 6144K FFT length (1 cpu, 1 worker): 110.52 ms. Throughput: 9.05 iter/sec. Timings for 6144K FFT length (2 cpus, 2 workers): 187.40, 186.73 ms. Throughput: 10.69 iter/sec. Timings for 7168K FFT length (1 cpu, 1 worker): 132.18 ms. Throughput: 7.57 iter/sec. Timings for 7168K FFT length (2 cpus, 2 workers): 236.89, 243.20 ms. Throughput: 8.33 iter/sec. Timings for 8192K FFT length (1 cpu, 1 worker): 151.83 ms. Throughput: 6.59 iter/sec. Timings for 8192K FFT length (2 cpus, 2 workers): 263.17, 260.16 ms. Throughput: 7.64 iter/sec. [/code] BTW: Is it possible to compile run Mlucas on Windows 7/10? If so, I could try to run benchmarks on my i5 2500k and/or i7 3770k |
[QUOTE=VictordeHolland;454872]You got it working, nice!
That is a Pine64 with 4x ARM Cortex A53 cores (@1.4GHz) right? I'm a little bit surprised it is about as fast as my Odroid-U2 (4x ARM Cortex A9 cores @1.7Ghz) which is only 32bit and an much older architecture. [URL]http://mersenneforum.org/showpost.php?p=426575&postcount=94[/URL] [/QUOTE] Right. This is PINE64 board. Anyway results for PINE64 is bit better and your device has bigger freq (+300MHz for each core). So PINE64 will be much better on the same freq as your device :) Also i did benchmark for 1-3 threads for moreless actual FFT size 2048K: ./mlucas -fftlen 2048 -nthread N -iters 10 [CODE] 2048 msec/iter = 707.58 ROE[avg,max] = [0.000000000, 0.000091553] radices = 256 16 16 16 0 0 0 0 0 0 2048 msec/iter = 371.82 ROE[avg,max] = [0.000000000, 0.000091553] radices = 256 16 16 16 0 0 0 0 0 0 2048 msec/iter = 241.66 ROE[avg,max] = [0.000000000, 0.000091553] radices = 256 16 16 16 0 0 0 0 0 0 [/CODE] |
[QUOTE=VictordeHolland;454872]You got it working, nice!
That is a Pine64 with 4x ARM Cortex A53 cores (@1.4GHz) right? I'm a little bit surprised it is about as fast as my Odroid-U2 (4x ARM Cortex A9 cores @1.7Ghz) which is only 32bit and an much older architecture. ... BTW: Is it possible to compile run Mlucas on Windows 7/10? If so, I could try to run benchmarks on my i5 2500k and/or i7 3770k[/QUOTE] Thanks for the timings! 32 vs 64-bit speed for LL testing is overwhelmingly a matter of the float-double capability - how do those 2 version of the ARM compare in that regard? I used to have Win-buildability in the 32-bit days for the x86, but MSFT delayed supporting 64-bit inline asm by at least 4-5 years (w.r.to when x86_64 started shipping), so I dropped Win support years ago. To build/run under Win you'll need a Linux emulator. |
Under Windows in the Ubuntu shell with an i7-6700 @ 3.4Ghz, using:
[CODE]./mlucas -fftlen 2048 -nthread N -iters 10 [/CODE] with N=1 to 8 (4 core machine with hyperthreading) [CODE] 2048 msec/iter = 21.03 ROE[avg,max] = [0.000000000, 0.000091553] radices = 32 32 32 32 0 0 0 0 0 0 2048 msec/iter = 13.90 ROE[avg,max] = [0.000000000, 0.000091553] radices = 32 8 16 16 16 0 0 0 0 0 2048 msec/iter = 11.43 ROE[avg,max] = [0.000000000, 0.000091553] radices = 64 16 32 32 0 0 0 0 0 0 2048 msec/iter = 10.52 ROE[avg,max] = [0.000000000, 0.000091553] radices = 32 8 16 16 16 0 0 0 0 0 2048 msec/iter = 10.79 ROE[avg,max] = [0.000000000, 0.000091553] radices = 32 8 16 16 16 0 0 0 0 0 2048 msec/iter = 11.39 ROE[avg,max] = [0.000000000, 0.000091553] radices = 128 16 16 32 0 0 0 0 0 0 2048 msec/iter = 11.06 ROE[avg,max] = [0.000000000, 0.000091553] radices = 256 16 16 16 0 0 0 0 0 0 2048 msec/iter = 11.43 ROE[avg,max] = [0.000000000, 0.000091553] radices = 256 16 16 16 0 0 0 0 0 0[/CODE] With 100 iterations: [CODE]2048 msec/iter = 18.29 ROE[avg,max] = [0.247767857, 0.250000000] radices = 32 32 32 32 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^36-1 = 6179CD26EC3B3274, 8060072069, 29249383388 2048 msec/iter = 11.17 ROE[avg,max] = [0.341964286, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^36-1 = 6179CD26EC3B3274, 8060072069, 29249383388 2048 msec/iter = 8.36 ROE[avg,max] = [0.312165179, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^36-1 = 6179CD26EC3B3274, 8060072069, 29249383388 2048 msec/iter = 7.84 ROE[avg,max] = [0.341964286, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^36-1 = 6179CD26EC3B3274, 8060072069, 29249383388 2048 msec/iter = 7.67 ROE[avg,max] = [0.312165179, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^36-1 = 6179CD26EC3B3274, 8060072069, 29249383388 2048 msec/iter = 7.64 ROE[avg,max] = [0.341964286, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^36-1 = 6179CD26EC3B3274, 8060072069, 29249383388 2048 msec/iter = 7.70 ROE[avg,max] = [0.341964286, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^36-1 = 6179CD26EC3B3274, 8060072069, 29249383388 2048 msec/iter = 7.69 ROE[avg,max] = [0.341964286, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 100-iteration Res mod 2^64, 2^35-1, 2^36-1 = 6179CD26EC3B3274, 8060072069, 29249383388[/CODE] |
@wombatman: Suggest you use 1000-iter for your multithread-scaling tests, to minimize init-overhead effects. (More precisely, one would do 1000*(t_1000-t_100)/900.)
|
No problem. I can do that tomorrow when I'm back at work :smile:
|
2 Attachment(s)
[QUOTE=ewmayer;454878]Thanks for the timings! 32 vs 64-bit speed for LL testing is overwhelmingly a matter of the float-double capability - how do those 2 version of the ARM compare in that regard?
[/QUOTE] ARM Cortex A9 was announced in October 2007, the Cortex A53 in October 2012. But 5 years newer doesn't tell the whole story. The design choices were different. A9 The ARM Cortex A9 was designed as a 'performance' core (with a power budget) and is dual-issue, Out-of-Order. In other words, it can decode/send two instructions per clock to the execution units and reorder them if necessary to extract extra performance. But a Vector Floating Point (VFP) execution unit is [U]not[/U] mandatory in the A9. Most A9s have the (optional) Vector Float Point v3 (VFPv3) for handling FP though. It has 32 registers of 64-bits with NEON capability. NEON is ARMs name for a SIMD. If I understand it all correctly the A9 is limited to[U] 1 DP Float per clock. [/U] A53 ARM designed A53 with (high) power efficiency in mind as it is supposed to fill the roll of 'little' cores in their little.BIG philosophy. So in many high-end devices (mostly phones) they are coupled with more powerful A57 or A72 cores. When maximum responsiveness is needed (loading websites/apps, games, etc) the A57/A72 cores are used. While the A53s handles background tasks with their greater efficiency in order to extend battery life. Anandtech tested them in the Samsung Exynos 7420 and 5433, taking into account overhead and different frequencies and concluded a Cortex A53 core consumes ~200mW/core @1.4GHz (see attached graph). A53 is a dual-issue in-order design with a VFPv4 + (advanced) NEON [U]mandatory[/U]. The VFPv4 has 32 registers of 128-bits, theoretically allowing it to process [U]2 DP Floats per clock. [/U] Other differences which could impact performance of the ODROID-U2 vs. PINE64 ODROID-U2 has 1MB L2 cache (shared amongst the cores), PINE64 512KB L2 (shared amongst the cores). Fab: 32nm (U2) vs. 40nm (PINE64) which might explain/allow the U2 to clock slightly higher (1.7GHz vs 1.4GHz). If anybody has a board with A57s or A72s, please share your benchmarks, we're curious how they perform :). I've also attached a graph with a comparison of Drystone benchmark performance (DMIPS/MHz) of different ARM architectures. Keep in mind Drystone is an old Integer benchmark, but it gives a rough idea. PINE64 Allwinner A64: [URL]http://linux-sunxi.org/A64[/URL] ODROID-U2: [URL]http://www.hardkernel.com/main/products/prdt_info.php?g_code=G135341370451&tab_idx=2[/URL] Useful pages for comparison between cores: [URL]https://en.wikipedia.org/wiki/Comparison_of_ARMv7-A_cores[/URL] [URL]https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores[/URL] |
[QUOTE=ewmayer;454884]@wombatman: Suggest you use 1000-iter for your multithread-scaling tests, to minimize init-overhead effects. (More precisely, one would do 1000*(t_1000-t_100)/900.)[/QUOTE]
As requested, the 1000 iteration tests for nthread = 1-4: [CODE] 2048 msec/iter = 18.94 ROE[avg,max] = [0.370465528, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0 1000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 81AEAC0C7E6089BB, 25132671466, 41950605021 2048 msec/iter = 11.20 ROE[avg,max] = [0.370465528, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0 1000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 81AEAC0C7E6089BB, 25132671466, 41950605021 2048 msec/iter = 8.48 ROE[avg,max] = [0.372615979, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 1000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 81AEAC0C7E6089BB, 25132671466, 41950605021 2048 msec/iter = 8.01 ROE[avg,max] = [0.372615979, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 1000-iteration Res mod 2^64, 2^35-1, 2^36-1 = 81AEAC0C7E6089BB, 25132671466, 41950605021[/CODE] For anyone else wanting to run mlucas on Windows 10, the Ubuntu shell works well, and mlucas compiles straight away. |
IIRC Cortex-A9 can only issue one DP mul every other cycle. But that was so long ago, that I might be wrong...
|
Many thanks for the details, VdH - one key point, though, needing clarification - In accordance with my earlier post re. the number of 128-bit registers, I believe your "VFPv4 has 32 such" is 2x too large. From Wikipedia (underlines mine):
[i] [b]VFPv4 or VFPv4-D32[/b] Implemented on the Cortex-A12 and A15 ARMv7 processors, Cortex-A7 optionally has VFPv4-D32 in the case of an FPU with NEON.[81] [u]VFPv4 has 32 64-bit FPU registers as standard[/u], adds both half-precision support as a storage format and fused multiply-accumulate instructions to the features of VFPv3. [/i] The same wikipage says Aarch64 has 31 64-bit GPRs - just confirming, those are distinct from the FPRs, yes? Wombatman, thanks for the timings - so no appreciable difference vs your simple 100-iter ones here. (This varies a lot by CPU< thus always better safe than sorry.) |
AArch64 has 32 128-bit SIMD/FP regs on top of 3x 64-bit int regs.
|
AArch64 has 32 integer registers (but X31 reads as zero and throws away anything written to it, so basically that's 31 registers), and also 32 128-bit-wide "SIMD and floating-point" registers.
Code looks like FADD V3.2D, V5.2D, V7.2D (which adds the doubles in V5[127:64] and V7[127:64] and puts the result in V3[127:64], and also adds the doubles in V5[63:0] and V7[63:0] and puts the result in V3[63:0]) or FADD S3, S7, S2 (which adds the bottom floats of V7 and V2, puts the result in the bottom float of S3, and sets the other three floats of V3 to zero) It has fused FMA support, but in the form Vd = Vd + Vm*Vn because there isn't space to pass four five-bit register names in a 32-bit opcode (there is also an FMLS instruction that does Vd = Vd - Vm*Vn form). |
[QUOTE=fivemack;454956]It has fused FMA support, but in the form Vd = Vd + Vm*Vn because there isn't space to pass four five-bit register names in a 32-bit opcode[/QUOTE]
To clarify, only the vector variant uses 3 registers. The scalar one has 4 registers. |
[QUOTE=fivemack;454956]AArch64 has 32 integer registers (but X31 reads as zero and throws away anything written to it, so basically that's 31 registers), and also 32 128-bit-wide "SIMD and floating-point" registers.
Code looks like FADD V3.2D, V5.2D, V7.2D (which adds the doubles in V5[127:64] and V7[127:64] and puts the result in V3[127:64], and also adds the doubles in V5[63:0] and V7[63:0] and puts the result in V3[63:0]) or FADD S3, S7, S2 (which adds the bottom floats of V7 and V2, puts the result in the bottom float of S3, and sets the other three floats of V3 to zero) It has fused FMA support, but in the form Vd = Vd + Vm*Vn because there isn't space to pass four five-bit register names in a 32-bit opcode (there is also an FMLS instruction that does Vd = Vd - Vm*Vn form).[/QUOTE] Thanks - sounds like Wikipedia article has some bugs. FMA3 is fine, as all my x86 vector-asm (starting with AVX2, obviously) is based on that. I will likely target a 16-vector-regs architecture for my initial implementation - that is more or less like my x86 AVX2/FMA3 code - question, will that appreciably broaden the base of ARM CPUs which can run said code? Still at least a month until my first AVX-512 implementation is done, but then it will be time to get a suitable Odroid board and start coding! |
[QUOTE=ewmayer;455000]Thanks - sounds like Wikipedia article has some bugs.
FMA3 is fine, as all my x86 vector-asm (starting with AVX2, obviously) is based on that. I will likely target a 16-vector-regs architecture for my initial implementation - that is more or less like my x86 AVX2/FMA3 code - question, will that appreciably broaden the base of ARM CPUs which can run said code? Still at least a month until my first AVX-512 implementation is done, but then it will be time to get a suitable Odroid board and start coding![/QUOTE] If you are using 64-bit, and you have to get DP SIMD, you can use the 32 registers as it's the number mandated by the architecture. |
The final stages of my initial AVX-512 port of Mlucas are proceeding more quickly than time-budgeted-for, so I'd like to go ahead and order an Odroid [url=http://www.hardkernel.com/main/]dev-board[/url] for 128-bit SIMD development under Linux. Any recommendations as to which of the options on offer I should choose? Also, what do I need in addition to the basic board such as PSU, cabling, WiFi? I only plan to use this for code development, so am fine simply using an Ethernet cable to connect and transfer data between my Macbook and the ARM system, but am open to "you really want the WiFi because..." pitches.
Also, the Odroid-C2 description notes [i] An additional MicroSD card or an eMMC module is required to install the OS. We recommend the eMMC module as it has much higher performance than standard MicroSD cards. [/i] So boot-from-OS-image-on-USB is not an option? Lastly, are these boards strictly standalone - in which event I should probably invest in one of the protective housings - or can they be hosted in (say) an ATX-cased PC system? |
[QUOTE=ewmayer;455439]Any recommendations as to which of the options on offer I should choose? Also, what do I need in addition to the basic board such as PSU, cabling, WiFi?
[...] Lastly, are these boards strictly standalone - in which event I should probably invest in one of the protective housings - or can they be hosted in (say) an ATX-cased PC system?[/QUOTE] I know next to nothing about hardware, but from a cursory look: * The C2 option is the only one of the three that implements the 64/32-bit ARMv8-A architecture and AArch64 instruction set, and isn't that what you would be targeting? The other two only implement the 32-bit ARMv7-A architecture and AArch32 instruction set, and that would likely be pretty obsolete by the time (if and when) number crunching on ARM becomes practical or widespread. * The [URL="http://www.hardkernel.com/main/products/prdt_info.php?g_code=G145457216438&tab_idx=3"]FAQ[/URL] (linked from the FAQs tab) mentions the peripherals that can be used. It doesn't seem to mention anything other than standalone, but mentions a discussion forum for questions not covered in the FAQ: [url]http://forum.odroid.com[/url] |
My ODROID-U2 boots only from eMMC or microSD by design. It can NOT boot from USB.
It might be possible to 'hack' it to boot from the eMMC or SD and continue to load the OS from USB afterwards, but it is unsupported and unadvicable. eMMC is quite a bit faster than microSD, especially in I/O operations, but also more expensive (a small eMMC card cost almost the same as the board itself). So I use a microSD card class 10 that was otherwise gathering dust. |
Will probably get a MicroSD, as (like Victor) I don't want the cost of the I/O device to double the system cost. So I would download the OS boot image from the Hardkernel site onto my Mac - as it happens I already have an IOGear USB-based MicroSD reader/writer, sans a MicroSD card (the IOGear was a found item which I figured would come in handy some day.)
So presumably I can get a MicroSD card wherever it's cheapest, and only need to get the C2 board and housing from Hardkernel - sounds like a plan. [b]Edit:[/b] order placed - I decided to just shell out the modest $8 for the MicroSD preflashed with Linux, no point in trying to save a few $ to get a cheaper-per-GB card elsewhere and end up spending an hour or more downloading the OS image and working thru the procedure to unzip and properly transfer it to the SD card myself. Here is what I ordered: o ODROID-C2 Item# G145457216438 $46.00 USD o 8GB MicroSD UHS-1 C2 Linux Item# G145586100692 $8.00 USD o 5V/2A Power Supply US Plug Item# G143652633329 $5.00 USD o ODROID-C2/C1+ Case Clear Item# G143805171261 $4.50 USD Plus $16 shipping, total $79.50. Vendor says up to 20 days lead time. |
Just installed Raspberry 32 bit on my PI 3 - ARMv7 OS.
mlucas compiled like a charm, now I'm passing make check. I hope tomorrow to install a new SD card with OpenSUSE 64bit. I might have some heating issues. Would you recommend me a specific mlucas benchmark test I can run on both OSes to show how 64 bits is better than 32 bits? Thank you. And next step will be the Odroid Pico5 Cluster... |
1 Attachment(s)
[QUOTE=ET_;457120]Just installed Raspberry 32 bit on my PI 3 - ARMv7 OS.
mlucas compiled like a charm, now I'm passing make check. I hope tomorrow to install a new SD card with OpenSUSE 64bit. I might have some heating issues. Would you recommend me a specific mlucas benchmark test I can run on both OSes to show how 64 bits is better than 32 bits? Thank you. And next step will be the Odroid Pico5 Cluster...[/QUOTE] Make check didn't pass (the log file is attached). I suppose it's because the PI has only 1 GB of free RAM and I was running the GUI on the desktop. can I safely "make install" the program? :smile: |
[QUOTE=ET_;457124]Make check didn't pass (the log file is attached).
I suppose it's because the PI has only 1 GB of free RAM and I was running the GUI on the desktop. can I safely "make install" the program? :smile:[/QUOTE] Sure, go ahead and try 'make install'. If you're gonna do any LL-testing or DCing, though, you'll want to finish the self-tests that were interrupted by the out-of-memory-ness. (1 GB should be more than enough, BTW, but if your OS is not doing a decent job of recovering memory as each self-test sub-task completes and frees it, you get the sort of error you saw.) Did the installer script's self-test create a (partial) mlucas.cfg file somewhere in the install directory tree? If you have a partial .cfg file and want to add the entries missed by the aborted self-test, you have to run each missing FFT length manually. For example, your self-test barfed in middle of the 2048K FFT length test-all-radices step. To rerun that length (substitute whatever your binary is named, probably lowercase 'mlucas'), in the same dir as the partial mlucas.cfg is located (or copy the latter to a run directory of your own choosing and do things there): Mlucas -fftlen 2048 -iters 100 Then do same for lengths 2304,2560,2816,3072,3328,3584,3840 and 4096. --------------------------- Aside: My Odroid arrived last week. I've opened the box to have a cursory glance, but still have some AVX-512 coding work to finish, and don't want to get distracted from that. Sorry about the delay. |
1 Attachment(s)
[QUOTE=ewmayer;457174]Sure, go ahead and try 'make install'. If you're gonna do any LL-testing or DCing, though, you'll want to finish the self-tests that were interrupted by the out-of-memory-ness. (1 GB should be more than enough, BTW, but if your OS is not doing a decent job of recovering memory as each self-test sub-task completes and frees it, you get the sort of error you saw.) Did the installer script's self-test create a (partial) mlucas.cfg file somewhere in the install directory tree?
If you have a partial .cfg file and want to add the entries missed by the aborted self-test, you have to run each missing FFT length manually. For example, your self-test barfed in middle of the 2048K FFT length test-all-radices step. To rerun that length (substitute whatever your binary is named, probably lowercase 'mlucas'), in the same dir as the partial mlucas.cfg is located (or copy the latter to a run directory of your own choosing and do things there): Mlucas -fftlen 2048 -iters 100 Then do same for lengths 2304,2560,2816,3072,3328,3584,3840 and 4096. --------------------------- Aside: My Odroid arrived last week. I've opened the box to have a cursory glance, but still have some AVX-512 coding work to finish, and don't want to get distracted from that. Sorry about the delay.[/QUOTE] Don't be sorry... I received my cluster 45 days ago, and it's still inside its box. Testing single FFT lengths worked fine, but I found the same allocation error while running the -s tiny test. I'm not worried, as I'm just testing the environment before upgrading to Odroid C2 64 bits. I also found some inconsistencies in the help got from the -h command switch, but again it's not important. The GMP v6.1.2 compiled and worked immediately on 32 bit ARMv7. |
OK, gonna try to set up my Odroid in the next couple of days ... first issue is that I think I may need a proper display adapter. I use a 15-pin (standard display pinout) to mini-Display-port adapter on my Intel NUC, but don't see a mini-Display port on the Odroid.
The other adapter I have on hand is 15-pin to what looks like 2 parallel, thin, roughly microSD-width slivers, each with what looks like 8 tiny pin contacts (hard to make out even under magnifying glass), the two 8-pin dealies seprated by ~1-1.5 mm. Again don't see anything like that on the Odroid. Should I just order a 15-pin to HDMI adapter cable? |
[QUOTE=ewmayer;461529]OK, gonna try to set up my Odroid in the next couple of days ... first issue is that I think I may need a proper display adapter. I use a 15-pin (standard display pinout) to mini-Display-port adapter on my Intel NUC, but don't see a mini-Display port on the Odroid.
The other adapter I have on hand is 15-pin to what looks like 2 parallel, thin, roughly microSD-width slivers, each with what looks like 8 tiny pin contacts (hard to make out even under magnifying glass), the two 8-pin dealies seprated by ~1-1.5 mm. Again don't see anything like that on the Odroid. Should I just order a 15-pin to HDMI adapter cable?[/QUOTE] My monitor has 2 HDMI inputs (and 1 15-pin VGA), so I ordered the HDMI cable and plugged it to the monitor as a secondary input. |
[QUOTE=ET_;461530]My monitor has 2 HDMI inputs (and 1 15-pin VGA), so I ordered the HDMI cable and plugged it to the monitor as a secondary input.[/QUOTE]
I'll probably get a VGA-to-HDMI adapter - that way I can use the existing VGA cable dangling down from the display and simply switch it between the NUC's mini-display adapter dongle and the adapter for the Odroid as needed. (The NUC only very rarely needs to be hooked to the display anymore, pretty much only when I need to restart after a power outage.) But, do I need a male or female HDMI plug on the adapter? Not having mucked about with HDMI and having only one end of the equation to eyeball - the Odroid C2 HDMI plug - it's not clear to me whether the latter is male or female. (I.e. I've never "sexed an HDMI" before, and have only one exemplar of the species to inspect. :) |
[QUOTE=ewmayer;461598]I'll probably get a VGA-to-HDMI adapter - that way I can use the existing VGA cable dangling down from the display and simply switch it between the NUC's mini-display adapter dongle and the adapter for the Odroid as needed. (The NUC only very rarely needs to be hooked to the display anymore, pretty much only when I need to restart after a power outage.)
But, do I need a male or female HDMI plug on the adapter? Not having mucked about with HDMI and having only one end of the equation to eyeball - the Odroid C2 HDMI plug - it's not clear to me whether the latter is male or female. (I.e. I've never "sexed an HDMI" before, and have only one exemplar of the species to inspect. :)[/QUOTE] Generally a socket is female and the side with prongs is male(guess where that came from). The Odroid will have a female socket. I expect you will use a male to male hdmi cable to connect to the adapter. You will also want a male to male vga cable from the adapter to your screen. The adapter will need to have both sockets female. You will need to make sure that it is a hdmi to vga adaptor rather than a vga to hdmi adaptor. As a general rule most cables with the same end on both sides are male. |
[QUOTE=ewmayer;461598]I'll probably get a VGA-to-HDMI adapter - that way I can use the existing VGA cable dangling down from the display and simply switch it between the NUC's mini-display adapter dongle and the adapter for the Odroid as needed.[/QUOTE]
Isn't it an HDMI-to-VGA adapter that you need since your device is outputting through HDMI? I'm on a coffee diet so sorry if I completely misunderstood, my brain is compromised by lack of caffeine :redface: |
[QUOTE=henryzz;461614]The Odroid will have a female socket. I expect you will use a male to male hdmi cable to connect to the adapter. You will also want a male to male vga cable from the adapter to your screen.[/QUOTE]
Yeah, looking at my NUC VGA-to-mini-display adapter dongle, VGA side is F (to take the M VGA plug from the monitor VGA-to-VGA cable), device side mini-connector is M. So I figured that F-to-M dongle scheme was standard and went ahead and ordered [url=https://www.amazon.com/VicTsing-Converter-Adapter-Projectors-devices/dp/B00G9UWP94/ref=sr_1_3?s=electronics&ie=UTF8&qid=1497917539&sr=1-3&keywords=vga+to+hdmi+adapter]this $8 special[/url]. Laurent, does that answer your fog-of-decaffeination question? |
Definitely :smile:
|
[QUOTE=ldesnogu;461660]Definitely :smile:[/QUOTE]
:coffee::tu: |
2 Attachment(s)
VGA-to-HDMI adapter dongle arrived today, a day late - slipped MicroSD with preloaded boot image (bought from Hardkernel as an accessory - I have a USB-flash-format MicroSD reader, so now will stick MicroSD in that, turning it into an 8GB thumb drive, 4x larger than any dedicated such I had on hand previosuly) into slot on underside of board, hooked up to monitor, plugged in 5V DC power mini-plug, within a minute was looking at a Ubuntu login prompt. Default user-login was with ID 'odroid' and password the same ... once in I tried to su root but 'root' is not the default pwd for that - do any Odroiders know what it is?
@Luigi: the 'gcc -dM -E - < /dev/null' trick worked fine for me in terms of dumping the predefines (copied to attached bzip2'ed arm.predefs text file) - you sure you typed it properly? Trial build of a small source file quickly turned up a bug in the imul_macro0.h file, key 64-bit integer-mul macros were getting left undefined ... I traced that to preprocessor/macro stuff I added in the last few years to implement such macros in Nvidia CUDA PTX code. Fixed-up version of that file also attached - ARM Neon probably has decent 64-bit-int hardware-MUL support, but I've yet to dig deeply into the instruction set, and not crucial for the LL-testing stuff, in any event. With those patches, multithreaded scalar-double (i.e. generic-C) build succeeded, self-tests running now, albeit quite slow, between the generic-C-ness and just 1-threaded for first set of such self-tests. E.g. 230 ms/iter @1024K, 25-30x slower than a single AMD Ryzen core running AVX2 code. But if I can get a 2.5-3x speedup from vector-128-bit-SIMD assembler on the Neon, 1/10th the per-core performance of Ryzen at only $10/core (sans any volume discounts) would be not too bad. Will post self-test timings @1,2,4-threads tomorrow. |
[QUOTE=ewmayer;461834]...Ubuntu login prompt. Default user-login was with ID 'odroid' and password the same ... once in I tried to su root but 'root' is not the default pwd for that - do any Odroiders know what it is?
[/QUOTE] Prefix each root command with [c]sudo[/c] and enter your user password. My guess is you can do [c]sudo passwd root[/c] if you want to set up a root password. Or run [c]sudo su[/c] to run root commands. :smile: |
[QUOTE=ewmayer;461834]
@Luigi: the 'gcc -dM -E - < /dev/null' trick worked fine for me in terms of dumping the predefines (copied to attached bzip2'ed arm.predefs text file) - you sure you typed it properly? [/QUOTE] I'm afraid I forgot the '-' after the E :redface: The code compiles and runs happily :smile: I am running the selftest with 1 thread (./Mlucas -s m > selftest.log) and will share the log as I have it (I'm at 1408K right now). Let me know if you still need my prefs.arm file as well. [COLOR="Black"]The [M|m]lucas.cfg file is not written.[/COLOR] The code detects its lack, but when it tries to write (r+) it does not succeed. To do a selftest with more threads, should I try [code] ./Mlucas -s m -nthread [2|4] [/code] or [code] ./Mlucas -s m -cpu 0:3 [/code]? |
[QUOTE=ET_;461838]I'm afraid I forgot the '-' after the E :redface:
The code compiles and runs happily :smile: I am running the selftest with 1 thread (./Mlucas -s m > selftest.log) and will share the log as I have it (I'm at 1408K right now). Let me know if you still need my prefs.arm file as well.[/QUOTE] You just dropped the patched imul_macro0.h file into the mlucas_v17 src-dir, yes? Don't really need the log, just post the resulting mlucas.cfg file. [QUOTE][COLOR="Black"]The [M|m]lucas.cfg file is not written.[/COLOR] The code detects its lack, but when it tries to write (r+) it does not succeed.[/QUOTE] I had no such issues on my odroid. Do you have write permissions to the dir in which you are doing self-tests? [If that is different than the one you built in] What does 'touch mlucas.cfg' in that dir give? [QUOTE]To do a selftest with more threads, should I try [code] ./Mlucas -s m -nthread [2|4] [/code] or [code] ./Mlucas -s m -cpu 0:3 [/code]?[/QUOTE] On the ARM there is just one logical core per physical core, so './Mlucas -s m -nthread 2' is the same as './Mlucas -s m -cpu 0:1' and './Mlucas -s m -nthread 4' is the same as './Mlucas -s m -cpu 0:3' |
2 Attachment(s)
[QUOTE=ewmayer;461908]You just dropped the patched imul_macro0.h file into the mlucas_v17 src-dir, yes?[/QUOTE]
Yes! [QUOTE=ewmayer;461908]Don't really need the log, just post the resulting mlucas.cfg file. [/QUOTE] Here they are. The mlucas.cfg is correctly written after the test is completed, while I erroneously thought it was written on the fly. Luigi P.S. Guess you may need the 2 and 3 threads files as well... 3 Looks like a good choice. If so, I will compute them tomorrow. |
[QUOTE=paulunderwood;461837]Prefix each root command with [c]sudo[/c] and enter your user password.
My guess is you can do [c]sudo passwd root[/c] if you want to set up a root password. Or run [c]sudo su[/c] to run root commands. :smile:[/QUOTE] Thanks, that works (e.g. 'sudo shutdown -h now' ... have not tried setting up root pwd yet). Also found that the system needs the boot-MicroSD to remain installed - I had assumed that after the initial boot-up the OS and various boot-loader files would get copied to the onboard memory and the MicroSD no longer needed. [QUOTE=ET_;461913]The mlucas.cfg is correctly written after the test is completed, while I erroneously thought it was written on the fly.[/QUOTE] Thanks - sure, go ahead do 2,3-threaded self-tests when you get the chance. Based on your 1 and 4-thread data, your Odriod must be using an older rev of the ARM core, or maybe slower memory, than mine - here is what I get @1,2,4-threads. Notice the 1-thread runtimes are 25-30% faster overall, and the scaling to 4-threads ('||-eff.' is short for 'parallel efficiency') is much better - my 4-thread timings are just over half of yours. When did you purchse your Odroid, and which precise model is it? [code] 1-thread: 2-thread: 3-thread: 4-thread: FFTlen ms/iter ||-eff% ms/iter ||-eff% ms/iter ||-eff% ms/iter ||-eff% 1024 223.93 100 112.33 99.7 95.18 78.4 61.24 91.4 1152 272.53 100 137.15 99.4 117.28 77.5 74.66 91.3 1280 317.98 100 160.73 98.9 138.97 76.3 88.37 90.0 1408 351.05 100 177.97 98.6 152.51 76.7 95.73 91.7 1536 387.47 100 197.59 98.0 169.60 76.2 108.45 89.3 1664 411.91 100 209.30 98.4 179.49 76.5 112.95 91.2 1792 419.99 100 213.92 98.2 182.06 76.9 116.43 90.2 1920 485.82 100 246.78 98.4 212.18 76.3 134.59 90.2 2048 476.06 100 241.00 98.8 205.41 77.3 131.84 90.3 2304 570.37 100 290.40 98.2 250.33 75.9 158.11 90.2 2560 654.57 100 345.22 94.8 300.91 72.5 196.02 83.5 2816 725.62 100 383.84 94.5 334.08 72.4 217.17 83.5 3072 793.89 100 418.03 95.0 357.01 74.1 237.74 83.5 3328 849.32 100 448.77 94.6 390.24 72.5 255.72 83.0 3584 859.99 100 456.01 94.3 393.88 72.8 262.40 81.9 3840 990.53 100 525.26 94.3 457.94 72.1 298.67 82.9 4096 974.11 100 512.90 95.0 445.75 72.8 297.59 81.8 4608 1213.42 100 615.35 98.6 537.29 75.3 353.29 85.9 5120 1460.96 100 775.31 94.2 669.04 72.8 447.42 81.6 5632 1617.00 100 857.64 94.3 742.06 72.6 495.67 81.6 6144 1764.45 100 937.44 94.1 780.16 75.4 546.03 80.8 6656 1897.60 100 1009.80 94.0 870.94 72.6 586.85 80.8 7168 1945.88 100 1035.60 93.9 887.20 73.1 609.23 79.8 7680 2231.54 100 1179.50 94.6 1021.03 72.9 691.53 80.7 [/code] Notes: - 3-thread scaling is by far the worst, unsurprisingly because Mlucas is optimized for power-of-2 thread counts. - The || scaling for 1,2,4-threads is quite impressive, especially given that we typically expect one-single-thread-job-per-core mode to run at no better than 80-90% efficiency due to overall system memory contention among the jobs (i.e. 1-worker/4-thread ~= 4-worker/1-thread in total-throughput terms). - Only significant timing anomaly is for FFT lengths of form 15*2^n, which are slower than the power-of-2 lengths just above them. The medium self-test (-s m) currently does not do 8192K, but likely those timings would, by extension be slightly faster than the 7680K ones which form the bottom ros of the above table. |
[QUOTE=ewmayer;461934]Based on your 1 and 4-thread data, your Odriod must be using an older rev of the ARM core, or maybe slower memory, than mine - here is what I get @1,2,4-threads. Notice the 1-thread runtimes are 25-30% faster overall, and the scaling to 4-threads ('||-eff.' is short for 'parallel efficiency') is much better - my 4-thread timings are just over half of yours. When did you purchse your Odroid, and which precise model is it?[/QUOTE]
I bought them (5 boards) at PicoCluster ([url]www.picocluster.com[/url]) last March, and was running the test from node0, having a tail -f, 3 ssh and the whole GUI system running on it. Once I figure out how to access each board via ssh from my computer (the boards are preconfigured with a 10.0.x.x IP address while I am on a 192.168.x.x network), I suppose the timings should lower. I will run the next 2-3threads test on a different node just to see how it works. |
[QUOTE=ET_;461946]I bought them (5 boards) at PicoCluster ([url]www.picocluster.com[/url]) last March, and was running the test from node0, having a tail -f, 3 ssh and the whole GUI system running on it. Once I figure out how to access each board via ssh from my computer (the boards are preconfigured with a 10.0.x.x IP address while I am on a 192.168.x.x network), I suppose the timings should lower. I will run the next 2-3threads test on a different node just to see how it works.[/QUOTE]
Also - I ran my tests with the board open to the room air, and a fan blowing from across the room - no idea if these guys have a preinstalled temperature-monitoring system, but I figured since they are fanless, better safe than sorry. On to the vector-asm coding effort! |
[QUOTE=ewmayer;461988]Also - I ran my tests with the board open to the room air, and a fan blowing from across the room - no idea if these guys have a preinstalled temperature-monitoring system, but I figured since they are fanless, better safe than sorry.[/QUOTE]
In fact, I suppose that 5 boards and a switch closed inside a plexyglass cube may become hot and tend to throttle... Oh, well, time to design a cooling system will come. |
With some helpful advice from fellow forumite and ARM employee Tom Womack (a.k.a. fivemack), got my first nontrivial asm-macros put together and timing-tested in the last few days.
Copied in the code-box below is the ARMv8 version - at least the first go - of a complex 4-DFT with 3 complex twiddles. I tested this side-by-side with the SSE2 version of the same macro, which has no FMAs, obviously. My initial timings were surprising, and had the ARM code running faster - not only in cycles but further in terms of wall-clock time - on 1 CPU of my 1.5GHz Odroid-C2than than the SSE2 version of the same macro running on 1 CPU of my 2.0GHz Core2Duo macbook. But based on the relative theoretical instruction throughputs of the 2 respective CPUs that 's simply wildly implausible. One more odd thing from the same side-by-side testing ... on the Core2/SSE2 once the GCC opt-level hit -O1 the macro timings bottomed out, not surprising since the compiler treats the ASM loop body as a back box and can only optimize the loop logic. However on the ARM going from -O1 to -O3 gave slightly better than a 2-fold speedup. Further digging into possible timing-loop overheads quickly revealed the cause of the above oddities - the loop was running the macro in in-place mode, reading 16 doubles (8 complex-double, thus 4 vector-complex-double, hence "4-DFT") from a block of quasirandom (i.e. 'repeatably random') inited local memory, then writing back to the same memory. This requires one to re-init the macro inputs on each loop pass via memcpy. For one reason or another, GCC (v4.2 on my Core2 and 5.something on my Odroid) is doing a really bad job of this init step on the Core2 even at -O3, whereas in going from -O1 to -O3 on the Odroid the compiler is slashing the cost of the init. The obvious answer was to switch to running the macro in out-of-place mode, which allows the inputs to be inited just once and the loop body now consists just of the 4-DFT macro. With that, here are the opcounts and cycle counts for the 2 respective 128-bit SIMD implementations: x86_64 SSE2: 41 MEM (19 load[1 via mem-op in addpd], 14 store, 8 reg-copy), 22 ADDPD, 16 MULPD: 46 cycles. (Note I can get this down to 36 cycles, but only by using > 8 vector registers, which is inconsistent with being able to do two such 4-DFTs side-by-side, one in vector registers 0-7, the other in 8-15.) ARM v8 Neon: 11 MEM (7 load-pair, 4 store-pair, i.e. 22 total vector-load/stores via 11 instructions), 16 FADD, 12 FMUl/FMA: 93 cycles. (Here I use 12 vector registers to save some spill/fills and arithmetic, because I have 32 such registers to work with.) Thus almost exactly double the cycle count on the ARM vs the Core2. Here is the ARM inline-asm macro - note that q- and v- are different name prefixes for the same set of vector registers, the former treating a given register as an integer one, the latter as a floating-point register. The LDP and STP (load-pair and store-pair) instructions can be used to operate on either kind of underlying data but formally require the integer form of the register name. Thus we e.g. 'LPD q4,q5' to load 32 bytes of contiguous date from a memory location, then use the same resulting register data under the name v4 and v5 to do vector floating-point arithmetic on them. The '.2d' v-register suffixes mean 'treat register as pair of floating doubles': [code] __asm__ volatile (\ "ldr x0,%[__add0] \n\t"\ "ldr w1,%[__p1] \n\t"\ "ldr w2,%[__p2] \n\t"\ "ldr w3,%[__p3] \n\t"\ "ldr x4,%[__cc0] \n\t"\ "ldr x5,%[__r0] \n\t"\ "add x1, x0,x1,lsl #3 \n\t"\ "add x2, x0,x2,lsl #3 \n\t"\ "add x3, x0,x3,lsl #3 \n\t"\ /* SSE2_RADIX_04_DIF_3TWIDDLE(r0,c0): */\ /* Do the p0,p2 combo: */\ "ldp q4,q5,[x2] \n\t"\ "ldp q8,q9,[x4] \n\t"/* cc0 */\ "ldp q0,q1,[x0] \n\t"\ "fmul v6.2d,v4.2d,v8.2d \n\t"/* twiddle-mul: */\ "fmul v7.2d,v5.2d,v8.2d \n\t"\ "fmls v6.2d,v5.2d,v9.2d \n\t"\ "fmla v7.2d,v4.2d,v9.2d \n\t"\ "fsub v2.2d ,v0.2d,v6.2d \n\t"/* 2 x 2 complex butterfly: */\ "fsub v3.2d ,v1.2d,v7.2d \n\t"\ "fadd v10.2d,v0.2d,v6.2d \n\t"\ "fadd v11.2d,v1.2d,v7.2d \n\t"\ /* Do the p1,3 combo: */\ "ldp q8,q9,[x4,#0x40] \n\t"/* cc0+4 */\ "ldp q6,q7,[x3] \n\t"\ "fmul v0.2d,v6.2d,v8.2d \n\t"/* twiddle-mul: */\ "fmul v1.2d,v7.2d,v8.2d \n\t"\ "fmls v0.2d,v7.2d,v9.2d \n\t"\ "fmla v1.2d,v6.2d,v9.2d \n\t"\ "ldp q8,q9,[x4,#0x20] \n\t"/* cc0+2 */\ "ldp q6,q7,[x1] \n\t"\ "fmul v4.2d,v6.2d,v8.2d \n\t"/* twiddle-mul: */\ "fmul v5.2d,v7.2d,v8.2d \n\t"\ "fmls v4.2d,v7.2d,v9.2d \n\t"\ "fmla v5.2d,v6.2d,v9.2d \n\t"\ "fadd v6.2d,v4.2d,v0.2d \n\t"/* 2 x 2 complex butterfly: */\ "fadd v7.2d,v5.2d,v1.2d \n\t"\ "fsub v4.2d,v4.2d,v0.2d \n\t"\ "fsub v5.2d,v5.2d,v1.2d \n\t"\ /* Finish radix-4 butterfly and store results: */\ "fsub v8.2d,v10.2d,v6.2d \n\t"\ "fsub v9.2d,v11.2d,v7.2d \n\t"\ "fsub v1.2d,v3.2d,v4.2d \n\t"\ "fsub v0.2d,v2.2d,v5.2d \n\t"\ "fadd v6.2d,v6.2d,v10.2d \n\t"\ "fadd v7.2d,v7.2d,v11.2d \n\t"\ "fadd v4.2d,v4.2d,v3.2d \n\t"\ "fadd v5.2d,v5.2d,v2.2d \n\t"\ "stp q6,q7,[x5 ] \n\t"/* out 0 */\ "stp q0,q4,[x5,#0x20] \n\t"/* out 1 */\ "stp q8,q9,[x5,#0x40] \n\t"/* out 2 */\ "stp q5,q1,[x5,#0x60] \n\t"/* out 3 */\ : /* outputs: none */\ : [__add0] "m" (r0) /* All inputs from memory addresses here */\ ,[__p1] "m" (p1)\ ,[__p2] "m" (p2)\ ,[__p3] "m" (p3)\ ,[__two] "m" (two)\ ,[__cc0] "m" (cc0)\ ,[__r0] "m" (r0)\ : "cc","memory","x0","x1","x2","x3","x4","x5","v0","v1","v2","v3","v4","v5","v6","v7","v8","v9","v10","v11" /* Clobbered registers */\ );[/code] In using this basic kind of small-DFT macro to build up a larger one (say a radix-16 DFT), I typically implement 2-columns of sich code operating side-by-side on independent data, which helps hide latency. Here are the respective 1-column and 2-column cycle counts: x86_64 SSE2: 1-col = 46 cycles, 2-col = 77 cycles. ARM Neon: 1-col = 93 cycles, 2-col = 165 cycles. Thus a decent per-cycle throughput gain on both, but comparatively more for the SSE2 code. To the ARM experts hereabouts, do those timings seem reasonable? |
I am only a few weeks away from releasing a beta version Mlucas with ARMv8 SIMD-assembly support - many thanks to fellow forumite and ARM engineer Tom Womack (a.k.a. fivemack) for much useful assistance in my early steep-part-of-the-learning-curve coding efforts.
But let me damp down expectations right off the bat: The performance of the SIMD code is less than I'd hoped in my wide-eyed initial guesstimates - looks like all 4 cores of my Odroid C2 are roughly equivalent to 1 core of my vintage-2009 Core2 Duo macbook running an SSE2 build of Mlucas, and equivalent to perhaps 1/8th of both cores of my ham-sandwich-sized Intel Broadwell NUC running an AVX2 build f the code. I'm not sure how that stacks up on a per-watt basis, probably decently enough, but overall we're talking on the order of half a year or more to do a single exponent at the current GIMPS wavefront, and that number needs to come down in order to spur any appreciable user adoption. So I'm hoping readers/future-users of the ARMv8 code can tell me some good news about better performance for higher-end ARMv8 systems than my humble A53, and e.g. low-cost multi-socket ARMv8 systems which contain multiple copies of such 4-core CPUs. Heck, fot this kind of work we'd really like just a simple board with multiple CPUs and wouldn't even need any memory subsystem support for interprocessor communication, but I'm guessing that's a no-go from a marketing perspective for general-purpose compute hardware. Some interesting performance trends already visible in the current powers-of-2-only binary (Adding non-power-of-2 support is pretty quick at this point since it shares ~90% of the power-of-2 FFT-code infrastructure, merely requiring implementation of odd-radix DFT macros for radices 3,5,7,9,11,13,15, only the two composite ones of which require separate DIF and DIT version of said macros). A key performance-related parameter relates to the leading radix, used for the initial fFFT pass and final iFFT pass, let's call said radix R. Say I'm doing a length-N FFT. Once I do that initial radix-R pass which accesses stride-N/R-sperarated sets of data, the subsequent passes of the fFFT, the dyadic-mul step and the iFFT passes all the way up to the final radix-R iFFT one, all those operate on R disjoint chunks of N/R data each, which naturally are assigned to separate threads in a multithreaded run. The size of these disjoint chunks is thus key in terms of getting good cache performance - we want each such chunk to fit into L2, typically with some room to spare. Thus at any given FFT length we typically see a "sweet spot" leading radix R - make R smaller and the resulting larger data chunks start to spill out of L2, make R too large and the overhead of handling many small chunks begins to dominate the runtime. A typical example on the ARM is provided by various radix combos at 4096K FFT, where the sweet spot is at R = 256, which yields an N/R-chunksize of ~32MB/256 = 128 kB. Let's compare 100 iterations using leading radices 128 and 256 here, for 1 and 4-threads: R 1-thr 4-thr 128 70 sec 24 sec 256 64 sec 23 sec i.e. R = 256 gives a 10% speedupp over R = 128 in single-thread mode, but the advantage drops to just 4% when running 4-threaded. (The eagle-eyed follower of this thread may compare these timings to those I gave for my initial scalar-double C-code build in [url=http://mersenneforum.org/showpost.php?p=461934&postcount=80]post 80[/url] and note that e.g. the above 64/23 sec for 1/4-thread represent only ~1.5x speedup over the non-SIMD build - like I said, underwhelming.) Two possibilities immediately come to mind to explain the 4-thread behavior: memory bandwidth (i.e. the RAM can't keep all 4 cores fed) and thermal throttling (the C2 has no cooling fan, just a small heatsink). Is there a way to see if the latter is occurring, and if so, to what degree? I know there are temp-monitoring packages for various linux distros, but those require the OS to play nice with the underlying CPU and motherboard hardware. In [url=http://mersenneforum.org/showpost.php?p=461934&postcount=50]post 50 of this thread[/url] VistordeHolland mentions power draw of 200mW per core for the A53 implementation of ARMv8 (the one in the Odroid C2), but the little passive heatsink on my C2 gets sufficiently hot under load that I'm skeptical of the total die power draw being under 1W ... might the L2 cache's power draw be excluded from the 200mW figure? The cheap little plastic housing I bought as an accessory for my C2 is poorly ventilated and surely doesn't help, but is necessary to keep on at the moment since I'm sneakernetting code updates via thumb drive and need to protect the board during all that plugging-and-playing. The whole thermal-throttling thing may prove to be a bogus notion, but I won't know until I know, right? Lastly for now, can any of our resident ARM coders tell me whether ARM has some analog of x86's CPUID functionality? It would be nice to be able to support the same kind of "on program startup, check the CPU's SIMD support against that targeted by the build. If build target instructions not supported by CPU quite with error; if CPU supports SIMD but build does not target same, print info-message to that effect" functionality I use for x86 SIMD-capable CPUs/builds. |
[QUOTE=ewmayer;470475]Lastly for now, can any of our resident ARM coders tell me whether ARM has some analog of x86's CPUID functionality? It would be nice to be able to support the same kind of "on program startup, check the CPU's SIMD support against that targeted by the build. If build target instructions not supported by CPU quite with error; if CPU supports SIMD but build does not target same, print info-message to that effect" functionality I use for x86 SIMD-capable CPUs/builds.[/QUOTE]Yes, of course. They are implementation dependant but ID registers are usually named ID_AA64* & ID_ISAR* and are accessed with the MRS instruction. You'll have to read your specific CPU manual to see how they are defined, but the generic ARMv8 manual lists the names and bit positions for the generic case.
|
There's a persistent rumor that Apple, which already designs its own ARM-based chips for mobile devices, is considering moving the Mac to ARM too.
They have already successfully navigated two architecture switches in past decades, from Motorola 68xxx to PowerPC to Intel, so it's surely feasible. Intel architecture has stagnated for some time now, and seems to have its inherent limitations. And the consumer demand for faster chips on the desktop is modest at best. Meanwhile, all the mobile devices are doing face recognition and augmented reality and whatnot while coping with limited battery life, so there's a neverending powerful industrywide incentive to keep making ARM run faster and use less power. So I think ARM will at some point become very relevant to our interests, and it's good to get ahead of the curve. I personally wouldn't care if each individual exponent took half a year, as long as I could run a whole bunch of them in parallel at the lowest possible buck for the bang. |
[QUOTE=retina;470479]Yes, of course. They are implementation dependant but ID registers are usually named ID_AA64* & ID_ISAR* and are accessed with the MRS instruction. You'll have to read your specific CPU manual to see how they are defined, but the generic ARMv8 manual lists the names and bit positions for the generic case.[/QUOTE]
Alas most of these registers can't be read from user space. There exists two possibilities: [LIST=1][*]parse the output of /proc/cpuinfo[*]play with getauxval and HWCAP (<sys/auxv.h> and <asm/hwcap.h>)[/LIST]I never tried any of these so this might not be exactly what Ernst needs. |
I actually find it easier to not bother with all the myriad of ID values and registers. Instead I just set an invalid instruction trap and start executing something from the set I want to use. If it traps then I drop back one level and try again. This covers both bases where either the OS or the CPU doesn't support the instructions.
|
[QUOTE=retina;470512]I actually find it easier to not bother with all the myriad of ID values and registers. Instead I just set an invalid instruction trap and start executing something from the set I want to use. If it traps then I drop back one level and try again. This covers both bases where either the OS or the CPU doesn't support the instructions.[/QUOTE]
A similar thought had occurred to me when I couldn't find any CPUID-style mention in the ARM instruction manual, but time to get concrete about this. Is that something easily doable in C - e.g. using the functionality of signal.h to catch SIGILL - or does it need C++'s try/catch exception handling? If you could post some sample code with your implementation, that would be great. |
[QUOTE=ewmayer;470525]A similar thought had occurred to me when I couldn't find any CPUID-style mention in the ARM instruction manual, but time to get concrete about this. Is that something easily doable in C - e.g. using the functionality of signal.h to catch SIGILL - or does it need C++'s try/catch exception handling?
If you could post some sample code with your implementation, that would be great.[/QUOTE]I don't use C/C++, or any HLL. It's all assembly. And I don't use Linux, or any commercial OS. So I'm not sure how much any code I have that could help you. But the try/catch thing (or its equivalent) would appear to be the easiest thing to use here. Pick a representative instruction and put it in the try block. Build up a pool of true/false flags. |
I don't think the exception-handling semantics in C++ are capable of dealing with unix style signals, which by their nature involve interrupt handling and a context switch to the OS. I suppose you could ignore the illegal instruction signal and then look to see if the instruction you are testing has had the intended effect on sample data, i.e. do a SIMD vector add and check that vector(1)+vector(1) == vector(2).
More mundane: what about parsing /proc/cpuinfo in linux? On x86 this parses the cpuid bits for you. Is there an ARM equivalent for that? |
[QUOTE=jasonp;470667]I don't think the exception-handling semantics in C++ are capable of dealing with unix style signals, which by their nature involve interrupt handling and a context switch to the OS. I suppose you could ignore the illegal instruction signal and then look to see if the instruction you are testing has had the intended effect on sample data, i.e. do a SIMD vector add and check that vector(1)+vector(1) == vector(2).[/QUOTE]I'm not sure how you could just ignore the illegal instruction trap. You would have to rewrite EIP/RIP properly to skip the faulting instruction, else you keep returning the the same instruction.
For my code I just lay it out like this (pseudo code):[code]//set all flags to zero supports_FPU = 0 supports_CMOV = 0 //... supports_AVX512F = 0 //test each instruction set try { finit //FPU instruction supports_FPU = 1 } catch { //nothing to do here } try { cmoveq eax,eax //CMOV instruction supports_CMOV = 1 } catch { //nothing to do here } //... try { vpabsq zmm1 {k1}, zmm2, zmm3 //AVX512 Foundation instruction supports_AVX512F = 1 } catch { //nothing to do here }[/code] |
Ignoring SIGILL presupposes that the processor triggers a synchronous exception that starts an exception handler, and when the user-defined portion of that handler does nothing then execution restarts at the instruction after the faulting one, whose address was saved by hardware and is accessible somewhere. At least that's how the embedded CPUs I'm familiar with would work.
I see there are some projects on github that use unix internals to trap things like segfaults and convert them into C++ exceptions; pretty slick. |
Ideally the [b]try[/b] block will set the trap address to the start of the following [b]catch[/b] block. So when it traps EIP/RIP is updated to point to the catch block and the flag never gets set to 1. The EIP/RIP rewrite has to be handled in the exception code. And for code like I have above it won't work if the code always returns to the following instruction because then the flag is always set regardless of whether or not the instruction was valid.
|
Here is the output of cat /proc/cpuinfo on a 64-bit machine:
[code]$ cat /proc/cpuinfo | grep Features | sort -u Features : fp asimd evtstrm[/code]And here is code to check undefined instructions in C: [code]#include <stdio.h> #include <string.h> #include <errno.h> #include <signal.h> static int got_sigill; static void illegal_handler(int signum, siginfo_t *info, void *context) { ucontext_t *uc = (ucontext_t *)context; printf("Got SIGILL\n"); got_sigill = 1; uc->uc_mcontext.pc += 4; /* AArch64 instructions always are 4-byte long */ } static int setup_illegal(void) { struct sigaction act; memset(&act, 0, sizeof(act)); act.sa_handler = illegal_handler; act.sa_flags = SA_SIGINFO; errno = 0; if (sigaction(SIGILL, &act, NULL) < 0) { perror("signaction"); return 1; } return 0; } int main(void) { int err; /* first setup the signal handler */ err = setup_illegal(); if (err) { return 1; } /* now check instruction */ got_sigill = 0; asm(".inst 0x0"); if (got_sigill) { printf("Instruction is undefined.\n"); } else { printf("Instruction is not undefined.\n"); } return 0; }[/code] |
And here is how to use HW capabilities:
[code]#include <stdio.h> #include <sys/auxv.h> #include <asm/hwcap.h> static int has_asimd(void) { unsigned long hwcaps = getauxval(AT_HWCAP); if (hwcaps & HWCAP_ASIMD) { return 1; } return 0; } int main(void) { int asimd; asimd = has_asimd(); if (asimd) { printf("AdvSIMD is supported.\n"); } else { printf("AdvSIMD is NOT supported.\n"); } return 0; }[/code] |
| All times are UTC. The time now is 04:24. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.