mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Running mprime/Prime95 on armhf arch (https://www.mersenneforum.org/showthread.php?t=20614)

RedPhantom 2015-11-02 19:38

Running mprime/Prime95 on armhf arch
 
Hello.
I've recently bought a microcomputer by the name of Raspberry Pi. Its processor architecture is [I]armhf[/I],
and it seems like mrprime won't run on the Pi. I tried the linux32 and 64 builds, and building the sourcecode is not an option because the primenet servers won't be supported.

Any solutions?
Thanks.

Mark Rose 2015-11-02 21:18

It won't work. mprime is written for x86 and x86-64 processors, not arm processors.

paulunderwood 2015-11-02 21:39

[QUOTE=RedPhantom;414684]Hello.
I've recently bought a microcomputer by the name of Raspberry Pi. Its processor architecture is [I]armhf[/I],
and it seems like mrprime won't run on the Pi. I tried the linux32 and 64 builds, and building the sourcecode is not an option because the primenet servers won't be supported.

Any solutions?
Thanks.[/QUOTE]

You can run [URL="http://hogranch.com/mayer/README.html"]Mlucas[/URL], I think. Ernst?

Testing for a big mersenne prime on a r-pi is going to take a long time to do :rolleyes:

Mark Rose 2015-11-02 22:32

[QUOTE=paulunderwood;414704]You can run [URL="http://hogranch.com/mayer/README.html"]Mlucas[/URL], I think. Ernst?

Testing for a big mersenne prime on a r-pi is going to take a long time to do :rolleyes:[/QUOTE]

Probably years. They have the performance of a 15 year old laptop.

henryzz 2015-11-02 22:47

The raspberry pi 2 does some things 20x slower than my 8 year old Q6600.

ewmayer 2015-11-02 23:17

[QUOTE=paulunderwood;414704]You can run [URL="http://hogranch.com/mayer/README.html"]Mlucas[/URL], I think. Ernst?

Testing for a big mersenne prime on a r-pi is going to take a long time to do :rolleyes:[/QUOTE]

Should work - I know Alex Vong (the fellow I worked with over the summer to get Mlucas submitted as a Debian package) tweaked several platform-related headers and the auto-build scripts for several variants of ARM, but there is probably still work to be done on that front. Any would-be ARM'linux builders should try the latest Mlucas release with the auto-build stuff, and PM me if they encounter issues.

As others have noted, things gonna be very slow on ARM, but more people doing builds for various flavors of ARM/linux and perhaps doing some DC work if they manage to get a working binary is always a useful exercise.

pepi37 2016-02-08 12:55

[QUOTE=henryzz;414710]The raspberry pi 2 does some things 20x slower than my 8 year old Q6600.[/QUOTE]
Is it possible to compile sr1sieve and sr2sieve on raspberry pi 2, and how "fast" is raspberry in sieving ( if someone have real results)

VictordeHolland 2016-02-08 22:57

Compiling binaries for ARM might not be very useful now. But it could be handy in the future [U][B]if[/B] [/U]ARM gets a slice of the server market. And it could be done just for fun :P.
Don't underestimate the supremacy of Intel/x86-64 in the current server market though. IBMs POWER architecture is only surviving in mainframes/ERP systems and AMD has a severe process node disadvantage (28nm vs. 14nm)

I had to look up [B]armhf[/B]
[quote]
Higher-end Arm processors come bundled with additional capability that enables hardware execution of floating point operations. The difference between these two architectures gave rise to two separate Embedded Application Binary Interfaces or EABIs for ARM: [I]soft float[/I] and [I]VFP (Vector Floating Point)[/I]. Although a big step up in performance, the VFP EABI utilizes less-than-optimal argument passing when a floating point operations take place. In this scenario, floating point arguments must first be passed through integer registers prior to executing in the floating point unit.
In the Linux community, releases built upon both these EABIs are refereed to as [I]armel[/I] based distributions.

A new EABI, referred to as [I]armhf[/I] optimizes the calling convention for floating point operations by passing arguments directly into floating point registers. The end result is applications compiled with the [I]armhf[/I] standard should demonstrate modest performance improvement in some cases, and significant improvement for floating point intensive applications.[/quote]source: [URL]https://blogs.oracle.com/jtc/entry/is_it_armhf_or_armel[/URL]

Then there is the maze of : VFPv3, VFPv4, NEON, ARMv7-A, AArch32, AArch64, ARMv8-A

and the different architectures:
ARM11

Cortex-A5
Cortex-A7
Cortex-A8
Cortex-A9
Cortex-A12
Cortex-A15
Cortex-A17

Cortex-A35
Cortex-A53
Cortex-A57
Cortex-A72
and half a dozen custom cores from Apple and Qualcomm.

Getting the software to work on ARM would be a huge effort in itself. Getting it to work efficiently on the different architectures is another thing.

Are the compilers mature enough? There is the compiler from ARM (you can get a 30 day free trial).
GCC apparently also has ARM support:
[URL]https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html[/URL]

Some possibly useful compile options in GCC:
[code]
-mfloat-abi=name
Specifies which floating-point ABI to use. Permissible values are: ‘soft’, ‘softfp’ and ‘hard’. Specifying ‘soft’ causes GCC to generate output containing library calls for floating-point operations. ‘softfp’ allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. ‘hard’ allows generation of floating-point instructions and uses FPU-specific calling conventions.
The default depends on the specific target configuration. Note that the hard-float and soft-float ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries.

-mlittle-endian
Generate code for a processor running in little-endian mode. This is the default for all standard configurations.
-mbig-endian
Generate code for a processor running in big-endian mode; the default is to compile code for a little-endian processor.

-march=name
This specifies the name of the target ARM architecture. GCC uses this name to determine what kind of instructions it can emit when generating assembly code. This option can be used in conjunction with or instead of the -mcpu= option. Permissible names are: ‘armv2’, ‘armv2a’, ‘armv3’, ‘armv3m’, ‘armv4’, ‘armv4t’, ‘armv5’, ‘armv5t’, ‘armv5e’, ‘armv5te’, ‘armv6’, ‘armv6j’, ‘armv6t2’, ‘armv6z’, ‘armv6kz’, ‘armv6-m’, ‘armv7’, ‘armv7-a’, ‘armv7-r’, ‘armv7-m’, ‘armv7e-m’, ‘armv7ve’, ‘armv8-a’, ‘armv8-a+crc’, ‘armv8.1-a’, ‘armv8.1-a+crc’, ‘iwmmxt’, ‘iwmmxt2’, ‘ep9312’.
-march=armv7ve is the armv7-a architecture with virtualization extensions.
-march=armv8-a+crc enables code generation for the ARMv8-A architecture together with the optional CRC32 extensions.
-march=native causes the compiler to auto-detect the architecture of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.

-mtune=name
This option specifies the name of the target ARM processor for which GCC should tune the performance of the code. For some ARM implementations better performance can be obtained by using this option. Permissible names are: ‘arm2’, ‘arm250’, ‘arm3’, ‘arm6’, ‘arm60’, ‘arm600’, ‘arm610’, ‘arm620’, ‘arm7’, ‘arm7m’, ‘arm7d’, ‘arm7dm’, ‘arm7di’, ‘arm7dmi’, ‘arm70’, ‘arm700’, ‘arm700i’, ‘arm710’, ‘arm710c’, ‘arm7100’, ‘arm720’, ‘arm7500’, ‘arm7500fe’, ‘arm7tdmi’, ‘arm7tdmi-s’, ‘arm710t’, ‘arm720t’, ‘arm740t’, ‘strongarm’, ‘strongarm110’, ‘strongarm1100’, ‘strongarm1110’, ‘arm8’, ‘arm810’, ‘arm9’, ‘arm9e’, ‘arm920’, ‘arm920t’, ‘arm922t’, ‘arm946e-s’, ‘arm966e-s’, ‘arm968e-s’, ‘arm926ej-s’, ‘arm940t’, ‘arm9tdmi’, ‘arm10tdmi’, ‘arm1020t’, ‘arm1026ej-s’, ‘arm10e’, ‘arm1020e’, ‘arm1022e’, ‘arm1136j-s’, ‘arm1136jf-s’, ‘mpcore’, ‘mpcorenovfp’, ‘arm1156t2-s’, ‘arm1156t2f-s’, ‘arm1176jz-s’, ‘arm1176jzf-s’, ‘generic-armv7-a’, ‘cortex-a5’, ‘cortex-a7’, ‘cortex-a8’, ‘cortex-a9’, ‘cortex-a12’, ‘cortex-a15’, ‘cortex-a17’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-r4’, ‘cortex-r4f’, ‘cortex-r5’, ‘cortex-r7’, ‘cortex-m7’, ‘cortex-m4’, ‘cortex-m3’, ‘cortex-m1’, ‘cortex-m0’, ‘cortex-m0plus’, ‘cortex-m1.small-multiply’, ‘cortex-m0.small-multiply’, ‘cortex-m0plus.small-multiply’, ‘exynos-m1’, ‘qdf24xx’, ‘marvell-pj4’, ‘xscale’, ‘iwmmxt’, ‘iwmmxt2’, ‘ep9312’, ‘fa526’, ‘fa626’, ‘fa606te’, ‘fa626te’, ‘fmp626’, ‘fa726te’, ‘xgene1’.
Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible names are: ‘cortex-a15.cortex-a7’, ‘cortex-a17.cortex-a7’, ‘cortex-a57.cortex-a53’, ‘cortex-a72.cortex-a53’.
-mtune=generic-arch specifies that GCC should tune the performance for a blend of processors within architecture arch. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs. The effects of this option may change in future GCC versions as CPU models come and go.
-mtune=native causes the compiler to auto-detect the CPU of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.

-mcpu=name
This specifies the name of the target ARM processor. GCC uses this name to derive the name of the target ARM architecture (as if specified by -march) and the ARM processor type for which to tune for performance (as if specified by -mtune). Where this option is used in conjunction with -march or -mtune, those options take precedence over the appropriate part of this option.
Permissible names for this option are the same as those for -mtune.
-mcpu=generic-arch is also permissible, and is equivalent to -march=arch -mtune=generic-arch. See -mtune for more information.
-mcpu=native causes the compiler to auto-detect the CPU of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect.

-mfpu=name
This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: ‘vfp’, ‘vfpv3’, ‘vfpv3-fp16’, ‘vfpv3-d16’, ‘vfpv3-d16-fp16’, ‘vfpv3xd’, ‘vfpv3xd-fp16’, ‘neon’, ‘neon-fp16’, ‘vfpv4’, ‘vfpv4-d16’, ‘fpv4-sp-d16’, ‘neon-vfpv4’, ‘fpv5-d16’, ‘fpv5-sp-d16’, ‘fp-armv8’, ‘neon-fp-armv8’ and ‘crypto-neon-fp-armv8’.
If -msoft-float is specified this specifies the format of floating-point values.
If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.
You can also set the fpu name at function level by using the target("fpu=") function attributes (see ARM Function Attributes) or pragmas (see Function Specific Option Pragmas).

-mneon-for-64bits
Enables using Neon to handle scalar 64-bits operations. This is disabled by default since the cost of moving data from core registers to Neon is high.

[/code]

ldesnogu 2016-02-09 18:57

Some points that I hope will make things clearer :smile:

[QUOTE=VictordeHolland;425684]
Then there is the maze of : VFPv3, VFPv4, NEON, ARMv7-A, AArch32, AArch64, ARMv8-A[/QUOTE]
VFPv[34] are FP instruction variants
NEON is the SIMD instruction set
ARMv7-A is the version 7 of the ARM architecture
ARMv8-A is the version 8 of the ARM architecture; it's made of 32-bit architecture AArch32 (mostly compatible with old software) and 64-bit AArch64.

[quote]and the different architectures:
ARM11

Cortex-A5
Cortex-A7
Cortex-A8
Cortex-A9
Cortex-A12
Cortex-A15
Cortex-A17

Cortex-A35
Cortex-A53
Cortex-A57
Cortex-A72
and half a dozen custom cores from Apple and Qualcomm.[/quote]These are not architectures, these are CPU implementing variants of the ARM architecture (in the same way, Haswell is not called an architecture).

A5-A17 implement ARMv7-A architecture.
A35-A72 implement ARMv8-A architecture (both AArch32 and AArch64).

[quote]Getting the software to work on ARM would be a huge effort in itself.[/quote]Portable software (no asm, no intrinsics) should be easily portable.

[quote]Getting it to work efficiently on the different architectures is another thing.[/quote]I guess you meant optimizing for the various CPU's, right? Yes, that's hard, but it's no harder than optimizing than for the various Intel CPU and instruction variants, ask Ernst about that ;-)

[quote]Are the compilers mature enough? There is the compiler from ARM (you can get a 30 day free trial).
GCC apparently also has ARM support:
[URL]https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html[/URL][/quote]gcc support is very good.

bgbeuning 2016-02-09 20:03

[QUOTE=henryzz;414710]The raspberry pi 2 does some things 20x slower than my 8 year old Q6600.[/QUOTE]

The Raspberry Pi uses a slow ARM chip.
How does a current high end ARM compare to his 8 year old Q6600?

ldesnogu 2016-02-09 20:27

[QUOTE=bgbeuning;425747]The Raspberry Pi uses a slow ARM chip.
How does a current high end ARM compare to his 8 year old Q6600?[/QUOTE]
FWIW:
Q6600 @ 4.6GHz vs Cortex-A72 @ 2.x GHz
[url]http://browser.primatelabs.com/geekbench3/compare/3284163?baseline=4630896[/url]


All times are UTC. The time now is 20:49.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.