![]() |
Running mprime/Prime95 on armhf arch
Hello.
I've recently bought a microcomputer by the name of Raspberry Pi. Its processor architecture is [I]armhf[/I], and it seems like mrprime won't run on the Pi. I tried the linux32 and 64 builds, and building the sourcecode is not an option because the primenet servers won't be supported. Any solutions? Thanks. |
It won't work. mprime is written for x86 and x86-64 processors, not arm processors.
|
[QUOTE=RedPhantom;414684]Hello.
I've recently bought a microcomputer by the name of Raspberry Pi. Its processor architecture is [I]armhf[/I], and it seems like mrprime won't run on the Pi. I tried the linux32 and 64 builds, and building the sourcecode is not an option because the primenet servers won't be supported. Any solutions? Thanks.[/QUOTE] You can run [URL="http://hogranch.com/mayer/README.html"]Mlucas[/URL], I think. Ernst? Testing for a big mersenne prime on a r-pi is going to take a long time to do :rolleyes: |
[QUOTE=paulunderwood;414704]You can run [URL="http://hogranch.com/mayer/README.html"]Mlucas[/URL], I think. Ernst?
Testing for a big mersenne prime on a r-pi is going to take a long time to do :rolleyes:[/QUOTE] Probably years. They have the performance of a 15 year old laptop. |
The raspberry pi 2 does some things 20x slower than my 8 year old Q6600.
|
[QUOTE=paulunderwood;414704]You can run [URL="http://hogranch.com/mayer/README.html"]Mlucas[/URL], I think. Ernst?
Testing for a big mersenne prime on a r-pi is going to take a long time to do :rolleyes:[/QUOTE] Should work - I know Alex Vong (the fellow I worked with over the summer to get Mlucas submitted as a Debian package) tweaked several platform-related headers and the auto-build scripts for several variants of ARM, but there is probably still work to be done on that front. Any would-be ARM'linux builders should try the latest Mlucas release with the auto-build stuff, and PM me if they encounter issues. As others have noted, things gonna be very slow on ARM, but more people doing builds for various flavors of ARM/linux and perhaps doing some DC work if they manage to get a working binary is always a useful exercise. |
[QUOTE=henryzz;414710]The raspberry pi 2 does some things 20x slower than my 8 year old Q6600.[/QUOTE]
Is it possible to compile sr1sieve and sr2sieve on raspberry pi 2, and how "fast" is raspberry in sieving ( if someone have real results) |
Compiling binaries for ARM might not be very useful now. But it could be handy in the future [U][B]if[/B] [/U]ARM gets a slice of the server market. And it could be done just for fun :P.
Don't underestimate the supremacy of Intel/x86-64 in the current server market though. IBMs POWER architecture is only surviving in mainframes/ERP systems and AMD has a severe process node disadvantage (28nm vs. 14nm) I had to look up [B]armhf[/B] [quote] Higher-end Arm processors come bundled with additional capability that enables hardware execution of floating point operations. The difference between these two architectures gave rise to two separate Embedded Application Binary Interfaces or EABIs for ARM: [I]soft float[/I] and [I]VFP (Vector Floating Point)[/I]. Although a big step up in performance, the VFP EABI utilizes less-than-optimal argument passing when a floating point operations take place. In this scenario, floating point arguments must first be passed through integer registers prior to executing in the floating point unit. In the Linux community, releases built upon both these EABIs are refereed to as [I]armel[/I] based distributions. A new EABI, referred to as [I]armhf[/I] optimizes the calling convention for floating point operations by passing arguments directly into floating point registers. The end result is applications compiled with the [I]armhf[/I] standard should demonstrate modest performance improvement in some cases, and significant improvement for floating point intensive applications.[/quote]source: [URL]https://blogs.oracle.com/jtc/entry/is_it_armhf_or_armel[/URL] Then there is the maze of : VFPv3, VFPv4, NEON, ARMv7-A, AArch32, AArch64, ARMv8-A and the different architectures: ARM11 Cortex-A5 Cortex-A7 Cortex-A8 Cortex-A9 Cortex-A12 Cortex-A15 Cortex-A17 Cortex-A35 Cortex-A53 Cortex-A57 Cortex-A72 and half a dozen custom cores from Apple and Qualcomm. Getting the software to work on ARM would be a huge effort in itself. Getting it to work efficiently on the different architectures is another thing. Are the compilers mature enough? There is the compiler from ARM (you can get a 30 day free trial). GCC apparently also has ARM support: [URL]https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html[/URL] Some possibly useful compile options in GCC: [code] -mfloat-abi=name Specifies which floating-point ABI to use. Permissible values are: ‘soft’, ‘softfp’ and ‘hard’. Specifying ‘soft’ causes GCC to generate output containing library calls for floating-point operations. ‘softfp’ allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. ‘hard’ allows generation of floating-point instructions and uses FPU-specific calling conventions. The default depends on the specific target configuration. Note that the hard-float and soft-float ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries. -mlittle-endian Generate code for a processor running in little-endian mode. This is the default for all standard configurations. -mbig-endian Generate code for a processor running in big-endian mode; the default is to compile code for a little-endian processor. -march=name This specifies the name of the target ARM architecture. GCC uses this name to determine what kind of instructions it can emit when generating assembly code. This option can be used in conjunction with or instead of the -mcpu= option. Permissible names are: ‘armv2’, ‘armv2a’, ‘armv3’, ‘armv3m’, ‘armv4’, ‘armv4t’, ‘armv5’, ‘armv5t’, ‘armv5e’, ‘armv5te’, ‘armv6’, ‘armv6j’, ‘armv6t2’, ‘armv6z’, ‘armv6kz’, ‘armv6-m’, ‘armv7’, ‘armv7-a’, ‘armv7-r’, ‘armv7-m’, ‘armv7e-m’, ‘armv7ve’, ‘armv8-a’, ‘armv8-a+crc’, ‘armv8.1-a’, ‘armv8.1-a+crc’, ‘iwmmxt’, ‘iwmmxt2’, ‘ep9312’. -march=armv7ve is the armv7-a architecture with virtualization extensions. -march=armv8-a+crc enables code generation for the ARMv8-A architecture together with the optional CRC32 extensions. -march=native causes the compiler to auto-detect the architecture of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect. -mtune=name This option specifies the name of the target ARM processor for which GCC should tune the performance of the code. For some ARM implementations better performance can be obtained by using this option. Permissible names are: ‘arm2’, ‘arm250’, ‘arm3’, ‘arm6’, ‘arm60’, ‘arm600’, ‘arm610’, ‘arm620’, ‘arm7’, ‘arm7m’, ‘arm7d’, ‘arm7dm’, ‘arm7di’, ‘arm7dmi’, ‘arm70’, ‘arm700’, ‘arm700i’, ‘arm710’, ‘arm710c’, ‘arm7100’, ‘arm720’, ‘arm7500’, ‘arm7500fe’, ‘arm7tdmi’, ‘arm7tdmi-s’, ‘arm710t’, ‘arm720t’, ‘arm740t’, ‘strongarm’, ‘strongarm110’, ‘strongarm1100’, ‘strongarm1110’, ‘arm8’, ‘arm810’, ‘arm9’, ‘arm9e’, ‘arm920’, ‘arm920t’, ‘arm922t’, ‘arm946e-s’, ‘arm966e-s’, ‘arm968e-s’, ‘arm926ej-s’, ‘arm940t’, ‘arm9tdmi’, ‘arm10tdmi’, ‘arm1020t’, ‘arm1026ej-s’, ‘arm10e’, ‘arm1020e’, ‘arm1022e’, ‘arm1136j-s’, ‘arm1136jf-s’, ‘mpcore’, ‘mpcorenovfp’, ‘arm1156t2-s’, ‘arm1156t2f-s’, ‘arm1176jz-s’, ‘arm1176jzf-s’, ‘generic-armv7-a’, ‘cortex-a5’, ‘cortex-a7’, ‘cortex-a8’, ‘cortex-a9’, ‘cortex-a12’, ‘cortex-a15’, ‘cortex-a17’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-r4’, ‘cortex-r4f’, ‘cortex-r5’, ‘cortex-r7’, ‘cortex-m7’, ‘cortex-m4’, ‘cortex-m3’, ‘cortex-m1’, ‘cortex-m0’, ‘cortex-m0plus’, ‘cortex-m1.small-multiply’, ‘cortex-m0.small-multiply’, ‘cortex-m0plus.small-multiply’, ‘exynos-m1’, ‘qdf24xx’, ‘marvell-pj4’, ‘xscale’, ‘iwmmxt’, ‘iwmmxt2’, ‘ep9312’, ‘fa526’, ‘fa626’, ‘fa606te’, ‘fa626te’, ‘fmp626’, ‘fa726te’, ‘xgene1’. Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible names are: ‘cortex-a15.cortex-a7’, ‘cortex-a17.cortex-a7’, ‘cortex-a57.cortex-a53’, ‘cortex-a72.cortex-a53’. -mtune=generic-arch specifies that GCC should tune the performance for a blend of processors within architecture arch. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs. The effects of this option may change in future GCC versions as CPU models come and go. -mtune=native causes the compiler to auto-detect the CPU of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect. -mcpu=name This specifies the name of the target ARM processor. GCC uses this name to derive the name of the target ARM architecture (as if specified by -march) and the ARM processor type for which to tune for performance (as if specified by -mtune). Where this option is used in conjunction with -march or -mtune, those options take precedence over the appropriate part of this option. Permissible names for this option are the same as those for -mtune. -mcpu=generic-arch is also permissible, and is equivalent to -march=arch -mtune=generic-arch. See -mtune for more information. -mcpu=native causes the compiler to auto-detect the CPU of the build computer. At present, this feature is only supported on GNU/Linux, and not all architectures are recognized. If the auto-detect is unsuccessful the option has no effect. -mfpu=name This specifies what floating-point hardware (or hardware emulation) is available on the target. Permissible names are: ‘vfp’, ‘vfpv3’, ‘vfpv3-fp16’, ‘vfpv3-d16’, ‘vfpv3-d16-fp16’, ‘vfpv3xd’, ‘vfpv3xd-fp16’, ‘neon’, ‘neon-fp16’, ‘vfpv4’, ‘vfpv4-d16’, ‘fpv4-sp-d16’, ‘neon-vfpv4’, ‘fpv5-d16’, ‘fpv5-sp-d16’, ‘fp-armv8’, ‘neon-fp-armv8’ and ‘crypto-neon-fp-armv8’. If -msoft-float is specified this specifies the format of floating-point values. If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=‘neon’), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision. You can also set the fpu name at function level by using the target("fpu=") function attributes (see ARM Function Attributes) or pragmas (see Function Specific Option Pragmas). -mneon-for-64bits Enables using Neon to handle scalar 64-bits operations. This is disabled by default since the cost of moving data from core registers to Neon is high. [/code] |
Some points that I hope will make things clearer :smile:
[QUOTE=VictordeHolland;425684] Then there is the maze of : VFPv3, VFPv4, NEON, ARMv7-A, AArch32, AArch64, ARMv8-A[/QUOTE] VFPv[34] are FP instruction variants NEON is the SIMD instruction set ARMv7-A is the version 7 of the ARM architecture ARMv8-A is the version 8 of the ARM architecture; it's made of 32-bit architecture AArch32 (mostly compatible with old software) and 64-bit AArch64. [quote]and the different architectures: ARM11 Cortex-A5 Cortex-A7 Cortex-A8 Cortex-A9 Cortex-A12 Cortex-A15 Cortex-A17 Cortex-A35 Cortex-A53 Cortex-A57 Cortex-A72 and half a dozen custom cores from Apple and Qualcomm.[/quote]These are not architectures, these are CPU implementing variants of the ARM architecture (in the same way, Haswell is not called an architecture). A5-A17 implement ARMv7-A architecture. A35-A72 implement ARMv8-A architecture (both AArch32 and AArch64). [quote]Getting the software to work on ARM would be a huge effort in itself.[/quote]Portable software (no asm, no intrinsics) should be easily portable. [quote]Getting it to work efficiently on the different architectures is another thing.[/quote]I guess you meant optimizing for the various CPU's, right? Yes, that's hard, but it's no harder than optimizing than for the various Intel CPU and instruction variants, ask Ernst about that ;-) [quote]Are the compilers mature enough? There is the compiler from ARM (you can get a 30 day free trial). GCC apparently also has ARM support: [URL]https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html[/URL][/quote]gcc support is very good. |
[QUOTE=henryzz;414710]The raspberry pi 2 does some things 20x slower than my 8 year old Q6600.[/QUOTE]
The Raspberry Pi uses a slow ARM chip. How does a current high end ARM compare to his 8 year old Q6600? |
[QUOTE=bgbeuning;425747]The Raspberry Pi uses a slow ARM chip.
How does a current high end ARM compare to his 8 year old Q6600?[/QUOTE] FWIW: Q6600 @ 4.6GHz vs Cortex-A72 @ 2.x GHz [url]http://browser.primatelabs.com/geekbench3/compare/3284163?baseline=4630896[/url] |
We are here the "master of the cortex m" since few years when we started playing with STM32 dies from ST, for our employer's interests. We are very found of those little grains. Technically, a 20-50 cents Cortex M3 MCU will run from 1.65V and it will consume a 140 micro-Amps per MHz, it could run as fast as 72 MHz, they have internal [U]one-clock[/U] (single cycle, they are ARM, don't forget) 32 bit full multiplier (result on 64 bits) and you can fill all (half, because you need the other half of them for the result) registers with operands and multiply all in the same time, therefore about 12 of them could do (for max 6 dollars) the same (factoring) work as a single core i7 running at 3GHz, for a fraction of the power consumption. I said factoring because LL would imply a lot of transfers between them, but factoring is self-contained. So, take 960 of them and put them on the PCB and give each a prime p and a class for a k, they could do the sieving and exponentiation by themselves for that particular class, rivaling as price and efficiency to a gtx580 or a hd7970. Each will do a class in the same time your GPU is doing ALL classes, but you have 960 of them and they run ALL classes in parallel (there are 960 "active" classes, i.e. needing to be "done" on the 4620-classes split for your mfaktX).
Of course, you could select those with more features, like external RAM interface (they have internal static RAM up to 80 KB, and internal fast NOR flash up to 1 MB, that already cost 2-3 dollars per chip or so) and do a "LL-tester" which could finish a 78M exponent in about 800-900 days (very coarse calculus, by dividing 4GHz to 72MHz and multiplying with ~15 days). Such "system" will cost you less then 8 dollars, including external static RAM able to hold the huge modular stuff. So, you can have about 65 of them for about $500, which will consume about the same as your CPU and finish 65 LL tests in 800 days. This is better than your (single core) CPU can do. But if you add the costs for other things that you put in your rig, you can easily rival a dual core, or even a quad core, if you switch to Cortex M4. And this is only with little grains of silicon that you can buy for 50 cents and work at 72 MHz, and only deliver about 0.95 DMips/MHz. Don't forget [URL="http://www.silabs.com/Support%20Documents/TechnicalDocs/Which-ARM-Cortex-Core-Is-Right-for-Your-Application.pdf"]Cortex R[/URL] can work to 600 MHz and deliver almost 2.5 DMips/MHz and their "big brothers" like R7/M7 go [U]over[/U] 1 GHz and they are as efficient as 4 or even 5 DMips/Mhz per core (yes, they can have many cores, and they even extend the bus width to 64 bits as opposed to 32). Some also have Neon chips (equivalent of a cuda card with 16 to 256 cores), and some also implement single-cycle float or double multiplications. Of course, with all these toys, we are talking about different price range too. tl;dr: Don't underestimate the ARM chips. They are coming from the back very fast and very heavily ARMed. |
[QUOTE=LaurV;425812]
tl;dr: Don't underestimate the ARM chips. They are coming from the back very fast and very heavily ARMed.[/QUOTE] It's time to reconsider a small cluster of those little monsters... |
Where do you find a 20-50 cent Cortex M3 MPU?
[url]http://www.mouser.co.uk/Search/Refine.aspx?Keyword=STM32F103V8[/url] is £2450 for a reel of a thousand. [url]http://www.mouser.co.uk/search/ProductDetail.aspx?R=0virtualkey0virtualkeySTM32F745VGT6[/url] is £3730 for a reel of 540 |
[QUOTE=fivemack;425846]Where do you find a 20-50 cent Cortex M3 MPU?
[URL]http://www.mouser.co.uk/Search/Refine.aspx?Keyword=STM32F103V8[/URL] is £2450 for a reel of a thousand. [URL]http://www.mouser.co.uk/search/ProductDetail.aspx?R=0virtualkey0virtualkeySTM32F745VGT6[/URL] is £3730 for a reel of 540[/QUOTE] You pay triple for the "V" package (100 pins) and for the "8" (respective "G") - the amount of flash. Also, mouser is not the cheapest place to buy ICs. Trust me on this. :smile: |
| All times are UTC. The time now is 05:45. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.