mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Mlucas (https://www.mersenneforum.org/forumdisplay.php?f=118)
-   -   Small mlucas issue on non-x86 (https://www.mersenneforum.org/showthread.php?t=17711)

fivemack 2013-01-27 22:09

Small mlucas issue on non-x86
 
I thought I might as well see just how slowly mlucas runs on a last-year's ARM.

The current downloadable tarfile of mlucas (Mlucas_10.09.2011; I appreciate this is old, is there a newer place to look?) doesn't build unless USE_SSE2 is defined, because the section around lines 1441 to 1458 of radix16_ditN_cy_dif1.c (only) uses *bjmodn0 which is incorrect if !USE_SSE2

I did

#if !defined(USE_SSE2)
#define BJSTAR
#else
#define BJSTAR *
#endif

then replaced *bjmodn0 with BJSTAR modn0

but I appreciate that makes the code a bit ugly.

It's really not terribly fast:

[code]
M2614999: using FFT length 128K = 131072 8-byte floats.
this gives an average 19.950859069824219 bits per digit
Using complex FFT radices 8 16 32 16
1000 iterations of M2614999 with FFT length 131072 = 128 K
Res64: 1A184504D2DE2D3C. AvgMaxErr = 0.000000000. MaxErr = 0.000000000. Program: E3.0x
Res mod 2^36 = 20717645116
Res mod 2^35 - 1 = 5934292942
Res mod 2^36 - 1 = 4090378120
Clocks = 00:00:45.939

M42643801: using FFT length 2304K = 2359296 8-byte floats.
this gives an average 18.074799007839626 bits per digit
Using complex FFT radices 9 8 8 8 16 16
10 iterations of M42643801 with FFT length 2359296 = 2304 K
Res64: 9BDB491DF4C00002. AvgMaxErr N/A. MaxErr = 0.000000000. Program: E3.0x
Res mod 2^36 = 59940798466
Res mod 2^35 - 1 = 11033316518
Res mod 2^36 - 1 = 15286304084
Clocks = 00:00:10.410

[/code]

(for comparison, a 3.4GHz Sandy Bridge machine gave 0.045s/i for 42643801 and 0.0021s/i for 2614999; so about 22 times faster)

I'm trying different compiler options; I tried enabling multi-threading but got a message saying that the sensitivity list for radix44 needed updating. Have you got a newer version of that?

ldesnogu 2013-01-28 14:01

What is a last year ARM? :)

Also what compiler flags did you try and what gcc version do you use?

fivemack 2013-01-28 14:32

ODROID-X, Exynos 4412 @ 1.4GHz (apparently, though /proc/cpuinfo says 2000 bogomips). Running Ubuntu 12.04.

So it's a Cortex-A9; you might reasonably argue that that is an October 2007 CPU, but the Exynos 4412 was only announced in April 2012, and I bought the board on 14 September 2012. I think I should get stuff working nicely on this board before contemplating an A15-based replacement.

I compiled with gcc-4.6.2 -march=v7-a -mcpu=cortex-a9.

Looking at the disassembly, it is using vfp instructions.

It is slightly embarrassing given my current workplace, but even with an ARM ARM in front of me I can't work out whether this architecture has instructions that treat a 128-bit register as two doubles ...

ldesnogu 2013-01-28 14:39

[QUOTE=fivemack;326337]ODROID-X, Exynos 4412 @ 1.4GHz (apparently, though /proc/cpuinfo says 2000 bogomips). Running Ubuntu 12.04.[/QUOTE]
I guess this means the board booted at 1 GHz.

[QUOTE]So it's a Cortex-A9; I compiled with gcc-4.6.2 -march=v7-a -mcpu=cortex-a9.

Looking at the disassembly, it is using vfp instructions.[/QUOTE]Can you check if it is passing function parameters in FP registers? I guess it should since IIRC Ubuntu is using the hard FP ABI.

[QUOTE]It is slightly embarrassing given my current workplace, but even with an ARM ARM in front of me I can't work out whether this architecture has instructions that treat a 128-bit register as two doubles ...[/QUOTE]Heh, the ARM ARM is not easy to read :smile: ARMv7 doesn't have SIMD with doubles. ARMv8 does, but you'll have to wait for silicon to arrive...

ewmayer 2013-01-28 21:03

[QUOTE=fivemack;326212]I thought I might as well see just how slowly mlucas runs on a last-year's ARM.

The current downloadable tarfile of mlucas (Mlucas_10.09.2011; I appreciate this is old, is there a newer place to look?) doesn't build unless USE_SSE2 is defined, because the section around lines 1441 to 1458 of radix16_ditN_cy_dif1.c (only) uses *bjmodn0 which is incorrect if !USE_SSE2[/QUOTE]
Hi, Tom:

Send me your e-mail address and I'll be happy to provide you with the recent tarball being used by myself and the new-prime verifiers.

It's high time for me to update the code at my ftp page, I suppose - would really like to get AVX support finished before spending time on release packaging, though.


All times are UTC. The time now is 13:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.