mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Mlucas (https://www.mersenneforum.org/forumdisplay.php?f=118)
-   -   Mlucas version 17.1 (https://www.mersenneforum.org/showthread.php?t=2977)

ewmayer 2009-11-12 17:41

[QUOTE=smoky;195554]Congratulations on this milestone!

May I ask about the roadmap for the RISC versions of Mlucas? It is fully understandable why they wouldn't be a priority, but one can still hope, right? A feature like PrimeNet integration would be an awesome advance!

-smoky[/QUOTE]
The code should build fine without modification on most RISC platforms - no SSE2 support for those, obviously - users may simply have to find the best set of compiler options for their individual platforms.

Regarding Primenet support, my plan is to first get it working for x86-style platforms, then if the resulting code can be ported to support a wider variety of platforms without terrible difficulty, to proceed with that. I will likely ask for the open-source community's help with the latter, to encompass as broad a variety of platforms as possible, without requiring me to work on that aspect full-time.

[QUOTE=lfm;195603]While trying Mlucas 3.0x (binary download for Linux 64)

./Mlucas_AMD64 -s a

...

seems like a problem with the radix 28?[/QUOTE]
More likely it's a sharad-library issue. Could you try building the source locally (just copy and past the one-line compile sequence on the README page) and retry the self-test? I may have to post a static binary instead.

Thanks,
-Ernst

lfm 2009-11-13 10:24

[QUOTE=ewmayer;195631]
More likely it's a sharad-library issue. Could you try building the source locally (just copy and past the one-line compile sequence on the README page) and retry the self-test? I may have to post a static binary instead.
[/QUOTE]

Seems like that was it. After a local build it runs OK (so far).

ewmayer 2009-11-13 17:04

[QUOTE=lfm;195690]Seems like that was it. After a local build it runs OK (so far).[/QUOTE]
I just replaced the Mlucas_AMD64.gz zipped binary with a new statically-linked one ... if you get the chance, please try it out and let me know if that solves the self-test issues you saw with the shared-lib build.

Thanks,
-Ernst

lfm 2009-11-15 12:27

[QUOTE=ewmayer;195734]I just replaced the Mlucas_AMD64.gz zipped binary with a new statically-linked one ... if you get the chance, please try it out and let me know if that solves the self-test issues you saw with the shared-lib build.
[/QUOTE]

Very strange. Today when I tried a few more tests of the old(er) dynamically linked version it won't fail for me any more. Not sure exactly but I think Ubuntu sent out a libc/libm patch and now it doesn't fail (just a theory). For the sake of smaller downloads, so far as I am concerned, you can go back to dynamically linked.

pegaso56 2009-11-15 18:29

Hi, below are the results for AMD 6000
AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
CPU speed: 1800.45 MHz, 2 cores
CPU features: RDTSC, CMOV, Prefetch, 3DNow!, MMX, SSE, SSE2
L1 cache size: 64 KB
L2 cache size: 1 MB
L1 cache line size: 64 bytes
L2 cache line size: 64 bytes
L1 TLBS: 32
L2 TLBS: 512

[COLOR="DarkRed"]Running openSUSE 11.2 Linux athlon 2.6.32-rc5-git3-1-desktop #1 SMP PREEMPT 2009-11-03 15:41:35 +0100 x86_64 x86_64 x86_64 GNU/Linux[/COLOR]


3.0x
1024 sec/iter = 0.057 ROE[min,max] = [0.250000000, 0.312500000] radices = 32 16 32 32 0 0 0 0 0 0
1152 sec/iter = 0.067 ROE[min,max] = [0.250000000, 0.250000000] radices = 36 32 32 16 0 0 0 0 0 0
1280 sec/iter = 0.072 ROE[min,max] = [0.250000000, 0.343750000] radices = 40 16 32 32 0 0 0 0 0 0
1408 sec/iter = 0.081 ROE[min,max] = [0.312500000, 0.312500000] radices = 44 16 32 32 0 0 0 0 0 0
1536 sec/iter = 0.091 ROE[min,max] = [0.265625000, 0.269042969] radices = 24 8 16 16 16 0 0 0 0 0
1792 sec/iter = 0.111 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 8 16 16 16 0 0 0 0 0
2048 sec/iter = 0.128 ROE[min,max] = [0.281250000, 0.343750000] radices = 32 32 32 32 0 0 0 0 0 0
2304 sec/iter = 0.142 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 8 16 16 16 0 0 0 0 0
2560 sec/iter = 0.160 ROE[min,max] = [0.281250000, 0.312500000] radices = 40 8 16 16 16 0 0 0 0 0
2816 sec/iter = 0.181 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 32 32 32 0 0 0 0 0 0
3072 sec/iter = 0.208 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0
3584 sec/iter = 0.248 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0
1024 sec/iter = 0.057 ROE[min,max] = [0.250000000, 0.312500000] radices = 32 16 32 32 0 0 0 0 0 0
1152 sec/iter = 0.068 ROE[min,max] = [0.250000000, 0.250000000] radices = 36 32 32 16 0 0 0 0 0 0
1280 sec/iter = 0.072 ROE[min,max] = [0.250000000, 0.343750000] radices = 40 16 32 32 0 0 0 0 0 0
1408 sec/iter = 0.082 ROE[min,max] = [0.312500000, 0.312500000] radices = 44 16 32 32 0 0 0 0 0 0
1536 sec/iter = 0.092 ROE[min,max] = [0.265625000, 0.269042969] radices = 24 8 16 16 16 0 0 0 0 0
1792 sec/iter = 0.110 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 8 16 16 16 0 0 0 0 0
2048 sec/iter = 0.128 ROE[min,max] = [0.281250000, 0.343750000] radices = 32 32 32 32 0 0 0 0 0 0
2304 sec/iter = 0.142 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 8 16 16 16 0 0 0 0 0
2560 sec/iter = 0.160 ROE[min,max] = [0.281250000, 0.312500000] radices = 40 8 16 16 16 0 0 0 0 0
2816 sec/iter = 0.182 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 8 16 16 16 0 0 0 0 0
3072 sec/iter = 0.209 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0
3584 sec/iter = 0.249 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0
128 sec/iter = 0.006 ROE[min,max] = [0.312500000, 0.312500000] radices = 16 16 16 16 0 0 0 0 0 0
144 sec/iter = 0.007 ROE[min,max] = [0.273437500, 0.273437500] radices = 36 8 16 16 0 0 0 0 0 0
160 sec/iter = 0.008 ROE[min,max] = [0.265625000, 0.265625000] radices = 20 16 16 16 0 0 0 0 0 0
192 sec/iter = 0.009 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 0 0 0 0 0 0
224 sec/iter = 0.011 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 16 16 16 0 0 0 0 0 0
256 sec/iter = 0.012 ROE[min,max] = [0.257812500, 0.296875000] radices = 16 16 32 16 0 0 0 0 0 0
288 sec/iter = 0.015 ROE[min,max] = [0.312500000, 0.312500000] radices = 36 16 16 16 0 0 0 0 0 0
320 sec/iter = 0.016 ROE[min,max] = [0.250000000, 0.312500000] radices = 20 16 32 16 0 0 0 0 0 0
384 sec/iter = 0.020 ROE[min,max] = [0.234375000, 0.250000000] radices = 24 16 16 32 0 0 0 0 0 0
448 sec/iter = 0.024 ROE[min,max] = [0.281250000, 0.312500000] radices = 28 16 32 16 0 0 0 0 0 0
512 sec/iter = 0.026 ROE[min,max] = [0.281250000, 0.312500000] radices = 16 16 32 32 0 0 0 0 0 0
576 sec/iter = 0.030 ROE[min,max] = [0.250000000, 0.281250000] radices = 36 16 32 16 0 0 0 0 0 0
640 sec/iter = 0.035 ROE[min,max] = [0.281250000, 0.343750000] radices = 40 16 16 32 0 0 0 0 0 0
704 sec/iter = 0.040 ROE[min,max] = [0.312500000, 0.312500000] radices = 44 16 16 32 0 0 0 0 0 0
768 sec/iter = 0.043 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 32 32 16 0 0 0 0 0 0
896 sec/iter = 0.053 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 32 32 16 0 0 0 0 0 0
1024 sec/iter = 0.057 ROE[min,max] = [0.250000000, 0.312500000] radices = 32 16 32 32 0 0 0 0 0 0
1152 sec/iter = 0.068 ROE[min,max] = [0.250000000, 0.250000000] radices = 36 32 32 16 0 0 0 0 0 0
1280 sec/iter = 0.072 ROE[min,max] = [0.250000000, 0.343750000] radices = 40 16 32 32 0 0 0 0 0 0
1408 sec/iter = 0.082 ROE[min,max] = [0.312500000, 0.312500000] radices = 44 16 32 32 0 0 0 0 0 0
1536 sec/iter = 0.091 ROE[min,max] = [0.265625000, 0.269042969] radices = 24 32 32 32 0 0 0 0 0 0
1792 sec/iter = 0.109 ROE[min,max] = [0.312500000, 0.312500000] radices = 28 8 16 16 16 0 0 0 0 0
2048 sec/iter = 0.126 ROE[min,max] = [0.281250000, 0.343750000] radices = 32 32 32 32 0 0 0 0 0 0
2304 sec/iter = 0.140 ROE[min,max] = [0.242187500, 0.281250000] radices = 36 8 16 16 16 0 0 0 0 0
2560 sec/iter = 0.158 ROE[min,max] = [0.281250000, 0.312500000] radices = 40 8 16 16 16 0 0 0 0 0
2816 sec/iter = 0.179 ROE[min,max] = [0.328125000, 0.343750000] radices = 44 8 16 16 16 0 0 0 0 0
3072 sec/iter = 0.207 ROE[min,max] = [0.250000000, 0.250000000] radices = 24 16 16 16 16 0 0 0 0 0
3584 sec/iter = 0.246 ROE[min,max] = [0.281250000, 0.281250000] radices = 28 16 16 16 16 0 0 0 0 0
4096 sec/iter = 0.281 ROE[min,max] = [0.250000000, 0.312500000] radices = 16 16 16 16 32 0 0 0 0 0
4608 sec/iter = 0.314 ROE[min,max] = [0.257812500, 0.257812500] radices = 36 16 16 16 16 0 0 0 0 0

Best regards, Carlos

moebius 2009-11-20 09:07

[quote=ewmayer;195734]I just replaced the Mlucas_AMD64.gz zipped binary with a new statically-linked one ... if you get the chance, please try it out and let me know if that solves the self-test issues you saw with the shared-lib build.

Thanks,
-Ernst[/quote]




[SIZE=5]I wanted to try your software at a windows XP-32 bit system, but the FTP server does not seem to be up.[/SIZE]

smh 2009-11-20 21:05

No need to shout!

ewmayer 2009-11-20 23:55

[QUOTE=moebius;196487]I wanted to try your software at a windows XP-32 bit system, but the FTP server does not seem to be up.[/QUOTE]

It seems ftp service is down – I can view http pages, but not upload/download anything via ftp. I just sent e-mail to John Pierce (owner of the Hogranch) about the problem.

This also made me realize that there is an inconsistency in my README - some files are linked via http, others (including the source tarball you are trying to get) via ftp. I made the needed changes so all files use http, but I can't upload the new file, since that needs ftp! :(

As a workaround (while we wait for ftp to be revived), you can manually change over from ftp to http for any file you need by copying the URL and changing the leading

[url]ftp://hogranch.com/pub/mayer...[/url]

to

[url]http://hogranch.com/mayer...[/url]

For example to get the source tarball via http, use

[url]http://hogranch.com/mayer/src/C/Mlucas_11.06.2009.zip[/url]

To get the .vcproj file needed for Win32/Visual Studio builds, use

[url]http://hogranch.com/mayer/bin/Mlucas.vcproj[/url]

emily 2012-03-01 05:12

compile error (linux64)
 
I get these compilation errors... how do I compile it?

$ gcc -m64 -o Mlucas *.o -lm
fermat_mod_square.o: In function `fermat_mod_square':
fermat_mod_square.c:(.text+0x1c8a): undefined reference to `radix32_ditN_cy_dif1'
fermat_mod_square.c:(.text+0x2072): undefined reference to `radix16_ditN_cy_dif1'
fermat_mod_square.c:(.text+0x4ab5): undefined reference to `radix16_dif_pass1'
fermat_mod_square.c:(.text+0x4b96): undefined reference to `radix32_dif_pass1'
fermat_mod_square.c:(.text+0x4e0a): undefined reference to `radix32_dit_pass1'
fermat_mod_square.c:(.text+0x4ed2): undefined reference to `radix16_dit_pass1'
mers_mod_square.o: In function `mers_mod_square':
mers_mod_square.c:(.text+0x173f): undefined reference to `radix32_dit_pass1'
mers_mod_square.c:(.text+0x1807): undefined reference to `radix16_dit_pass1'
mers_mod_square.c:(.text+0x19a2): undefined reference to `radix32_dif_pass1'
mers_mod_square.c:(.text+0x1a6a): undefined reference to `radix16_dif_pass1'
mers_mod_square.c:(.text+0x1dab): undefined reference to `radix32_ditN_cy_dif1'
mers_mod_square.c:(.text+0x2199): undefined reference to `radix16_ditN_cy_dif1'
secure5.o: In function `make_v5_client_key':
secure5.c:(.text+0xe): undefined reference to `md5_raw_output'
secure5.c:(.text+0x18e): undefined reference to `md5_raw_input'
secure5.c:(.text+0x198): undefined reference to `strupper'
secure5.o: In function `secure_v5_url':
secure5.c:(.text+0x210): undefined reference to `md5'
secure5.c:(.text+0x21a): undefined reference to `strupper'
collect2: ld returned 1 exit status

sanaris 2013-08-03 11:18

1 Attachment(s)
Hello!

I have the error at performing line carry_gcc64.h:687
which cause SIGILL at radix16_ditN_cy_dif1.c:2156 .

[CODE]
Program received signal SIGILL, Illegal instruction.
0x000000000047c953 in radix16_ditN_cy_dif1 (a=a@entry=0x7ffff61de080, n=n@entry=1048576, nwt=1024, nwt_bits=10, wt0=0x1, wt1=<optimized out>, si=0x9e1340, rn0=rn0@entry=0x0, rn1=rn1@entry=0x0,
base=base@entry=0x9c11e0 <base.6704>, baseinv=baseinv@entry=0x9c11f0 <baseinv.6705>, iter=iter@entry=1, fracmax=fracmax@entry=0x7fffffffbc48, p=p@entry=20000047) at radix16_ditN_cy_dif1.c:2156
[/CODE][CODE]
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│0x47c8f4 <radix16_ditN_cy_dif1+12540> add %rax,%rbx │
│0x47c8f7 <radix16_ditN_cy_dif1+12543> add %rax,%rdx │
│0x47c8fa <radix16_ditN_cy_dif1+12546> add %rax,%rcx │
│0x47c8fd <radix16_ditN_cy_dif1+12549> mulpd 0x100(%rax),%xmm2 │
│0x47c905 <radix16_ditN_cy_dif1+12557> mulpd 0x100(%rax),%xmm6 │
│0x47c90d <radix16_ditN_cy_dif1+12565> mulpd 0x110(%rax),%xmm3 │
│0x47c915 <radix16_ditN_cy_dif1+12573> mulpd 0x110(%rax),%xmm7 │
│0x47c91d <radix16_ditN_cy_dif1+12581> mulpd (%rdi),%xmm2 │
│0x47c921 <radix16_ditN_cy_dif1+12585> mulpd (%rbx),%xmm6 │
│0x47c925 <radix16_ditN_cy_dif1+12589> mulpd 0x40(%rdx),%xmm3 │
│0x47c92a <radix16_ditN_cy_dif1+12594> mulpd 0x40(%rcx),%xmm7 │
│0x47c92f <radix16_ditN_cy_dif1+12599> mov 0x545332(%rip),%rcx # 0x9c1c68 <cy_r01.6782> │
│0x47c936 <radix16_ditN_cy_dif1+12606> mov 0x54533b(%rip),%rdx # 0x9c1c78 <cy_r23.6783> │
│0x47c93d <radix16_ditN_cy_dif1+12613> mulpd %xmm3,%xmm1 │
│0x47c941 <radix16_ditN_cy_dif1+12617> mulpd %xmm7,%xmm5 │
│0x47c945 <radix16_ditN_cy_dif1+12621> addpd (%rcx),%xmm1 │
│0x47c949 <radix16_ditN_cy_dif1+12625> addpd (%rdx),%xmm5 │
│0x47c94d <radix16_ditN_cy_dif1+12629> movaps %xmm1,%xmm3 │
│0x47c950 <radix16_ditN_cy_dif1+12632> movaps %xmm5,%xmm7 │
>│0x47c953 <radix16_ditN_cy_dif1+12635> roundpd $0x0,%xmm3,%xmm3 │
│0x47c959 <radix16_ditN_cy_dif1+12641> roundpd $0x0,%xmm7,%xmm7 │
│0x47c95f <radix16_ditN_cy_dif1+12647> mov 0x54549a(%rip),%rbx # 0x9c1e00 <sign_mask.6724> │
│0x47c966 <radix16_ditN_cy_dif1+12654> subpd %xmm3,%xmm1 │
│0x47c96a <radix16_ditN_cy_dif1+12658> subpd %xmm7,%xmm5 │
│0x47c96e <radix16_ditN_cy_dif1+12662> andpd (%rbx),%xmm1 │
│0x47c972 <radix16_ditN_cy_dif1+12666> andpd (%rbx),%xmm5 │
│0x47c976 <radix16_ditN_cy_dif1+12670> maxpd %xmm5,%xmm1 │
│0x47c97a <radix16_ditN_cy_dif1+12674> maxpd -0x20(%rax),%xmm1 │
│0x47c97f <radix16_ditN_cy_dif1+12679> movaps %xmm1,-0x20(%rax) │
│0x47c983 <radix16_ditN_cy_dif1+12683> mov %rsi,%rdi │
│0x47c986 <radix16_ditN_cy_dif1+12686> mov %rsi,%rbx │
│0x47c989 <radix16_ditN_cy_dif1+12689> shr $0x14,%rdi │
│0x47c98d <radix16_ditN_cy_dif1+12693> shr $0x16,%rbx │
│0x47c991 <radix16_ditN_cy_dif1+12697> and $0x30,%rdi │
│0x47c995 <radix16_ditN_cy_dif1+12701> and $0x30,%rbx │
│0x47c999 <radix16_ditN_cy_dif1+12705> add %rax,%rdi │
│0x47c99c <radix16_ditN_cy_dif1+12708> add %rax,%rbx │
│0x47c99f <radix16_ditN_cy_dif1+12711> movaps %xmm3,%xmm1 │
│0x47c9a2 <radix16_ditN_cy_dif1+12714> movaps %xmm7,%xmm5 │
│0x47c9a5 <radix16_ditN_cy_dif1+12717> mulpd 0xc0(%rdi),%xmm3 │
│0x47c9ad <radix16_ditN_cy_dif1+12725> mulpd 0xc0(%rbx),%xmm7 │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
child process 24789 In: radix16_ditN_cy_dif1 Line: 2156 PC: 0x47c953
[/CODE]Output: attachment.
Machine: sse sse2 sse4a

ldesnogu 2013-08-03 16:12

It looks like roundpd is an SSE4.1 instruction which your Opteron 6124 doesn't seem to support (it's not part of SSE4a; see [URL="http://en.wikipedia.org/wiki/SSE4"]Wipedia[/URL]). I guess Ernst will have to explain why he pretends that Mlucas is an SSE2 program :smile:


All times are UTC. The time now is 04:26.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.