mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Sierpinski/Riesel Base 5 (https://www.mersenneforum.org/forumdisplay.php?f=54)
-   -   A multiple k/c sieve for Sierpinski/Riesel problems (https://www.mersenneforum.org/showthread.php?t=5785)

Cruelty 2007-05-14 06:06

[QUOTE=geoff;106009]It also updates the Intel cache detection code. Cruelty: can you check whether this version works on your C2D?[/QUOTE]It works fine now :tu:

Cruelty 2007-05-14 08:29

[B]P3-M @ 1GHz[/B] - amd binary is slower
Generally version 1.5.2 is ~3% faster than 1.4.2 :tu: [code]sr2sieve-intel -vv
sr2sieve 1.5.2 -- A sieve for multiple sequences k*b^n+/-1.
L1 data cache 16Kb (detected), L2 cache 512Kb (detected).
Read 493896 terms for 9 sequences from ABCD format file `sr2data.txt'.
Split 9 base 2 sequences into 649 base 2^180 subsequences.
Loaded Legendre symbol lookup tables for 9 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.23.
Best time for baby step method gen/2: 62836.
Best time for baby step method gen/4: 62120.
Best time for baby step method gen/8: 57971.
Best time for baby step method gen/1: 73323.
Best time for giant step method gen/2: 39837.
Best time for giant step method gen/4: 39845.
Best time for giant step method gen/8: 37217.
Best time for giant step method gen/1: 44362.
Best time for ladder method gen/2: 5197.
Best time for ladder method gen/4: 4614.
Best time for ladder method gen/8: 4866.
Best time for ladder method gen/1: 8042.
Best time for ladder method add/1: 12124.
Using baby step method gen/8, giant step method gen/8, ladder method gen/4.
Resuming from checkpoint pmin=3955028652461 in `checkpoint.txt'.
Using 256Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 4896.56 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 3955028652461 <= p <= 4000000000000
p=3955047920347, 319607 p/sec, 4474 factors, 95.50% done, 667 sec/factor[/code]

Cruelty 2007-05-14 20:10

I have upgraded linux64.sr1sieve on one of my machines from 1.0.23 to 1.1.0 and I get an error when trying to run it. Any ideas what's going on? It's a C2D E4300 CPU @ 2.4GHz.[code]sr1sieve 1.1.0 -- A sieve for one sequence k*b^n+/-1.
L1 data cache 32Kb (detected), L2 cache 2048Kb (detected).
Read 89136 terms for 4*3^n-1 from NewPGen file `k=4_b=3.txt'.
Split 1 base 3 sequence into 32 base 3^90 subsequences.
Using 0 Kb for Legendre symbol tables.
Using 8 Kb for the baby-steps giant-steps hashtable, maximum density 0.20.
Best time for baby step method gen/2: 20322.
Best time for baby step method gen/4: 17064.
Best time for baby step method gen/1: 23553.
Best time for giant step method gen/2: 12087.
Best time for giant step method gen/4: 13131.
Best time for giant step method gen/1: 16704.
Using baby step method gen/4, giant step method gen/2.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 1461.46 terms.
sr1sieve started: 200013 <= n <= 1999957, 6121454326643 <= p <= 10000000000000
./linux.bat: line 1: 23295 Segmentation fault (core dumped) ./sr1sieve -i k=4_b=3.txt -o ready.txt -f factors.txt --pmax 10e12 -vv --save 15[/code]

geoff 2007-05-16 22:38

[QUOTE=Cruelty;106062]I have upgraded linux64.sr1sieve on one of my machines from 1.0.23 to 1.1.0 and I get an error when trying to run it. Any ideas what's going on? It's a C2D E4300 CPU @ 2.4GHz.[code]./linux.bat: line 1: 23295 Segmentation fault (core dumped) ./sr1sieve -i k=4_b=3.txt -o ready.txt -f factors.txt --pmax 10e12 -vv --save 15[/code][/QUOTE]

Sorry about this. I will try to find out what is happening. Can you email me (g_w_reynolds at yahoo.co.nz) a zipped copy of the core file?

Also, could you try running it with the command line switches -Bgen/1 -Ggen/1 and see whether it still segfaults?

Cruelty 2007-05-17 21:09

Setting manually -Bgen/x to anything other than "1" causes segfault. When using -Bgen/1 I can use any value for the -Ggen/x without any error.

As my linux knowledge is somehow limited, I don't understand what you mean by "core file" :unsure:

geoff 2007-05-17 22:54

sr1sieve 1.1.1, sr2sieve 1.5.3
 
[QUOTE=Cruelty;106370]Setting manually -Bgen/x to anything other than "1" causes segfault. When using -Bgen/1 I can use any value for the -Ggen/x without any error.[/QUOTE]

These versions should fix this segfault in the x86-64 build. The bug didn't affect the x86 or ppc64 builds.

Also in these versions, benchmarks are run twice and the times taken from the second run. This should help ensure that everything is in cache when the times are taken.

[QUOTE]As my linux knowledge is somehow limited, I don't understand what you mean by "core file" :unsure:[/QUOTE]

When you get the message `core dumped' after a fatal error, it means that Linux has created a file called `core' in the program's working directory which can be used to examine the state the program was in when the error occured.

I don't need the core file now, unless you get another segfault.

Cruelty 2007-05-18 13:19

C2D E4300 @ 2.4GHz using sr1sieve.linux.x86-64 v.1.1.1 speed increase of ~40% (6.1M vs 8.6M) :shock:

Flatlander 2007-05-18 16:04

sr1sieve

C2D E4300 @ a very hot 3013mhz , Windows.

1% speed increase (over the one from a couple of weeks ago.)
It correctly chose sse2/16 for baby steps, sse2/8 for giant steps.
Is it possible to have sse2/32, 64 etc. to further tweak it? (Baby steps in this case.)

Cruelty 2007-05-18 22:13

BTW: under x86-64 linux there are no sse2 methods to choose for C2D CPU - is it only available under 32-bit systems?

geoff 2007-05-22 01:48

[QUOTE=Cruelty]BTW: under x86-64 linux there are no sse2 methods to choose for C2D CPU - is it only available under 32-bit systems?[/QUOTE]

The 64-bit code is quite different to the 32-bit code.

Currently the 64-bit code uses SSE2 for the floating point operations and general registers for the integer operations (because the SSE2 instruction set lacks any SIMD equivalent of the imulq instruction). I plan to add routines that use the FPU instead of SSE2, this will be slower but will allow sieving beyond p=2^52 as an option.

The 32-bit code uses the FPU for floating point operations (not ideal, but the 32-bit SSE2 instruction set lacks the vital cvtsi2sdq and cvtsd2siq instructions) and SSE2 for integer operations, where available.

[QUOTE=Flatlander]1% speed increase (over the one from a couple of weeks ago.)
It correctly chose sse2/16 for baby steps, sse2/8 for giant steps.
Is it possible to have sse2/32, 64 etc. to further tweak it? (Baby steps in this case.)[/QUOTE]
I might try this, but I don't think extending the number of mulmods per loop beyond 16 will have much effect. The main reason for doing it is to increase the distance between the FPU writes and the SSE2 reads, but once the ideal distance has been achieved any more will make it slower. There is also a benefit on some machines from reading a larger amout of data each loop cycle, but the SSE2/16 routine reads 128 bytes per cycle and I don't think any machine would benefit from more than that.

On the other hand there is a small benefit simply from unrolling the loops on machines with a large L1 code cache, but there are other ideas I wil try first. In particular it should be possible to interleave the hashtable insert/lookup code with the mulmod code.

The 64-bit code doesn't yet make proper use of the packed data SSE2 instructions, it uses two mulsd instructions instead of one mulpd, so I will also try improving that.

geoff 2007-05-22 02:00

[QUOTE=Cruelty;106438]C2D E4300 @ 2.4GHz using sr1sieve.linux.x86-64 v.1.1.1 speed increase of ~40% (6.1M vs 8.6M) :shock:[/QUOTE]

Could you send me the results of running with the -vv option on this machine? I have only implemented gen/2 and gen/4 methods, there may be more improvements possible, but without a machine to test on it involves a lot of guesswork. (Or a much better understanding of the processor architecture than I have :-)


All times are UTC. The time now is 22:37.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.