mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Sierpinski/Riesel Base 5

Reply
 
Thread Tools
Old 2007-05-14, 06:06   #309
Cruelty
 
Cruelty's Avatar
 
May 2005

23·7·29 Posts
Default

Quote:
Originally Posted by geoff View Post
It also updates the Intel cache detection code. Cruelty: can you check whether this version works on your C2D?
It works fine now
Cruelty is offline   Reply With Quote
Old 2007-05-14, 08:29   #310
Cruelty
 
Cruelty's Avatar
 
May 2005

23·7·29 Posts
Default

P3-M @ 1GHz - amd binary is slower
Generally version 1.5.2 is ~3% faster than 1.4.2
Code:
sr2sieve-intel -vv
sr2sieve 1.5.2 -- A sieve for multiple sequences k*b^n+/-1.
L1 data cache 16Kb (detected), L2 cache 512Kb (detected).
Read 493896 terms for 9 sequences from ABCD format file `sr2data.txt'.
Split 9 base 2 sequences into 649 base 2^180 subsequences.
Loaded Legendre symbol lookup tables for 9 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.23.
Best time for baby step method gen/2: 62836.
Best time for baby step method gen/4: 62120.
Best time for baby step method gen/8: 57971.
Best time for baby step method gen/1: 73323.
Best time for giant step method gen/2: 39837.
Best time for giant step method gen/4: 39845.
Best time for giant step method gen/8: 37217.
Best time for giant step method gen/1: 44362.
Best time for ladder method gen/2: 5197.
Best time for ladder method gen/4: 4614.
Best time for ladder method gen/8: 4866.
Best time for ladder method gen/1: 8042.
Best time for ladder method add/1: 12124.
Using baby step method gen/8, giant step method gen/8, ladder method gen/4.
Resuming from checkpoint pmin=3955028652461 in `checkpoint.txt'.
Using 256Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 4896.56 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 3955028652461 <= p <= 4000000000000
p=3955047920347, 319607 p/sec, 4474 factors, 95.50% done, 667 sec/factor

Last fiddled with by Cruelty on 2007-05-14 at 08:31
Cruelty is offline   Reply With Quote
Old 2007-05-14, 20:10   #311
Cruelty
 
Cruelty's Avatar
 
May 2005

23×7×29 Posts
Default

I have upgraded linux64.sr1sieve on one of my machines from 1.0.23 to 1.1.0 and I get an error when trying to run it. Any ideas what's going on? It's a C2D E4300 CPU @ 2.4GHz.
Code:
sr1sieve 1.1.0 -- A sieve for one sequence k*b^n+/-1.
L1 data cache 32Kb (detected), L2 cache 2048Kb (detected).
Read 89136 terms for 4*3^n-1 from NewPGen file `k=4_b=3.txt'.
Split 1 base 3 sequence into 32 base 3^90 subsequences.
Using 0 Kb for Legendre symbol tables.  
Using 8 Kb for the baby-steps giant-steps hashtable, maximum density 0.20.
Best time for baby step method gen/2: 20322.
Best time for baby step method gen/4: 17064.
Best time for baby step method gen/1: 23553.
Best time for giant step method gen/2: 12087.
Best time for giant step method gen/4: 13131.
Best time for giant step method gen/1: 16704.
Using baby step method gen/4, giant step method gen/2.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 1461.46 terms.
sr1sieve started: 200013 <= n <= 1999957, 6121454326643 <= p <= 10000000000000
./linux.bat: line 1: 23295 Segmentation fault      (core dumped) ./sr1sieve -i k=4_b=3.txt -o ready.txt -f factors.txt --pmax 10e12 -vv --save 15

Last fiddled with by Cruelty on 2007-05-14 at 20:12
Cruelty is offline   Reply With Quote
Old 2007-05-16, 22:38   #312
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

22058 Posts
Default

Quote:
Originally Posted by Cruelty View Post
I have upgraded linux64.sr1sieve on one of my machines from 1.0.23 to 1.1.0 and I get an error when trying to run it. Any ideas what's going on? It's a C2D E4300 CPU @ 2.4GHz.
Code:
./linux.bat: line 1: 23295 Segmentation fault      (core dumped) ./sr1sieve -i k=4_b=3.txt -o ready.txt -f factors.txt --pmax 10e12 -vv --save 15
Sorry about this. I will try to find out what is happening. Can you email me (g_w_reynolds at yahoo.co.nz) a zipped copy of the core file?

Also, could you try running it with the command line switches -Bgen/1 -Ggen/1 and see whether it still segfaults?
geoff is offline   Reply With Quote
Old 2007-05-17, 21:09   #313
Cruelty
 
Cruelty's Avatar
 
May 2005

23·7·29 Posts
Default

Setting manually -Bgen/x to anything other than "1" causes segfault. When using -Bgen/1 I can use any value for the -Ggen/x without any error.

As my linux knowledge is somehow limited, I don't understand what you mean by "core file"
Cruelty is offline   Reply With Quote
Old 2007-05-17, 22:54   #314
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

48516 Posts
Default sr1sieve 1.1.1, sr2sieve 1.5.3

Quote:
Originally Posted by Cruelty View Post
Setting manually -Bgen/x to anything other than "1" causes segfault. When using -Bgen/1 I can use any value for the -Ggen/x without any error.
These versions should fix this segfault in the x86-64 build. The bug didn't affect the x86 or ppc64 builds.

Also in these versions, benchmarks are run twice and the times taken from the second run. This should help ensure that everything is in cache when the times are taken.

Quote:
As my linux knowledge is somehow limited, I don't understand what you mean by "core file"
When you get the message `core dumped' after a fatal error, it means that Linux has created a file called `core' in the program's working directory which can be used to examine the state the program was in when the error occured.

I don't need the core file now, unless you get another segfault.
geoff is offline   Reply With Quote
Old 2007-05-18, 13:19   #315
Cruelty
 
Cruelty's Avatar
 
May 2005

23×7×29 Posts
Default

C2D E4300 @ 2.4GHz using sr1sieve.linux.x86-64 v.1.1.1 speed increase of ~40% (6.1M vs 8.6M)
Cruelty is offline   Reply With Quote
Old 2007-05-18, 16:04   #316
Flatlander
I quite division it
 
Flatlander's Avatar
 
"Chris"
Feb 2005
England

31·67 Posts
Default

sr1sieve

C2D E4300 @ a very hot 3013mhz , Windows.

1% speed increase (over the one from a couple of weeks ago.)
It correctly chose sse2/16 for baby steps, sse2/8 for giant steps.
Is it possible to have sse2/32, 64 etc. to further tweak it? (Baby steps in this case.)

Last fiddled with by Flatlander on 2007-05-18 at 16:07 Reason: more details
Flatlander is offline   Reply With Quote
Old 2007-05-18, 22:13   #317
Cruelty
 
Cruelty's Avatar
 
May 2005

23×7×29 Posts
Default

BTW: under x86-64 linux there are no sse2 methods to choose for C2D CPU - is it only available under 32-bit systems?
Cruelty is offline   Reply With Quote
Old 2007-05-22, 01:48   #318
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

48516 Posts
Default

Quote:
Originally Posted by Cruelty
BTW: under x86-64 linux there are no sse2 methods to choose for C2D CPU - is it only available under 32-bit systems?
The 64-bit code is quite different to the 32-bit code.

Currently the 64-bit code uses SSE2 for the floating point operations and general registers for the integer operations (because the SSE2 instruction set lacks any SIMD equivalent of the imulq instruction). I plan to add routines that use the FPU instead of SSE2, this will be slower but will allow sieving beyond p=2^52 as an option.

The 32-bit code uses the FPU for floating point operations (not ideal, but the 32-bit SSE2 instruction set lacks the vital cvtsi2sdq and cvtsd2siq instructions) and SSE2 for integer operations, where available.

Quote:
Originally Posted by Flatlander
1% speed increase (over the one from a couple of weeks ago.)
It correctly chose sse2/16 for baby steps, sse2/8 for giant steps.
Is it possible to have sse2/32, 64 etc. to further tweak it? (Baby steps in this case.)
I might try this, but I don't think extending the number of mulmods per loop beyond 16 will have much effect. The main reason for doing it is to increase the distance between the FPU writes and the SSE2 reads, but once the ideal distance has been achieved any more will make it slower. There is also a benefit on some machines from reading a larger amout of data each loop cycle, but the SSE2/16 routine reads 128 bytes per cycle and I don't think any machine would benefit from more than that.

On the other hand there is a small benefit simply from unrolling the loops on machines with a large L1 code cache, but there are other ideas I wil try first. In particular it should be possible to interleave the hashtable insert/lookup code with the mulmod code.

The 64-bit code doesn't yet make proper use of the packed data SSE2 instructions, it uses two mulsd instructions instead of one mulpd, so I will also try improving that.
geoff is offline   Reply With Quote
Old 2007-05-22, 02:00   #319
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default

Quote:
Originally Posted by Cruelty View Post
C2D E4300 @ 2.4GHz using sr1sieve.linux.x86-64 v.1.1.1 speed increase of ~40% (6.1M vs 8.6M)
Could you send me the results of running with the -vv option on this machine? I have only implemented gen/2 and gen/4 methods, there may be more improvements possible, but without a machine to test on it involves a lot of guesswork. (Or a much better understanding of the processor architecture than I have :-)
geoff is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Very Prime Riesel and Sierpinski k robert44444uk Open Projects 587 2016-11-13 15:26
Sierpinski/ Riesel bases 6 to 18 robert44444uk Conjectures 'R Us 139 2007-12-17 05:17
Sierpinski/Riesel Base 10 rogue Conjectures 'R Us 11 2007-12-17 05:08
Sierpinski / Riesel - Base 23 michaf Conjectures 'R Us 2 2007-12-17 05:04
Sierpinski / Riesel - Base 22 michaf Conjectures 'R Us 49 2007-12-17 05:03

All times are UTC. The time now is 12:12.


Mon Aug 2 12:12:40 UTC 2021 up 10 days, 6:41, 0 users, load averages: 1.54, 1.54, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.