mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Sierpinski/Riesel Base 5

Reply
 
Thread Tools
Old 2007-05-12, 01:32   #298
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

100100001012 Posts
Default

Quote:
Originally Posted by ltd View Post
Did a new test with the extra parameters for mulmod.
Results are closer together now but the tendency stays that mulmod runs
faster if a second process(eulernet) runs in parallel. ( SSE2 Results)
This is possible, the only sure way to get accurate timings on a HT system is to run two copies of the same program, otherwise one will get a bigger share of the cpu. The operating system can't take this into account when it reports CPU time, it always assumes they both get 50%.
geoff is offline   Reply With Quote
Old 2007-05-12, 02:13   #299
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default sr5sieve 1.5.0

This version attempts to find the best mulmod method to use by doing some benchmarks before sieving starts. x86 only at this stage, I will get the x86_64 version going soon.

Run it with the -v switch to see which method it is using. The methods are named x86/N or sse2/N where N is the number of mulmods that are interleaved.

Run with the -vv option to get details of the relative speed of each method as found by the benchmarks.

If you want to experiment you can use the -B and -G switches to select a particular method. For example `-B sse2/4' uses 4 interleaved sse2 mulmods for the baby steps routine.

The code corresponds to previous versions roughly as follows: sse2/2 as in 1.4.34; sse2/4 as in 1.4.37; sse2/8 as in 1.4.40; x86/8 as in 1.4.42.

There is some overhead created by the need to select different methods at runtime, I am not sure how this will affect different machines. The main cost is that some functions are now called through a variable pointer instead of coded inline. Any benchmarks comparing version 1.5.0 to whichever 1.4.x version was fastest for your machine would be useful, especially if 1.5.0 is slower.
geoff is offline   Reply With Quote
Old 2007-05-12, 02:43   #300
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default

Quote:
Originally Posted by geoff View Post
This is possible, the only sure way to get accurate timings on a HT system is to run two copies of the same program, otherwise one will get a bigger share of the cpu. The operating system can't take this into account when it reports CPU time, it always assumes they both get 50%.
Sorry, I was confused. Getting a time faster than when idle is not normal, even on a HT machine.
geoff is offline   Reply With Quote
Old 2007-05-12, 06:55   #301
Cruelty
 
Cruelty's Avatar
 
May 2005

23·7·29 Posts
Default

C2D E6600 @ 3GHz - it seems that sr2sieve-amd is faster in case of giant-step method.
BTW: both executables do not detect L2 cache size, assuming default value of 256kB.
Code:
sr2sieve-intel -l 32 -L 2048 -vv
sr2sieve 1.5.0 -- A sieve for multiple sequences k*b^n+/-1.
Using SSE2 code path, L1 data cache 32Kb (supplied), L2 cache 2048Kb (supplied).

Read 493896 terms for 9 sequences from ABCD format file `sr2data.txt'.
Split 9 base 2 sequences into 649 base 2^180 subsequences.
Loaded Legendre symbol lookup tables for 9 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.23.
Best time for baby step method sse2/2: 37494.
Best time for baby step method sse2/4: 25164.
Best time for baby step method sse2/8: 22032.
Best time for baby step method sse2/16: 21798.
Best time for baby step method x86/1: 43992.
Best time for baby step method x86/2: 33750.
Best time for baby step method x86/4: 28314.
Best time for baby step method x86/8: 26793.
Best time for giant step method sse2/2: 27333.
Best time for giant step method sse2/4: 20862.
Best time for giant step method sse2/8: 19116.
Best time for giant step method sse2/16: 19107.
Best time for giant step method x86/1: 27171.
Best time for giant step method x86/2: 25272.
Best time for giant step method x86/4: 24228.
Best time for giant step method x86/8: 24300.
Best time for ladder method sse2/2: 5076.
Best time for ladder method sse2/4: 2880.
Best time for ladder method sse2/8: 2367.
Best time for ladder method sse2/16: 2655.
Best time for ladder method x86/1: 8028.
Best time for ladder method x86/2: 4194.
Best time for ladder method x86/4: 3249.
Best time for ladder method x86/8: 3276.
Best time for ladder method add/1: 10539.
Using baby step method sse2/16, giant step method sse2/16, ladder method sse2/8.

Resuming from checkpoint pmin=5141372131943 in `checkpoint.txt'.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 1705.31 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 5141372131943 <= p <= 5200000000000
p=5141376065113, 1603412 p/sec, 1420 factors, 88.28% done, 193 sec/factor
Code:
sr2sieve-amd -l 32 -L 2048 -vv
sr2sieve 1.5.0 -- A sieve for multiple sequences k*b^n+/-1.
Using SSE2 code path, L1 data cache 32Kb (supplied), L2 cache 2048Kb (supplied).

Read 493896 terms for 9 sequences from ABCD format file `sr2data.txt'.
Split 9 base 2 sequences into 649 base 2^180 subsequences.
Loaded Legendre symbol lookup tables for 9 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.23.
Best time for baby step method sse2/2: 38205.
Best time for baby step method sse2/4: 25704.
Best time for baby step method sse2/8: 22968.
Best time for baby step method sse2/16: 22527.
Best time for baby step method x86/1: 45027.
Best time for baby step method x86/2: 33993.
Best time for baby step method x86/4: 29457.
Best time for baby step method x86/8: 27027.
Best time for giant step method sse2/2: 27756.
Best time for giant step method sse2/4: 20709.
Best time for giant step method sse2/8: 18189.
Best time for giant step method sse2/16: 19080.
Best time for giant step method x86/1: 26883.
Best time for giant step method x86/2: 25164.
Best time for giant step method x86/4: 24129.
Best time for giant step method x86/8: 24300.
Best time for ladder method sse2/2: 5085.
Best time for ladder method sse2/4: 2889.
Best time for ladder method sse2/8: 2367.
Best time for ladder method sse2/16: 2655.
Best time for ladder method x86/1: 7884.
Best time for ladder method x86/2: 4167.
Best time for ladder method x86/4: 3267.
Best time for ladder method x86/8: 3303.
Best time for ladder method add/1: 10494.
Using baby step method sse2/16, giant step method sse2/8, ladder method sse2/8.
Resuming from checkpoint pmin=5141013121169 in `checkpoint.txt'.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
sr2sieve started: 1000000 <= n <= 1999997, 5141013121169 <= p <= 5200000000000
p=5141297681683, 1587250 p/sec, 1419 factors, 88.26% done, 195 sec/factor
Cruelty is offline   Reply With Quote
Old 2007-05-12, 15:15   #302
Cruelty
 
Cruelty's Avatar
 
May 2005

23×7×29 Posts
Default

CeleronM @ 1.5GHz - there are also some minor differences between intel and amd binaries, where amd executable would be faster
Code:
sr2sieve-intel -vv
sr2sieve 1.5.0 -- A sieve for multiple sequences k*b^n+/-1.
Using SSE2 code path, L1 data cache 32Kb (detected), L2 cache 1024Kb (detected).

Read 493896 terms for 9 sequences from ABCD format file `sr2data.txt'.
Split 9 base 2 sequences into 649 base 2^180 subsequences.
Loaded Legendre symbol lookup tables for 9 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.23.
Best time for baby step method sse2/2: 46722.
Best time for baby step method sse2/4: 45557.
Best time for baby step method sse2/8: 45008.
Best time for baby step method sse2/16: 44633.
Best time for baby step method x86/1: 49756.
Best time for baby step method x86/2: 39744.
Best time for baby step method x86/4: 37841.
Best time for baby step method x86/8: 36608.
Best time for giant step method sse2/2: 39338.
Best time for giant step method sse2/4: 38509.
Best time for giant step method sse2/8: 37383.
Best time for giant step method sse2/16: 38928.
Best time for giant step method x86/1: 32374.
Best time for giant step method x86/2: 35539.
Best time for giant step method x86/4: 29850.
Best time for giant step method x86/8: 29863.
Best time for ladder method sse2/2: 6440.
Best time for ladder method sse2/4: 6115.
Best time for ladder method sse2/8: 6151.
Best time for ladder method sse2/16: 6416.
Best time for ladder method x86/1: 8563.
Best time for ladder method x86/2: 4855.
Best time for ladder method x86/4: 4336.
Best time for ladder method x86/8: 4473.
Best time for ladder method add/1: 11120.
Using baby step method x86/8, giant step method x86/4, ladder method x86/4.
Resuming from checkpoint pmin=5141810132443 in `checkpoint.txt'.
Using 512Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 1705.31 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 5141810132443 <= p <= 5200000000000
p=5141843557339, 558384 p/sec, 1421 factors, 88.37% done, 556 sec/factor
Code:
sr2sieve-amd -vv
sr2sieve 1.5.0 -- A sieve for multiple sequences k*b^n+/-1.
Using SSE2 code path, L1 data cache 32Kb (detected), L2 cache 1024Kb (detected).

Read 493896 terms for 9 sequences from ABCD format file `sr2data.txt'.
Split 9 base 2 sequences into 649 base 2^180 subsequences.
Loaded Legendre symbol lookup tables for 9 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.23.
Best time for baby step method sse2/2: 47496.
Best time for baby step method sse2/4: 45853.
Best time for baby step method sse2/8: 46046.
Best time for baby step method sse2/16: 46919.
Best time for baby step method x86/1: 52386.
Best time for baby step method x86/2: 40618.
Best time for baby step method x86/4: 38537.
Best time for baby step method x86/8: 37051.
Best time for giant step method sse2/2: 39347.
Best time for giant step method sse2/4: 38099.
Best time for giant step method sse2/8: 36258.
Best time for giant step method sse2/16: 39071.
Best time for giant step method x86/1: 32979.
Best time for giant step method x86/2: 30831.
Best time for giant step method x86/4: 29744.
Best time for giant step method x86/8: 30012.
Best time for ladder method sse2/2: 6481.
Best time for ladder method sse2/4: 6102.
Best time for ladder method sse2/8: 6108.
Best time for ladder method sse2/16: 6469.
Best time for ladder method x86/1: 8564.
Best time for ladder method x86/2: 4859.
Best time for ladder method x86/4: 4323.
Best time for ladder method x86/8: 4471.
Best time for ladder method add/1: 11139.
Using baby step method x86/8, giant step method x86/4, ladder method x86/4.
Resuming from checkpoint pmin=5141861039863 in `checkpoint.txt'.
Using 512Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 1705.31 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 5141861039863 <= p <= 5200000000000
p=5141878341563, 556718 p/sec, 1422 factors, 88.38% done, 558 sec/factor

Last fiddled with by Cruelty on 2007-05-12 at 16:12
Cruelty is offline   Reply With Quote
Old 2007-05-12, 16:15   #303
Cruelty
 
Cruelty's Avatar
 
May 2005

23·7·29 Posts
Default

A63 3400+ @ 2.4GHz - intel binary is slower without any exceptions on A64
Code:
sr2sieve-amd -vv
sr2sieve 1.5.0 -- A sieve for multiple sequences k*b^n+/-1.
Using SSE2 code path, L1 data cache 64Kb (detected), L2 cache 512Kb (detected).
Read 493896 terms for 9 sequences from ABCD format file `sr2data.txt'.
Split 9 base 2 sequences into 649 base 2^180 subsequences.
Using 32 Kb for the baby-steps giant-steps hashtable, maximum density 0.11.
Best time for baby step method sse2/2: 42030.
Best time for baby step method sse2/4: 33084.
Best time for baby step method sse2/8: 32145.
Best time for baby step method sse2/16: 33230.
Best time for baby step method x86/1: 46833.
Best time for baby step method x86/2: 43472.
Best time for baby step method x86/4: 35132.
Best time for baby step method x86/8: 28691.
Best time for giant step method sse2/2: 34313.
Best time for giant step method sse2/4: 31106.
Best time for giant step method sse2/8: 30402.
Best time for giant step method sse2/16: 31219.
Best time for giant step method x86/1: 37890.
Best time for giant step method x86/2: 29052.
Best time for giant step method x86/4: 26605.
Best time for giant step method x86/8: 26959.
Best time for ladder method sse2/2: 5855.
Best time for ladder method sse2/4: 4119.
Best time for ladder method sse2/8: 4082.
Best time for ladder method sse2/16: 4444.
Best time for ladder method x86/1: 8348.
Best time for ladder method x86/2: 5103.
Best time for ladder method x86/4: 3681.
Best time for ladder method x86/8: 3350.
Best time for ladder method add/1: 11125.
Using baby step method x86/8, giant step method x86/4, ladder method x86/8.
Resuming from checkpoint pmin=5140056163097 in `checkpoint.txt'.
Using 256Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 1705.31 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 5140056163097 <= p <= 5200000000000
p=5141150746079, 1041479 p/sec, 1419 factors, 88.23% done, 298 sec/factor
Cruelty is offline   Reply With Quote
Old 2007-05-13, 00:23   #304
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default

Quote:
Originally Posted by Cruelty View Post
C2D E6600 @ 3GHz - it seems that sr2sieve-amd is faster in case of giant-step method.
BTW: both executables do not detect L2 cache size, assuming default value of 256kB.
Thanks, I'll look into the cache detection for this machine. It is an unfortunate feature of the Intel cpuid scheme that when new models come out it is necessary to update the source code to detect the cache size properly.

Last fiddled with by geoff on 2007-05-13 at 00:31 Reason: quote
geoff is offline   Reply With Quote
Old 2007-05-13, 00:30   #305
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default sr5sieve 1.5.1

This version brings the x86-64 build up to date with the changes in 1.5.0. There is new mulmod code which hasn't been tested yet.
geoff is offline   Reply With Quote
Old 2007-05-13, 21:51   #306
Cruelty
 
Cruelty's Avatar
 
May 2005

23·7·29 Posts
Default

What about sr1sieve?

Last fiddled with by Cruelty on 2007-05-13 at 21:53
Cruelty is offline   Reply With Quote
Old 2007-05-14, 01:55   #307
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default sr5sieve 1.5.2

This version fixes a segfault that can occur if the -B switch is used without the -G switch.

It also updates the Intel cache detection code. Cruelty: can you check whether this version works on your C2D?
geoff is offline   Reply With Quote
Old 2007-05-14, 02:02   #308
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default sr1sieve 1.1.0

This version uses the benchmarking code to test which mulmod routines to use before sieving. Use -v or -vv to see the details as with sr5sieve 1.5.0.

Because the number of subsequences is usually much less with sr1sieve, the effect of this code is much more variable, especially with very light weight sequences. It may take some experimenting with the -G switch to get the best results.
geoff is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Very Prime Riesel and Sierpinski k robert44444uk Open Projects 587 2016-11-13 15:26
Sierpinski/ Riesel bases 6 to 18 robert44444uk Conjectures 'R Us 139 2007-12-17 05:17
Sierpinski/Riesel Base 10 rogue Conjectures 'R Us 11 2007-12-17 05:08
Sierpinski / Riesel - Base 23 michaf Conjectures 'R Us 2 2007-12-17 05:04
Sierpinski / Riesel - Base 22 michaf Conjectures 'R Us 49 2007-12-17 05:03

All times are UTC. The time now is 12:12.


Mon Aug 2 12:12:17 UTC 2021 up 10 days, 6:41, 0 users, load averages: 1.34, 1.49, 1.46

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.