mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Sierpinski/Riesel Base 5 (https://www.mersenneforum.org/forumdisplay.php?f=54)
-   -   Sr2sieve on PPC/Linux (https://www.mersenneforum.org/showthread.php?t=6669)

geoff 2007-02-17 02:26

[QUOTE=Greenbank;98702]And the program is not optimised for Riesel at all. I wanted Sierpinski sieving to be as fast as possible. Sorting it out properly for Riesel is on my big list of stuff to do.[/QUOTE]

I think part of the reason that sr2sieve does well with riesel.dat (I hear that it is even faster at riesel.dat than JJsieve on x86) is not so much the large number of k, but more because of the narrower range of n. Perhaps because it puts less effort into trying to reduce the work done in BSGS, as the range of n widens and BSGS becomes more expensive, so sr2sieve becomes slower than proth_sieve.

These are some timings for sr2sieve 1.4.x vs proth_sieve 0.42 done on two of my machines, both running Debian Linux.

I tested at p=100e12 (100T) because the proth_sieve speed starts to drop when p becomes too much larger than this, and I think this may be a problem with the code rather than a true indication of performance. (The speed per p should increase as p increases, as there are fewer primes to test).

Times are kp/s (1000's increase in p per CPU second) to 3 s.f. where known. The hyperthreaded times were taken by running two instances of the program and adding the kp/s times for both.


Pentium 3 @ 600MHz (Coppermine EB, 16Kb L1, 256Kb L2), p=100e12
[code]
8k SoB.dat 19k SoB.dat 69k riesel.dat
---------- ----------- --------------
proth_sieve_cmov 0.42 151 86 31
sr2sieve-i686 1.4.18 122 75.9 45.6
sr2sieve-i686 1.4.21 138 81.5 47.0
sr2sieve-i686 1.4.23 145 85.4 48.9
[/code]

Pentium 4 @ 2.9GHz (Northwood C, 8Kb L1, 512Kb L2), p=100e12
[code]
Single thread 8k SoB.dat 19k SoB.dat 69k riesel.dat
------------- ---------- ----------- --------------
proth_sieve_sse2 0.42 342 201 82
sr2sieve-pentium4 1.4.18 279 177 107
sr2sieve-pentium4 1.4.21 318 189 113
sr2sieve-pentium4 1.4.23 328 197 116

Two hyperthreads 8k SoB.dat 19k SoB.dat 69k riesel.dat
---------------- ---------- ----------- --------------
proth_sieve_sse2 0.42 554 330 130
sr2sieve-pentium4 1.4.18 413 262 157
sr2sieve-pentium4 1.4.21 469 279 162
sr2sieve-pentium4 1.4.23 488 288 167
[/code]

Greenbank 2007-02-20 12:00

That looks great Geoff.

I hope my message didn't come over as "my sieve is faster than yours", it certainly wasn't meant that way.

If we work together and share code/results we can make each others code even faster!

geoff 2007-02-21 03:58

[QUOTE=Greenbank;99000]I hope my message didn't come over as "my sieve is faster than yours", it certainly wasn't meant that way.
[/QUOTE]
Not at all :-) I just found it interesting that the SoB.dat times could be so much faster than the riesel.dat times, when for sr2sieve it is the other way around.

I don't know how much of that is due to the effort to make proth sieve run fast for SoB.dat without regard to riesel.dat speed, and how much is because of differences between the proth sieve and sr2sieve algorithms.

I suspect that sr2sieve does a lot less work in trying to eliminate candidates before running BSGS, and that may be a better approach when the range of n is small. The 20 million range of riesel.dat vs the 50 million range of SoB.dat could be the important factor, rather than the number of k in the sieve.

geoff 2007-03-09 00:07

Does anyone know how to detect the size of the L1 and L2 data cache on ppc64? Is 32Kb L1, 512Kb L2 a reasonable default if it can't be detected?

BlisteringSheep 2007-03-09 03:27

Geoff,
That's what it has been for every ppc64 that I've encountered. I don't know an easy way to do it for Linux; there are external tools but they can't be depended upon. For example, on my home PowerMac, /proc/cpuinfo shows the 512K unified L2 cache, but doesn't mention the L1. lshw says that the same machine has 128 terabytes of L1 and 2 petabytes of L2. However, lshw on the IBM blades shows the correct L1 & L2.

rogue 2007-03-09 13:21

[QUOTE=geoff;100292]Does anyone know how to detect the size of the L1 and L2 data cache on ppc64? Is 32Kb L1, 512Kb L2 a reasonable default if it can't be detected?[/QUOTE]

For 64-bit PowerPC CPUs, 512Kb is the minimum L2 cache size. Some have 1Mb. I don't know if there is an easy way to determine the L2 cache size.

Greenbank 2007-03-09 15:19

MacOS X command line:-

sysctl hw.l1icachesize
sysctl hw.l1dcachesize
sysctl hw.l2cachesize

So I'm guessing there'll be somewhere in the sysctl() function call...indeed, in /usr/include/sys/sysctl.h

#define HW_L1ICACHESIZE 17 /* int: L1 I Cache Size in Bytes */
#define HW_L1DCACHESIZE 18 /* int: L1 D Cache Size in Bytes */
#define HW_L2SETTINGS 19 /* int: L2 Cache Settings */
#define HW_L2CACHESIZE 20 /* int: L2 Cache Size in Bytes */
#define HW_L3SETTINGS 21 /* int: L3 Cache Settings */
#define HW_L3CACHESIZE 22 /* int: L3 Cache Size in Bytes */

Don't have any time right now to knock up an example program but the stuff on the sysctl() man page (on MacOS X) should help.

[EDIT] For my Quad G5 (2.5GHz PPC) I've got 64KB L1 instruction cache, 32KB L2 data cache and 1MB L2 Cache (per cpu).

Greenbank 2007-03-09 15:37

Must be compiled with -m64

Only tested on MacOS X on 64-bit PPC, not Linux (not sure if the sysctl interface is the same).
[code]
#include <stdio.h>
#include <stdint.h>
#include <sys/sysctl.h>

int main(void)
{
int64_t i;
int ret;
size_t len=8;
ret=sysctlbyname( "hw.l1icachesize", &i, &len, NULL, 0 );
if( ret == -1 ) {
perror( "sysctl:" );
} else {
printf( "l1icachesize=%d\n", i );
}
ret=sysctlbyname( "hw.l1dcachesize", &i, &len, NULL, 0 );
if( ret == -1 ) {
perror( "sysctl:" );
} else {
printf( "l1dcachesize=%d\n", i );
}
ret=sysctlbyname( "hw.l2cachesize", &i, &len, NULL, 0 );
if( ret == -1 ) {
perror( "sysctl:" );
} else {
printf( "l2cachesize=%d\n", i );
}
return(0);
}
[/code]

l1icachesize=65536
l1dcachesize=32768
l2cachesize=1048576

which matches the real output.

geoff 2007-03-09 22:05

[QUOTE=Greenbank;100347]Only tested on MacOS X on 64-bit PPC, not Linux (not sure if the sysctl interface is the same).
[/QUOTE]
Thanks, I'll use this in the next version, with a 32Kb/512Kb default if sysctl fails or the detected value doesn't make sense.

BlisteringSheep 2007-03-10 04:47

[QUOTE=Greenbank;100347]Only tested on MacOS X on 64-bit PPC, not Linux (not sure if the sysctl interface is the same).[/QUOTE]

Unfortunately, the interfaces are very different, and the Linux version provides completely different information.

[QUOTE=geoff;100395]Thanks, I'll use this in the next version, with a 32Kb/512Kb default if sysctl fails or the detected value doesn't make sense.[/QUOTE]

As an aside, I did figure out what's wrong with lshw; it reads the sizes into an unsigned long, but they're only int long.

Here's an ugly, probably non-portable hack:
[code]
#include <stdio.h>

int main(void)
{
FILE *fp;
unsigned data = 0;


fp = fopen("/proc/device-tree/cpus/PowerPC,970@0/d-cache-size","r");
fread(&data, sizeof(data), 1, fp);
fclose(fp);
printf("d-cache-size: %d\n", data);

fp = fopen("/proc/device-tree/cpus/PowerPC,970@0/i-cache-size","r");
fread(&data, sizeof(data), 1, fp);
fclose(fp);
printf("i-cache-size: %d\n", data);

fp = fopen("/proc/device-tree/cpus/PowerPC,970@0/l2-cache/d-cache-size","r");
fread(&data, sizeof(data), 1, fp);
fclose(fp);
printf("l2-cache/d-cache-size: %d\n", data);

return(0);
}
[/code]

geoff 2007-03-13 03:29

[QUOTE=BlisteringSheep;100420]
fp = fopen("/proc/device-tree/cpus/PowerPC,970@0/d-cache-size","r");
fread(&data, sizeof(data), 1, fp);
[/QUOTE]

Are the sizes in bytes or kilobytes?

Do you know which compiler symbols I should test to decide whether this code should be included? I assume __linux__ and __powerpc64__ and one other for the CPU type.


All times are UTC. The time now is 05:56.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.