![]() |
|
|
#133 | |
|
Oct 2006
On a Suzuki Boulevard C90
2·3·41 Posts |
Quote:
|
|
|
|
|
|
|
#134 | ||
|
Mar 2003
New Zealand
48516 Posts |
Quote:
Quote:
Code:
vec4_mulmod64_sse2(uint64_t *X, uint64_t *Y, int count); Last fiddled with by geoff on 2007-06-22 at 03:47 |
||
|
|
|
|
|
#135 | ||
|
"Mark"
Apr 2003
Between here and the
11×577 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#136 |
|
Mar 2003
New Zealand
115710 Posts |
Attached is my attempt at doing two mulmods in parallel, this code can be tested by replacing the VEC2_* definitions in asm-ppc64.h with those in the attached file.
I didn't know how many bits a condition register like cr6 has, so the declarations of c0, c1 may need to be changed. (Or it might not matter). I wasn't sure what the adde and addze instructions do, so I have allowed for the possibility that addze might depend on adde. Do any instructions have side effects on the ppc64, effects other than on their operands? Anyway, if it works then I can probably make a version that does 4 in parallel using the same technique. Last fiddled with by geoff on 2007-06-23 at 02:42 Reason: Windows mangled the text file |
|
|
|
|
|
#137 | |
|
"Mark"
Apr 2003
Between here and the
11000110010112 Posts |
Quote:
It should be possible to calculate b * pMagic up front, then multiply that product by a, but you will have to hold all 128 bits of that product. That will eliminate the stall of 4 cycles after "mulhdu %8, %18, %12". It will also allow the function to reuse two registers. I have an idea that might allow you to avoid most of that assember code. I hope to play around with it later. |
|
|
|
|
|
|
#138 |
|
"Mark"
Apr 2003
Between here and the
11×577 Posts |
Here is some code you can incorporate. I have not tried to compile this or test this. If you put it into 1.5.8, send me the code and the means to test and I will test it for you. Typically I would use ppc_intrinsics.h, but it doesn't have these instructions, which is why they are here.
These inlines remove the need for the mulmod-ppc64.S file: static inline uint64_t __adde (uint64_t a, uint64_t c) __attribute__((always_inline)); static inline uint64_t __adde (uint64_t a, uint64_t c) { uint64_t result; __asm__ ("adde %0, %1, %2" /* outputs: */ : "=r" (result) /* inputs: */ : "r" (a), "r" (c)); return result; } static inline uint64_t __addze (uint64_t a) __attribute__((always_inline)); static inline uint64_t __addze (uint64_t a) { uint64_t result; __asm__ ("addze %0, %1" /* outputs: */ : "=r" (result) /* inputs: */ : "r" (a)); return result; } static inline uint64_t __mulhdu (uint64_t a, uint64_t c) __attribute__((always_inline)); static inline uint64_t __mulhdu (uint64_t a, uint64_t c) { uint64_t result; __asm__ ("mulhdu %0, %1, %2" /* outputs: */ : "=r" (result) /* inputs: */ : "r" (a), "r" (c)); return result; } I re-arranged the code (calculating b*pMagic first). You should be able to unroll from the calculation of c64a to the end. mulmod(a, b, p, pMagic, magicShift) { uint64_t c64, c64a, c64b, c128, rem, quot; // Get the shifts rightShift = magicShift; leftShift = 64 - magicShift; // Calculate the 128-bit product b * pMagic bLO = b * pMagic; bHI = __mulhdu(b, pMagic); // Multiple (bHI,bLO) by a c64a = __mulhdu(bLO, a); c64b = bHI * a; c128 = __mulhdu(bHI, a); // Get the upper 128-bits of aforementioned multiply c64 = __adde(c64a, c64b); c128 = __addze(c128); // get the quotient (a * b) / p quot = (c64 >> rightShift) | (c128 << leftShift); // Calculate the remainder (a * b) - (quot * p); rem = (a * b) - (quot * p); } |
|
|
|
|
|
#139 |
|
"Mark"
Apr 2003
Between here and the
11·577 Posts |
malloc.h does not exist on OS X, so you should #ifdef the include of malloc.h in bsgs.c and util.c.
HAVE_MEMALIGN should be set to 0 for OS X. |
|
|
|
|
|
#140 |
|
Oct 2006
On a Suzuki Boulevard C90
24610 Posts |
geoff, I just found a minor bug in function read_argc_argv() in file files.c. When it's building up the strv, it needs to populate strv[0] with argv[0], and then start reading the contents of the command line file into strv beginning at strv[1]. Here's a little patch that works (or at least works for me :), though I'm not claiming it's the best fix):
Code:
diff -u sr2sieve-1.5.8.orig/files.c sr2sieve-1.5.8.mod/files.c
--- sr2sieve-1.5.8.orig/files.c 2007-03-19 06:09:45.000000000 -0400
+++ sr2sieve-1.5.8.mod/files.c 2007-06-24 02:34:09.000000000 -0400
@@ -352,7 +352,10 @@
len = 16;
strv = xmalloc(len*sizeof(char *));
- if ((strv[0] = strtok(line," \n")) == NULL)
+ strv[0] = xmalloc(strlen(argv[0][0])+1);
+ strcpy(strv[0], argv[0][0]);
+
+ if ((strv[1] = strtok(line," \n")) == NULL)
{
free(strv);
free(line);
@@ -360,7 +363,7 @@
return;
}
- for (i = 1; ((strv[i] = strtok(NULL," \n")) != NULL); i++)
+ for (i = 2; ((strv[i] = strtok(NULL," \n")) != NULL); i++)
if (len <= i+1)
{
len += 16;
|
|
|
|
|
|
#141 |
|
Oct 2006
On a Suzuki Boulevard C90
2×3×41 Posts |
Just a clarification: what is happening with the current code is that the first option from sr2sieve-command-line.txt is put into argv[0], so it is never examined back in main().
|
|
|
|
|
|
#142 | |
|
Mar 2003
New Zealand
13×89 Posts |
Quote:
In version 1.5.9 I have tried to use the idea from this code to modify the existing code. Set EXPERIMENTAL to 1 in asm-ppc64.h to enable it. The definition for CONDITION_REGISTER_T might need to be changed. I have also fixed the malloc.h issues for OS X. How many condition registers does the PPC64 have? A brief test can be done by downloading the archive http://www.geocities.com/g_w_reynold...r5check.tar.gz and running the command line `sr5sieve -i sr5check.txt -p 100e6 -P 150e6', which should result in 10533 factors being found. |
|
|
|
|
|
|
#143 |
|
Mar 2003
New Zealand
13·89 Posts |
Actually that is not a bug :-). sr5sieve-command-line.txt should contain the full command line, including the command itself, not just the switches: `sr5sieve -Z -v ...' not just `-Z -v ...'
Last fiddled with by geoff on 2007-06-26 at 02:27 Reason: fixed quote |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| srsieve/sr2sieve enhancements | rogue | Software | 300 | 2021-03-18 20:31 |
| 32-bit of sr1sieve and sr2sieve for Win | pepi37 | Software | 5 | 2013-08-09 22:31 |
| sr2sieve question | SaneMur | Information & Answers | 2 | 2011-08-21 22:04 |
| sr2sieve client | mgpower0 | Prime Sierpinski Project | 54 | 2008-07-15 16:50 |
| How to use sr2sieve | nuggetprime | Riesel Prime Search | 40 | 2007-12-03 06:01 |