![]() |
|
|
#56 | |
|
"Mark"
Apr 2003
Between here and the
11100101011012 Posts |
Quote:
Note that the code does not have any sr1sieve or sr2sieve enhancements. I have a ways to go to pull in the sr2sieve bsgs logic. |
|
|
|
|
|
|
#57 |
|
Apr 2013
Durham, UK
10001002 Posts |
I think EXP_WORK relates to the additional mulmods involved in pre-calculating b^d (mod p) for all the appropriate remainder values 'd' where 0 <= d < Q.
E.g. We are rewriting the terms of a sequence k * b^n -1 as k * b^d * b^(Qm) -1 where n=Qm+d. The idea behind Q is to identify which values 'd' are unnecessary, thus ending up with less than Q subsequences over a range 1/Q times the size of the original range, hopefully giving a performance improvement. There could be multiple terms with the same remainder 'd' so we can precalculate b^d (mod p) for those instances. My understanding from looking at the source code of sr1sieve (bsgs.c) is that it is this mulmod that we are accounting for in EXP_WORK. |
|
|
|
|
|
#58 | |
|
Jun 2003
22·11·37 Posts |
Quote:
For multiple k (sr2sieve) the program should intelligently decide which b^d should be computed. In Srsieve2 the following function would need to be corrected to make it faster for single k. Code:
void GenericWorker::BuildTables(uint64_t *baseToUse, uint64_t *primeList, double *invp, uint64_t *bm64){
...}
|
|
|
|
|
|
|
#59 |
|
Jun 2003
110010111002 Posts |
I tried to replace mulmod by powmod to avoid unnecessary multiplication. I get an error during run time. Any idea why the calculations are not being done correctly?
Code:
void GenericWorker::BuildTables(uint64_t *baseToUse, uint64_t *primeList, double *invp, uint64_t *bm64)
{
uint64_t inv_b[4];
uint32_t qIdx, seqIdx, ssIdx;
uint64_t umod[4], inv[4];
fpu_push_1divp(primeList[3]);
fpu_push_1divp(primeList[2]);
fpu_push_1divp(primeList[1]);
fpu_push_1divp(primeList[0]);
// Precompute 1/b^d (mod p) for 0 <= d <= Q.
bd64[0][0] = bd64[0][1] = bd64[0][2] = bd64[0][3] = 1;
bm64[0] = inv_b[0] = invmod64(baseToUse[0], primeList[0]);
bm64[1] = inv_b[1] = invmod64(baseToUse[1], primeList[1]);
bm64[2] = inv_b[2] = invmod64(baseToUse[2], primeList[2]);
bm64[3] = inv_b[3] = invmod64(baseToUse[3], primeList[3]);
/* for (qIdx=1; qIdx<ii_BestQ; qIdx++)
{
bd64[qIdx][0] = bm64[0];
bd64[qIdx][1] = bm64[1];
bd64[qIdx][2] = bm64[2];
bd64[qIdx][3] = bm64[3];
fpu_mulmod_4a_4b_4p(bm64, inv_b, primeList);
}
*/
for (seqIdx=0; seqIdx<ii_SequenceCount; seqIdx++)
{
ck64[seqIdx][0] = smod64(-ip_Sequences[seqIdx].c, primeList[0]);
ck64[seqIdx][1] = smod64(-ip_Sequences[seqIdx].c, primeList[1]);
ck64[seqIdx][2] = smod64(-ip_Sequences[seqIdx].c, primeList[2]);
ck64[seqIdx][3] = smod64(-ip_Sequences[seqIdx].c, primeList[3]);
umod[0] = umod64(ip_Sequences[seqIdx].k, primeList[0]);
umod[1] = umod64(ip_Sequences[seqIdx].k, primeList[1]);
umod[2] = umod64(ip_Sequences[seqIdx].k, primeList[2]);
umod[3] = umod64(ip_Sequences[seqIdx].k, primeList[3]);
inv[0] = invmod64(umod[0], primeList[0]);
inv[1] = invmod64(umod[1], primeList[1]);
inv[2] = invmod64(umod[2], primeList[2]);
inv[3] = invmod64(umod[3], primeList[3]);
fpu_mulmod_4a_4b_4p(ck64[seqIdx], inv, primeList);
}
// Compute -c/(k*b^d) (mod p) for each subsequence.
for (ssIdx=0; ssIdx<ii_SubsequenceCount; ssIdx++)
{
bdck64[ssIdx][0] = ck64[ip_Subsequences[ssIdx].seqIdx][0];
bdck64[ssIdx][1] = ck64[ip_Subsequences[ssIdx].seqIdx][1];
bdck64[ssIdx][2] = ck64[ip_Subsequences[ssIdx].seqIdx][2];
bdck64[ssIdx][3] = ck64[ip_Subsequences[ssIdx].seqIdx][3];
fpu_powmod_4b_1n_4p(bm64, ip_Subsequences[ssIdx].d, primeList);
//fpu_mulmod_4a_4b_4p(bdck64[ssIdx], bd64[ip_Subsequences[ssIdx].d], primeList);
fpu_mulmod_4a_4b_4p(bdck64[ssIdx], bm64, primeList);
bm64[0] = inv_b[0];
bm64[1] = inv_b[1];
bm64[2] = inv_b[2];
bm64[3] = inv_b[3];
}
fpu_powmod_4b_1n_4p(bm64, ii_BestQ, primeList);
fpu_pop();
fpu_pop();
fpu_pop();
fpu_pop();
}
|
|
|
|
|
|
#60 |
|
"Mark"
Apr 2003
Between here and the
3×2,447 Posts |
fpu_powmod_4b_1n_4p() assumes that there is nothing on the FPU stack, but fpu_mulmod_4a_4b_4p() requires the first four entries on the stack to be preset (per fpu_push_1divp). You would have to create a version of fpu_powmod_4b_1n_4p() that supports precomputed values on the stack.
|
|
|
|
|
|
#61 | |
|
"Alexander"
Nov 2008
The Alamo City
991 Posts |
Quote:
). Is there a reason why fpu_mulmod_4a_4b_4p() preloads the 1/p values and fpu_powmod_4b_1n_4p() doesn't? Should I keep the current semantics with the ARM version?I discovered when I implemented fpu_powmod() that the x87 version does in fact use right-to-left exponentiation, as I noted in a previous post, so you were right on that concern earlier. Last fiddled with by Happy5214 on 2020-09-26 at 12:16 |
|
|
|
|
|
|
#62 | |
|
"Mark"
Apr 2003
Between here and the
3×2,447 Posts |
Quote:
|
|
|
|
|
|
|
#63 |
|
"Mark"
Apr 2003
Between here and the
3·2,447 Posts |
I have posted mtsieve 2.1.0 over at sourceforge. There are many changes with this release, but users of gcwsievecl, mfsieve, and mfsievecl will get the most benefit. The changes are:
Code:
2.1.0 - December 27, 2020
framework:
On Windows, switched to using gcc from msys2 instead of gcc from mingw64. -O3
gives a nice performance bump to some of the non-ASM code.
Exit after showing help when using -h swtich instead of giving fatal error.
Add logic to support validation of factors passed with -I, but not all sieves are
coded to do this validation.
On GPU builds, default -W to 0 and -G to 1. A "special" CPU worker will be created
as necessary to test ranges of p that are to small for the GPU code.
For GPU executables, add -H to show some GPU details when each kernel is created.
Improve factor rate calculation when using GPU workers.
Created FUTURE.txt for items I would like to get working.
cksieve: version 1.3
Added logic to validate factors passed with -I.
gcwsieve, gcwsievecl: version 1.4
Added logic to validate factors passed with -I.
Add -f option to gcwsieve so that it can generate ABC output that is supported by LLR.
Added -M option and fixed -S for gcwsievecl.
Improved speed of GPU code of gcwsievecl by about 25%.
The GPU code is more than 20x faster than the CPU for the same amount of work.
mfsieve, mfsievecl: version 1.7
Added logic to validate factors passed with -I.
Replaced all ASM routines with non-ASM routines based upon Montgomery mulitplication. This
provides a speed bump of at least 30% over the previous release.
Fixed the GPU code and replaced magic number logic with Montgomery multiplcation as it is
faster. The new GPU code is more then 25% faster than the old GPU code.
The GPU code is more than 20x faster than the CPU for the same amount of work.
sgsieve: version 1.2
Added logic to validate factors passed with -I.
twinsieve: version 1.3
Added logic to validate factors passed with -I.
xyyxsieve, xyyxsievecl: version 1.8
Added logic to validate factors passed with -I.
Code:
2.0.7 - December 22, 2020
framework:
Replace AMD SDK on Windows with open source SDK for OpenCL.
Add more output for -H for GPU executables.
afsieve, afsievecl:
Replace logic with Montgomery multiplcation.
Add factor validation when using -I.
dmdsieve:
Add factor validation when using -I.
fbncsieve:
Add factor validation when using -I.
fkbnsieve:
Add factor validation when using -I.
gfndsieve:
Add factor validation when using -I.
kbbsieve:
Add factor validation when using -I.
psieve:
Add factor validation when using -I.
pixsieve:
srsieve2:
Implement sr1sieve and sr2sieve logic.
Add factor validation when using -I.
|
|
|
|
|
|
#64 |
|
Dec 2011
After 1.58M nines:)
1,699 Posts |
If you can rise upper limit of srsieve2.
|
|
|
|
|
|
#65 |
|
"Mark"
Apr 2003
Between here and the
3×2,447 Posts |
|
|
|
|
|
|
#66 |
|
Dec 2011
After 1.58M nines:)
1,699 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mtsieve | rogue | Software | 1343 | 2023-07-06 16:41 |
| srsieve/sr2sieve enhancements | rogue | Software | 304 | 2021-11-06 13:51 |
| LLRnet enhancements | kar_bon | No Prime Left Behind | 10 | 2008-03-28 11:21 |
| TODO list and suggestions/comments/enhancements | Greenbank | Octoproth Search | 2 | 2006-12-03 17:28 |
| Suggestions for future enhancements | Reboot It | Software | 16 | 2003-10-17 01:31 |