mersenneforum.org mtsieve enhancements
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

2020-09-21, 19:23   #56
rogue

"Mark"
Apr 2003
Between here and the

134608 Posts

Quote:
 Originally Posted by Citrix I am looking at BestQ code Code: uint32_t GenericSubsequenceHelper::RateQ(uint32_t Q, uint32_t s) { uint32_t baby, giant, work; ChooseSteps(&baby, &giant, Q, s); work = baby*BABY_WORK + s*(giant-1)*GIANT_WORK + Q*EXP_WORK + s*SUBSEQ_WORK; return work; } I understand the BABY_WORK, GIANT_WORK and SUBSEQ_WORK steps. Can you help me understand what is the Q*EXP_WORK -- I do not see anything corresponding to this in the discrete log function? For a large Q if only one subsequence is left - it should be significantly faster than using a lower Q but the "Q*EXP_WORK" prevents use of large Q.
This was borrowed from srsieve. The original code was written by Geoff Reynolds.

Note that the code does not have any sr1sieve or sr2sieve enhancements. I have a ways to go to pull in the sr2sieve bsgs logic.

 2020-09-22, 14:27 #57 rob147147   Apr 2013 Durham, UK 778 Posts I think EXP_WORK relates to the additional mulmods involved in pre-calculating b^d (mod p) for all the appropriate remainder values 'd' where 0 <= d < Q. E.g. We are rewriting the terms of a sequence k * b^n -1 as k * b^d * b^(Qm) -1 where n=Qm+d. The idea behind Q is to identify which values 'd' are unnecessary, thus ending up with less than Q subsequences over a range 1/Q times the size of the original range, hopefully giving a performance improvement. There could be multiple terms with the same remainder 'd' so we can precalculate b^d (mod p) for those instances. My understanding from looking at the source code of sr1sieve (bsgs.c) is that it is this mulmod that we are accounting for in EXP_WORK.
2020-09-24, 03:45   #58
Citrix

Jun 2003

157410 Posts

Quote:
 Originally Posted by rob147147 I think EXP_WORK relates to the additional mulmods involved in pre-calculating b^d (mod p) for all the appropriate remainder values 'd' where 0 <= d < Q. E.g. We are rewriting the terms of a sequence k * b^n -1 as k * b^d * b^(Qm) -1 where n=Qm+d. The idea behind Q is to identify which values 'd' are unnecessary, thus ending up with less than Q subsequences over a range 1/Q times the size of the original range, hopefully giving a performance improvement. There could be multiple terms with the same remainder 'd' so we can precalculate b^d (mod p) for those instances. My understanding from looking at the source code of sr1sieve (bsgs.c) is that it is this mulmod that we are accounting for in EXP_WORK.
Robert, I think you are correct. I came to the same conclusion when reviewing the code. Though for sr1sieve there would not be any need to calculate b^d (mod p) for particular d values as there is no candidates left for certain d. That is the whole point of choosing a Q value.

For multiple k (sr2sieve) the program should intelligently decide which b^d should be computed.

In Srsieve2 the following function would need to be corrected to make it faster for single k.

Code:
void   GenericWorker::BuildTables(uint64_t *baseToUse, uint64_t *primeList, double *invp, uint64_t *bm64){
...}

 2020-09-24, 05:31 #59 Citrix     Jun 2003 30468 Posts I tried to replace mulmod by powmod to avoid unnecessary multiplication. I get an error during run time. Any idea why the calculations are not being done correctly? Code: `void GenericWorker::BuildTables(uint64_t *baseToUse, uint64_t *primeList, double *invp, uint64_t *bm64) { uint64_t inv_b[4]; uint32_t qIdx, seqIdx, ssIdx; uint64_t umod[4], inv[4]; fpu_push_1divp(primeList[3]); fpu_push_1divp(primeList[2]); fpu_push_1divp(primeList[1]); fpu_push_1divp(primeList[0]); // Precompute 1/b^d (mod p) for 0 <= d <= Q. bd64[0][0] = bd64[0][1] = bd64[0][2] = bd64[0][3] = 1; bm64[0] = inv_b[0] = invmod64(baseToUse[0], primeList[0]); bm64[1] = inv_b[1] = invmod64(baseToUse[1], primeList[1]); bm64[2] = inv_b[2] = invmod64(baseToUse[2], primeList[2]); bm64[3] = inv_b[3] = invmod64(baseToUse[3], primeList[3]); /* for (qIdx=1; qIdx
 2020-09-24, 12:44 #60 rogue     "Mark" Apr 2003 Between here and the 24×7×53 Posts fpu_powmod_4b_1n_4p() assumes that there is nothing on the FPU stack, but fpu_mulmod_4a_4b_4p() requires the first four entries on the stack to be preset (per fpu_push_1divp). You would have to create a version of fpu_powmod_4b_1n_4p() that supports precomputed values on the stack.
2020-09-26, 12:15   #61
Happy5214

"Alexander"
Nov 2008
The Alamo City

38610 Posts

Quote:
 Originally Posted by rogue fpu_powmod_4b_1n_4p() assumes that there is nothing on the FPU stack, but fpu_mulmod_4a_4b_4p() requires the first four entries on the stack to be preset (per fpu_push_1divp). You would have to create a version of fpu_powmod_4b_1n_4p() that supports precomputed values on the stack.
On that subject, I'm about to implement the ARM version of fpu_powmod_4b_1n_4p() (it'll be the only one I'll have to spill registers for ). Is there a reason why fpu_mulmod_4a_4b_4p() preloads the 1/p values and fpu_powmod_4b_1n_4p() doesn't? Should I keep the current semantics with the ARM version?

I discovered when I implemented fpu_powmod() that the x87 version does in fact use right-to-left exponentiation, as I noted in a previous post, so you were right on that concern earlier.

Last fiddled with by Happy5214 on 2020-09-26 at 12:16

2020-09-26, 13:06   #62
rogue

"Mark"
Apr 2003
Between here and the

10111001100002 Posts

Quote:
 Originally Posted by Happy5214 On that subject, I'm about to implement the ARM version of fpu_powmod_4b_1n_4p() (it'll be the only one I'll have to spill registers for ). Is there a reason why fpu_mulmod_4a_4b_4p() preloads the 1/p values and fpu_powmod_4b_1n_4p() doesn't? Should I keep the current semantics with the ARM version? I discovered when I implemented fpu_powmod() that the x87 version does in fact use right-to-left exponentiation, as I noted in a previous post, so you were right on that concern earlier.
In most cases fpu_powmod_4b_1n_4p() is called once to set up a loop and fpu_mulmod_4a_4b_4p() is called within a loop. There is a time speed bump to setting up the stack prior to the call to fpu_powmod_4b_1n_4p(), but it wouldn't benefit all sieves and most that would benefit would probably see a negligible benefit. The long term goal would then be to preset the stack (or registers for ARM) for all fpu functions.

 Thread Tools

 Similar Threads Thread Thread Starter Forum Replies Last Post rogue Software 281 2020-09-29 16:36 rogue Software 447 2020-09-19 05:50 kar_bon No Prime Left Behind 10 2008-03-28 11:21 Greenbank Octoproth Search 2 2006-12-03 17:28 Reboot It Software 16 2003-10-17 01:31

All times are UTC. The time now is 15:04.

Tue Oct 20 15:04:14 UTC 2020 up 40 days, 12:15, 1 user, load averages: 3.07, 2.97, 2.83

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.