mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2020-09-21, 19:23   #56
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

177A16 Posts
Default

Quote:
Originally Posted by Citrix View Post
I am looking at BestQ code

Code:
uint32_t    GenericSubsequenceHelper::RateQ(uint32_t Q, uint32_t s)
{
   uint32_t baby, giant, work;

   ChooseSteps(&baby, &giant, Q, s);

   work = baby*BABY_WORK + s*(giant-1)*GIANT_WORK + Q*EXP_WORK + s*SUBSEQ_WORK;

   return work;
}
I understand the BABY_WORK, GIANT_WORK and SUBSEQ_WORK steps.
Can you help me understand what is the Q*EXP_WORK -- I do not see anything corresponding to this in the discrete log function?

For a large Q if only one subsequence is left - it should be significantly faster than using a lower Q but the "Q*EXP_WORK" prevents use of large Q.
This was borrowed from srsieve. The original code was written by Geoff Reynolds.

Note that the code does not have any sr1sieve or sr2sieve enhancements. I have a ways to go to pull in the sr2sieve bsgs logic.
rogue is offline   Reply With Quote
Old 2020-09-22, 14:27   #57
rob147147
 
Apr 2013
Durham, UK

32·7 Posts
Default

I think EXP_WORK relates to the additional mulmods involved in pre-calculating b^d (mod p) for all the appropriate remainder values 'd' where 0 <= d < Q.

E.g. We are rewriting the terms of a sequence k * b^n -1 as k * b^d * b^(Qm) -1 where n=Qm+d.

The idea behind Q is to identify which values 'd' are unnecessary, thus ending up with less than Q subsequences over a range 1/Q times the size of the original range, hopefully giving a performance improvement. There could be multiple terms with the same remainder 'd' so we can precalculate b^d (mod p) for those instances. My understanding from looking at the source code of sr1sieve (bsgs.c) is that it is this mulmod that we are accounting for in EXP_WORK.
rob147147 is offline   Reply With Quote
Old 2020-09-24, 03:45   #58
Citrix
 
Citrix's Avatar
 
Jun 2003

32·52·7 Posts
Default

Quote:
Originally Posted by rob147147 View Post
I think EXP_WORK relates to the additional mulmods involved in pre-calculating b^d (mod p) for all the appropriate remainder values 'd' where 0 <= d < Q.

E.g. We are rewriting the terms of a sequence k * b^n -1 as k * b^d * b^(Qm) -1 where n=Qm+d.

The idea behind Q is to identify which values 'd' are unnecessary, thus ending up with less than Q subsequences over a range 1/Q times the size of the original range, hopefully giving a performance improvement. There could be multiple terms with the same remainder 'd' so we can precalculate b^d (mod p) for those instances. My understanding from looking at the source code of sr1sieve (bsgs.c) is that it is this mulmod that we are accounting for in EXP_WORK.
Robert, I think you are correct. I came to the same conclusion when reviewing the code. Though for sr1sieve there would not be any need to calculate b^d (mod p) for particular d values as there is no candidates left for certain d. That is the whole point of choosing a Q value.

For multiple k (sr2sieve) the program should intelligently decide which b^d should be computed.

In Srsieve2 the following function would need to be corrected to make it faster for single k.

Code:
void   GenericWorker::BuildTables(uint64_t *baseToUse, uint64_t *primeList, double *invp, uint64_t *bm64){
...}
Citrix is offline   Reply With Quote
Old 2020-09-24, 05:31   #59
Citrix
 
Citrix's Avatar
 
Jun 2003

32·52·7 Posts
Default

I tried to replace mulmod by powmod to avoid unnecessary multiplication. I get an error during run time. Any idea why the calculations are not being done correctly?

Code:
void   GenericWorker::BuildTables(uint64_t *baseToUse, uint64_t *primeList, double *invp, uint64_t *bm64)
{
   uint64_t inv_b[4];
   uint32_t qIdx, seqIdx, ssIdx;
   uint64_t umod[4], inv[4];

   fpu_push_1divp(primeList[3]);
   fpu_push_1divp(primeList[2]);
   fpu_push_1divp(primeList[1]);
   fpu_push_1divp(primeList[0]);
   
   // Precompute 1/b^d (mod p) for 0 <= d <= Q.

   bd64[0][0] = bd64[0][1] = bd64[0][2] = bd64[0][3] = 1;
   
   bm64[0] = inv_b[0] = invmod64(baseToUse[0], primeList[0]);
   bm64[1] = inv_b[1] = invmod64(baseToUse[1], primeList[1]);
   bm64[2] = inv_b[2] = invmod64(baseToUse[2], primeList[2]);
   bm64[3] = inv_b[3] = invmod64(baseToUse[3], primeList[3]);

  /* for (qIdx=1; qIdx<ii_BestQ; qIdx++)
   {
      bd64[qIdx][0] = bm64[0];
      bd64[qIdx][1] = bm64[1];
      bd64[qIdx][2] = bm64[2];
      bd64[qIdx][3] = bm64[3];
         
      fpu_mulmod_4a_4b_4p(bm64, inv_b, primeList);
   }
   */
    
   for (seqIdx=0; seqIdx<ii_SequenceCount; seqIdx++)
   {
      ck64[seqIdx][0] = smod64(-ip_Sequences[seqIdx].c, primeList[0]);
      ck64[seqIdx][1] = smod64(-ip_Sequences[seqIdx].c, primeList[1]);
      ck64[seqIdx][2] = smod64(-ip_Sequences[seqIdx].c, primeList[2]);
      ck64[seqIdx][3] = smod64(-ip_Sequences[seqIdx].c, primeList[3]);
      
      umod[0] = umod64(ip_Sequences[seqIdx].k, primeList[0]);
      umod[1] = umod64(ip_Sequences[seqIdx].k, primeList[1]);
      umod[2] = umod64(ip_Sequences[seqIdx].k, primeList[2]);
      umod[3] = umod64(ip_Sequences[seqIdx].k, primeList[3]);
      
      inv[0] = invmod64(umod[0], primeList[0]);
      inv[1] = invmod64(umod[1], primeList[1]);
      inv[2] = invmod64(umod[2], primeList[2]);
      inv[3] = invmod64(umod[3], primeList[3]);
      
      fpu_mulmod_4a_4b_4p(ck64[seqIdx], inv, primeList);
   }

   // Compute -c/(k*b^d) (mod p) for each subsequence.
   for (ssIdx=0; ssIdx<ii_SubsequenceCount; ssIdx++)
   {
      bdck64[ssIdx][0] = ck64[ip_Subsequences[ssIdx].seqIdx][0];
      bdck64[ssIdx][1] = ck64[ip_Subsequences[ssIdx].seqIdx][1];
      bdck64[ssIdx][2] = ck64[ip_Subsequences[ssIdx].seqIdx][2];
      bdck64[ssIdx][3] = ck64[ip_Subsequences[ssIdx].seqIdx][3];


	  fpu_powmod_4b_1n_4p(bm64, ip_Subsequences[ssIdx].d, primeList);
      
      //fpu_mulmod_4a_4b_4p(bdck64[ssIdx], bd64[ip_Subsequences[ssIdx].d], primeList);
	  fpu_mulmod_4a_4b_4p(bdck64[ssIdx], bm64, primeList);

	  bm64[0] = inv_b[0];
	  bm64[1] = inv_b[1];
	  bm64[2] = inv_b[2];
	  bm64[3] = inv_b[3];
   }
   fpu_powmod_4b_1n_4p(bm64, ii_BestQ, primeList);

   
   fpu_pop();
   fpu_pop();
   fpu_pop();
   fpu_pop();
}
Citrix is offline   Reply With Quote
Old 2020-09-24, 12:44   #60
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

135728 Posts
Default

fpu_powmod_4b_1n_4p() assumes that there is nothing on the FPU stack, but fpu_mulmod_4a_4b_4p() requires the first four entries on the stack to be preset (per fpu_push_1divp). You would have to create a version of fpu_powmod_4b_1n_4p() that supports precomputed values on the stack.
rogue is offline   Reply With Quote
Old 2020-09-26, 12:15   #61
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

1100111002 Posts
Default

Quote:
Originally Posted by rogue View Post
fpu_powmod_4b_1n_4p() assumes that there is nothing on the FPU stack, but fpu_mulmod_4a_4b_4p() requires the first four entries on the stack to be preset (per fpu_push_1divp). You would have to create a version of fpu_powmod_4b_1n_4p() that supports precomputed values on the stack.
On that subject, I'm about to implement the ARM version of fpu_powmod_4b_1n_4p() (it'll be the only one I'll have to spill registers for ). Is there a reason why fpu_mulmod_4a_4b_4p() preloads the 1/p values and fpu_powmod_4b_1n_4p() doesn't? Should I keep the current semantics with the ARM version?

I discovered when I implemented fpu_powmod() that the x87 version does in fact use right-to-left exponentiation, as I noted in a previous post, so you were right on that concern earlier.

Last fiddled with by Happy5214 on 2020-09-26 at 12:16
Happy5214 is offline   Reply With Quote
Old 2020-09-26, 13:06   #62
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

10111011110102 Posts
Default

Quote:
Originally Posted by Happy5214 View Post
On that subject, I'm about to implement the ARM version of fpu_powmod_4b_1n_4p() (it'll be the only one I'll have to spill registers for ). Is there a reason why fpu_mulmod_4a_4b_4p() preloads the 1/p values and fpu_powmod_4b_1n_4p() doesn't? Should I keep the current semantics with the ARM version?

I discovered when I implemented fpu_powmod() that the x87 version does in fact use right-to-left exponentiation, as I noted in a previous post, so you were right on that concern earlier.
In most cases fpu_powmod_4b_1n_4p() is called once to set up a loop and fpu_mulmod_4a_4b_4p() is called within a loop. There is a time speed bump to setting up the stack prior to the call to fpu_powmod_4b_1n_4p(), but it wouldn't benefit all sieves and most that would benefit would probably see a negligible benefit. The long term goal would then be to preset the stack (or registers for ARM) for all fpu functions.
rogue is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mtsieve rogue Software 458 2020-11-28 14:25
srsieve/sr2sieve enhancements rogue Software 281 2020-09-29 16:36
LLRnet enhancements kar_bon No Prime Left Behind 10 2008-03-28 11:21
TODO list and suggestions/comments/enhancements Greenbank Octoproth Search 2 2006-12-03 17:28
Suggestions for future enhancements Reboot It Software 16 2003-10-17 01:31

All times are UTC. The time now is 15:55.

Sat Nov 28 15:55:00 UTC 2020 up 79 days, 13:05, 4 users, load averages: 2.01, 1.53, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.