mersenneforum.org Idea for faster sieve
 Register FAQ Search Today's Posts Mark Forums Read

 2007-05-24, 01:02 #23 Citrix     Jun 2003 32·52·7 Posts Geoff, I think I misunderstood your post the last time I read it. Here is what I think will be best. We will look at primes such that p=2^y*t*x*m+1 1) Using a switch (y) you only sieve for values that are p==1 (mod 2^y). This is already implemented. 2) t=p-1%1440. We do SPH over the t value. It would be nice if this could be set as a switch so that the POWER_RESIDUE_DIVISORS, POWER_LCM etc can be modified, so memory needs can be tested out. Btw, why do you have to look stuff up a table, can't you factorize the p-1%1440 over small integers? Of these 1440 values we only need to look at a fraction of t's. *** (See below) 3) We do BSGS algorithm on the rest of x 4) m is the extra non-useful part of the prime. *** So we decide on which t's we want to use based on work done. Work done~= sqrt(remaining number of the first 1440 with no small factors) For example if p-1%1440=36, then 36=2^2*3^2. (We do SPH over this) 1440/36=60 So workdone for 36= sqrt(60)=~7.5 So if we could have a switch to choose primes that fall in a certain work done range. All this can be calculated at the start of the program. Here is the work done for all p-1%1440 that you would look at. Code: p-1%1440 workdone 6 15.4919 10 12 12 10.9545 16 9.48683 18 8.94427 22 26.8328 28 18.9737 30 6.9282 36 6.32456 40 6 42 15.4919 46 26.8328 48 5.47723 52 18.9737 58 26.8328 60 4.89898 66 15.4919 70 12 72 4.47214 76 18.9737 78 15.4919 82 26.8328 88 13.4164 90 4 96 3.87298 100 8.48528 102 15.4919 106 26.8328 108 6.32456 112 9.48683 118 26.8328 120 3.4641 126 8.94427 130 12 132 10.9545 136 13.4164 138 15.4919 142 26.8328 148 18.9737 150 6.9282 156 10.9545 160 3 162 8.94427 166 26.8328 168 7.74597 172 18.9737 178 26.8328 180 2.82843 186 15.4919 190 12 192 3.87298 196 18.9737 198 8.94427 202 26.8328 208 9.48683 210 6.9282 216 4.47214 220 8.48528 222 15.4919 226 26.8328 228 10.9545 232 13.4164 238 26.8328 240 2.44949 246 15.4919 250 12 252 6.32456 256 6.7082 258 15.4919 262 26.8328 268 18.9737 270 4 276 10.9545 280 6 282 15.4919 286 26.8328 288 2.23607 292 18.9737 298 26.8328 300 4.89898 306 8.94427 310 12 312 7.74597 316 18.9737 318 15.4919 322 26.8328 328 13.4164 330 6.9282 336 5.47723 340 8.48528 342 8.94427 346 26.8328 348 10.9545 352 6.7082 358 26.8328 360 2 366 15.4919 370 12 372 10.9545 376 13.4164 378 8.94427 382 26.8328 388 18.9737 390 6.9282 396 6.32456 400 4.24264 402 15.4919 406 26.8328 408 7.74597 412 18.9737 418 26.8328 420 4.89898 426 15.4919 430 12 432 3.16228 436 18.9737 438 15.4919 442 26.8328 448 6.7082 450 4 456 7.74597 460 8.48528 462 15.4919 466 26.8328 468 6.32456 472 13.4164 478 26.8328 480 1.73205 486 8.94427 490 12 492 10.9545 496 9.48683 498 15.4919 502 26.8328 508 18.9737 510 6.9282 516 10.9545 520 6 522 8.94427 526 26.8328 528 5.47723 532 18.9737 538 26.8328 540 2.82843 546 15.4919 550 12 552 7.74597 556 18.9737 558 8.94427 562 26.8328 568 13.4164 570 6.9282 576 2.23607 580 8.48528 582 15.4919 586 26.8328 588 10.9545 592 9.48683 598 26.8328 600 3.4641 606 15.4919 610 12 612 6.32456 616 13.4164 618 15.4919 622 26.8328 628 18.9737 630 4 636 10.9545 640 3 642 15.4919 646 26.8328 648 4.47214 652 18.9737 658 26.8328 660 4.89898 666 8.94427 670 12 672 3.87298 676 18.9737 678 15.4919 682 26.8328 688 9.48683 690 6.9282 696 7.74597 700 8.48528 702 8.94427 706 26.8328 708 10.9545 712 13.4164 718 26.8328 720 1.41421 726 15.4919 730 12 732 10.9545 736 6.7082 738 8.94427 742 26.8328 748 18.9737 750 6.9282 756 6.32456 760 6 762 15.4919 766 26.8328 768 3.87298 772 18.9737 778 26.8328 780 4.89898 786 15.4919 790 12 792 4.47214 796 18.9737 798 15.4919 802 26.8328 808 13.4164 810 4 816 5.47723 820 8.48528 822 15.4919 826 26.8328 828 6.32456 832 6.7082 838 26.8328 840 3.4641 846 8.94427 850 12 852 10.9545 856 13.4164 858 15.4919 862 26.8328 868 18.9737 870 6.9282 876 10.9545 880 4.24264 882 8.94427 886 26.8328 888 7.74597 892 18.9737 898 26.8328 900 2.82843 906 15.4919 910 12 912 5.47723 916 18.9737 918 8.94427 922 26.8328 928 6.7082 930 6.9282 936 4.47214 940 8.48528 942 15.4919 946 26.8328 948 10.9545 952 13.4164 958 26.8328 960 1.73205 966 15.4919 970 12 972 6.32456 976 9.48683 978 15.4919 982 26.8328 988 18.9737 990 4 996 10.9545 1000 6 1002 15.4919 1006 26.8328 1008 3.16228 1012 18.9737 1018 26.8328 1020 4.89898 1026 8.94427 1030 12 1032 7.74597 1036 18.9737 1038 15.4919 1042 26.8328 1048 13.4164 1050 6.9282 1056 3.87298 1060 8.48528 1062 8.94427 1066 26.8328 1068 10.9545 1072 9.48683 1078 26.8328 1080 2 1086 15.4919 1090 12 1092 10.9545 1096 13.4164 1098 8.94427 1102 26.8328 1108 18.9737 1110 6.9282 1116 6.32456 1120 3 1122 15.4919 1126 26.8328 1128 7.74597 1132 18.9737 1138 26.8328 1140 4.89898 1146 15.4919 1150 12 1152 2.23607 1156 18.9737 1158 15.4919 1162 26.8328 1168 9.48683 1170 4 1176 7.74597 1180 8.48528 1182 15.4919 1186 26.8328 1188 6.32456 1192 13.4164 1198 26.8328 1200 2.44949 1206 8.94427 1210 12 1212 10.9545 1216 6.7082 1218 15.4919 1222 26.8328 1228 18.9737 1230 6.9282 1236 10.9545 1240 6 1242 8.94427 1246 26.8328 1248 3.87298 1252 18.9737 1258 26.8328 1260 2.82843 1266 15.4919 1270 12 1272 7.74597 1276 18.9737 1278 8.94427 1282 26.8328 1288 13.4164 1290 6.9282 1296 3.16228 1300 8.48528 1302 15.4919 1306 26.8328 1308 10.9545 1312 6.7082 1318 26.8328 1320 3.4641 1326 15.4919 1330 12 1332 6.32456 1336 13.4164 1338 15.4919 1342 26.8328 1348 18.9737 1350 4 1356 10.9545 1360 4.24264 1362 15.4919 1366 26.8328 1368 4.47214 1372 18.9737 1378 26.8328 1380 4.89898 1386 8.94427 1390 12 1392 5.47723 1396 18.9737 1398 15.4919 1402 26.8328 1408 6.7082 1410 6.9282 1416 7.74597 1420 8.48528 1422 8.94427 1426 26.8328 1428 10.9545 1432 13.4164 1438 26.8328 1440 1 Let me know if you have any questions? Could you please implement this soon. Thanks!
 2007-05-24, 09:33 #24 Citrix     Jun 2003 32×52×7 Posts I was able to compile sr5sieve. 5040 seemed the fastest. (Btw 1440 has 36 factors, I think you meant 5040.) My compiled exe only reached 250kp/sec, but the exe I downloaded from your website gives me 350kp/sec. Do you know why? I used gcc to compile using the make file for pentium 4.
2007-05-25, 03:37   #25
geoff

Mar 2003
New Zealand

22058 Posts

Quote:
 Originally Posted by Citrix I was able to compile sr5sieve. 5040 seemed the fastest. (Btw 1440 has 36 factors, I think you meant 5040.)
Yes my mistake, 5040 has 60 divisors. The ideal setting for POWER_RESIDUE_LCM will be larger if there are fewer sequences in the sieve, or if the n-range is larger, or the BSGS routine is slower.

At the moment POWER_RESIDUE_LCM must be less than 2^15, this can be extended but needs other changes to be made to the source.

Quote:
 My compiled exe only reached 250kp/sec, but the exe I downloaded from your website gives me 350kp/sec. Do you know why? I used gcc to compile using the make file for pentium 4.
I compiled the 1.5.x versions with make ARCH=x86-intel CC="gcc -V3.4"', for some reason I don't understand yet GCC version 3.4 is a bit faster than version 4.1 when compiling the 1.5.x code (It was the other way around with the 1.4.x code).

Quote:
 2) t=p-1%1440. We do SPH over the t value. It would be nice if this could be set as a switch so that the POWER_RESIDUE_DIVISORS, POWER_LCM etc can be modified, so memory needs can be tested out. Btw, why do you have to look stuff up a table, can't you factorize the p-1%1440 over small integers?
Dividing a 64-bit number on a 32-bit CPU is extremely slow, it is worthwhile using just about any trick you can think of to avoid it.

I said before that the algorithm used information about the factors of gcd(p-1,720), but in fact it just needs to know the greatest r dividing 720 such that p=1 (mod r). For example, if p=1 (mod 120) then the algorithm doesn't need to know about the factors of 120.

2007-05-27, 06:34   #26
Citrix

Jun 2003

157510 Posts

Hi Geoff,
I tried to compile the sieve file so I could implement the cutoff method.
1) I changed the LCM value to 360*7*11. This seems to be the fastest.
2) in file bsgs.c, in function uint32_t setup64(uint64_t p), I wanted to add the following code. I tried returning j=0, so that the program would ignore this prime, but it did not work. What am I doing wrong? Could you please implement the following code and compile it.

Thanks!

code attached. (Look at it with word wrap off)
Attached Files
 code.txt (11.8 KB, 89 views)

2007-05-28, 02:22   #27
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by Citrix 2) in file bsgs.c, in function uint32_t setup64(uint64_t p), I wanted to add the following code. I tried returning j=0, so that the program would ignore this prime, but it did not work. What am I doing wrong?
Returning 0 from setup64(p) should cause the prime p to be skipped as you want. I think your problem is the array indexing, you declare
Code:
uint32_t modulus =(p-1)%POWER_RESIDUE_LCM;
uint32_t ar[POWER_RESIDUE_LCM/2]
then test
Code:
if(ar[(modulus/2)-1]==0)
but (modulus/2)-1 could take the value -1 ... ((POWER_RESIDUE_LCM-1)/2)-1
I think you just want to test
Code:
if(ar[modulus/2]==0)
Remember p-1 and POWER_RESIDUE_LCM can both be assumed to be even.

 2007-05-28, 02:32 #28 geoff     Mar 2003 New Zealand 13·89 Posts I have uploaded version 1.5.5x which takes the switch sr2sieve -x X' and only tests for factors of the form p-1 (mod X). X doesn't have to be a power of 2 but it is more efficient if it is. I speculated in another thread about modifying the Sieve of Eratosthenes so that it only sieved numbers of the form N*x+1 for some constant N, but I haven't found a way to do this efficiently, except when N is a power of 2. Citrix, even if this were done, you talk about finding the factors of (p-1)%N, which is another problem altogether. Sr2sieve doesn't need to do this, it only ever looks at the value gcd(p-1,N). Here is the general technique I would use to find the factors of gcd(p-1,N). 0. Precompute T[0] ... T[N-1] where T[i] contains the factors of gcd(i,N). 1. For any p > 0, the factors of gcd(p-1,N) are T[(p-1)%N]. This only requires one division per p and so is a huge improvement over computing (p-1)%n for each divisor n of N. However it can't be used to find factors of (p-1)%N that are not themselves divisors of N.
 2007-05-28, 02:38 #29 Citrix     Jun 2003 110001001112 Posts I don't think this is the problem. Assume that p-1%720=20 Then I want to look at ar[10-1], since the values in the array are 2,4,6,8... ie they start at 2. So the ar[9]=20. Is there anything else that could be going wrong? Are you sure returning j=0 causes the prime to be skipped. edit: By the way when you do gcd(p-1,N), N does not have to have any factors as powers of 2. Powers of 2 can be calculated by looking at the number of zero bits in the end of p-1. Last fiddled with by Citrix on 2007-05-28 at 03:17
2007-05-28, 02:53   #30
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by Citrix Assume that p-1%720=20 Then I want to look at ar[10-1], since the values in the array are 2,4,6,8... ie they start at 2. So the ar[9]=20.
What if (p-1)%720 = 0? Then ar[0-1] is junk.

Quote:
 Is there anything else that could be going wrong? Are you sure returning j=0 causes the prime to be skipped.
setup64() is called like this from prime_sieve() (in primes.c):
Code:
cc = setup64(p);
if (cc > 0)
bsgs64(p,cc);

2007-05-28, 02:58   #31
Citrix

Jun 2003

32·52·7 Posts

Quote:
 Originally Posted by geoff I speculated in another thread about modifying the Sieve of Eratosthenes so that it only sieved numbers of the form N*x+1 for some constant N, but I haven't found a way to do this efficiently, except when N is a power of 2.
There is an easy way, but requires 1 extra step for each prime in the low_primes array. You calculate N' (ie N inverse) mod p for every prime p in the array low_primes and store them in an array.

Then to sieve between start and stop. You find the first candidate where p divides the number ie. (start+x+1)%p=0. I assume that start and x are both divisible by N to make the example easier, though it is not needed.

1) then k*N+N*m+1==0 (mod p) where n*m=xand k*N=start
2) To find m we multiply both sides by N'
3) We get k+m+N'==0 (mod p)
4) Solve for m=-(k+N') (mod p)
5) Add p to m till you reach stop and cancel these values as primes.

If start is not a multiple of N, then you have to do an extra multiplication. But why not use start as a multiple of N and avoid the multiplication.

Hope this helps.

2007-06-10, 20:48   #32
Citrix

Jun 2003

32·52·7 Posts

Hi Geoff,
Any progress on this. May be you din't understand my last post or there is some confusion on what I wanted.

Just to summarize what I have think will be fastest.

1) From the command line the program gets -x X, and factorizes X. if you want to avoid this step I can enter the factors as -fx a*b*c*d...
2) Then we look at all primes such tht p%X==1. We can use the method below in quotes to generate all such numbers. The only problem is that it will use twice as much memory.
3) Since all numbers are multiples of X and factorization of X is known, we can apply SPH over the entire factorization of X. I plan to use X, such that X>range, so there is no need to check gcd(prime-1,720) etc. Just do SPH over the factorization of X.

Could you please implement this soon. Let me know if you have any questions.

Thanks!

Quote:
 Originally Posted by Citrix 1) then k*N+N*m+1==0 (mod p) where n*m=xand k*N=start 2) To find m we multiply both sides by N' 3) We get k+m+N'==0 (mod p) 4) Solve for m=-(k+N') (mod p) 5) Add p to m till you reach stop and cancel these values as primes.

2007-06-10, 22:45   #33
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 Originally Posted by Citrix Any progress on this. May be you din't understand my last post or there is some confusion on what I wanted.
No, I just haven't got around to it yet :-). RieselSieve are rapidly approaching the 2^52 limit of the current x86-64 mulmods, so I am working on extending that limit.

BTW I think the existing method I am using for sieving numbers mod 2^y will fail when p > 2^(64-y). For example when sieving GFNs with k=3^16 the limit for the sieve will be 2^59 instead of 2^62.

 Similar Threads Thread Thread Starter Forum Replies Last Post jasong jasong 8 2017-04-07 00:23 Dubslow Forum Feedback 7 2016-12-02 14:26 binu Factoring 3 2013-04-13 16:32 science_man_88 Homework Help 0 2010-04-23 16:33 ThomRuley Lone Mersenne Hunters 5 2003-07-15 05:51

All times are UTC. The time now is 05:07.

Thu Oct 29 05:07:31 UTC 2020 up 49 days, 2:18, 1 user, load averages: 1.25, 1.37, 1.41