mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > And now for something completely different

Reply
 
Thread Tools
Old 2015-01-04, 16:40   #12
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default Testers required

Attached is the source + compile script, a Win32 build using CUDA 3.2 (ancient!) and the corresponding cuda run time dll.

I would appreciate some testing on different NVidia h/w (probably wont work on the latest 9xx series, but should run on everything else).

suggested command line parameters:
  • 17 1 2 B9
  • 18 1000 1001 B9
  • 19 10000 10001 B9
  • 20 100000 100001 B9
Attached Files
File Type: zip src.zip (81.4 KB, 213 views)
File Type: zip cudart32_32_16.zip (130.7 KB, 208 views)
axn is online now   Reply With Quote
Old 2015-01-04, 16:42   #13
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

Quote:
Originally Posted by rogue View Post
If you run these thru pfgw 3.7.8, it will validate them. Run you file as "pfgw factors.txt" and if all lines return "is Zero", then they are valid factors.
looks like 3.7.8 won't run on my ancient XP :-( I'll try it out later when I'm on Win 7. Thanks for the tip.
axn is online now   Reply With Quote
Old 2015-01-04, 18:12   #14
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

588010 Posts
Default

Quote:
Originally Posted by axn View Post
Attached is the source + compile script, a Win32 build using CUDA 3.2 (ancient!) and the corresponding cuda run time dll.

I would appreciate some testing on different NVidia h/w (probably wont work on the latest 9xx series, but should run on everything else).

suggested command line parameters:
  • 17 1 2 B9
  • 18 1000 1001 B9
  • 19 10000 10001 B9
  • 20 100000 100001 B9
It is running on my 750ti which is the younger brother of 9xx. It should work on pretty much any card.
Getting the occasional unable to find qnr message.
Runtime around 4 hours per P for n=15
henryzz is online now   Reply With Quote
Old 2015-01-04, 18:29   #15
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

100101000001012 Posts
Smile

I usually validate all factors with a simple reformatter written in perl whose output is fed to gp -q.

Let the fun begin!
Batalov is offline   Reply With Quote
Old 2015-01-04, 18:45   #16
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

250516 Posts
Default

Ok, on a 580, I need at least three of them (with B12) to fully load the GPU... but this is common with gfn-sieve (as far as I know from gfn-sieve discussions at primesearch/PrimeGrid forums). I will wait for 2-4P, 4-5P, 5-7P to finish (ETA 30 minutes), then will compare with my sieve results (for false negatives?) and for validity (false positives)...

I even recompiled with b<=100,000... but the speed doesn't depend on it (0.1M, 1M, 100M, all the same; just less output), does it? ;-)

It is works great!
* All factors are of course valid
* No factors seem to be missing!

Last fiddled with by Batalov on 2015-01-04 at 20:59 Reason: (cross-validation on a small sample)
Batalov is offline   Reply With Quote
Old 2015-01-04, 20:19   #17
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36·13 Posts
Default

Here is my primitive CPU sieve prefactor program.
Promise not to laugh; it was hacked together in half an hour.
It takes candidates from stdin and spits factors on stdout.
I use it in bit levels: first pass
seq 2 99999 | awk '{print $1, 131072}' |./PIESsv 1 40 > factors_to_40bits.txt
then remove them, then
cat Survivors | awk '{print $1, 131072}' |./PIESsv 40 50 > factors_to_50bits.txt
and so on.

Note: my m is Yves' 2n-1

Code:
/*********************************************************/
/* A primitive sieve for PIES (b,n) values (from stdin)  */
/*                      (copyleft) 2014 S.Batalov        */
/* Example use:                                          */
/*   gcc -O3 -o EMsieve EMsieve.c                        */
/*   echo 2100011 | ./EMsieve 1 50                       */
/*********************************************************/

#include <stdio.h>
#include <stdlib.h>

typedef unsigned long long uint64_t;

uint64_t mulmod(uint64_t a, uint64_t b, uint64_t c) {
  uint64_t d; /* to hold the result of a*b mod c */
  /* calculates a*b mod c, stores result in d */
  asm ("mov %1, %%rax;"        /* put a into rax */
       "mul %2;"               /* mul a*b -> rdx:rax */
       "div %3;"               /* (a*b)/c -> quot in rax remainder in rdx */
       "mov %%rdx, %0;"        /* store result in d */
       :"=r"(d)                /* output */
       :"r"(a), "r"(b), "r"(c) /* input */
       :"%rax", "%rdx"         /* clobbered registers */
      );
  return d;
}

uint64_t powmod(uint64_t i, uint64_t j, uint64_t k) { /* a general, well-known code */
  uint64_t r = 1;
  while(j > 1){
    if(j&1) r=mulmod(r,i,k);
    i=mulmod(i,i,k);
    j>>=1;
  }
  return mulmod(r,i,k);
}

int gcd(uint64_t a, uint64_t b) {
  return (b) ? gcd( b, a % b ) : a;
}

#define SHM (5*7*11*13*17*19)

main(int argc,char **argv) {
  uint64_t f,b,c,j,k,l,l2,B;
  int p, p2, p6, z;
  char buf[64], *s, *note;
  char ms[SHM];

  for(j=0; j<SHM; j++  ) ms[j]=0;
  for(j=0; j<SHM; j+=5 ) ms[j]=1;
  for(j=0; j<SHM; j+=7 ) ms[j]=1;
  for(j=0; j<SHM; j+=11) ms[j]=1;
  for(j=0; j<SHM; j+=13) ms[j]=1;
  for(j=0; j<SHM; j+=17) ms[j]=1;
  for(j=0; j<SHM; j+=19) ms[j]=1;

  l = 1; l2 = 49;
  if(0 && argc<=2) { printf(" Use: %s from_bits to_bits < input_file > output_file\n"
                       "\t 0 < from_bits < to_bits < 63 are numeric parameters\n"
                       "\t a PIES simple sieve, v.1.1\n"
                       "\t (copyleft) 2014 S.Batalov\n",argv[0]); exit(0); }
  if(argc>1) l = atoll(argv[1]);
  if(argc>2) l2 = atoll(argv[2]);
  printf("# from %lld to %lld bits\n",l,l2);
  l = 1ULL << l;
  l2 = 1ULL << l2;

  while((s=fgets(buf,64,stdin))) {
    if(sscanf(s,"%lld %d",&B,&p)!=2) { /* p for "power"; here, must be 3-smooth, not prime */
      p=73728;
      B=strtol(s,NULL,10);
    }
    k=p; while((k&1)==0) k/=2; while((k%3)==0) k/=3;
    if(k!=1) { printf("!"); continue;}
    if((p&1)==1 && (B%3)==2) { printf("%lld | %d^%d-%d^%d+1\n",3,B,2*p,B,p); continue;}
    if(p<10 || p>(1<<28)) continue;
    p6= p*6;
    for(f=(l/p6+1)*p6+1; f<=l2; f+=p6) {
      if(ms[f%SHM] || gcd(f, 810162134158954261ULL % f)>1) continue; /* cheap ispseudoprime(f) */
      b=powmod(B,p,f);
      c=mulmod(b,b,f);
      if(b==c+1) {
        printf("%lld | %d^%d-%d^%d+1\n",f,B,2*p,B,p); fflush(stdout);
        break;
      }
    }
  }
  exit(0);
}
Batalov is offline   Reply With Quote
Old 2015-01-05, 03:19   #18
axn
 
axn's Avatar
 
Jun 2003

505110 Posts
Default

Quote:
Originally Posted by henryzz View Post
It is running on my 750ti which is the younger brother of 9xx. It should work on pretty much any card.
Getting the occasional unable to find qnr message.
Runtime around 4 hours per P for n=15
Are you getting factor outputs to the screen (like in my post #8?). If not, then it is not working. The CPU portion will run and produce output (which is where the qnr warning comes from), but GPU part might not be running properly. If that is the case, it would probably need to be compiled with latest cuda version (6.5?) specifically targeting the relevant compute version (5.0?).

Quote:
Originally Posted by Batalov View Post
Ok, on a 580, I need at least three of them (with B12) to fully load the GPU... but this is common with gfn-sieve (as far as I know from gfn-sieve discussions at primesearch/PrimeGrid forums).
Two copies should get very nearly the same thruput as 3.

Quote:
Originally Posted by Batalov View Post
I even recompiled with b<=100,000... but the speed doesn't depend on it (0.1M, 1M, 100M, all the same; just less output), does it? ;-)
Yes, it is completely range-agnostic. That's just the nature of the beast :-)

Quote:
Originally Posted by Batalov View Post
It is works great!
* All factors are of course valid
* No factors seem to be missing!
That's great news.
axn is online now   Reply With Quote
Old 2015-01-05, 04:06   #19
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

36×13 Posts
Default

I should mention occasional "Unable to find qnr for ..." message as well. Fairly rare. Once an hour, maybe.
(I don't think I've seen "Unable to find cnr ..." !)
Clarification: that is, in the steady stream of factors, of course.

I think you could bump 128 to 256, maybe. It almost never runs to 128 anyway, so what if it runs a bit beyond. From the papers that I flipped thru, the smallest qr is bound with O(p^α, α has some bounds) but overgrow O(log p), and one can imagine that the qnr could have the same behavior. So every once in a while they will be bigger.
Or of course we can sacrifice those missed opportunities, as is written now.

Last fiddled with by Batalov on 2015-01-05 at 04:25
Batalov is offline   Reply With Quote
Old 2015-01-05, 04:34   #20
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

The "Unable to find qnr" invariably occurs when the input factor is a composite perfect square. I was worried that maybe once in a while it might miss a prime like that, so initial builds of GFN sieve didn't have a restriction, and occasionally went into infinite loop. Then I put in the restriction and print the f, just to see if any prime ever fails.

FWIW, the probability of a given p being a QNR (mod f) is 50%, and so a genuine prime being missed is 2^-128. I think I can live with those odds Honestly, like you said, it wouldn't matter if that is 128 or 256. For most numbers, it would never reach these limits. But those that fail at 128 will also fail at 256.

EDIT:- Technically, initial builds had a restriction of 64 primes, then I removed it because of worries about missing a prime, then users complained about infinite loops, and then finally put back in restriction with a generous 128 primes.

Last fiddled with by axn on 2015-01-05 at 04:38
axn is online now   Reply With Quote
Old 2015-01-05, 05:06   #21
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

947710 Posts
Default

Ah! I see! Makes sense.
Indeed,
"...60397458466799617 | 695791^262144-695791^131072+1
Unable to find qnr for 60397977108480001
Unable to find qnr for 60397978091520001
Found 158 ETA 12h34m..."
but
(21:07) gp > factor(60397977108480001)
[245759999 2]

(21:07) gp > factor(60397978091520001)
[245760001 2]

Those pesky prime squares also botched mpqs in lat-sievers. In both cases they just slip through the fast pseudoprime check (and it must be a lazy, inexpensive one, otherwise needless extra time will be spent on testing them).

Last fiddled with by Batalov on 2015-01-05 at 05:09
Batalov is offline   Reply With Quote
Old 2015-01-05, 07:53   #22
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

So... the general case (2^s3^t) looks doable. I will have to prototype in GP a little bit to see if the math holds. Adapting CycloSv for the general case will be slightly more work that adapting GFNSv for CycloSv, because the code has lot of assumptions regarding power-of-2 stuff.
At any rate, the cyclo tester can't currently handle these, so maybe Serge will be the only customer
axn is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime 95 and internet connection issue Jwb52z Software 10 2013-01-30 01:09
Twin prime search? MooooMoo Twin Prime Search 115 2010-08-29 17:38
Prime Search at School Unregistered Information & Answers 5 2009-10-15 22:44
Prime Search on PS-3? Kosmaj Riesel Prime Search 6 2006-11-21 15:19
Running prime on PC without internet-connection Ferdy Software 3 2006-04-25 08:53

All times are UTC. The time now is 17:20.


Fri Jul 16 17:20:02 UTC 2021 up 49 days, 15:07, 1 user, load averages: 1.62, 1.75, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.