20071228, 19:56  #1 
(loop (#_fork))
Feb 2006
Cambridge, England
2×7×461 Posts 
Counting ideals during singleton removal
I thought the singletonremoval process involved repeated passes which found ideals that only occurred in one relation, and then removed those relations and the ideals in which they occurred from the set of relations.
This would mean that, after a singletonremoval pass, the number of relations and the number of distinct prime ideals would both decrease. But, running a test case (the present state of play for 6^383+1), I get Code:
found 22130393 hash collisions in 98352468 relations commencing duplicate removal, pass 2 found 8693969 duplicates and 89658499 unique relations memory use: 504.8 MB ignoring smallest 5036341 rational and 5036568 algebraic ideals filtering ideals above 86691876 need 17123945 more relations than ideals commencing singleton removal, pass 1 relations with 0 large ideals: 968390 relations with 1 large ideals: 6903778 relations with 2 large ideals: 20314406 relations with 3 large ideals: 30536388 relations with 4 large ideals: 23471476 relations with 5 large ideals: 7464061 relations with 6 large ideals: 0 relations with 7+ large ideals: 0 89658499 relations and about 76965979 large ideals commencing singleton removal, pass 2 found 26602951 singletons current dataset: 63055548 relations and about 46701442 large ideals commencing singleton removal, pass 3 relations with 0 large ideals: 968390 relations with 1 large ideals: 6160702 relations with 2 large ideals: 16120073 relations with 3 large ideals: 21418424 relations with 4 large ideals: 14416423 relations with 5 large ideals: 3971536 relations with 6 large ideals: 0 relations with 7+ large ideals: 0 63055548 relations and about 65568001 large ideals commencing singleton removal, pass 4 found 27251901 singletons current dataset: 35803647 relations and about 32163994 large ideals commencing singleton removal, pass 5 found 8896096 singletons current dataset: 26907551 relations and about 22272846 large ideals commencing singleton removal, pass 6 found 3574443 singletons current dataset: 23333108 relations and about 18498547 large ideals commencing singleton removal, pass 7 found 1609082 singletons current dataset: 21724026 relations and about 16844081 large ideals commencing singleton removal, pass 8 found 776955 singletons current dataset: 20947071 relations and about 16056136 large ideals commencing singleton removal, pass 9 found 388921 singletons current dataset: 20558150 relations and about 15664346 large ideals commencing singleton removal, final pass memory use: 674.1 MB commencing inmemory singleton removal begin with 20558150 relations and 19331039 unique ideals reduce to 7745572 relations and 5116329 ideals in 25 passes Last fiddled with by fivemack on 20071228 at 20:01 
20071229, 04:27  #2  
Tribal Bullet
Oct 2004
3^{2}·5·79 Posts 
Quote:
For extralarge size problems this fixedsize hashtable can become congested. This is bad, because too many ideals collide in the hashtable, which means not enough singletons will be discovered. The filtering code detects when this is likely, and then changes the hash function and refills the table. After one singleton removal pass there are many fewer ideals to add to the table, and the new hash function shuffles the mapping of ideals to bins, so that more singletons can be deleted. The idea is to remove as many singletons as possible before switching to a unique mapping between ideals and bins; this occurs in the pass labeled 'final pass'; when complete the list of relations and large ideals is resident only in memory, and singleton removal proceeds as expected. All of this code is in gnfs/filter/singleton.c 

20071229, 12:26  #3 
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2·7^{3}·17 Posts 
I've found an even more approximate solution to be worth doing and very easy to code. Remove all the singleton primes (not ideals) first. Assuming that one polynomial is linear, this removes all the singleton ideals on that side. For the other polynomial, removing all the singleton primes usually reduces the data size substantiantially, thereby making the more sophisticated algorithm run faster in less space.
Paul 
20071229, 13:11  #4 
Jun 2003
The Texas Hill Country
10001000001_{2} Posts 
And, to carry Paul's suggestion even farther, I have been doing that (with line sieving relations only) even before eliminating the few duplicates that always manage to creep into a data set that is collected from distributed sources.
With lattice sieving, you will to first eliminate the duplicates because the duplication rate is so high. 
20071229, 20:53  #5  
Tribal Bullet
Oct 2004
3^{2}·5·79 Posts 
Quote:
I did find that the difference between the true number of ideals and the number of ideals as estimated by the hashtable was much larger when using primes only (which isn't surprising). That would mean the size of the collisionless hashtable would be unnecessarily large, unless several passes were added that used an approximate hashtable and the ideals. Another possibility is to use the Judy library, which implements a compressing radix tree associative data structure. Judy can map primes to unique counts in very little memory, and ideals to unique counts using much more memory. Unfortunately, the latter is where the memory bottleneck is anyway, so I figured it wasn't worth the added external library dependency. Examining the log files that everyone is posting from big jobs, it seems that at the switchover point between the approximate and exact methods, the number of ideals goes up by 1020%, and this still represents hundreds of megabytes of wasted memory during the exact singleton removal. It would be nice to get rid of that somehow, I think right now this pass takes up about as much memory as the merge phase. 

20071230, 11:17  #6  
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2·7^{3}·17 Posts 
Quote:
Here is my qndsr.c program. It's small, inelegant and only does one pass per run of the program. Cleaning it up and making it do a better job is left as an exercise. Code:
/* Quick & dirty singleton remover. PC Leyland 20020904 */ /* Read relations from stdin and write a set of files which have firstlevel singletons removed. Subsequent runs of the progaram are needed if singletons generated in the preceeding pass are to be removed. The output files are limited to (slightly greater than) 1e9 bytes to avoid file size limits on common operating systems. Uses temporary files not exceeding 1Gb each and total size ~< size of input. There must be room for these, together with the output file(s). This is *not* an elegantly written program. Anyone who wants to make it nicer and/or more robust is welcome to work on it. The input file parser, in particular, is rather brittle. strtol() from stdlib.h could probably be used for cleanliness but may be a bit slower because my code doesn't do error checking or work in bases other than 10. */ #include <stdio.h> static unsigned char ratprimes[1<<27], algprimes[1<<27]; /* Enough for primes to 1G */ main () { int linesize, tfileno=0, tfileoutchars=0, ofileno=0, ofileoutchars=0; int nratp, nalgp, p; int byte, crumb, maskhi, masklo; int i, tf, relcount = 0; char linebuf[10240], tfilename[24], ofilename[24], *s; FILE *tfile, *ofile; int getline (char *buf, FILE *f); sprintf (tfilename, "__TEMP__%03d", tfileno); tfile = fopen (tfilename, "w"); if (tfile == (FILE *) NULL) { fprintf (stderr, "Can't open temporary file %s for writing\n"); exit (1); } while (linesize = getline (linebuf, stdin)) { if (linebuf[0] != '0') continue; /* A relation? */ nratp = linebuf[2]  '0'; if (nratp < 0  nratp > 9) { fprintf (stderr, "Strange # primes in 1st poly, %d\t %s", nratp, linebuf); continue; } nalgp = linebuf[3]  '0'; if (nalgp < 0  nalgp > 9) { fprintf (stderr, "Strange # primes in 2nd poly, %d\t %s", nalgp, linebuf); continue; } relcount++; s = linebuf + 5; while (*s++ != ' '); /* Empty, skip a */ while (*s++ != ' '); /* Empty, skip b */ for (i = 1; i <= nratp; i++) { p = 0; while (isdigit (*s)) { p = p*10 + (*s++  '0'); } s++; /* Skip space */ p >>= 1; byte = p >> 2; crumb = (p & 3) * 2; maskhi = 2 << crumb; masklo = 1 << crumb; if (ratprimes[byte] & masklo) /* Seen at least once */ ratprimes[byte] = maskhi; /* Set as seen at least twice */ else ratprimes[byte] = masklo; /* Set as seen at least once */ } for (i = 1; i <= nalgp; i++) { p = 0; while (isdigit (*s)) { p = p*10 + (*s++  '0'); } s++; /* Skip space */ p >>= 1; byte = p >> 2; crumb = (p & 3) * 2; maskhi = 2 << crumb; masklo = 1 << crumb; if (algprimes[byte] & masklo) /* Seen at least once */ algprimes[byte] = maskhi; /* Set as seen at least twice */ else algprimes[byte] = masklo; /* Set as seen at least once */ } tfileoutchars += fprintf (tfile, "%s", linebuf); if (tfileoutchars > 1000000000) { fclose (tfile); sprintf (tfilename, "__TEMP__%03d", ++tfileno); tfile = fopen (tfilename, "w"); if (tfile == (FILE *) NULL) { fprintf (stderr, "Can't open temporary file %s for writing\n"); exit (1); } tfileoutchars = 0; } } printf ("Read %d (not necessarily distinct) relations\n", relcount); /* Ok, now have a bit map which tells which primes on each side have been seen at least twice, and we have one or more files containing clean relations (no comments, etc). Read them in and write them out only if all primes in a relation are seen at least twice. */ relcount = 0; sprintf (ofilename, "OUT__%03d", ofileno); ofile = fopen (ofilename, "w"); if (ofile == (FILE *) NULL) { fprintf (stderr, "Can't open output file %s\n"); exit (1); } for (tf = 0; tf <= tfileno; tf++) { fclose (tfile); sprintf (tfilename, "__TEMP__%03d", tf); tfile = fopen (tfilename, "r"); if (tfile == (FILE *) NULL) { fprintf (stderr, "Can't open temporary file %s for reading \n"); exit (1); } while (linesize = getline (linebuf, tfile)) { nratp = linebuf[2]  '0'; nalgp = linebuf[3]  '0'; s = linebuf + 5; while (*s++ != ' '); /* Empty, skip a */ while (*s++ != ' '); /* Empty, skip b */ for (i = 1; i <= nratp; i++) { p = 0; while (isdigit (*s)) { p = p*10 + (*s++  '0'); } s++; /* Skip space */ p >>= 1; byte = p >> 2; crumb = (p & 3) * 2; maskhi = 2 << crumb; if (!(ratprimes[byte] & maskhi)) /* Not seen at least twice */ goto nextrel; } for (i = 1; i <= nalgp; i++) { p = 0; while (isdigit (*s)) { p = p*10 + (*s++  '0'); } s++; /* Skip space */ p >>= 1; byte = p >> 2; crumb = (p & 3) * 2; maskhi = 2 << crumb; if (!(algprimes[byte] & maskhi)) /* Not seen at least twice */ goto nextrel; } /* Seen them all at least twice, so write it out. */ ofileoutchars += fprintf (ofile, "%s", linebuf); relcount++; if (ofileoutchars > 1000000000) { fclose (ofile); sprintf (ofilename, "OUT__%03d", ++ofileno); ofile = fopen (ofilename, "w"); if (ofile == (FILE *) NULL) { fprintf (stderr, "Can't open output file %s for writing\n"); exit (1); } ofileoutchars = 0; } nextrel: ; /* Empty */ } } fclose (tfile); fclose (ofile); printf ("Wrote %d (not necessarily distinct) relations\n", relcount); } int getline (char *s, FILE *infile) { int c, i = 0; while ((c = getc (infile)) != EOF && c != '\n') if (c != '\r') s[i++] = c; /* Strip out CR */ if (c == '\n') s[i++] = c; s[i] = '\0'; return (i); } P.S. I forgot to say this is for CWIformat relations. Converting to other formats is also left as an exercise. Last fiddled with by xilman on 20071230 at 11:22 Reason: Add P.S. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Basic Number Theory 21: ideals and homomorphisms  Nick  Number Theory Discussion Group  5  20170604 12:36 
NFS: Sieving the norm over ideals vs. integers  paul0  Factoring  2  20150119 11:52 
Automatic removal of assignments stuck in stage two  TObject  PrimeNet  0  20120923 17:59 
Theorems about ideals  fivemack  Abstract Algebra & Algebraic Number Theory  10  20120122 11:01 
clean removal of a machine from GIMPS  blackguard  PrimeNet  4  20050215 16:08 