Corrupt data files discovered during post-processing
 2018-08-15, 17:27 #1 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 2×3×29×37 Posts Corrupt data files discovered during post-processing When I run Code:  gunzip -c L3655A.dat.gz | nl | grep -a "000000[^0-9a-f,]" | tee every-millionth it suggests that there is a large block of corrupt data in the middle of L3655A.dat.gz, meaning that there are only 378M or so lines in the file despite the log reporting 455M relations. I'm having a bit of difficulty working out which Q-values are affected by this, because some versions of the sieving client put the special-Q at the end of the list of factors and some sort the factors numerically, so given a line it's not trivial to find the special-Q that it was. My inclination is to write a better special-Q-finder, figure out the corrupt ranges, and sieve them myself with 16e; I'm not sure I'll get that done before I go to Australia in eight days.
 2018-08-15, 17:41 #2 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 3·1,657 Posts Is it possible to cheat on sieving?
 2018-08-15, 21:38 #3 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 643810 Posts At a first analysis, there are no special-Q larger than 966392000 in the output file despite sieving supposedly having been done to 1200M I don't see any suspiciously large gaps before Q=880M or so.
 2018-08-15, 21:46 #4 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 2×3×29×37 Posts With my embarrassed hat on, I also need to point out that I forgot to put an lss: 0 directive in the polynomial file so all the sieving was done on the lower-yielding side - I wondered why the yield was so low. I will fix this, it will take a few tens of thousands of core hours at my side but that seems a reasonable penalty for me to pay.
2018-08-15, 21:51   #5
RichD

Sep 2008
Kansas

344310 Posts

Quote:
 Originally Posted by fivemack I will fix this, it will take a few tens of thousands of core hours at my side but that seems a reasonable penalty for me to pay.
You are only human like the rest of us. I have no problem letting the grid help this number out...

 2018-08-16, 00:20 #6 swellman     Jun 2012 62028 Posts Agreed. Penance is for sins, not simple mistakes. Use the grid. No stones being thrown here!
 2018-08-16, 01:11 #7 VBCurtis     "Curtis" Feb 2005 Riverside, CA 27·3·13 Posts On my home copies of lasieve4e, Qmax is near 1060M. I get errors above that and the sieve exits without trying any Q's. Perhaps someone might test-sieve Q=1100M or 1200M to see if that's even possible on the BOINC-ified sievers?
2018-08-16, 07:04   #8
frmky

Jul 2003
So Cal

42148 Posts

Quote:
 Originally Posted by VBCurtis Perhaps someone might test-sieve Q=1100M or 1200M to see if that's even possible on the BOINC-ified sievers?
It's not for all 14e and 15e versions. IIRC the 64-bit Linux and FreeBSD clients will work, but the Mac and Windows ones will fail.

 2018-08-16, 07:24 #9 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 2·3·29·37 Posts OK: my computers are doing the small gaps and the grid job L3655Ab is taking the strain. I will have to manage some of this on an iPad from the other side of the planet, which might be less efficient than the most efficient protocol but should work.
 2018-08-17, 09:57 #10 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 643810 Posts I have replaced L3655Ab with L3655Ac, which has a more sensible sieving region and, importantly, remembered to put the alim: and rlim: lines in the polynomial file.
 2018-10-13, 10:57 #11 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 144468 Posts Another data quality issue: 5009_73m1 only has 345131720 usable relations despite a claim of 450653495, and so I can't yet build a matrix. I will investigate which regions are missing and put in a new sieving job.

