![]() |
![]() |
#1 |
Jun 2005
lehigh.edu
210 Posts |
![]()
I've been getting np2 hanging, sometimes a day or more with no output at
all --- this for the c187 --- once in 4M, and a second time in 5M. After some time fiddling to see which stage1 report(s) was causing the problem, turns out that one of the -np1's reported an error on the stdout file from the range "searching leading coefficients from 4000001 to 4400000" Code:
error generating or reading NFS polynomials file" without locating anything; empty missing lines at the start of the file ...). I hadn't actually looked at the file, until I found the line below in msieve.dat.m; and then found a second one for the other range "from 5000001 to 5400000": Code:
4007700 24015536 295219382270877590927 801983503937356382677653357465991274 5084748 25083344 280739665478577317867 765040171045381367403062334481129902 The 4M report file hung on more than one line, although many/most of the other lines were OK. The 5M file didn't report any error. Maybe -np2 should check to see that the msieve.dat.m line is properly formatted? Losing a few lines (of 1000s, 10000s) isn't a problem; it's the hanging, and not knowing that something's gone wrong to know to go on to the rest of the valid reports that's the trouble. Unless these stage1 reports indicate a problem in the code? -Bruce Last fiddled with by bdodson on 2010-11-01 at 01:33 Reason: typo |
![]() |
![]() |
![]() |
#2 | ||
May 2008
3×5×73 Posts |
![]() Quote:
Quote:
Code:
/*------------------------------------------------------------------*/ static void stage1_callback_log(mpz_t high_coeff, mpz_t p, mpz_t m, double coeff_bound, void *extra) { FILE *mfile = (FILE *)extra; gmp_fprintf(mfile, "%Zd %Zd %Zd\n", high_coeff, p, m); fflush(mfile); } |
||
![]() |
![]() |
![]() |
#3 |
Tribal Bullet
Oct 2004
357810 Posts |
![]()
Do you have multiple poly search processes writing to the same file? That could cause the problems you're seeing; specifying a different argument to '-s' (if you are not doing so now, or running an msieve binary from different directories) will cause output from different GPUs to go to different output files; otherwise I'd suspect a filesystem problem that's making file writes collide.
|
![]() |
![]() |
![]() |
#4 | |
Jun 2005
lehigh.edu
210 Posts |
![]() Quote:
-np2's also in different directories than the -np1's. I suppose I could check for disk errors by "sort -gk4 msieve.dat.m". Turns out that I missed one of the 5M's Code:
5000040 282950932555811249513 767572566639277931962886857122963054 5000040 282988014873105079573 767572566770261762635560319792058762 5000040 283489882496584278539 767572566780564045348713900359963743 ... 5141820 303759607119684153587 763292097447179536382655903394573001 5141820 303874311272843118707 763292097967737416866712224883219685 5094360 25093256 290781228927362427487 764742170064234883654754500369055132 5084748 25083344 280739665478577317867 765040171045381367403062334481129902 Code:
4000260 264511855886585219909 802595085782688477156868964611859014 ... 4128540 277554032589933354403 797544346807989836260266516658633381 4128540 277625813552862465761 797544346932719528504349552877501128 4007700 24015536 295219382270877590927 801983503937356382677653357465991274 4011384 24005864 265672669931552184923 802370401984067760730681286443632763 4010040 24005540 266427315108384809443 802383381517071143811386449187655371 |
|
![]() |
![]() |
![]() |
#5 |
May 2008
3×5×73 Posts |
![]()
With those corrupted lines, here's where it's getting stuck:
gnfs/poly/stage2/stage2.c in pol_expand(): Code:
mpz_tdiv_q_2exp(c->gmp_help1, gmp_d, (mp_limb_t)1); for (i = 0; i < degree; i++) { while (mpz_cmpabs(c->gmp_a[i], c->gmp_help1) > 0) { if (mpz_sgn(c->gmp_a[i]) < 0) { mpz_add(c->gmp_a[i], c->gmp_a[i], gmp_d); mpz_sub(c->gmp_a[i+1], c->gmp_a[i+1], gmp_p); } else { mpz_sub(c->gmp_a[i], c->gmp_a[i], gmp_d); mpz_add(c->gmp_a[i+1], c->gmp_a[i+1], gmp_p); } } } |
![]() |
![]() |
![]() |
#6 |
Tribal Bullet
Oct 2004
357810 Posts |
![]()
Argh, that while() loop should do two or three iterations at most...
|
![]() |
![]() |
![]() |
#7 | |
Dec 2008
2638 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
Jun 2005
lehigh.edu
210 Posts |
![]() Quote:
Code:
150672 148717065853967295793 1546349397151620 148003488673044184871 154441084180341103999 8407965507924884 Code:
162060 151898518516855821613 1523978967286095124294544200586784087 162060 157845105824749120277 1523978967289425984216744549275830446 150672 148717065853967295793 1546349397151620 148003488673044184871 1544410841803411039998407965507924884 151164 13151032 128683849570608945163 1545611518056124233627904175463785373 (I'm not sure which hung. Both occur after the last stage1 hit that ran with a stage2 report; the one with 4 fields (of 3!) just shortly after the new one with 5 fields (of 3 ...).) |
|
![]() |
![]() |
![]() |
#9 |
"Serge"
Mar 2008
San Diego, Calif.
101000100111102 Posts |
![]()
It is probably not always 9 chars.
A couple strings collide in a random place like XXXXXXXX XXXXXXXXXXXXyyyyyy yyyyyyyyyyyyyyyy yyyyyyyyyyyyyyyyyyyyyyyyy For this last one, the proper blue string seems to be 150672 148717065853967295793 1546349397| 151620 148003488673044184871 1544410841803411039998407965507924884 The red line should have its tail some where as a line with just one field, and could be rescued too probably. Instead of sort -gk4, try awk 'NF!=3' |
![]() |
![]() |
![]() |
#10 |
"Ed Hall"
Dec 2009
Adirondack Mtns
167C16 Posts |
![]()
Is it possible that the fflush(mfile) is happening prior to the full completion of writing a line? Perhaps inserting a brief delay would show. . .
|
![]() |
![]() |
![]() |
#11 |
"Serge"
Mar 2008
San Diego, Calif.
2·3·1,733 Posts |
![]()
Yeah, that's what Random Poster said a long ago. But he also said (I think) a deeper thing - that this is not necessarily this application's fault, but instead either gmp or the system libc fault - that I tend to agree with.
A similar (but not exactly the same) thing happened to Prime95 with printing some invalid factors with repeated digit patterns (which could hint to memory bad alloc, but the margins of this message are to narrow to elaborate), and that defect was also OS-specific. I am tempted to look at Prime95's source and see if he simply wrote around the library bug in disgust. Is libgmp linked statically in this particular binary that emits errors? Last fiddled with by Batalov on 2010-11-09 at 19:49 Reason: narrow, naroow, tpyos... blegh |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Poly search candidates | schickel | Msieve | 32 | 2013-11-05 19:11 |
Poly Search vs Sieving times | EdH | Factoring | 10 | 2013-10-14 20:00 |
Resume msieve poly search job? | Andi47 | Msieve | 1 | 2011-03-28 04:30 |
Poly search for c157 from 4788:2422 | henryzz | Aliquot Sequences | 59 | 2009-07-04 06:27 |
Poly search for c137 from 4788:2408 | axn | Aliquot Sequences | 15 | 2009-05-28 16:50 |