![]() |
[quote=scalabis;210032]Hi!
I have found a problem with this solution. If i use on a 8 core machine, the version of factmsieve.py from 19 of March everything runs ok. If i try the "beta test" version dated 20 of March, with this patch and the other on previous post of course, it fails after sieving the first range from 90000 to 100000. If necessary for debugging, i will post the error here after runnig the current job.[/quote] I'll take at look at the error when you can let me have the error output. Brian |
Hi!
Here it is ... I start the factoring of Jeff Gilchrist 100 digit example. First you have on screen 7 blocks for the first 7 "cores" that finish the task first. Then the last block: [quote]Total yield: 138207 611/202 mpqs failures, 13907/3924 vain mpqs milliseconds total: Sieve 121536 Sched 0 medsched 34487 TD 329455 (Init 7574, MPQS 224457) Sieve-Change 137961 TD side 0: init/small/medium/large/search: 1163 10691 1627 2329 28048 sieve: init/small/medium/large/search: 4384 23345 2510 13808 16633 TD side 1: init/small/medium/large/search: 673 9264 1847 3137 263102 sieve: init/small/medium/large/search: 2155 15620 1876 18260 22945 [/quote] And then: [quote]appending spairs.out.T0 to spairs.out appending spairs.out.T1 to spairs.out appending spairs.out.T2 to spairs.out appending spairs.out.T3 to spairs.out appending spairs.out.T4 to spairs.out appending spairs.out.T5 to spairs.out appending spairs.out.T6 to spairs.out appending spairs.out.T7 to spairs.out appending spairs.out to example.dat Found 1187224 relations, 19.8% of the estimated minimum (5985000). compressing spairs.out to spairs.save.gz [/quote] And finally: [quote] Traceback (most recent call last): File "C:\ggnfs\factmsieve.py", line 1985, in <module> run_siever(client_id, num_clients, NUM_THREADS, fact_p, lats_p) File "C:\ggnfs\factmsieve.py", line 1603, in run_siever sieve_lim = make_sieve_jobfile(JOBNAME, fact_p, poly_p, lats_p) File "C:\ggnfs\factmsieve.py", line 1491, in make_sieve_jobfile .format(ql, qp, qh, t_fname)) ValueError: Unknown format code 'd' for object of type 'float' siever terminated C:\ggnfs\example2> [/quote] Best regards, scalabis |
Here is a version that attempts to solve the error reported above and seeks to continue sieving if there is an abnormal thread exit.
Brian |
Hi Brian!
This is a new version, that includes the latest "patches" and corrections, or is it "experimental" like the last one ? I mean, should i use the version form 19 of March or this one ? Thank you, for all your work. Best regards, scalabis |
[quote=scalabis;210112]Hi Brian!
This is a new version, that includes the latest "patches" and corrections, or is it "experimental" like the last one ? I mean, should i use the version form 19 of March or this one ? Thank you, for all your work.[/quote] I now consider this to be a full version although it does contain a big change in how thread jobs are allocated. It might contain errors as a result but it is a much better basis for future work so it is worth adopting from here on. I am sorry to use you guys as bug finders but I assure you it will be worth it in the end. Brian |
[QUOTE=Brian Gladman;210118]I now consider this to be a full version although it does contain a big change in how thread jobs are allocated. It might contain errors as a result but it is a much better basis for future work so it is worth adopting from here on.
I am sorry to use you guys as bug finders but I assure you it will be worth it in the end. Brian[/QUOTE] Okay, out of 76 snfs jobs I've recently ran, the old script stopped on 8 of them because the siever crashed. I finished running all 8 of those overnight with the new script and they all ran to completion. Thank you for the update! The one thing I've noticed that is different is the Q ranges that were done. Since I'm running 3 threads, there are some rounding issues when picking the 3 ranges. Originally it would choose an increment of 16667. With this script it chose 16666. So, for example, one block in the new script would be from 400000 to 416666 416666 to 433332 433332 to 449998 whereas the old script would do 400000 to 416667 416667 to 433334 433334 to 450001 I know it is a very small difference. It might not be worth changing, but I wanted to let you know all the same. Thanks again for the update. |
Thanks for the report on the changed ranges. I used to sieve up to the nearest integer to the calculated endpoint but I recently discovered that the siever doesn't sieve right up to the endpoint, only close to it. I hence now truncate to integer as the extra work of going to nearest makes no substantive difference.
Brian |
factmsieve and/or msieve woes...
(This may also be an msieve question, but I didn't want to double post. If it needs moving, please do so.)
OK, Gurus, Yesterday was disappointing for all three currently (kind of) running machines, but to the point of this particular one: I had to interrupt it during the linear algebra phase.:sad: It was using factmsieve.py 0.51. Here's the console readout on restart: [code] -> ________________________________________________________________ -> | Running factmsieve.py, a Python driver for MSIEVE with GGNFS | -> | sieving support. It is Copyright, 2010, Brian Gladman and is | -> | a conversion of factmsieve.pl that is Copyright, 2004, Chris | -> | Monico. This is version 0.51, dated 8th February 2010. | -> |______________________________________________________________| -> This is client 1 of 1 -> Using 1 threads -> Working with NAME = test -> Selected default factorization parameters for 110 digit level. -> Selected lattice siever: gnfs-lasieve4I13e -> Creating param file to detect parameter changes... -> Running setup ... -> Estimated minimum relations needed: 6.81625e+06 -> cleaning up before a restart -> Running lattice siever ... -> entering sieving loop -> Running matrix solving step ... -> msieve -s test.dat -l test.log -i test.ini -v -nf test.fb -t 1 -nc2 Msieve v. 1.44 Wed Mar 31 10:27:30 2010 random seeds: 21e5e205 80726848 factoring 46004911275165940307243009671749303433164591319941166183746232677096721432698537657838056180803206525560833971 (110 digits) searching for 15-digit factors commencing number field sieve (110-digit input) R0: -1515233566215398305176 R1: 466586281249 A0: 1672008125872783936096675 A1: 1010228519387346987530 A2: 256220197422426093 A3: -5356375710872 A4: -697665346 A5: 5760 skew 20311.62, size 1.953373e-10, alpha -4.980689, combined = 1.118429e-09 commencing linear algebra read 381365 cycles cycles contain 1307088 unique relations read 0 relations error: cannot locate relation 7557799 Return value 255. Terminating... [/code]All subsequent tries produce the same result. (Imagine that!) I have also tried factmsieve.py 0.62 with the existing data and its result is: [code] -> ________________________________________________________________ -> | Running factmsieve.py, a Python driver for MSIEVE with GGNFS | -> | sieving support. It is Copyright, 2010, Brian Gladman and is | -> | a conversion of factmsieve.pl that is Copyright, 2004, Chris | -> | Monico. This is version 0.62, dated 30th March 2010. | -> |______________________________________________________________| -> This is client 1 of 1 -> Using 1 threads -> Working with NAME = test -> Selected default factorization parameters for 110 digit level. -> Selected lattice siever: gnfs-lasieve4I13e -> Creating param file to detect parameter changes... -> Running setup ... -> Estimated minimum relations needed: 6.81625e+06 -> cleaning up before a restart -> Running lattice siever ... -> entering sieving loop -> Running matrix solving step ... -> ./msieve -s ../aliqueit/ggnfs_46004911275165940307243009671749303433164591319941166183746232677096721432698537657838056180803206525560833971/test.dat -l ../aliqueit/ggnfs_46004911275165940307243009671749303433164591319941166183746232677096721432698537657838056180803206525560833971/test.log -i ../aliqueit/ggnfs_46004911275165940307243009671749303433164591319941166183746232677096721432698537657838056180803206525560833971/test.ini -nf ../aliqueit/ggnfs_46004911275165940307243009671749303433164591319941166183746232677096721432698537657838056180803206525560833971/test.fb -t 1 -nc2 Traceback (most recent call last): File "/home/tester/MathStuff/ggnfs/factmsieve.py", line 2014, in <module> die('Return value {0:d}. Terminating...'.format(ret)) NameError: name 'ret' is not defined siever terminated [/code]Is there a simple fix, or should I just start over? Thanks for all the work and any word of wisdom that can be sent my way... |
The last try raises an error in the script - instead of:
[code]if run_msieve('-t {0:d} -nc2'.format(NUM_CPUS)): die('Return value {0:d}. Terminating...'.format(ret)) [/code]the lines around 2014 should be: [code]ret = run_msieve('-t {0:d} -nc2'.format(NUM_CPUS)) if ret: die('Return value {0:d}. Terminating...'.format(ret)) [/code]But it says that msieve is returning an error value of some kind. Brian |
Is there a test.dat file with all the relations in it? A while ago (before the .py script existed) I found that forcing the .pl script to terminate during linear algebra may trigger the deletion of the .dat file; the bug has never been found, although no-one has really tried to find it. It may have been carried over to the new script, but did you interrupt the linear algebra or did the error just occur?
|
[B]Thanks Brian,[/B]
I changed the lines and it now just goes to the 255 error. (I guess that's from msieve?) [B]Thanks 10metreh, also,[/B] My test.dat file has only "N <110 digit number>" in it, so maybe it was deleted. I interrupted the Linear Algebra with CTRL-C. No errors appeared until the restart. I've since renamed the original directory, started a new run with 0.62 (with edit) and copied the test.poly and spairs.save.gz files from the original into the new directory to see what happens. I am wondering if I should extract the spairs.save... This particular issue isn't more than a one to two-day loss, really, but the learning might help me if I get into a similar situation with a longer run. I did recently take my 24/7 machine on a trip and had a factmsieve.py error that lost a week's worth, but wasn't able to spend any time working with it, so I swapped over to the .pl script to let it run that number. We're a few versions later with the .py script now, so I'm back to the .py on that machine to see how it does next time. Thanks again for all the help... |
| All times are UTC. The time now is 22:51. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.