mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   CADO-NFS (https://www.mersenneforum.org/forumdisplay.php?f=170)
-   -   CADO-NFS error (exit code -6) (https://www.mersenneforum.org/showthread.php?t=25842)

EdH 2020-08-21 14:56

I swapped over to a newer commit (Aug 5) and remembered why I wasn't using it - It won't communicate properly with clients:
[code]
ERROR:root:Invalid workunit file: Error: key STDOUT not recognized
[/code]I wonder if this is a conflict between commits and clients have to be closer to the server, In which case, I won't be able to use later commits because I still have some Core2 machines. . .

RedGolpe 2020-08-21 15:12

It seems the good guys at INRIA are already looking into my report. They don't seem to require more information for now.

EdH 2020-08-21 16:20

I'll read the posts when I get my digest version. For now, I'm going to run my September commit and see what shows up later. I'll check the latest git again later on and see if the client communication issue has disappeared.

bur 2021-05-03 10:18

Unfortunately, I ran into that error on a C153 which ran over the weekend:

[CODE]Warning:Command: Process with PID 849626 finished with return code -6 Error:Filtering - Duplicate Removal, removal pass: Program run on server failed
with exit code -6 Error:Filtering - Duplicate Removal, removal pass: Command line was: /home/flori
an/Math/cado-nfs/build/florian-Precision-3640-Tower/filter/dup2 -poly ./workdir/AL30081984/1971-C153/c155.poly -nrels 62519376 -renumber ./workdir/AL30081984/19
71-C153/c155.renumber.gz ./workdir/AL30081984/1971-C153/c155.dup1//0/dup1.0.0000.gz ./workdir/AL30081984/1971-C153/c155.dup1//0/dup1.0.0001.gz > ./workdir/AL300
81984/1971-C153/c155.dup2.slice0.stdout.4 2> ./workdir/AL30081984/1971-C153/c155.dup2.slice0.stderr.4 Error:Filtering - Duplicate Removal, removal pass: Stderr output (last 10 lines
only) follow (stored in file ./workdir/AL30081984/1971-C153/c155.dup2.slice0.std
err.4):
Error:Filtering - Duplicate Removal, removal pass: antebuffer set to /home/
florian/Math/cado-nfs/build/florian-Precision-3640-Tower/utils/antebuffer
Error:Filtering - Duplicate Removal, removal pass: [checking true duplicate
s on sample of 750234 cells]
Error:Filtering - Duplicate Removal, removal pass: Allocated hash table of
75023359 entries (286MiB)
Error:Filtering - Duplicate Removal, removal pass: Constructing the two fil
elists...
Error:Filtering - Duplicate Removal, removal pass: 2 files (2 new and 0 alr
eady renumbered)
Error:Filtering - Duplicate Removal, removal pass: Reading files already re
numbered:
Error:Filtering - Duplicate Removal, removal pass: Reading new files (using
3 auxiliary threads for roots mod p):
Error:Filtering - Duplicate Removal, removal pass: terminate called after t
hrowing an instance of 'renumber_t::corrupted_table'
Error:Filtering - Duplicate Removal, removal pass: what(): Renumber tabl
e is corrupt: cannot find p=0x4a2bfa9, r=0xd70340 on side 1; note: vp=0x4a2bfb6,
vr=0xd70340
Error:Filtering - Duplicate Removal, removal pass:
Traceback (most recent call last):
File "./cado-nfs.py", line 122, in <module>
factors = factorjob.run()
File "./scripts/cadofactor/cadotask.py", line 6131, in run
last_status = task.run()
File "./scripts/cadofactor/cadotask.py", line 3845, in run
raise Exception("Program failed")
Exception: Program failed[/CODE]Restarting with parameters.snaphop.0 didn't help.

It seems I can still use the relations by having msieve continue the work? How would I do that?

According to [url]https://www.mersenneforum.org/showthread.php?t=11948&page=21#227[/url] it seems I can cat the gz files and have msieve process them. But if one of the files is apparently corrupted, how do I find out which one? They all have a size between 3 and 7 MB. I did a zcat | grep and the missing 4a2bfa9 prime is present in some relation, but does that help?

[SIZE="1"]Please don't tell me all is lost...[/SIZE]

bur 2021-05-03 12:13

So I just ignored the cado error message and used the relations with msieve. In case someone has the same problem in the future:

All required files are in workdir/cxxx.upload.
First combine all gz compressed relations into one rels.dat:
[CODE]zcat *.gz > rels.dat[/code]

Then use convert_poly in cado-nfs/build/machine/misc to convert the cnnn.poly file to cnnn.fb:
[CODE]convert_poly -if cado -of msieve < c155.poly > c155.fb[/CODE]

I suggest copying both files to a new directory so nothing gets accidentally modified. Create a cnnn.n file with the number to be factored and then run:
[CODE]../msieve/msieve -i c155.n -s rels.dat -l c155msieve.log -nf c155.fb -t 10 -nc1
../msieve/msieve -i c155.n -s rels.dat -l c155msieve.log -nf c155.fb -t 10 -nc2
../msieve/msieve -i c155.n -s rels.dat -l c155msieve.log -nf c155.fb -t 10 -nc3[/CODE]

Currently I'm at the -nc2 step and it's performing LA with an ETA of 2:20 hours.

For sake of completeness, if not enough relation are found, see [url]https://www.mersenneforum.org/showthread.php?t=11948&page=21#230[/url] for how to make cado-nfs do more sieving. After that it should be possible use msieve as explained above.

charybdis 2021-05-03 13:04

[QUOTE=bur;577517][CODE]../msieve/msieve -i c155.n -s rels.dat -l c155msieve.log -nf c155.fb -t 10 -nc1
../msieve/msieve -i c155.n -s rels.dat -l c155msieve.log -nf c155.fb -t 10 -nc2
../msieve/msieve -i c155.n -s rels.dat -l c155msieve.log -nf c155.fb -t 10 -nc3[/CODE][/QUOTE]

[C]-nc[/C] performs all of [C]-nc1[/C], [C]-nc2[/C], [C]-nc3[/C] in succession.

EdH 2021-05-03 13:09

Good post!

I thought I had posted a "How I ..." on using CADO-NFS for poly/sieving and Msieve for LA, but apparently I've been slacking. This is how I run all my larger jobs. I had originally written my own conversion (for the .fb), before I learned of the provided one.

For some of my scripts, I do a check for *.cyc after the -nc1 step. The scripts use the existence of that file to tell whether filtering succeeded or not. Then the scripts can either call -nc2 or call for more sieving.

Not sure if you know this (you probably do), but if -nc2 is interrupted, use -ncr to continue. If you use -nc2 again, it will start LA from scratch.

bur 2021-05-03 13:32

Thanks, it's basically your linked post with the small addition of how to convert poly to fb. I'm glad this error can easily be worked out, otherwise I'd be quite nervous on longer jobs.

Not sure why cado-nfs chokes on othe rels while msieve has no problem with them.

[QUOTE]This is how I run all my larger jobs.[/QUOTE]Why is that? Is msieve faster on those steps?

[QUOTE]-nc performs all of -nc1, -nc2, -nc3 in succession.[/QUOTE]Yes, and EdH already mentioned that in his post, I still used the seperate steps since I wasn't sure it would work at all with the corruption cado-nfs talked about.

charybdis 2021-05-03 13:47

[QUOTE=bur;577526]Not sure why cado-nfs chokes on othe rels while msieve has no problem with them.[/quote]

I don't think there's anything wrong with the relations, it's a bug in the way that CADO duplicate removal processes them. And if a few relations are bad, then msieve will just ignore them.

[QUOTE]Why is that? Is msieve faster on those steps?[/quote]

The most time-consuming part of the postprocessing, the linear algebra (-nc2), is substantially faster with msieve than with CADO. In addition, CADO uses much more memory than msieve during the filtering stage, so a given machine will be able to run larger numbers with msieve than with CADO.

bur 2021-05-03 14:16

Ah, that's good to know!

Maybe a stupid question, but since msieve is open source why is the implementation of cado-nfs linear algebra not just taken from msieve?

VBCurtis 2021-05-03 14:45

CADO's algorithm features less interprocess communication during the (longest) first stage of matrix solving than msieve, which allows jobs to be split among machines fruitfully. This allows larger jobs to be run on regular hardware.

An ideal solution would be to have an -msieve flag in CADO which runs the matrix using msieve within the cado-nfs.py wrapper.


All times are UTC. The time now is 20:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.