mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   CADO-NFS (https://www.mersenneforum.org/forumdisplay.php?f=170)
-   -   Re-generating the dups files (https://www.mersenneforum.org/showthread.php?t=27025)

wombatman 2021-07-25 13:39

Re-generating the dups files
 
I inadvertently deleted the folder with the dups files, and I can't figure out how to generate them again (or get CADO to do it automatically). This causes an error after CADO has collected enough relations:

OSError: output file /tmp/cado.v9x5al17/c212_snfs215.dup1//1/dup1.0.0000.gz does not exist

How can I re-create this file or set of files?

charybdis 2021-07-25 14:31

Re-create the directory [C]/tmp/cado.v9x5al17/c212_snfs215.dup1[/C], and make subdirectories named [C]0[/C] and [C]1[/C] inside it.
Re-run all of the [C]dup1[/C] and [C]dup2[/C] commands in the [C]c212_snfs215.cmd[/C] file in order. If this runs smoothly then the [C]0[/C] and [C]1[/C] directories should contain some large files with names like [C]dup1.0.0000.gz[/C].
Now restart CADO.

wombatman 2021-07-25 16:34

The first step:

[CODE]/home/wombat/cado-nfs/build/Ben-PC/filter/dup1 -prefix dup1.2 -out /tmp/cado.v9x5al17/c212_snfs215.dup1/ -n 1 -filelist /tmp/cado.v9x5al17/c212_snfs215.dup1.filelist.3 > /tmp/cado.v9x5al17/c212_snfs215.dup1.stdout.3 2> /tmp/cado.v9x5al17/c212_snfs215.dup1.stderr.3[/CODE]

works fine.

This command:

[CODE]/home/wombat/cado-nfs/build/Ben-PC/filter/dup2 -poly /tmp/cado.v9x5al17/c212_snfs215.poly -nrels 29153560 -renumber /tmp/cado.v9x5al17/c212_snfs215.renumber.gz /tmp/cado.v9x5al17/c212_snfs215.dup1//0/dup1.0.0000.gz /tmp/cado.v9x5al17/c212_snfs215.dup1//0/dup1.1.0000.gz /tmp/cado.v9x5al17/c212_snfs215.dup1//0/dup1.2.0000.gz > /tmp/cado.v9x5al17/c212_snfs215.dup2.slice0.stdout.3 2> /tmp/cado.v9x5al17/c212_snfs215.dup2.slice0.stderr.3[/CODE]

is aborting.

charybdis 2021-07-25 16:46

[QUOTE=wombatman;583959]The first step:

[CODE]/home/wombat/cado-nfs/build/Ben-PC/filter/dup1 -prefix dup1.2 -out /tmp/cado.v9x5al17/c212_snfs215.dup1/ -n 1 -filelist /tmp/cado.v9x5al17/c212_snfs215.dup1.filelist.3 > /tmp/cado.v9x5al17/c212_snfs215.dup1.stdout.3 2> /tmp/cado.v9x5al17/c212_snfs215.dup1.stderr.3[/CODE]

works fine.
[/QUOTE]

This cannot have been the first dup1 command: the [C].3[/C] endings on the files mean that it was the third. Try emptying the dup1 folder and trying again from the start; make sure you look back far enough in the [C].cmd[/C] file to find all the dup1 and dup2 commands.

wombatman 2021-07-25 17:09

You're right. It's not, but even running the first instances, it does the same thing. [STRIKE]I notice that in the command line is a parameter "-nrels 24996721". Does this need to be updated to reference a split of the current number of relations (~120M)?[/STRIKE] Didn't work.

Edit: I do notice that there's no dup1.filelist as expected from the first command either and there's nothing in the stdout.1 output file either.

charybdis 2021-07-25 18:09

So you don't have a file [C]c212_snfs215.dup1.filelist.1[/C]? That's odd, it obviously existed when the command ran originally, and it doesn't usually get deleted during the run.

If you definitely have lost the filelist and need to recreate it, use something like
[code]grep -o "in .*gz" c212_snfs215.log | cut -c 5- >> c212_snfs215.dup1.filelist[/code]
and then cut it into chunks: everything up to the first appearance of "Reached target of xxxxxxxx relations" in the logfile is [C]filelist.1[/C], everything from there to the next "Reached target" (NOT including the contents of [C]filelist.1[/C]!) is [C]filelist.2[/C], and so on.

The -nrels flag in the dup2 command line refers to the number of relations in one slice, i.e. roughly half the number of relations you had at the time the command was run.

wombatman 2021-07-25 19:35

Thanks to your help, I'm getting closer. After rebuilding the filelist, I was able to run the dup2 commands. When I try and start the primary python script, it fails again because it's looking for a file that doesn't exist (dup1.1.0000.gz). I checked both the dup1.filelist.1 file and the purge.filelist.1 files. The purge filelist had files that no longer exist in them, so I pared it down to only the two files (plus the freerels.gz) I had created: dup1.0.0000.gz in folders 0 and 1.

Is there another filelist that the main script is pulling from? I've search the /tmp/cado* directory and don't see anything else.

The grep command is also not collecting all of the completed workunits, missing about 20M q worth. I've confirmed via spot check that the missing workunits appear in the log file with the same format as the collected workunits.

charybdis 2021-07-25 19:50

[QUOTE=wombatman;583968]Thanks to your help, I'm getting closer. After rebuilding the filelist, I was able to run the dup2 commands. When I try and start the primary python script, it fails again because it's looking for a file that doesn't exist (dup1.1.0000.gz). I checked both the dup1.filelist.1 file and the purge.filelist.1 files. The purge filelist had files that no longer exist in them, so I pared it down to only the two files (plus the freerels.gz) I had created: dup1.0.0000.gz in folders 0 and 1.[/quote]

dup1.1.0000.gz should be created by the second dup1 run - confusingly, CADO isn't consistent on whether to start its indexing at 0 or 1. Assuming you did run dup1 a second time, what are the contents of dup1.stdout.2 and dup1.stderr.2?

[quote]The grep command is also not collecting all of the completed workunits, missing about 20M q worth. I've confirmed via spot check that the missing workunits appear in the log file with the same format as the collected workunits.[/QUOTE]

There's a chance I've been an idiot and somehow got the command wrong, so if anyone here spots a mistake, please don't hesitate to correct me. Otherwise you could attach the file and I'll see if I can get grep to work.

wombatman 2021-07-25 20:03

There's a command line the .cmd file for the 2nd dup1 run, but I would need to generate a second filelist to recreate it. At this point, I would ideally just generate the dup1.0 files and go from there.

For the grep command, here's an example line that's not being picked up by it:
[CODE]PID32137 2021-07-25 14:04:33,789 Info:Lattice Sieving: Found 24569 relations in '/tmp/cado.v9x5al17/c212_snfs215.upload/c212_snfs215.56430000-56440000.xaxayqid.gz', total is now 121252276/130000000[/CODE]

charybdis 2021-07-25 20:17

[QUOTE=wombatman;583971]There's a command line the .cmd file for the 2nd dup1 run, but I would need to generate a second filelist to recreate it. At this point, I would ideally just generate the dup1.0 files and go from there.[/quote]

That's what I was trying to explain how to do here:

[QUOTE=charybdis;583966]and then cut it into chunks: everything up to the first appearance of "Reached target of xxxxxxxx relations" in the logfile is [C]filelist.1[/C], everything from there to the next "Reached target" (NOT including the contents of [C]filelist.1[/C]!) is [C]filelist.2[/C], and so on.[/QUOTE]

Of course you'll need grep to work properly in order to do this. I inserted your line into one of my old logfiles and grep found it successfully, so I'm mystified.

wombatman 2021-07-25 20:21

1 Attachment(s)
Here's the output of the grep command as you gave it (without sending it to cut).


All times are UTC. The time now is 17:31.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.