mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve 1.44 feedback (https://www.mersenneforum.org/showthread.php?t=13067)

jasonp 2010-03-25 00:56

No real reason; earlier filtering code made it a little difficult to determine when the .lp file was truly no longer needed. I'll add code to delete it.

debrouxl 2010-03-25 07:34

I too ran `msieve -h`, before suggesting to use `msieve -r`, but I misinterpreted "-r <num> stop after [i]finding[/i] <num> relations".
Sorry for creating confusion.

Andi47 2010-03-25 07:46

[QUOTE=debrouxl;209475]I too ran `msieve -h`, before suggesting to use `msieve -r`, but I misinterpreted "-r <num> stop after [i]finding[/i] <num> relations".
Sorry for creating confusion.[/QUOTE]

Maybe this should be changed to "stop [I]sieving[/I] after finding <num> relations"?

jasonp 2010-03-25 12:16

Yes, I clarified that and a few other inconsistencies in the usage.

chris2be8 2010-03-25 19:30

Would it be useful for msieve to save a copy of the input with duplicates removed if there are not enough relations to continue? Then the next run won't have so many duplicates to process.

Of course it might only help if there are a lot more duplicates than usual in the input.

Chris K

sleigher 2010-03-25 20:06

[quote=jasonp;209420]Duplicate removal uses one set of files, that are input to singleton removal. Singleton removal creates the .lp file and uses it to remove singletons, possibly in several passes that each create a new .lp file, and finally produces a .s file. The .s file becomes the input to the merge phase, where a great deal of magic happens and the .cyc file is created, which provides the mapping between columns of the matrix and a list of line numbers of relations from the .dat file that participate in that column.

I'm being purposely vague because a new filtering run always does each of these steps in order and all of these files are recreated from scratch.[/quote]

Is there a way to run filtering on a dat file and not have to output go to a binary file? Or I guess a better question is, what data on each line of the dat file is compared to determine if it is a duplicate.

I can use unix tools to do it, uniq -u file.dat > somefile.dat but not sure of those tools can handle files his big.

One line looks like this and there are many fields. Is it comma or colon delimited? If I know the delimiter I can possibly write an awk program to do it.

-87377953847,34:E2D69,11B3415,11CCB07,1A6B,B,1CF,DDB:10fbdb5,2a4584b,2383,37EA3,D3FE7,2E5FC5,3,5,7,7,D,25,47,7F,DF,E3,2,C43C33

jrk 2010-03-25 20:27

For duplication removal, the only relevant part is the a,b pair before the first colon. The other parts are the relation's factors, which will be the same set in duplicated relations, but they may be arranged in different orders in the dat file.

sleigher 2010-03-25 20:46

So as long as I can match the first pair before the colon I will know I have a dupe?

Thanks for your help.

jrk 2010-03-25 21:01

[QUOTE=sleigher;209525]So as long as I can match the first pair before the colon I will know I have a dupe? [/QUOTE]

Yes

sleigher 2010-03-25 22:25

Thanks

[FONT=Courier New]awk '!arr[$1]++' FS=\:[/FONT] seemed to do the trick nicely.

jasonp 2010-03-26 03:21

See around post 454 in the 'running GGNFS' thread for more details about the duplicate removal. Both fivemack and bdodson have written fairly straightforward C++ programs that produce a new dat file with duplicates removed, at a memory cost much larger than what msieve needs. You may not have enough memory to run those programs if you have 500M relations with 80% duplication.


All times are UTC. The time now is 04:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.