mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   Running GGNFS (https://www.mersenneforum.org/showthread.php?t=9645)

10metreh 2010-03-22 20:28

Yes, all the relations are kept in the .dat file. The actual size of the file doesn't matter; it's the number of relations that does. When the script runs msieve (or is it you?), this line shows the number of relations:

[code]Sun Mar 21 08:37:09 2010 found 282537224 hash collisions in [COLOR="Red"]578285384[/COLOR] relations[/code]

Alternatively, if you are on Unix, then run "wc -l bignum.dat".

sleigher 2010-03-22 21:29

So I did the wc -l thing. I guess that means we are at 1 relation per line?

I got this: 585981151

So when someone says remove the last 500M relations is it as easy as just deleting the last 500 million lines of the file? Seems it just cats the data in and appends it to the end. Then I will have ~85 Million relations and I can move forward?

The next step is linear algebra right? Or should I sieve more? My highest q0 and #q1 are 34000000. So that is as far as it got.

Is there a way to run a scan of the file to look for duplicates? I can run unix tools to do that but am wondering about a way using the ggnfs binaries.

BTW, my job stopped yesterday because I ran out of disk space.....

debrouxl 2010-03-23 07:25

You don't _have_ to modify the input file: use the "-r <n>" option of msieve to read only the first n relations :smile:
In addition to that, since you're out of disk space, you may want to cut some relations.

sleigher 2010-03-23 15:12

Well I moved everything to another location so I have disk space again.

Gonna try to work out the msieve options for the next phase.

Thanks all of you for your help in this.

siew 2010-03-23 19:27

Hi again,

got some strange:

Collected with distrubuted sieving:
4 Gb result file
[CODE]found 21692982 hash collisions in 58166546 relations
added 122 free relations
commencing duplicate removal, pass 2
found 35251896 duplicates and 22914772 unique relations
[/CODE]
2,7 Gb file (optimized)
then i optimised result .dat file by sorting and removing duplicate strings
(sort and uniq -u in unix command shell)

[CODE]found 3210360 hash collisions in 16234915 relations
added 1 free relations
commencing duplicate removal, pass 2
found 3271238 duplicates and 12963678 unique relations
[/CODE]

can it be true from same dataset (original and optimized)
or maybe some error in sortinq/filtering.

10metreh 2010-03-23 19:31

What does "optimized" mean here?

siew 2010-03-23 19:48

So i collected all the results, combined (by appending) all in one file.

Result was big file 4.7 Gbytes length. With some duplicate lines inside;

Then i did: (optimized)
"sort result.dat new.dat" (result file is the same size, 4,7 Gb, but all lines sorted)
"uniq -u new.dat final.dat

result final.dat file is 2.4 gbytes size - sorted and without duplicate lines.

henryzz 2010-03-23 20:01

[quote=siew;209307]So i collected all the results, combined (by appending) all in one file.

Result was big file 4.7 Gbytes length. With some duplicate lines inside;

Then i did: (optimized)
"sort result.dat new.dat" (result file is the same size, 4,7 Gb, but all lines sorted)
"uniq -u new.dat final.dat

result final.dat file is 2.4 gbytes size - sorted and without duplicate lines.[/quote]
are those the exact commands?
did you mean:
"sort result.dat >new.dat"
"uniq -u new.dat >final.dat"
if so "sort -u result.dat >final.dat" would do both steps in one

siew 2010-03-23 20:16

Yes, i will try such command too, but for testing purposes i did in two steps :-)

anyway, i dont uderstand, why msieve calcs

[CODE]found 21692982 hash collisions in 58166546 relations[/CODE]

and in sorted it calcs

[CODE]found 3210360 hash collisions in 16234915 relations[/CODE]

what does it means "collision" ?
two equal lines?

axn 2010-03-23 20:19

[QUOTE=siew;209307]So i collected all the results, combined (by appending) all in one file.

Result was big file 4.7 Gbytes length. With some duplicate lines inside;

Then i did: (optimized)
"sort result.dat new.dat" (result file is the same size, 4,7 Gb, but all lines sorted)
"uniq -u new.dat final.dat

result final.dat file is 2.4 gbytes size - sorted and without duplicate lines.[/QUOTE]
Duplicate relations are not completely identical!!! The format is such that the special-q used to find the relation is stored in the end of the line. Same relation can be found with different special q's, and hence the lines may be textually different.

[CODE]sort -u -k 1 -t ":"[/CODE] might work

siew 2010-03-23 20:28

yes, i understand, i see similar lines with different q.

my goal was to remove IDENTICAL lines from text files combined.

will try command you posted too, thx


All times are UTC. The time now is 22:58.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.