![]() |
Yes, all the relations are kept in the .dat file. The actual size of the file doesn't matter; it's the number of relations that does. When the script runs msieve (or is it you?), this line shows the number of relations:
[code]Sun Mar 21 08:37:09 2010 found 282537224 hash collisions in [COLOR="Red"]578285384[/COLOR] relations[/code] Alternatively, if you are on Unix, then run "wc -l bignum.dat". |
So I did the wc -l thing. I guess that means we are at 1 relation per line?
I got this: 585981151 So when someone says remove the last 500M relations is it as easy as just deleting the last 500 million lines of the file? Seems it just cats the data in and appends it to the end. Then I will have ~85 Million relations and I can move forward? The next step is linear algebra right? Or should I sieve more? My highest q0 and #q1 are 34000000. So that is as far as it got. Is there a way to run a scan of the file to look for duplicates? I can run unix tools to do that but am wondering about a way using the ggnfs binaries. BTW, my job stopped yesterday because I ran out of disk space..... |
You don't _have_ to modify the input file: use the "-r <n>" option of msieve to read only the first n relations :smile:
In addition to that, since you're out of disk space, you may want to cut some relations. |
Well I moved everything to another location so I have disk space again.
Gonna try to work out the msieve options for the next phase. Thanks all of you for your help in this. |
Hi again,
got some strange: Collected with distrubuted sieving: 4 Gb result file [CODE]found 21692982 hash collisions in 58166546 relations added 122 free relations commencing duplicate removal, pass 2 found 35251896 duplicates and 22914772 unique relations [/CODE] 2,7 Gb file (optimized) then i optimised result .dat file by sorting and removing duplicate strings (sort and uniq -u in unix command shell) [CODE]found 3210360 hash collisions in 16234915 relations added 1 free relations commencing duplicate removal, pass 2 found 3271238 duplicates and 12963678 unique relations [/CODE] can it be true from same dataset (original and optimized) or maybe some error in sortinq/filtering. |
What does "optimized" mean here?
|
So i collected all the results, combined (by appending) all in one file.
Result was big file 4.7 Gbytes length. With some duplicate lines inside; Then i did: (optimized) "sort result.dat new.dat" (result file is the same size, 4,7 Gb, but all lines sorted) "uniq -u new.dat final.dat result final.dat file is 2.4 gbytes size - sorted and without duplicate lines. |
[quote=siew;209307]So i collected all the results, combined (by appending) all in one file.
Result was big file 4.7 Gbytes length. With some duplicate lines inside; Then i did: (optimized) "sort result.dat new.dat" (result file is the same size, 4,7 Gb, but all lines sorted) "uniq -u new.dat final.dat result final.dat file is 2.4 gbytes size - sorted and without duplicate lines.[/quote] are those the exact commands? did you mean: "sort result.dat >new.dat" "uniq -u new.dat >final.dat" if so "sort -u result.dat >final.dat" would do both steps in one |
Yes, i will try such command too, but for testing purposes i did in two steps :-)
anyway, i dont uderstand, why msieve calcs [CODE]found 21692982 hash collisions in 58166546 relations[/CODE] and in sorted it calcs [CODE]found 3210360 hash collisions in 16234915 relations[/CODE] what does it means "collision" ? two equal lines? |
[QUOTE=siew;209307]So i collected all the results, combined (by appending) all in one file.
Result was big file 4.7 Gbytes length. With some duplicate lines inside; Then i did: (optimized) "sort result.dat new.dat" (result file is the same size, 4,7 Gb, but all lines sorted) "uniq -u new.dat final.dat result final.dat file is 2.4 gbytes size - sorted and without duplicate lines.[/QUOTE] Duplicate relations are not completely identical!!! The format is such that the special-q used to find the relation is stored in the end of the line. Same relation can be found with different special q's, and hence the lines may be textually different. [CODE]sort -u -k 1 -t ":"[/CODE] might work |
yes, i understand, i see similar lines with different q.
my goal was to remove IDENTICAL lines from text files combined. will try command you posted too, thx |
| All times are UTC. The time now is 22:58. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.