![]() |
Breaking up files
Thanks to whoever posted how to combine files, but how do you break them up? I've got a 6.5GB file from newpgen, and when I try to sieve it it says "out of memory." I set the max RAM to use to the maximum it would accept, but it still gave me the same error.
I don't know why it is giving me this error though. I looked at how many k's it left for every billion sieved through, and I should only be getting around 350M total, well under the 1B maximum. :huh: Thanks! |
You probably need to increase the size of virtual memory, but with a file that large, you are going to experience performance issues.
|
Yeah, I tried increasing the Max RAM, that didn't help.
Also, I have another file: 1.5GB, and it was cutting k's off at a speed around 20000/second when sieving in 1B segments. Now when I try to sieve it, it's doing around 15. :glare: It sounds like cutting them up (like newpgen does by itself) is the best way to go. How does one do that?? Thanks! |
I wasn't referring to RAM, but to virtual memory. If you are running Windows, then you can increase the amount of virtual memory which might allow you to use the input file you want. The main problem is that with virtual memory you will be swapping physical memory to disk which can significantly slow the process down.
BTW, to edit the file, I would suggest that you write a small program or script to split it. |
I think that in the future, as more cores are added, people will have less and less RAM per core. I think RAM amounts will continue to go up, but I think it will portray a downward trend when divided by number of cores.
For this reason, I think it would be a good idea for sieving programmers to consider the possibility of having sieving files that do double-duty, meaning one file is being acted on by more than one factoring program thread. This could cut down on total RAM usage and mean that people will be more likely to actually run the sieve software on more cores, meaning more throughput. |
[quote]It sounds like cutting them up (like newpgen does by itself) is the best way to go. How does one do that??[/quote]No idea if this is helpful:
[code]SPLIT(1) User Commands SPLIT(1) NAME split - split a file into pieces SYNOPSIS split [OPTION] [INPUT [PREFIX]] DESCRIPTION Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default size is 1000 lines, and default PREFIX is ‘x’. With no INPUT, or when INPUT is -, read standard input. Mandatory arguments to long options are mandatory for short options too. -a, --suffix-length=N use suffixes of length N (default 2) -b, --bytes=SIZE put SIZE bytes per output file -C, --line-bytes=SIZE put at most SIZE bytes of lines per output file -d, --numeric-suffixes use numeric suffixes instead of alphabetic -l, --lines=NUMBER put NUMBER lines per output file --verbose print a diagnostic to standard error just before each output file is opened --help display this help and exit --version output version information and exit SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg. AUTHOR Written by Torbjorn Granlund and Richard M. Stallman. REPORTING BUGS Report bugs to <bug-coreutils@gnu.org>. COPYRIGHT Copyright © 2006 Free Software Foundation, Inc. This is free software. You may redistribute copies of it under the terms of the GNU General Public License <http://www.gnu.org/licenses/gpl.html>. There is NO WARRANTY, to the extent permitted by law. SEE ALSO The full documentation for split is maintained as a Texinfo manual. If the info and split programs are properly installed at your site, the command info split should give you access to the complete manual. split 5.97 January 2007 SPLIT(1)[/code] |
[QUOTE]If you are running Windows, then you can increase the amount of virtual memory which might allow you to use the input file you want.[/QUOTE]
How would I do that? [QUOTE]BTW, to edit the file, I would suggest that you write a small program or script to split it.[/QUOTE] Sorry, I don't have any programming skills :down: @xyzzy: I'm not sure if that's what I'm looking for (ideally a command in dos, like the combining one "copy /B file1+file2 result"), especially seeing it says 'default size 1000 lines, because my file is 6.5GB = literally millions if not half a billion lines. |
[quote]@xyzzy: I'm not sure if that's what I'm looking for (ideally a command in dos, like the combining one "copy /B file1+file2 result"), especially seeing it says 'default size 1000 lines, because my file is 6.5GB = literally millions if not half a billion lines.[/quote]
From above: [code] -l, --lines=NUMBER put NUMBER lines per output file[/code] We're sure somebody has ported "split" for DOS. |
[QUOTE=Xyzzy;118594]From above:
[code] -l, --lines=NUMBER put NUMBER lines per output file[/code] We're sure somebody has ported "split" for DOS.[/QUOTE] Yep. It is part of the MKS toolkit; a pretty good collection of Unix tools that run under DOS. I use them all the time. head, tail, split, sort, uniq, chmod, ps, etc. etc. etc. |
See [url]http://sourceforge.net/project/showfiles.php?group_id=9328[/url] for a collection of unix utilities.
Just put the files in the wbin directory somewhere in your system path and you can use them in any directory from a cmd window |
[quote=Xyzzy;118594]From above:
[code] -l, --lines=NUMBER put NUMBER lines per output file[/code] We're sure somebody has ported "split" for DOS.[/quote] You can find a similar application for the Windows command line (and presumably DOS too) here: [url]http://www.fourmilab.ch/splits/[/url] It's less flexible than the Linux one you listed the man output for, but it does the trick as long as you just need to split up a file into even-kilobyte chunks. However, the one that smh listed is probably a more true port of the Linux one, so it would probably be more flexible. |
| All times are UTC. The time now is 13:08. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.