mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Information & Answers (https://www.mersenneforum.org/forumdisplay.php?f=38)
-   -   Breaking up files (https://www.mersenneforum.org/showthread.php?t=9599)

roger 2007-11-15 23:39

Breaking up files
 
Thanks to whoever posted how to combine files, but how do you break them up? I've got a 6.5GB file from newpgen, and when I try to sieve it it says "out of memory." I set the max RAM to use to the maximum it would accept, but it still gave me the same error.

I don't know why it is giving me this error though. I looked at how many k's it left for every billion sieved through, and I should only be getting around 350M total, well under the 1B maximum. :huh:

Thanks!

rogue 2007-11-16 00:27

You probably need to increase the size of virtual memory, but with a file that large, you are going to experience performance issues.

roger 2007-11-16 01:20

Yeah, I tried increasing the Max RAM, that didn't help.
Also, I have another file: 1.5GB, and it was cutting k's off at a speed around 20000/second when sieving in 1B segments. Now when I try to sieve it, it's doing around 15. :glare:

It sounds like cutting them up (like newpgen does by itself) is the best way to go. How does one do that??

Thanks!

rogue 2007-11-16 04:03

I wasn't referring to RAM, but to virtual memory. If you are running Windows, then you can increase the amount of virtual memory which might allow you to use the input file you want. The main problem is that with virtual memory you will be swapping physical memory to disk which can significantly slow the process down.

BTW, to edit the file, I would suggest that you write a small program or script to split it.

jasong 2007-11-16 04:16

I think that in the future, as more cores are added, people will have less and less RAM per core. I think RAM amounts will continue to go up, but I think it will portray a downward trend when divided by number of cores.

For this reason, I think it would be a good idea for sieving programmers to consider the possibility of having sieving files that do double-duty, meaning one file is being acted on by more than one factoring program thread. This could cut down on total RAM usage and mean that people will be more likely to actually run the sieve software on more cores, meaning more throughput.

Xyzzy 2007-11-16 04:24

[quote]It sounds like cutting them up (like newpgen does by itself) is the best way to go. How does one do that??[/quote]No idea if this is helpful:

[code]SPLIT(1) User Commands SPLIT(1)

NAME
split - split a file into pieces

SYNOPSIS
split [OPTION] [INPUT [PREFIX]]

DESCRIPTION
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is ‘x’. With no INPUT, or when
INPUT is -, read standard input.

Mandatory arguments to long options are mandatory for short options
too.

-a, --suffix-length=N
use suffixes of length N (default 2)

-b, --bytes=SIZE
put SIZE bytes per output file

-C, --line-bytes=SIZE
put at most SIZE bytes of lines per output file

-d, --numeric-suffixes
use numeric suffixes instead of alphabetic

-l, --lines=NUMBER
put NUMBER lines per output file

--verbose
print a diagnostic to standard error just before each output
file is opened

--help display this help and exit

--version
output version information and exit

SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.

AUTHOR
Written by Torbjorn Granlund and Richard M. Stallman.

REPORTING BUGS
Report bugs to <bug-coreutils@gnu.org>.

COPYRIGHT
Copyright © 2006 Free Software Foundation, Inc.
This is free software. You may redistribute copies of it under the
terms of the GNU General Public License
<http://www.gnu.org/licenses/gpl.html>. There is NO WARRANTY, to the
extent permitted by law.

SEE ALSO
The full documentation for split is maintained as a Texinfo manual. If
the info and split programs are properly installed at your site, the
command

info split

should give you access to the complete manual.

split 5.97 January 2007 SPLIT(1)[/code]

roger 2007-11-16 06:08

[QUOTE]If you are running Windows, then you can increase the amount of virtual memory which might allow you to use the input file you want.[/QUOTE]

How would I do that?

[QUOTE]BTW, to edit the file, I would suggest that you write a small program or script to split it.[/QUOTE]

Sorry, I don't have any programming skills :down:

@xyzzy: I'm not sure if that's what I'm looking for (ideally a command in dos, like the combining one "copy /B file1+file2 result"), especially seeing it says 'default size 1000 lines, because my file is 6.5GB = literally millions if not half a billion lines.

Xyzzy 2007-11-16 11:25

[quote]@xyzzy: I'm not sure if that's what I'm looking for (ideally a command in dos, like the combining one "copy /B file1+file2 result"), especially seeing it says 'default size 1000 lines, because my file is 6.5GB = literally millions if not half a billion lines.[/quote]
From above:

[code] -l, --lines=NUMBER
put NUMBER lines per output file[/code]
We're sure somebody has ported "split" for DOS.

R.D. Silverman 2007-11-16 14:17

[QUOTE=Xyzzy;118594]From above:

[code] -l, --lines=NUMBER
put NUMBER lines per output file[/code]
We're sure somebody has ported "split" for DOS.[/QUOTE]

Yep. It is part of the MKS toolkit; a pretty good collection of
Unix tools that run under DOS. I use them all the time. head, tail,
split, sort, uniq, chmod, ps, etc. etc. etc.

smh 2007-11-16 16:00

See [url]http://sourceforge.net/project/showfiles.php?group_id=9328[/url] for a collection of unix utilities.

Just put the files in the wbin directory somewhere in your system path and you can use them in any directory from a cmd window

mdettweiler 2007-11-16 16:33

[quote=Xyzzy;118594]From above:

[code] -l, --lines=NUMBER
put NUMBER lines per output file[/code]
We're sure somebody has ported "split" for DOS.[/quote]
You can find a similar application for the Windows command line (and presumably DOS too) here:
[url]http://www.fourmilab.ch/splits/[/url]
It's less flexible than the Linux one you listed the man output for, but it does the trick as long as you just need to split up a file into even-kilobyte chunks.

However, the one that smh listed is probably a more true port of the Linux one, so it would probably be more flexible.


All times are UTC. The time now is 13:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.