View Single Post
Old 2003-11-06, 12:21   #3
xilman's Avatar
May 2003
Down not across

299616 Posts

Originally posted by ET_
Just out of curiosity, may I ask how can you manage data of that size? I mean, what software and resources are needed to handle that huge mass of data?

The data was brought from Austin to Cambridge by sftp over the open ethernet. Richard has ADSL and 24/7 connectivity; Microsoft Research has adequate capacity ;-) It was transferred in several over-night sessions.

My workstation is a fairly ordinary 2.5GHz box with a 40G disk and a rather larger than average 1G of RAM. It runs XP Pro and so has the NTFS filesystem which supports files much larger than the ones needed here. The 40G disk is a bit limiting and I'll be installing a commodity 160G scratch disk next week. For now, I've been keeping stuff compressed when not needed and I've been dumping files onto other machines for temporary storage.

One stage of the postprocessing needs a large amount of memory, so I used a cluster node. Each node has 2G of RAM and the filter run used 1900M of memory, so it only just fitted.

Summary: the only essentials are a filesystem that supports large files, a few dozen gigabytes of disk and a decent amount of RAM.
Oh, and a degree of patience 8-)

The large memory filter run could have been avoided, but at the cost of greater inefficiency later on, so the entire post-processing up to but excluding the linear algebra could have been performed on a commodity PC upgraded to 1G RAM.

xilman is offline