Wednesday, May 09, 2007

Sorting Large Files

I have an 800MB file that I load into memory line-by-line. I spit a resulting 800MB file line-by-line. I want the output sorted. Anyway to perform the sort without loading the entire file into memory at one time?

3 Comments:

At 8:47 AM, Blogger Rob said...

We're having a similar problem here; the output of an "extract" can be huge, and it doesn't come out sorted. But it needs to be sorted. We were doing it in memory and kept crashing the JVM.

The best approach we've found is to just write the file out and call the Unix sort command on it. It works surprisingly well. Otherwise, the only thing I can think of is to write out the file into discrete chunks, and then do some sort of merge sort on the result. The IO time on this would be pretty high.

 
At 10:29 AM, Blogger Ryan said...

I found a link discussion this. You have the beginnings of the right approach.

http://www.webservertalk.com/message1867191.html

It is going to be very difficult to call the unix sort command from windows.

 
At 10:33 AM, Blogger Ryan said...

Turns out there's a windows sort.exe command.

 

Post a Comment

<< Home