Sorting Large Files
I have an 800MB file that I load into memory line-by-line. I spit a resulting 800MB file line-by-line. I want the output sorted. Anyway to perform the sort without loading the entire file into memory at one time?
This is where we come together to talk about the tech issues that are facing the world, or at least us.
3 Comments:
We're having a similar problem here; the output of an "extract" can be huge, and it doesn't come out sorted. But it needs to be sorted. We were doing it in memory and kept crashing the JVM.
The best approach we've found is to just write the file out and call the Unix sort command on it. It works surprisingly well. Otherwise, the only thing I can think of is to write out the file into discrete chunks, and then do some sort of merge sort on the result. The IO time on this would be pretty high.
I found a link discussion this. You have the beginnings of the right approach.
http://www.webservertalk.com/message1867191.html
It is going to be very difficult to call the unix sort command from windows.
Turns out there's a windows sort.exe command.
Post a Comment
<< Home