com.taleo.integration.client.customstep.util.csv
Class MergeSort

java.lang.Object
  extended by com.taleo.integration.client.customstep.util.csv.MergeSort

public class MergeSort
extends java.lang.Object

Implementation of the Merge Sort algorithm to sort large CSV files.

Based on: Sorting-really-BIG-files

Author:
Romain Guay, Taleo Corporation

Field Summary
static int DEFAULT_CHUNK_SIZE
          The default chunk size (KB).
 
Constructor Summary
MergeSort()
           
 
Method Summary
 int getChunkSize()
          Get the chunk size.
 char getCsvDelimiter()
          Get the CSV delimiter.
 char getCsvQuoteCharacter()
          Get the CSV quote character.
 java.lang.String getEncoding()
          Get the encoding.
 java.lang.String getWorkingFolder()
          Get the working folder.
 boolean isCsvHeaderPresent()
          Get the CSV header present flag.
 boolean isRemoveAllDuplicates()
          Get the remove all duplicates flag.
 boolean isRemoveDuplicates()
          Get the remove duplicates flag.
 void setChunkSize(int chunkSize)
          Set the chunk size.
 void setCsvDelimiter(char csvDelimiter)
          Set the CSV delimiter.
 void setCsvHeaderPresent(boolean csvHeaderPresent)
          Set the CSV header present flag.
 void setCsvQuoteCharacter(char csvQuoteCharacter)
          Set the CSV quote character.
 void setEncoding(java.lang.String encoding)
          Set the encoding.
 void setRemoveAllDuplicates(boolean removeAllDuplicates)
          Set the remove all duplicates flag.
 void setRemoveDuplicates(boolean removeDuplicates)
          Set the remove duplicates flag.
 void setWorkingFolder(java.lang.String workingFolder)
          Set the working folder.
 void sort(java.io.File inFile, java.util.Comparator comparator, java.io.File outFile)
          Sort the inFile according to the given Comparator and write result in the outFile.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_CHUNK_SIZE

public static final int DEFAULT_CHUNK_SIZE
The default chunk size (KB).

See Also:
Constant Field Values
Constructor Detail

MergeSort

public MergeSort()
Method Detail

getChunkSize

public int getChunkSize()
Get the chunk size.

Returns:

setChunkSize

public void setChunkSize(int chunkSize)
Set the chunk size.

Parameters:
chunkSize -

isRemoveDuplicates

public boolean isRemoveDuplicates()
Get the remove duplicates flag.

Returns:

setRemoveDuplicates

public void setRemoveDuplicates(boolean removeDuplicates)
Set the remove duplicates flag.

Parameters:
removeDuplicates -

isRemoveAllDuplicates

public boolean isRemoveAllDuplicates()
Get the remove all duplicates flag.

Returns:

setRemoveAllDuplicates

public void setRemoveAllDuplicates(boolean removeAllDuplicates)
Set the remove all duplicates flag.

Parameters:
removeAllDuplicates -

getCsvDelimiter

public char getCsvDelimiter()
Get the CSV delimiter.

Returns:

setCsvDelimiter

public void setCsvDelimiter(char csvDelimiter)
Set the CSV delimiter.

Parameters:
csvDelimiter -

getCsvQuoteCharacter

public char getCsvQuoteCharacter()
Get the CSV quote character.

Returns:

setCsvQuoteCharacter

public void setCsvQuoteCharacter(char csvQuoteCharacter)
Set the CSV quote character.

Parameters:
csvQuoteCharacter -

isCsvHeaderPresent

public boolean isCsvHeaderPresent()
Get the CSV header present flag.

Returns:

setCsvHeaderPresent

public void setCsvHeaderPresent(boolean csvHeaderPresent)
Set the CSV header present flag.

Parameters:
csvHeaderPresent -

getEncoding

public java.lang.String getEncoding()
Get the encoding.

Returns:

setEncoding

public void setEncoding(java.lang.String encoding)
Set the encoding.

Parameters:
encoding -

getWorkingFolder

public java.lang.String getWorkingFolder()
Get the working folder.

Returns:

setWorkingFolder

public void setWorkingFolder(java.lang.String workingFolder)
Set the working folder.

Parameters:
workingFolder -

sort

public void sort(java.io.File inFile,
                 java.util.Comparator comparator,
                 java.io.File outFile)
          throws java.io.IOException,
                 InvalidCSVFileFormat
Sort the inFile according to the given Comparator and write result in the outFile. If outFile is the same as inFile, it will be overwritten.

The Comparator must be able to compare String arrays (String[]) corresponding to a record in the file.

Parameters:
inFile - The input file.
comparator - The row comparator.
outFile - The output file.
Throws:
java.io.IOException
InvalidCSVFileFormat