org.apache.hadoop.mrunit
Class TestDriver<K1,V1,K2,V2,T extends TestDriver<K1,V1,K2,V2,T>>

java.lang.Object
  extended by org.apache.hadoop.mrunit.TestDriver<K1,V1,K2,V2,T>
Direct Known Subclasses:
MapDriverBase, MapReduceDriverBase, PipelineMapReduceDriver, ReduceDriverBase

public abstract class TestDriver<K1,V1,K2,V2,T extends TestDriver<K1,V1,K2,V2,T>>
extends Object


Field Summary
protected  org.apache.hadoop.mrunit.internal.counters.CounterWrapper counterWrapper
           
protected  List<Pair<Enum<?>,Long>> expectedEnumCounters
           
protected  List<Pair<K2,V2>> expectedOutputs
           
protected  List<Pair<Pair<String,String>,Long>> expectedStringCounters
           
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
TestDriver()
           
 
Method Summary
 void addAllOutput(List<Pair<K2,V2>> outputRecords)
          Adds output (k, v)* pairs we expect
 void addCacheArchive(String path)
          Adds an archive to be put on the distributed cache.
 void addCacheArchive(URI uri)
          Adds an archive to be put on the distributed cache.
 void addCacheFile(String path)
          Adds a file to be put on the distributed cache.
 void addCacheFile(URI uri)
          Adds a file to be put on the distributed cache.
 void addOutput(K2 key, V2 val)
          Adds a (k, v) pair we expect as output
 void addOutput(Pair<K2,V2> outputRecord)
          Adds an output (k, v) pair we expect
 void addOutputFromString(String output)
          Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables
protected  void cleanupDistributedCache()
          Cleans up the distributed cache test by deleting the temporary directory and any extracted cache archives contained within
protected
<E> E
copy(E object)
           
protected
<S,E> Pair<S,E>
copyPair(S first, E second)
           
protected static void formatValueList(List<?> values, StringBuilder sb)
           
 org.apache.hadoop.conf.Configuration getConfiguration()
           
 List<Pair<Enum<?>,Long>> getExpectedEnumCounters()
           
 List<Pair<K2,V2>> getExpectedOutputs()
           
 List<Pair<Pair<String,String>,Long>> getExpectedStringCounters()
           
 org.apache.hadoop.conf.Configuration getOutputSerializationConfiguration()
          Get the Configuration to use when copying output for use with run* methods or for the InputFormat when reading output back in when setting a real OutputFormat.
protected  void initDistributedCache()
          Initialises the test distributed cache if required.
protected static List<org.apache.hadoop.io.Text> parseCommaDelimitedList(String commaDelimList)
          Split "val,val,val,val..." into a List of Text(val) objects.
static Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> parseTabbedPair(String tabSeparatedPair)
          Split "key \t val" into Pair(Text(key), Text(val))
protected  void printPreTestDebugLog()
          Overridable hook for printing pre-test debug information
 void resetExpectedCounters()
          Clears the list of expected counters from this driver
 void resetOutput()
          Clears the list of outputs expected from this driver
abstract  List<Pair<K2,V2>> run()
          Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)
 List<Pair<K2,V2>> run(boolean validateCounters)
          Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this).
 void runTest()
          Runs the test and validates the results
 void runTest(boolean orderMatters)
          Runs the test and validates the results
 void setCacheArchives(URI[] archives)
          Set the list of archives to put on the distributed cache
 void setCacheFiles(URI[] files)
          Set the list of files to put on the distributed cache
 void setConfiguration(org.apache.hadoop.conf.Configuration configuration)
          Deprecated. Use getConfiguration() to set configuration items as opposed to overriding the entire configuration object as it's used internally.
 void setOutputSerializationConfiguration(org.apache.hadoop.conf.Configuration configuration)
          Set the Configuration to use when copying output for use with run* methods or for the InputFormat when reading output back in when setting a real OutputFormat.
protected  T thisAsTestDriver()
           
protected  void validate(org.apache.hadoop.mrunit.internal.counters.CounterWrapper counterWrapper)
          Check counters.
protected  void validate(List<Pair<K2,V2>> outputs, boolean orderMatters)
          check the outputs against the expected inputs in record
 T withAllOutput(List<Pair<K2,V2>> outputRecords)
          Functions like addAllOutput() but returns self for fluent programming style
 T withCacheArchive(String archive)
          Adds an archive to be put on the distributed cache.
 T withCacheArchive(URI archive)
          Adds an archive to be put on the distributed cache.
 T withCacheFile(String file)
          Adds a file to be put on the distributed cache.
 T withCacheFile(URI file)
          Adds a file to be put on the distributed cache.
 T withConfiguration(org.apache.hadoop.conf.Configuration configuration)
          Deprecated. Use getConfiguration() to set configuration items as opposed to overriding the entire configuration object as it's used internally.
 T withCounter(Enum<?> e, long expectedValue)
          Register expected enumeration based counter value
 T withCounter(String group, String name, long expectedValue)
          Register expected name based counter value
 T withOutput(K2 key, V2 val)
          Works like addOutput() but returns self for fluent programming style
 T withOutput(Pair<K2,V2> outputRecord)
          Works like addOutput(), but returns self for fluent style
 T withOutputFromString(String output)
          Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables
 T withOutputSerializationConfiguration(org.apache.hadoop.conf.Configuration configuration)
          Set the Configuration to use when copying output for use with run* methods or for the InputFormat when reading output back in when setting a real OutputFormat.
 T withStrictCounterChecking()
          Change counter checking.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

expectedOutputs

protected List<Pair<K2,V2>> expectedOutputs

expectedEnumCounters

protected List<Pair<Enum<?>,Long>> expectedEnumCounters

expectedStringCounters

protected List<Pair<Pair<String,String>,Long>> expectedStringCounters

counterWrapper

protected org.apache.hadoop.mrunit.internal.counters.CounterWrapper counterWrapper
Constructor Detail

TestDriver

public TestDriver()
Method Detail

addAllOutput

public void addAllOutput(List<Pair<K2,V2>> outputRecords)
Adds output (k, v)* pairs we expect

Parameters:
outputRecords - The (k, v)* pairs to add

withAllOutput

public T withAllOutput(List<Pair<K2,V2>> outputRecords)
Functions like addAllOutput() but returns self for fluent programming style

Parameters:
outputRecords -
Returns:
this

addOutput

public void addOutput(Pair<K2,V2> outputRecord)
Adds an output (k, v) pair we expect

Parameters:
outputRecord - The (k, v) pair to add

addOutput

public void addOutput(K2 key,
                      V2 val)
Adds a (k, v) pair we expect as output

Parameters:
key - the key
val - the value

withOutput

public T withOutput(Pair<K2,V2> outputRecord)
Works like addOutput(), but returns self for fluent style

Parameters:
outputRecord -
Returns:
this

withOutput

public T withOutput(K2 key,
                    V2 val)
Works like addOutput() but returns self for fluent programming style

Returns:
this

addOutputFromString

@Deprecated
public void addOutputFromString(String output)
Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables

Expects an input of the form "key \t val" Forces the output types to Text.

Parameters:
output - A string of the form "key \t val". Trims any whitespace.

withOutputFromString

@Deprecated
public T withOutputFromString(String output)
Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables

Identical to addOutputFromString, but with a fluent programming style

Parameters:
output - A string of the form "key \t val". Trims any whitespace.
Returns:
this

getExpectedOutputs

public List<Pair<K2,V2>> getExpectedOutputs()
Returns:
the list of (k, v) pairs expected as output from this driver

resetOutput

public void resetOutput()
Clears the list of outputs expected from this driver


getExpectedEnumCounters

public List<Pair<Enum<?>,Long>> getExpectedEnumCounters()
Returns:
expected counters from this driver

getExpectedStringCounters

public List<Pair<Pair<String,String>,Long>> getExpectedStringCounters()
Returns:
expected counters from this driver

resetExpectedCounters

public void resetExpectedCounters()
Clears the list of expected counters from this driver


thisAsTestDriver

protected T thisAsTestDriver()

withCounter

public T withCounter(Enum<?> e,
                     long expectedValue)
Register expected enumeration based counter value

Parameters:
e - Enumeration based counter
expectedValue - Expected value
Returns:

withCounter

public T withCounter(String group,
                     String name,
                     long expectedValue)
Register expected name based counter value

Parameters:
group - Counter group
name - Counter name
expectedValue - Expected value
Returns:

withStrictCounterChecking

public T withStrictCounterChecking()
Change counter checking. After this method is called, the test will fail if an actual counter is not matched by an expected counter. By default, the test only check that every expected counter is there. This mode allows you to ensure that no unexpected counters has been declared.


getConfiguration

public org.apache.hadoop.conf.Configuration getConfiguration()
Returns:
The configuration object that will given to the mapper and/or reducer associated with the driver

setConfiguration

@Deprecated
public void setConfiguration(org.apache.hadoop.conf.Configuration configuration)
Deprecated. Use getConfiguration() to set configuration items as opposed to overriding the entire configuration object as it's used internally.

Parameters:
configuration - The configuration object that will given to the mapper and/or reducer associated with the driver. This method should only be called directly after the constructor as the internal state of the driver depends on the configuration object

withConfiguration

@Deprecated
public T withConfiguration(org.apache.hadoop.conf.Configuration configuration)
Deprecated. Use getConfiguration() to set configuration items as opposed to overriding the entire configuration object as it's used internally.

Parameters:
configuration - The configuration object that will given to the mapper associated with the driver. This method should only be called directly after the constructor as the internal state of the driver depends on the configuration object
Returns:
this object for fluent coding

getOutputSerializationConfiguration

public org.apache.hadoop.conf.Configuration getOutputSerializationConfiguration()
Get the Configuration to use when copying output for use with run* methods or for the InputFormat when reading output back in when setting a real OutputFormat.

Returns:
outputSerializationConfiguration, null when no outputSerializationConfiguration is set

setOutputSerializationConfiguration

public void setOutputSerializationConfiguration(org.apache.hadoop.conf.Configuration configuration)
Set the Configuration to use when copying output for use with run* methods or for the InputFormat when reading output back in when setting a real OutputFormat. When this configuration is not set, MRUnit will use the configuration set with withConfiguration(Configuration) or setConfiguration(Configuration)

Parameters:
configuration -

withOutputSerializationConfiguration

public T withOutputSerializationConfiguration(org.apache.hadoop.conf.Configuration configuration)
Set the Configuration to use when copying output for use with run* methods or for the InputFormat when reading output back in when setting a real OutputFormat. When this configuration is not set, MRUnit will use the configuration set with withConfiguration(Configuration) or setConfiguration(Configuration)

Parameters:
configuration -
Returns:
this for fluent style

addCacheFile

public void addCacheFile(String path)
Adds a file to be put on the distributed cache. The path may be relative and will try to be resolved from the classpath of the test.

Parameters:
path - path to the file

addCacheFile

public void addCacheFile(URI uri)
Adds a file to be put on the distributed cache.

Parameters:
uri - uri of the file

setCacheFiles

public void setCacheFiles(URI[] files)
Set the list of files to put on the distributed cache

Parameters:
files - list of URIs

addCacheArchive

public void addCacheArchive(String path)
Adds an archive to be put on the distributed cache. The path may be relative and will try to be resolved from the classpath of the test.

Parameters:
path - path to the archive

addCacheArchive

public void addCacheArchive(URI uri)
Adds an archive to be put on the distributed cache.

Parameters:
uri - uri of the archive

setCacheArchives

public void setCacheArchives(URI[] archives)
Set the list of archives to put on the distributed cache

Parameters:
archives - list of URIs

withCacheFile

public T withCacheFile(String file)
Adds a file to be put on the distributed cache. The path may be relative and will try to be resolved from the classpath of the test.

Parameters:
file - path to the file
Returns:
the driver

withCacheFile

public T withCacheFile(URI file)
Adds a file to be put on the distributed cache.

Parameters:
file - uri of the file
Returns:
the driver

withCacheArchive

public T withCacheArchive(String archive)
Adds an archive to be put on the distributed cache. The path may be relative and will try to be resolved from the classpath of the test.

Parameters:
archive - path to the archive
Returns:
the driver

withCacheArchive

public T withCacheArchive(URI archive)
Adds an archive to be put on the distributed cache.

Parameters:
file - uri of the archive
Returns:
the driver

run

public List<Pair<K2,V2>> run(boolean validateCounters)
                      throws IOException
Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this). Also optionally performs counter validation.

Parameters:
validateCounters - whether to run automatic counter validation
Returns:
the list of (k, v) pairs returned as output from the test
Throws:
IOException

initDistributedCache

protected void initDistributedCache()
                             throws IOException
Initialises the test distributed cache if required. This process is referred to as "localizing" by Hadoop, but since this is a unit test all files/archives are already local. Cached files are not moved but cached archives are extracted into a temporary directory.

Throws:
IOException

cleanupDistributedCache

protected void cleanupDistributedCache()
                                throws IOException
Cleans up the distributed cache test by deleting the temporary directory and any extracted cache archives contained within

Throws:
IOException - if the local fs handle cannot be retrieved

run

public abstract List<Pair<K2,V2>> run()
                               throws IOException
Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)

Returns:
the list of (k, v) pairs returned as output from the test
Throws:
IOException

runTest

public void runTest()
             throws IOException
Runs the test and validates the results

Throws:
IOException

runTest

public void runTest(boolean orderMatters)
             throws IOException
Runs the test and validates the results

Parameters:
orderMatters - Whether or not output ordering is important
Throws:
IOException

printPreTestDebugLog

protected void printPreTestDebugLog()
Overridable hook for printing pre-test debug information


parseTabbedPair

public static Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> parseTabbedPair(String tabSeparatedPair)
Split "key \t val" into Pair(Text(key), Text(val))

Parameters:
tabSeparatedPair -
Returns:

parseCommaDelimitedList

protected static List<org.apache.hadoop.io.Text> parseCommaDelimitedList(String commaDelimList)
Split "val,val,val,val..." into a List of Text(val) objects.

Parameters:
commaDelimList - A list of values separated by commas

copy

protected <E> E copy(E object)

copyPair

protected <S,E> Pair<S,E> copyPair(S first,
                                   E second)

validate

protected void validate(List<Pair<K2,V2>> outputs,
                        boolean orderMatters)
check the outputs against the expected inputs in record

Parameters:
outputs - The actual output (k, v) pairs
orderMatters - Whether or not output ordering is important when validating test result

validate

protected void validate(org.apache.hadoop.mrunit.internal.counters.CounterWrapper counterWrapper)
Check counters.


formatValueList

protected static void formatValueList(List<?> values,
                                      StringBuilder sb)


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.