org.apache.hadoop.mrunit
Class MapReduceDriverBase<K1,V1,K2,V2,K3,V3,T extends MapReduceDriverBase<K1,V1,K2,V2,K3,V3,T>>

java.lang.Object
  extended by org.apache.hadoop.mrunit.TestDriver<K1,V1,K3,V3,T>
      extended by org.apache.hadoop.mrunit.MapReduceDriverBase<K1,V1,K2,V2,K3,V3,T>
Direct Known Subclasses:
MapReduceDriver, MapReduceDriver

public abstract class MapReduceDriverBase<K1,V1,K2,V2,K3,V3,T extends MapReduceDriverBase<K1,V1,K2,V2,K3,V3,T>>
extends TestDriver<K1,V1,K3,V3,T>

Harness that allows you to test a Mapper and a Reducer instance together You provide the input key and value that should be sent to the Mapper, and outputs you expect to be sent by the Reducer to the collector for those inputs. By calling runTest(), the harness will deliver the input to the Mapper, feed the intermediate results to the Reducer (without checking them), and will check the Reducer's outputs against the expected results. This is designed to handle a single (k, v)* -> (k, v)* case from the Mapper/Reducer pair, representing a single unit test.


Field Summary
protected  List<Pair<K1,V1>> inputList
           
protected  Comparator<K2> keyGroupComparator
          Key group comparator
protected  Comparator<K2> keyValueOrderComparator
          Key value order comparator
static org.apache.commons.logging.Log LOG
           
protected  org.apache.hadoop.fs.Path mapInputPath
           
 
Fields inherited from class org.apache.hadoop.mrunit.TestDriver
counterWrapper, expectedEnumCounters, expectedOutputs, expectedStringCounters
 
Constructor Summary
MapReduceDriverBase()
           
 
Method Summary
 void addAll(List<Pair<K1,V1>> inputs)
          Adds input to send to the mapper
 void addInput(K1 key, V1 val)
          Adds an input to send to the mapper
 void addInput(Pair<K1,V1> input)
          Adds an input to send to the Mapper
 void addInputFromString(String input)
          Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables
 org.apache.hadoop.fs.Path getMapInputPath()
           
protected  void preRunChecks(Object mapper, Object reducer)
           
abstract  List<Pair<K3,V3>> run()
          Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)
 void setKeyGroupingComparator(org.apache.hadoop.io.RawComparator<K2> groupingComparator)
          Set the key grouping comparator, similar to calling the following API calls but passing a real instance rather than just the class: pre 0.20.1 API: JobConf.setOutputValueGroupingComparator(Class) 0.20.1+ API: Job.setGroupingComparatorClass(Class)
 void setKeyOrderComparator(org.apache.hadoop.io.RawComparator<K2> orderComparator)
          Set the key value order comparator, similar to calling the following API calls but passing a real instance rather than just the class: pre 0.20.1 API: JobConf.setOutputKeyComparatorClass(Class) 0.20.1+ API: Job.setSortComparatorClass(Class)
 void setMapInputPath(org.apache.hadoop.fs.Path mapInputPath)
           
 List<Pair<K2,List<V2>>> shuffle(List<Pair<K2,V2>> mapOutputs)
          Take the outputs from the Mapper, combine all values for the same key, and sort them by key.
 T withAll(List<Pair<K1,V1>> inputs)
          Identical to addAll() but returns self for fluent programming style
 T withInput(K1 key, V1 val)
          Identical to addInput() but returns self for fluent programming style
 T withInput(Pair<K1,V1> input)
          Identical to addInput() but returns self for fluent programming style
 T withInputFromString(String input)
          Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables
 T withKeyGroupingComparator(org.apache.hadoop.io.RawComparator<K2> groupingComparator)
          Identical to setKeyGroupingComparator(RawComparator), but with a fluent programming style
 T withKeyOrderComparator(org.apache.hadoop.io.RawComparator<K2> orderComparator)
          Identical to setKeyOrderComparator(RawComparator), but with a fluent programming style
 T withMapInputPath(org.apache.hadoop.fs.Path mapInputPath)
           
 
Methods inherited from class org.apache.hadoop.mrunit.TestDriver
addAllOutput, addCacheArchive, addCacheArchive, addCacheFile, addCacheFile, addOutput, addOutput, addOutputFromString, cleanupDistributedCache, copy, copyPair, formatValueList, getConfiguration, getExpectedEnumCounters, getExpectedOutputs, getExpectedStringCounters, getOutputSerializationConfiguration, initDistributedCache, parseCommaDelimitedList, parseTabbedPair, printPreTestDebugLog, resetExpectedCounters, resetOutput, run, runTest, runTest, setCacheArchives, setCacheFiles, setConfiguration, setOutputSerializationConfiguration, thisAsTestDriver, validate, validate, withAllOutput, withCacheArchive, withCacheArchive, withCacheFile, withCacheFile, withConfiguration, withCounter, withCounter, withOutput, withOutput, withOutputFromString, withOutputSerializationConfiguration, withStrictCounterChecking
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

inputList

protected List<Pair<K1,V1>> inputList

mapInputPath

protected org.apache.hadoop.fs.Path mapInputPath

keyGroupComparator

protected Comparator<K2> keyGroupComparator
Key group comparator


keyValueOrderComparator

protected Comparator<K2> keyValueOrderComparator
Key value order comparator

Constructor Detail

MapReduceDriverBase

public MapReduceDriverBase()
Method Detail

addInput

public void addInput(K1 key,
                     V1 val)
Adds an input to send to the mapper

Parameters:
key -
val -

addInput

public void addInput(Pair<K1,V1> input)
Adds an input to send to the Mapper

Parameters:
input - The (k, v) pair to add to the input list.

addAll

public void addAll(List<Pair<K1,V1>> inputs)
Adds input to send to the mapper

Parameters:
inputs - List of (k, v) pairs to add to the input list

addInputFromString

@Deprecated
public void addInputFromString(String input)
Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables

Expects an input of the form "key \t val" Forces the Mapper input types to Text.

Parameters:
input - A string of the form "key \t val". Trims any whitespace.

withInput

public T withInput(K1 key,
                   V1 val)
Identical to addInput() but returns self for fluent programming style

Parameters:
key -
val -
Returns:
this

withInput

public T withInput(Pair<K1,V1> input)
Identical to addInput() but returns self for fluent programming style

Parameters:
input - The (k, v) pair to add
Returns:
this

withInputFromString

@Deprecated
public T withInputFromString(String input)
Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables

Identical to addInputFromString, but with a fluent programming style

Parameters:
input - A string of the form "key \t val". Trims any whitespace.
Returns:
this

withAll

public T withAll(List<Pair<K1,V1>> inputs)
Identical to addAll() but returns self for fluent programming style

Parameters:
inputs - List of (k, v) pairs to add
Returns:
this

getMapInputPath

public org.apache.hadoop.fs.Path getMapInputPath()
Returns:
the path passed to the mapper InputSplit

setMapInputPath

public void setMapInputPath(org.apache.hadoop.fs.Path mapInputPath)
Parameters:
mapInputPath - Path which is to be passed to the mappers InputSplit

withMapInputPath

public final T withMapInputPath(org.apache.hadoop.fs.Path mapInputPath)
Parameters:
mapInputPath - The Path object which will be given to the mapper
Returns:

preRunChecks

protected void preRunChecks(Object mapper,
                            Object reducer)

run

public abstract List<Pair<K3,V3>> run()
                               throws IOException
Description copied from class: TestDriver
Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)

Specified by:
run in class TestDriver<K1,V1,K3,V3,T extends MapReduceDriverBase<K1,V1,K2,V2,K3,V3,T>>
Returns:
the list of (k, v) pairs returned as output from the test
Throws:
IOException

shuffle

public List<Pair<K2,List<V2>>> shuffle(List<Pair<K2,V2>> mapOutputs)
Take the outputs from the Mapper, combine all values for the same key, and sort them by key.

Parameters:
mapOutputs - An unordered list of (key, val) pairs from the mapper
Returns:
the sorted list of (key, list(val))'s to present to the reducer

setKeyGroupingComparator

public void setKeyGroupingComparator(org.apache.hadoop.io.RawComparator<K2> groupingComparator)
Set the key grouping comparator, similar to calling the following API calls but passing a real instance rather than just the class:

Parameters:
groupingComparator -

setKeyOrderComparator

public void setKeyOrderComparator(org.apache.hadoop.io.RawComparator<K2> orderComparator)
Set the key value order comparator, similar to calling the following API calls but passing a real instance rather than just the class:

Parameters:
orderComparator -

withKeyGroupingComparator

public T withKeyGroupingComparator(org.apache.hadoop.io.RawComparator<K2> groupingComparator)
Identical to setKeyGroupingComparator(RawComparator), but with a fluent programming style

Parameters:
groupingComparator - Comparator to use in the shuffle stage for key grouping
Returns:
this

withKeyOrderComparator

public T withKeyOrderComparator(org.apache.hadoop.io.RawComparator<K2> orderComparator)
Identical to setKeyOrderComparator(RawComparator), but with a fluent programming style

Parameters:
orderComparator - Comparator to use in the shuffle stage for key value ordering
Returns:
this


Copyright © 2013 The Apache Software Foundation. All Rights Reserved.