org.apache.hadoop.mrunit
Class PipelineMapReduceDriver<K1,V1,K2,V2>

java.lang.Object
  extended by org.apache.hadoop.mrunit.TestDriver<K1,V1,K2,V2>
      extended by org.apache.hadoop.mrunit.PipelineMapReduceDriver<K1,V1,K2,V2>

public class PipelineMapReduceDriver<K1,V1,K2,V2>
extends TestDriver<K1,V1,K2,V2>

Harness that allows you to test a dataflow through a set of Mappers and Reducers. You provide a set of (Mapper, Reducer) "jobs" that make up a workflow, as well as a set of (key, value) pairs to pass in to the first Mapper. You can also specify the outputs you expect to be sent to the final Reducer in the pipeline. By calling runTest(), the harness will deliver the input to the first Mapper, feed the intermediate results to the first Reducer (without checking them), and proceed to forward this data along to subsequent Mapper/Reducer jobs in the pipeline until the final Reducer. The last Reducer's outputs are checked against the expected results. This is designed for slightly more complicated integration tests than the MapReduceDriver, which is for smaller unit tests. (K1, V1) in the type signature refer to the types associated with the inputs to the first Mapper. (K2, V2) refer to the types associated with the final Reducer's output. No intermediate types are specified.


Field Summary
static org.apache.commons.logging.Log LOG
           
 
Fields inherited from class org.apache.hadoop.mrunit.TestDriver
configuration, expectedOutputs
 
Constructor Summary
PipelineMapReduceDriver()
           
PipelineMapReduceDriver(List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> pipeline)
           
 
Method Summary
 void addInput(K1 key, V1 val)
          Adds an input to send to the mapper
 void addInput(Pair<K1,V1> input)
          Adds an input to send to the Mapper
 void addInputFromString(String input)
          Expects an input of the form "key \t val" Forces the Mapper input types to Text.
 void addMapReduce(org.apache.hadoop.mapred.Mapper m, org.apache.hadoop.mapred.Reducer r)
          Add a Mapper and Reducer instance to the pipeline to use with this test driver
 void addMapReduce(Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer> p)
          Add a Mapper and Reducer instance to the pipeline to use with this test driver
 void addOutput(K2 key, V2 val)
          Adds a (k, v) pair we expect as output from the Reducer
 void addOutput(Pair<K2,V2> outputRecord)
          Adds an output (k, v) pair we expect from the Reducer
 void addOutputFromString(String output)
          Expects an input of the form "key \t val" Forces the Reducer output types to Text.
 org.apache.hadoop.mapred.Counters getCounters()
           
 List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> getMapReducePipeline()
           
 List<Pair<K2,V2>> run()
          Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)
 void runTest()
          Runs the test and validates the results
 void setCounters(org.apache.hadoop.mapred.Counters ctrs)
          Sets the counters object to use for this test.
 PipelineMapReduceDriver<K1,V1,K2,V2> withCounters(org.apache.hadoop.mapred.Counters ctrs)
          Sets the counters to use and returns self for fluent style
 PipelineMapReduceDriver<K1,V1,K2,V2> withInput(K1 key, V1 val)
          Identical to addInput() but returns self for fluent programming style
 PipelineMapReduceDriver<K1,V1,K2,V2> withInput(Pair<K1,V1> input)
          Identical to addInput() but returns self for fluent programming style
 PipelineMapReduceDriver<K1,V1,K2,V2> withInputFromString(String input)
          Identical to addInputFromString, but with a fluent programming style
 PipelineMapReduceDriver<K1,V1,K2,V2> withMapReduce(org.apache.hadoop.mapred.Mapper m, org.apache.hadoop.mapred.Reducer r)
          Add a Mapper and Reducer instance to the pipeline to use with this test driver using fluent style
 PipelineMapReduceDriver<K1,V1,K2,V2> withMapReduce(Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer> p)
          Add a Mapper and Reducer instance to the pipeline to use with this test driver using fluent style
 PipelineMapReduceDriver<K1,V1,K2,V2> withOutput(K2 key, V2 val)
          Functions like addOutput() but returns self for fluent programming style
 PipelineMapReduceDriver<K1,V1,K2,V2> withOutput(Pair<K2,V2> outputRecord)
          Works like addOutput(), but returns self for fluent style
 PipelineMapReduceDriver<K1,V1,K2,V2> withOutputFromString(String output)
          Identical to addOutputFromString, but with a fluent programming style
 
Methods inherited from class org.apache.hadoop.mrunit.TestDriver
formatValueList, getConfiguration, getExpectedOutputs, parseCommaDelimitedList, parseTabbedPair, resetOutput, setConfiguration, validate
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

PipelineMapReduceDriver

public PipelineMapReduceDriver(List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> pipeline)

PipelineMapReduceDriver

public PipelineMapReduceDriver()
Method Detail

getCounters

public org.apache.hadoop.mapred.Counters getCounters()
Returns:
the counters used in this test

setCounters

public void setCounters(org.apache.hadoop.mapred.Counters ctrs)
Sets the counters object to use for this test.

Parameters:
ctrs - The counters object to use.

withCounters

public PipelineMapReduceDriver<K1,V1,K2,V2> withCounters(org.apache.hadoop.mapred.Counters ctrs)
Sets the counters to use and returns self for fluent style


addMapReduce

public void addMapReduce(org.apache.hadoop.mapred.Mapper m,
                         org.apache.hadoop.mapred.Reducer r)
Add a Mapper and Reducer instance to the pipeline to use with this test driver

Parameters:
m - The Mapper instance to add to the pipeline
r - The Reducer instance to add to the pipeline

addMapReduce

public void addMapReduce(Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer> p)
Add a Mapper and Reducer instance to the pipeline to use with this test driver

Parameters:
p - The Mapper and Reducer instances to add to the pipeline

withMapReduce

public PipelineMapReduceDriver<K1,V1,K2,V2> withMapReduce(org.apache.hadoop.mapred.Mapper m,
                                                          org.apache.hadoop.mapred.Reducer r)
Add a Mapper and Reducer instance to the pipeline to use with this test driver using fluent style

Parameters:
m - The Mapper instance to use
r - The Reducer instance to use

withMapReduce

public PipelineMapReduceDriver<K1,V1,K2,V2> withMapReduce(Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer> p)
Add a Mapper and Reducer instance to the pipeline to use with this test driver using fluent style

Parameters:
p - The Mapper and Reducer instances to add to the pipeline

getMapReducePipeline

public List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> getMapReducePipeline()
Returns:
A copy of the list of Mapper and Reducer objects under test

addInput

public void addInput(K1 key,
                     V1 val)
Adds an input to send to the mapper

Parameters:
key -
val -

withInput

public PipelineMapReduceDriver<K1,V1,K2,V2> withInput(K1 key,
                                                      V1 val)
Identical to addInput() but returns self for fluent programming style

Parameters:
key -
val -
Returns:
this

addInput

public void addInput(Pair<K1,V1> input)
Adds an input to send to the Mapper

Parameters:
input - The (k, v) pair to add to the input list.

withInput

public PipelineMapReduceDriver<K1,V1,K2,V2> withInput(Pair<K1,V1> input)
Identical to addInput() but returns self for fluent programming style

Parameters:
input - The (k, v) pair to add
Returns:
this

addOutput

public void addOutput(Pair<K2,V2> outputRecord)
Adds an output (k, v) pair we expect from the Reducer

Parameters:
outputRecord - The (k, v) pair to add

withOutput

public PipelineMapReduceDriver<K1,V1,K2,V2> withOutput(Pair<K2,V2> outputRecord)
Works like addOutput(), but returns self for fluent style

Parameters:
outputRecord -
Returns:
this

addOutput

public void addOutput(K2 key,
                      V2 val)
Adds a (k, v) pair we expect as output from the Reducer

Parameters:
key -
val -

withOutput

public PipelineMapReduceDriver<K1,V1,K2,V2> withOutput(K2 key,
                                                       V2 val)
Functions like addOutput() but returns self for fluent programming style

Parameters:
key -
val -
Returns:
this

addInputFromString

public void addInputFromString(String input)
Expects an input of the form "key \t val" Forces the Mapper input types to Text.

Parameters:
input - A string of the form "key \t val". Trims any whitespace.

withInputFromString

public PipelineMapReduceDriver<K1,V1,K2,V2> withInputFromString(String input)
Identical to addInputFromString, but with a fluent programming style

Parameters:
input - A string of the form "key \t val". Trims any whitespace.
Returns:
this

addOutputFromString

public void addOutputFromString(String output)
Expects an input of the form "key \t val" Forces the Reducer output types to Text.

Parameters:
output - A string of the form "key \t val". Trims any whitespace.

withOutputFromString

public PipelineMapReduceDriver<K1,V1,K2,V2> withOutputFromString(String output)
Identical to addOutputFromString, but with a fluent programming style

Parameters:
output - A string of the form "key \t val". Trims any whitespace.
Returns:
this

run

public List<Pair<K2,V2>> run()
                      throws IOException
Description copied from class: TestDriver
Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)

Specified by:
run in class TestDriver<K1,V1,K2,V2>
Returns:
the list of (k, v) pairs returned as output from the test
Throws:
IOException

runTest

public void runTest()
             throws RuntimeException
Description copied from class: TestDriver
Runs the test and validates the results

Specified by:
runTest in class TestDriver<K1,V1,K2,V2>
Throws:
RuntimeException - if they don't *


Copyright © 2012 Apache Software Foundation. All Rights Reserved.