PipelineMapReduceDriver (MRUnit 1.0.0 API)

This project has retired. For details please refer to its Attic page.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mrunit
Class PipelineMapReduceDriver<K1,V1,K2,V2>

java.lang.Object
  org.apache.hadoop.mrunit.TestDriver<K1,V1,K2,V2,PipelineMapReduceDriver<K1,V1,K2,V2>>
      org.apache.hadoop.mrunit.PipelineMapReduceDriver<K1,V1,K2,V2>

public class PipelineMapReduceDriver<K1,V1,K2,V2>
extends TestDriver<K1,V1,K2,V2,PipelineMapReduceDriver<K1,V1,K2,V2>>
extends TestDriver<K1,V1,K2,V2,PipelineMapReduceDriver<K1,V1,K2,V2>>

Harness that allows you to test a dataflow through a set of Mappers and Reducers. You provide a set of (Mapper, Reducer) "jobs" that make up a workflow, as well as a set of (key, value) pairs to pass in to the first Mapper. You can also specify the outputs you expect to be sent to the final Reducer in the pipeline. By calling runTest(), the harness will deliver the input to the first Mapper, feed the intermediate results to the first Reducer (without checking them), and proceed to forward this data along to subsequent Mapper/Reducer jobs in the pipeline until the final Reducer. The last Reducer's outputs are checked against the expected results. This is designed for slightly more complicated integration tests than the MapReduceDriver, which is for smaller unit tests. (K1, V1) in the type signature refer to the types associated with the inputs to the first Mapper. (K2, V2) refer to the types associated with the final Reducer's output. No intermediate types are specified.

Field Summary
`static org.apache.commons.logging.Log`	`LOG`
`protected org.apache.hadoop.fs.Path`	`mapInputPath`

Fields inherited from class org.apache.hadoop.mrunit.TestDriver
`counterWrapper, expectedEnumCounters, expectedOutputs, expectedStringCounters`

Constructor Summary
`PipelineMapReduceDriver()`
`PipelineMapReduceDriver(List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> pipeline)`

Method Summary

void addAll(List<Pair<K1,V1>> inputs)
Adds list of inputs to send to the mapper

void addInput(K1 key, V1 val)
Adds an input to send to the mapper

void addInput(Pair<K1,V1> input)
Adds an input to send to the Mapper

void addInputFromString(String input)
Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables

void addMapReduce(org.apache.hadoop.mapred.Mapper m, org.apache.hadoop.mapred.Reducer r)
Add a Mapper and Reducer instance to the pipeline to use with this test driver

void addMapReduce(Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer> p)
Add a Mapper and Reducer instance to the pipeline to use with this test driver

org.apache.hadoop.mapred.Counters getCounters()

org.apache.hadoop.fs.Path getMapInputPath()

List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> getMapReducePipeline()

static



<K1,V1,K2,V2> 


PipelineMapReduceDriver<K1,V1,K2,V2>

newPipelineMapReduceDriver()
Returns a new PipelineMapReduceDriver without having to specify the generic types on the right hand side of the object create statement.

static



<K1,V1,K2,V2> 


PipelineMapReduceDriver<K1,V1,K2,V2>

newPipelineMapReduceDriver(List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> pipeline)
Returns a new PipelineMapReduceDriver without having to specify the generic types on the right hand side of the object create statement.

List<Pair<K2,V2>> run()
Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)

void setCounters(org.apache.hadoop.mapred.Counters ctrs)
Sets the counters object to use for this test.

void setMapInputPath(org.apache.hadoop.fs.Path mapInputPath)

PipelineMapReduceDriver<K1,V1,K2,V2> withAll(List<Pair<K1,V1>> inputRecords)
Identical to addAll() but returns self for fluent programming style

PipelineMapReduceDriver<K1,V1,K2,V2> withCounters(org.apache.hadoop.mapred.Counters ctrs)
Sets the counters to use and returns self for fluent style

PipelineMapReduceDriver<K1,V1,K2,V2> withInput(K1 key, V1 val)
Identical to addInput() but returns self for fluent programming style

PipelineMapReduceDriver<K1,V1,K2,V2> withInput(Pair<K1,V1> input)
Identical to addInput() but returns self for fluent programming style

PipelineMapReduceDriver<K1,V1,K2,V2> withInputFromString(String input)
Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables

PipelineMapReduceDriver<K1,V1,K2,V2> withMapInputPath(org.apache.hadoop.fs.Path mapInputPath)

PipelineMapReduceDriver<K1,V1,K2,V2> withMapReduce(org.apache.hadoop.mapred.Mapper m, org.apache.hadoop.mapred.Reducer r)
Add a Mapper and Reducer instance to the pipeline to use with this test driver using fluent style

PipelineMapReduceDriver<K1,V1,K2,V2> withMapReduce(Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer> p)
Add a Mapper and Reducer instance to the pipeline to use with this test driver using fluent style

Methods inherited from class org.apache.hadoop.mrunit.TestDriver
addAllOutput, addCacheArchive, addCacheArchive, addCacheFile, addCacheFile, addOutput, addOutput, addOutputFromString, cleanupDistributedCache, copy, copyPair, formatValueList, getConfiguration, getExpectedEnumCounters, getExpectedOutputs, getExpectedStringCounters, getOutputSerializationConfiguration, initDistributedCache, parseCommaDelimitedList, parseTabbedPair, printPreTestDebugLog, resetExpectedCounters, resetOutput, run, runTest, runTest, setCacheArchives, setCacheFiles, setConfiguration, setOutputSerializationConfiguration, thisAsTestDriver, validate, validate, withAllOutput, withCacheArchive, withCacheArchive, withCacheFile, withCacheFile, withConfiguration, withCounter, withCounter, withOutput, withOutput, withOutputFromString, withOutputSerializationConfiguration, withStrictCounterChecking

Methods inherited from class org.apache.hadoop.mrunit.TestDriver

addAllOutput, addCacheArchive, addCacheArchive, addCacheFile, addCacheFile, addOutput, addOutput, addOutputFromString, cleanupDistributedCache, copy, copyPair, formatValueList, getConfiguration, getExpectedEnumCounters, getExpectedOutputs, getExpectedStringCounters, getOutputSerializationConfiguration, initDistributedCache, parseCommaDelimitedList, parseTabbedPair, printPreTestDebugLog, resetExpectedCounters, resetOutput, run, runTest, runTest, setCacheArchives, setCacheFiles, setConfiguration, setOutputSerializationConfiguration, thisAsTestDriver, validate, validate, withAllOutput, withCacheArchive, withCacheArchive, withCacheFile, withCacheFile, withConfiguration, withCounter, withCounter, withOutput, withOutput, withOutputFromString, withOutputSerializationConfiguration, withStrictCounterChecking

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

mapInputPath

protected org.apache.hadoop.fs.Path mapInputPath

Constructor Detail

PipelineMapReduceDriver

public PipelineMapReduceDriver(List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> pipeline)

PipelineMapReduceDriver

public PipelineMapReduceDriver()

Method Detail

getCounters

public org.apache.hadoop.mapred.Counters getCounters()

Returns:: the counters used in this test

setCounters

public void setCounters(org.apache.hadoop.mapred.Counters ctrs)

Sets the counters object to use for this test.

Parameters:: ctrs - The counters object to use.

withCounters

public PipelineMapReduceDriver<K1,V1,K2,V2> withCounters(org.apache.hadoop.mapred.Counters ctrs)

Sets the counters to use and returns self for fluent style

addMapReduce

public void addMapReduce(org.apache.hadoop.mapred.Mapper m,
                         org.apache.hadoop.mapred.Reducer r)

Add a Mapper and Reducer instance to the pipeline to use with this test driver

Parameters:: m - The Mapper instance to add to the pipeline; r - The Reducer instance to add to the pipeline

addMapReduce

public void addMapReduce(Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer> p)

Add a Mapper and Reducer instance to the pipeline to use with this test driver

Parameters:: p - The Mapper and Reducer instances to add to the pipeline

withMapReduce

public PipelineMapReduceDriver<K1,V1,K2,V2> withMapReduce(org.apache.hadoop.mapred.Mapper m,
                                                          org.apache.hadoop.mapred.Reducer r)

Add a Mapper and Reducer instance to the pipeline to use with this test driver using fluent style

Parameters:: m - The Mapper instance to use; r - The Reducer instance to use

withMapReduce

public PipelineMapReduceDriver<K1,V1,K2,V2> withMapReduce(Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer> p)

Add a Mapper and Reducer instance to the pipeline to use with this test driver using fluent style

Parameters:: p - The Mapper and Reducer instances to add to the pipeline

getMapReducePipeline

public List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> getMapReducePipeline()

Returns:: A copy of the list of Mapper and Reducer objects under test

addInput

public void addInput(K1 key,
                     V1 val)

Adds an input to send to the mapper

Parameters:: key -; val -

addAll

public void addAll(List<Pair<K1,V1>> inputs)

Adds list of inputs to send to the mapper

Parameters:: inputs - list of (K, V) pairs

withInput

public PipelineMapReduceDriver<K1,V1,K2,V2> withInput(K1 key,
                                                      V1 val)

Identical to addInput() but returns self for fluent programming style

Parameters:: key -; val -
Returns:: this

addInput

public void addInput(Pair<K1,V1> input)

Adds an input to send to the Mapper

Parameters:: input - The (k, v) pair to add to the input list.

withInput

public PipelineMapReduceDriver<K1,V1,K2,V2> withInput(Pair<K1,V1> input)

Identical to addInput() but returns self for fluent programming style

Parameters:: input - The (k, v) pair to add
Returns:: this

addInputFromString

@Deprecated
public void addInputFromString(String input)

Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables

Expects an input of the form "key \t val" Forces the Mapper input types to Text.

Parameters:: input - A string of the form "key \t val". Trims any whitespace.

withInputFromString

@Deprecated
public PipelineMapReduceDriver<K1,V1,K2,V2> withInputFromString(String input)

Deprecated. No replacement due to lack of type safety and incompatibility with non Text Writables

Identical to addInputFromString, but with a fluent programming style

Parameters:: input - A string of the form "key \t val". Trims any whitespace.
Returns:: this

withAll

public PipelineMapReduceDriver<K1,V1,K2,V2> withAll(List<Pair<K1,V1>> inputRecords)

Identical to addAll() but returns self for fluent programming style

Parameters:: inputRecords - input key/value pairs
Returns:: this

getMapInputPath

public org.apache.hadoop.fs.Path getMapInputPath()

Returns:: the path passed to the mapper InputSplit

setMapInputPath

public void setMapInputPath(org.apache.hadoop.fs.Path mapInputPath)

Parameters:: mapInputPath - Path which is to be passed to the mappers InputSplit

withMapInputPath

public final PipelineMapReduceDriver<K1,V1,K2,V2> withMapInputPath(org.apache.hadoop.fs.Path mapInputPath)

Parameters:: mapInputPath - The Path object which will be given to the mapper
Returns:

run

public List<Pair<K2,V2>> run()
                      throws IOException

Description copied from class: TestDriver

Runs the test but returns the result set instead of validating it (ignores any addOutput(), etc calls made before this)

Specified by:: run in class TestDriver<K1,V1,K2,V2,PipelineMapReduceDriver<K1,V1,K2,V2>>

Returns:: the list of (k, v) pairs returned as output from the test
Throws:: IOException

newPipelineMapReduceDriver

public static <K1,V1,K2,V2> PipelineMapReduceDriver<K1,V1,K2,V2> newPipelineMapReduceDriver()

Returns a new PipelineMapReduceDriver without having to specify the generic types on the right hand side of the object create statement.

Returns:: new PipelineMapReduceDriver

newPipelineMapReduceDriver

public static <K1,V1,K2,V2> PipelineMapReduceDriver<K1,V1,K2,V2> newPipelineMapReduceDriver(List<Pair<org.apache.hadoop.mapred.Mapper,org.apache.hadoop.mapred.Reducer>> pipeline)

Returns a new PipelineMapReduceDriver without having to specify the generic types on the right hand side of the object create statement.

Parameters:: pipeline - passed to PipelineMapReduceDriver constructor
Returns:: new PipelineMapReduceDriver

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.mrunit Class PipelineMapReduceDriver<K1,V1,K2,V2>

LOG

mapInputPath

PipelineMapReduceDriver

PipelineMapReduceDriver

getCounters

setCounters

withCounters

addMapReduce

addMapReduce

withMapReduce

withMapReduce

getMapReducePipeline

addInput

addAll

withInput

addInput

withInput

addInputFromString

withInputFromString

withAll

getMapInputPath

setMapInputPath

withMapInputPath

run

newPipelineMapReduceDriver

newPipelineMapReduceDriver

org.apache.hadoop.mrunit
Class PipelineMapReduceDriver<K1,V1,K2,V2>