MRUnit is a unit test library designed to facilitate easy integration between your MapReduce development process and standard development and testing tools such as JUnit. MRUnit contains mock objects that behave like classes you interact with during MapReduce execution (e.g., InputSplit and OutputCollector) as well as test harness "drivers" that test your program’s correctness while maintaining compliance with the MapReduce semantics. Mapper and Reducer implementations can be tested individually, as well as together to form a full MapReduce job.
This document describes how to get started using MRUnit to unit test your Mapper and Reducer implementations.
Getting Started with MRUnit
MRUnit is compiled as a jar and resides in $HADOOP_HOME/contrib/mrunit. MRUnit is designed to augment an existing unit test framework such as JUnit.
To use MRUnit, add the MRUnit JAR from the above path to the classpath or project build path in your development environment (ant buildfile, Eclipse project, etc.).
An Example
The following example test case demonstrates how to use MRUnit:
import junit.framework.TestCase; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.lib.IdentityMapper; import org.apache.hadoop.mrunit.MapDriver; import org.junit.Before; import org.junit.Test; public class TestExample extends TestCase { private Mapper mapper; private MapDriver driver; @Before public void setUp() { mapper = new IdentityMapper(); driver = MapDriver.newMapDriver(mapper); } @Test public void testIdentityMapper() { driver.withInput(new Text("foo"), new Text("bar")) .withOutput(new Text("foo"), new Text("bar")) .runTest(); } }
In this example, we see an existing Mapper implementation (IdentityMapper) being tested. We made a class named TestExample designed to test this mapper implementation. In JUnit, each particular test is created in a separate method whose name begins with test, and is marked with the @Test annotation. All of the test*() methods are run in succession. Before each test method, the setUp() method (if one is present) is executed. After each test, a tearDown() method, if one exists, is executed. (In this example, no tearDown() takes place.)
MRUnit is designed to allow you to test the precise actions of a particular class. Here we’re verifying that the IdentityMapper emits the same (key, value) pair that is provided to it as input. This test process is facilitated by a driver. In the setUp() method, we created an instance of the IdentityMapper that we want to test, as well as a MapDriver to run the test. In the testIdentityMapper() method, we configure the driver to pass the (key, value) pair ("foo", "bar") to our mapper as input. When runTest() is called, the MapDriver will send this single (key, value) pair as input to a mapper. We also configured the driver to expect the (key, value) pair ("foo", "bar") as output. After the planned input and expected output are configured, runTest() invokes the mapper on the input, and compares the expected and actual outputs. If these mismatch, it causes the JUnit test to fail.
Test Drivers and Running Tests
Each MRUnit test is designed to test a mapper, a reducer, or a mapper/reducer pair (i.e., a "job"). MRUnit provides three TestDriver classes that are designed to test each of these three scenarios. A MapDriver will provide a single (key, value) pair as input to an instance of the Mapper interface. When its run() or runTest() method is called, the mapper’s map() method is called with the provided (key, value) pair, as well as MRUnit-specific implementations of OutputCollector and Reporter. After the mapper completes, any output (key, value) pairs sent to this OutputCollector are compiled into a list.
If the test was launched via MapDriver.runTest(), the emitted (key, value) pairs are compared with the (key, value) pairs provided to the MapDriver as expected output. The driver uses equals() to determine whether the emitted value list is equal to the expected value list. If these differ, the driver raises a JUnit assertion failure to signify a failing test.
If the test was launched via MapDriver.run(), the emitted (key, value) pair list is returned to the calling method, where you can process the outputs using your own logic to assess the success or failure of the test.
Similar to the MapDriver implementation, ReduceDriver will put a single Reducer implementation under test. A single input key is provided, as well as an ordered list of input values. The reducer receives an iterator over this list of values, as well as the input key. It may emit an arbitrary number of output (key, value) pairs; runTest() will compare these against a list provided as expected output.
Finally, one may want to test a complete MapReduce job consisting of a mapper and a reducer composed together. The MapReduceDriver receives a Mapper and Reducer implementation, as well as an arbitrary number of (key, value) pairs as input. These are used as inputs to the Mapper.map() method. The outputs from these map calls are put through a process similar to shuffling when mapred.reduce.tasks is set to 1. No partitioner is called, but the values are aggregated by key and the keys are sorted by their compareTo() methods. The Reducer.reduce() method is called to process these intermediate (key, value list) sets in order. Finally, the output (key, value) pairs are again compared with any expected values provided by the user.
Configuring Tests
MRUnit provides multiple ways of configuring individual tests to facilitate different programming styles.
Setter Methods
Various setter methods allow you to set the mapper/reducer classes under test, or the input (key, value) pair. e.g., myMapDriver.setInputPair(key, value). Because a mapper may emit multiple (key, value) pairs as output, outputs are set with myMapDriver.addOutputPair(key, value). These expected outputs are added to an ordered list.
Fluent Programming Style
Another alternate mechanism for configuring tests (which is also the author’s preferred way) is to use "fluent" methods. Several methods whose names begin with with will set a configuration input, and return this. These calls can be chained together (as done in the example above) to concisely specify all the inputs to the test process; e.g., myMapDriver.withInputPair(k1, v1).withOutputPair(k2, v2).runTest().
Additional API Features
This section describes additional features of the MRUnit API.
Mock Objects
To facilitate calls to map() and reduce(), MRUnit provides mock implementations of the classes used for non-user provided arguments. The MockReporter implementation ignores most of its function calls except getInputSplit(), which returns a MockInputSplit, and the counter-increment methods. MockInputSplit subclasses FileInputSplit and contains a dummy filename, but otherwise does nothing. The MockOutputCollector aggregates (key, value) pairs sent to it via collect() into a list. This list is then used during the shuffling or output comparson functions. Unlike the full Hadoop job running process, this list is not spilled to disk nor are any memory management methods used. It is assumed that the volume of data used during MRUnit does not exceed the available heap size.
Additional Test Drivers
MRUnit comes with an additional test driver called the PipelineMapReduceDriver which allows testing of a series of MapReduce passes. By calling the addMapReduce() or withMapReduce() methods, an additional mapper and reducer pass can be added to the pipeline under test.
By calling runTest(), the harness will deliver the input to the first Mapper, feed the intermediate results to the first Reducer (without checking them), and proceed to forward this data along to subsequent Mapper/Reducer jobs in the pipeline until the final Reducer. The last Reducer's outputs are checked against the expected results.
This is designed for slightly more complicated integration tests than the MapReduceDriver, which is for smaller unit tests.
(K1, V1) in the type signature refer to the types associated with the inputs to the first Mapper. (K2, V2) refer to the types associated with the final Reducer's output. No intermediate types are specified.
Testing Combiners
The MapReduceDriver will allow you to test a combiner in addition to a mapper and reducer. The setCombiner() method configures the driver to pass all mapper output (key, value) pairs through a combiner before being sent to the reducer under test.
Counters
The test drivers support testing of the Counters system in Hadoop. The Reporter.incrCounter() method works as it usually does inside Mapper or Reducer instances under test. The TestDriver implementation itself holds a Counters object which can be retrieved with getCounters(). You can then verify the correct counter values have been set by your code under test.
The setCounters() and withCounters() methods allow you to set the Counters instance being used to accumulate values during testing.
One departure from the typical interface is that in the PipelineMapReduceDriver, all MapReduce passes share the same Counters instance and counter values are not reset. If several MapReduce passes are tested together, their counter values are accumulated together as well.
The New MapReduce API
MRUnit includes support for the "new" (i.e., version 0.20 and later) API as well. MRUnit provides MapDriver, ReduceDriver, and MapReduceDriver implementations compatable with the new MapReduce (Context-based) API in the org.apache.hadoop.mrunit.mapreduce package. These classes work identically to their old-API counterparts in the org.apache.hadoop.mrunit package, but work with org.apache.hadoop.mapreduce.Mapper and org.apache.hadoop.mapreduce.Reducer instances.
Mock implementations of InputSplit, MapContext, OutputCommitter, ReduceContext, and Reporter compatible with the new interfaces are provided.