hadoop custom partitioner in mapreduce example

Our HadoopTpoint App is now available in google play store,please rate and comment it in play store : W3Schools

In our last posts we all are clearly discussed about hadoop mapreduce architecture and how mapper function will work and how reducer function will work and how to set mapreduce program configurations in mapreduce driver class.Now we will learn about hadoop custom partitioner in mapreduce example .

what is partitioner in hadoop

The main purpose of partitioner is partitions the key,value pairs of mapper output intermediate keys,The partitioner will divided the data based on our user defined conditions,which works like a hash function.The total number of partitions is equal to total number of reducers in a job. ( job.setNumReduceTasks(n)) . The partitioner phase takes place after the map phase and before the reduce phase in our mapreduce program.The default partitioning function is the hash partitioning function where the hashing is done on the key. However it might be useful to partition the data according to some other function of the key or the value.

mapreduce partitioner Uses

Actually we will wrote the custom partitioner function logic after Mapper class and before reducer class because the mapreduce partitioner receive the input from the mapper class and by using the given partitioner conditions it will divided the mapper tasks and assign the tasks to individual reducers.

by using hadoop partition we will optimize the mapreduce programs and executed the given problem as well as possible.divide the output files as like we want.we will use partitioner in mapreduce only suitable situations only.There is no need to use custom partitioner in every program.

hadoop custom partitioner Default Work

By default the partitioner implementation is called HashPartitioner. It uses the hashCode() method of the key objects modulo the number of partitions total to determine which partition to send a given (key, value) pair to.

Partitioner provides the getPartition() method that you can implement yourself if you want to declare the custom partition for your job. The getPartition() method receives a key and a value and the number of partitions to split the data, a number in the range [0, numPartitions) must be returned by this method, indicating which partition to send the key and value to. For any two keys k1 and k2, k1.equals(k2) implies getPartition(k1, *, n) == getPartition(k2, *, n).

public int getPartition(K key, V value,
                         int numReduceTasks) {
   return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
 }

hadoop partitioner example

Now you can either use one of the existing .txt file on your machine or you can create a text file like this

hhh eee iii bbb ccc fff ddd ggg aaa aaa XXX YYY ZZZ hhh eee iii bbb ccc fff ddd ggg aaa aaa XXX YYY ZZZ hhh eee iii bbb ccc fff ddd ggg aaa aaa XXX YYY ZZZ hhh eee iii bbb ccc fff ddd ggg aaa aaa hhh eee iii bbb ccc fff ddd ggg aaa aaa

Now want to write the Mapreduce program by using custom partitioner hadoop example.It will divided the output into 2 separate parts.one file having the lowercase words and second output file having the uppercase words.

hadoop-custom-partitioner-in-mapreduce-example

The below code belongs to Mapreduce program configurations .

Driver Class

The below code belongs to Mapreduce program Mapper function

Mapper Class

The below code belongs to Mapreduce program Partitioner function

Partitioner Class

The below code belongs to Mapreduce program Reducer function.

Reducer Class

Output 1

bbb 5
ccc 5
ddd 5
eee 5
fff 5
ggg 5
hhh 5
iii 5

Output 2

XXX 3
YYY 3
ZZZ 3

This is the hadoop custom partitioner in mapreduce example

Comments

  1. Looks like the Partitioner example doesn’t work on Windows. All parititioner output results in a single file.

  2. This example is not resulting in two output files when ran on Windows. But results in two files on unix. Any tweaks needed for windows run?

Speak Your Mind

*