Hadoop multiple input files example In MapReduce

Our HadoopTpoint App is now available in google play store,please rate and comment it in play store : W3Schools

In Our basic MapReduce Programs we took a single input file and then we load that particular input file from local path to Mapreduce framework.In this process we just add a single input file and then we got a single output file but coming to the real time scenarios we have to load multiple input files to mapreduce framework at that time this basic program concepts will not work.

So in this kind of situations we are using some Input formats and output formats.For Multiple Input files concept we are using MultipleInputs class and for Multiple output files we are using MultipleOutputs class .

Here is the Main configurations for MultipleOutput class in Driver Class and Hadoop multiple input files example In MapReduce .

We have to import first of all import org.apache.hadoop.mapreduce.lib.output.MultipleInputs from hadoop librires for using this MultipleInputs class.

Hadoop multiple input files

This approach as a matter of fact is very simple and effective. Here we simply need to understand the concept of number of mappers needed. As you may know, mapper extract its input from the input file. When there are more than input file , we need the same number of mapper to read records from input files. For instance, if we are using two input files then we need two mapper classes.

Hadoop multiple input files example In MapReduce

Hadoop multiple input files example In MapReduce

We use MultipleInputs class which supports MapReduce jobs that have multiple input paths with a different InputFormat and Mapper for each path. To understand the concept more clearly let us take a case where user want to take input from two input files with similar structure.

Also assume that both the input files have 2 columns, first having “Name” and second having “Age”. We want to simply combine the data and sort it by “Name”. What we need to do? Just two things:

  1. Use two mapper classes.
  2. Specify the mapper classes in MultipleInputs class object in run/main method.

Input Files

File1.txt

Aman  19
Tom   20
Tony  15
John  18
Johnny      19
Hugh  17

File2.txt

James,21
Punk,21
Frank,20

Hadoop multiple input files Driver Class

Hadoop multiple input files Counter Mapper Class

Hadoop multiple input files Counter Two Mapper Class

Hadoop multiple input files Counter Reducer Class

Output Files

15    1
17    1
18    1
19    2
20    2
21    2

This is the main concept of Hadoop multiple input files .Share this knowledge ! Join us on Facebook ! Now Whatsapp sharing is supportable ! BookMark our HadoopTpoint.com ! Any Doubts Comment below .

Speak Your Mind

*