MapReduce Program In Detail

0
4106

MapReduce Program In Detail

In our previous guides, we saw how to run wordcount MapReduce program on a single node Hadoop cluster. Now we will understand the MapReduce program in detail with the help of wordcount MapReduce program.

MapReduce is a system for parallel processing of large data sets. MapReduce reduces the data into results and creates a summary of the data. A MapReduce program has two parts – mapper and reducer. After the mapper finishes its work then only reducer’s start.

Mapper: It maps input key/value pairs to a set of intermediate key/value pairs.

Reducer: It reduces a set of intermediate values which share a key to a smaller set of values.

Basically, in the wordcount MapReduce program, we provide input file(s) – any text file, as input. When the MapReduce program starts, below are the processes it goes through:

Splitting: It splits the each line in the input file into words.

Mapping: It forms a key value pair, where word is the key and 1 is the value assigned to each key.

Shuffling: Common key value pairs get grouped together.

Reducing: The values of similar keys are added together.

From the below snapshot you can see the complete MapReduce workflow.
MapReduce workflow

Let us see the practical implementation of wordcount program now. In a common MapReduce process, two methods do the major job, they are map and reduce methods. There is main method also in which we define all the job configurations. You can keep map, reduce and main methods in separate class files also, here I am taking in one class file.

The data types provided here are Hadoop specific data types designed for operational efficiency suited for massive parallel and lightning fast read write operations. All these data types are based out of java data types itself, for example LongWritable is the equivalent for long in java, IntWritable for int and Text for String. The input to mapper is a single line, then key value pair is formed and passed to reducer where the aggregation happens.

LEAVE A REPLY

Please enter your comment!
Please enter your name here