Apache Spark Examples on Character Count

Our HadoopTpoint App is now available in google play store,please rate and comment it in play store : W3Schools

Apache Spark is nothing but a Hadoop MapReduce but comparing MapReduce Apache Spark can run the programs up to 100 times faster than MapReduce in Memory and 10 times faster on disk.Apache Spark is a fast and general engine for large scale data processing.Here is the Apache Spark Examples on Character Count .

Here is the Apache Spark Examples on Character Count which is written in Scala language.In MapReduce we wrote the code in java and other hadoop streaming languages but in Apache Spark most of the people will prefer to write the code in Scala language rather than java language.we will discuss about spark streaming in our future posts.

Also Read Introduction To Apache Spark

Apache Spark Examples Input

 

In the above input file we have so many characters but here our task is we have to show how the list of characters which character is appeared for one time and which character is appeared for two times etc..

 

Here is the clear steps for Apache Spark Examples on Character Count

Starting Spark on terminal
start-master.sh
start-slave.sh spark://your hostname:7077
spark-shell

Step 1 :: Load the text file form local or HDFS
val textFile= sc.textFile(“InputPath”)

Step 2:: Mapper
val counts=textFile.flatMap(line=>line.split(“”).map(char=>(char,1))

Step 3:: Reducer
counts.reduceByKey(_+_).collect()

Step 4:: swapping key and value to value and key
val reverseMap=for((k,v)<-counts)yield(v,k)

Step 5::  grouping key values and saving as textfile
reverseMap.groupByKey().sortByKey().coalesce(1,true).saveAsTextFile(“outputPath”)

Apache Spark Examples Program

Apache Spark Examples on Character Count Output

So this is the Apache Spark Examples on Character Count using Scala language.Please do subscribe for more updates from us,please share and comment your opinion about this post .

Speak Your Mind

*