Apache Spark interview questions and answers for Experienced and Freshers

Our HadoopTpoint App is now available in google play store,please rate and comment it in play store : W3Schools

Apache Spark interview questions Apache Spark is one of the trending project in Apache organisation.From Last two years so many companies are moving their projects from Mapreduce framework to Apache Spark.Apache Spark is extension of Mapreduce Model. Here we Providing Most asked interview Question on Spark.Here is the list of Apache Spark interview questions and answers for Experienced and Freshers .

1) What is Apache Spark ?

It is a cluster computing platform designed for general and fast purposes.Spark is essentially a fast and flexible data processing framework. It has a capable of getting data from hdfs,hbase,cassandra and others.It has an advanced execution engine supporting cyclic data flow with in-memory computing functionalities

2) What are the features of Apache Spark?

  •    In-Memory Computation
  •    RDD (Resilient Distributed Dataset)
  •    Supports many languages
  •    Integration with Hadoop
  •    fast processing
  •    Real time stream processing

3) What is RDD ?

RDD (Resilient Distrubution Datasets) : Collection of objects that runs in parallel.Partitions data in RDD is immutable and is distributed in nature.

4) What operations does the  Apache Spark RDD support ?

  • Transformations
  • Actions

Transformations are two types

  1. Narrow Transformation
  2. Wide transformation

5) Define Transformations in Apache Spark ?

“Transformations” are functions applied on RDD, gives a new RDD. Transformations does not execute until an action occurs.

map() and filer() are examples of “transformations”.The filter() creates a new RDD by selecting elements from the current RDD.

Apache Spark interview questions

6) Define Actions in Apache Spark ?

“Action” take back the data from the RDD to the local machine. Execution of “action” is the result of all transformations created previously. fold() is an action that implements the function passed again and again until only one value is left.

Apache Spark interview questions

Apache Spark interview questions and answers for Experienced and Freshers

Apache Spark interview questions and answers for Experienced and Freshers

7) What are the commonly used Ecosystems in Apache Spark ?

  • Spark Streaming
  • Spark Sql
  • Spark Mllib
  • Spark graphx

8) What is Spark Core ?

SparkCore performs memory management, monitoring jobs, fault tolerance, job scheduling and interaction with storage systems.

RDD in Spark Core makes it fault tolerance. RDD is a collection of items distributed across many nodes that can be manipulated in parallel.

Apache Spark interview questions

9) What is Spark SQL ?

 Spark SQL is a Spark module for structured data processing.Spark SQL is almost similar to SQL and it supports also Hive Query Language.There are several ways to interact with Spark SQL including SQL, the DataFrames API and the Datasets API.

10) What is Spark Streaming ?

Spark Streaming allows stream processing of live data streams.Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards.

Apache Spark interview questions

11) What is Spark Graphx ?

Spark GraphX is a component in Spark which is used for graph processing (Social Media Friends Recommendation).

12) What is Spark MLlib ?

 Spark MLlib is supporting for Machine Learning Algorithms before Spark MLlib hadoop using Apache Mahout for Machine Leaning Algorithms.It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.Machine Learning Algorithms are mainly used for predictions,Recommendations and other purposes.

13) Which File System does Apache Spark support ?

Spark can create distributed datasets from any storage source supported byHadoop, including your local file system, HDFS, Cassandra, HBase, Amazon S3, etc.

14) What are the cluster modes in Apache Spark ?

The Spark framework supports three kinds of Cluster Managers:

  • Standalone
  • Apache Mesos
  • YARN

15) What is Hadoop Yarn ?

Yarn means yet another resource negotiator.Yarn is a cluster Management technology.It introduced from Hadoop 2.X version.Yarn mainly used for reduce the burden on Mapreduce.

Apache Spark interview questions

16)What is Apache Mesos ?

Apache Mesos is one of the cluster Management technology like Yarn.It “provides efficient resource isolation and sharing across distributed applications, or frameworks”.

17) What is Standalone Mode ?

Spark Cluster can also be run with out support of Yarn and Apache Mesos or any other cluster Manager.Spark can run itself called Standalone Mode.

18) What is a Apache Spark Executor ?

When “SparkContext” connects to a cluster manager, it acquires an “Executor” on the cluster nodes. “Executors” are Spark processes that run computations and store the data on the worker node. The final tasks by “SparkContext” are transferred to executors.

19) What is a Apache Spark Work Node ?

Spark Work Node is a slave node.“Worker node” refers to any node that can run the application code in a cluster.

20) How does create a spark RDD ?

We can create a RDD in 2 ways

i) parallelize

ii) textFile

val a= Array(4,6,7,8)

val b= sc.parallelize(a)

val input = sc.textFile(“input.txt”);

Apache Spark interview questions

21) What does the Spark Engine Do ?

Spark Engine is responsible for scheduling, distributing and monitoring the data application across the cluster.

22) What is RDD Partitions ?

A “Partition” is a smaller and logical division of data, that is similar to the “split” in Map Reduce. Partitioning is the process that helps derive logical units of data in order to speed up data processing.

Here’s an example:  val someRDD = sc.parallelize( 1 to 100, 4)

In the above example 4 Mention the number of Partitions in RDD

23) Apache Spark Supported File Formates ?

Spark supports text files, SequenceFiles, and any other Hadoop InputFormats.

This all are Repeated Apache Spark interview questions for Freshers and Experienced. Share this knowledge ! Join us on Facebook ! Subscribe For our Website and Youtube channel  ! BookMark our HadoopTpoint.com ! Any Doubts Comment below . 

Comments

  1. Firstly thanks for the questions, but, Please correct the grammatical mistakes and errors.

Speak Your Mind

*