Spark SQL Tutorial Introduction

Spark SQL Tutorial Introduction : Spark SQL is one of the main ecosystem on Apache Spark.It almost same as Hive but some new advantages are added to this Apache Spark SQL.Spark’s interface for working with structured and semi structured data. Structured data is any data that has a schema—that is, a known set of fields for each record. When you have this type of data, Spark SQL makes it both easier and more efficient to load and query. In particular, Spark SQL provides three main capabilities .

i) It can load data from a variety of structured sources (e.g., JSON, Hive, and Parquet).

ii) It lets you query the data using SQL, both inside a Spark program and from external tools that connect to Spark SQL through standard database connectors (JDBC/ODBC), such as business intelligence tools like Tableau.

iii) When used within a Spark program, Spark SQL provides rich integration between SQL and regular Python/Java/Scala code, including the ability to join RDDs and SQL tables, expose custom functions in SQL, and more. Many jobs are easier to write using this combination.

To implement these capabilities, Spark SQL provides a special type of RDD called SchemaRDD. A SchemaRDD is an RDD of Row objects, each representing a record. A SchemaRDD also knows the schema (i.e., data fields) of its rows. While SchemaRDDs look like regular RDDs, internally they store data in a more efficient manner, taking advantage of their schema.

In addition, they provide new operations not available on Spark RDDs, such as the ability to run SQL queries. SchemaRDDs can be created from external data sources, from the results of queries, or from regular RDDs.

Spark SQL Tutorial Introduction

Spark SQL Tutorial Introduction

Spark SQL Tutorial Introduction

In future we will clearly discuss about how to load verity of data using spark sql.We’ll then describe the Spark SQL JDBC server, which lets you run Spark SQL on a shared server and connect either SQL shells or visualization tools like Tableau to it.

Finally, we’ll discuss some advanced features. Spark SQL is a newer component of Spark and it will evolve sub‐ stantially in Spark 1.3 and future versions, so consult the most recent documentation for the latest information on Spark SQL and SchemaRDDs.This is Spark SQL Tutorial Introduction.

SubScribe and Like our Youtube and FaceBook Pages

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

HadoopTpoint © 2017 Frontier Theme