The Future of SQL on Apache Spark SQL
When the Shark project started 3 years ago, Hive (on MapReduce) was the only choice for SQL on Hadoop. Hive compiled SQL into scalable MapReduce jobs and could work with a variety of formats (through its SerDes). However, it delivered less than ideal performance.
In order to run queries interactively, organizations deployed expensive, proprietary enterprise data warehouses (EDWs) that required rigid and lengthy ETL pipelines. Shark became one of the first interactive SQL on Hadoop systems.
From Shark to Spark SQL:
Shark built on the Hive codebase and achieved performance improvements by swapping out the physical execution engine part of Hive. While this approach enabled Shark users to speed up their Hive queries, Shark inherited a large, complicated code base from Hive that made it hard to optimize and maintain.
Also Read Introduction To Apache Spark
As we moved to push the boundary of performance optimizations and integrating sophisticated analytics with SQL, we were constrained by the legacy that was designed for MapReduce.
It is for this reason that we are ending development in Shark as a separate project and moving all our development resources to Spark SQL, a new component in Spark. We are applying what we learned in Shark to Spark SQL, designed from ground-up to leverage the power of Spark.
This new approach enables us to innovate faster, and ultimately deliver much better experience and power to users.
For SQL users, Spark SQL provides state-of-the-art SQL performance and maintains compatibility with Shark/Hive. In particular, like Shark, Spark SQL supports all existing Hive data formats, user-defined functions (UDF), and the Hive metastore.
While Spark SQL is becoming the standard for SQL on Spark, we do realize many organizations have existing investments in Hive. Many of these organizations, however, are also eager to migrate to Spark.
Share this knowledge ! Join us on Facebook ! Now Whatsapp sharing is supportable ! BookMark our HadoopTpoint.com ! Any Doubts Comment below .