Apache Spark is nothing but a Hadoop MapReduce but comparing MapReduce Apache Spark can run the programs up to 100 times faster than MapReduce in Memory and 10 times faster on disk.Apache Spark is a fast and general engine for large scale data processing.Here is the Apache Spark Examples on Character Count . Here is the Apache […]
Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics.
The Future of SQL on Apache Spark SQL When the Shark project started 3 years ago, Hive (on MapReduce) was the only choice for SQL on Hadoop. Hive compiled SQL into scalable MapReduce jobs and could work with a variety of formats (through its SerDes). However, it delivered less than ideal performance. In order […]
Introduction to Apache Spark Apache Spark Apache Spark is a open source processing engine for Hadoop data built around speed and sophisticated analytics anf easy to use. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010. Spark has quickly become one of the largest open source communities in big […]