The Hadoop ecosystem is diverse and grows by the day. It’s impossible to keep track of all of the various projects that interact with Hadoop.Actually Hadoop is best for storing (HDFS) and processing (MapReduce),the is also used for a family of related projects that fall under the umbrella of infrastructure for distributed computing and large-scale data processing. MapReduce is not for the faint of heart, which means the goal for many of these Hadoop-related projects is to increase the accessibility of Hadoop to programmers and nonprogrammers.
A distributed data processing model and execution environment that runs on large clusters of commodity machines.
A distributed filesystem that runs on large clusters of commodity machines.
A data flow language and execution environment for exploring very large data sets. Pig runs on HDFS and MapReduce clusters.
A distributed data warehouse. Hive manages data stored in HDFS and provides a query language based on SQL (and which is translated by the runtime engine to MapReduce jobs) for querying the data.
A distributed, column-oriented database. HBase uses HDFS for its underlying storage, and supports both batch-style computations using MapReduce and point queries (random reads).