Introduction to Hive
What is Hive
Hive is a data warehouse software which is used for facilitates querying and managing large data sets residing in distributed storage.Hive language almost look like SQL language called HiveQL.Hive is designed to enable easy data summarization.Hive also allows traditional map reduce programs to customize mappers and reducers when it is inconvenient or inefficient to execute the logic in HiveQL (User Defined Functions UDFS).Hive can easily integrated with other data technologies by using Hive JDBC connection For More click here (Hive)
Introduction to Hive is not
What Hive is Not
Hive is a batch processing system and hive jobs takes much latency to execute the quires comparing to other databases like Oracle.In Oracle databases it can supports only GBs of data but in Hive we can execute More than TBs of data.Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries.Hive is not designed for online transaction supports and does not offer real-time queries and row level updates. It is best used for batch jobs over large sets of immutable data (like web logs).How Facebook Uses Hive Click Here (Hive FaceBook)
The Stinger Initiative successfully delivered a fundamental new Apache Hive, which evolved Hive’s traditional architecture and made it faster, with richer SQL semantics and petabyte scalability.
Three Key Facets of Hive
Introduction to Hive Releases
Recent Hive Releases
Introduction to Hive Features
Hive Features Included
i) Easy to enable tools for ETL(extract/transform/load)
ii) Stores variety of data
iii) Directly store the data on top of HDFS or Apache Hbase
iv) Mapreduce Execution Internally
v) Best used for batch jobs over large sets of append-only data (like web logs).
vi) Users very comfortable with SQL
vi) Developed by facebook and contributed by facebook
vii)Custom, aggregations,table functions available UDFs (User defined functions) UDAFs(User defined aggregation functions),and table functions (UDTF’s).
viii) Hive works equally well on Thrift
ix) Apache Derby default one for Hive ,Mysql can optionally be used
Introduction to Hive Applications
Hive Applications Include
- Data Mining
- Document Indexing
- Predictive modeling, and Hypothesis testing
- Customer-facing Business Intelligence (e.g., Google Analytics)
- Log processing
- Hive is not designed for OLTP workloads and does not offer real-time queries or row-level updates.