Hadoop Hive ORC File Format

Hadoop Hive ORC File Format

ORC File Format Full Form is Optimized Row Columnar File Format.ORC File format provides very efficient way to store relational data then RC file,By using ORC File format we can reduce the size of original data up to 75%.Comparing to Text,Sequence,Rc file formats ORC is better .

Using ORC files improves performance when Hive is reading, writing, and processing data.Comparing to Text,Sequence and Rc.RC and ORC shows better performance than Text and Sequence File formats.

Again Comparing to RC and ORC File formats always ORC is better.ORC takes less time to access the data comparing to RC File Format and ORC takes Less space  space to store data.However, the ORC file increases CPU overhead by increasing the time it takes to decompress the relational data.ORC File format feature comes with the Hive 0.11 version and cannot be used with previous versions. 

Sytntax To Create ORC File Format Table

Generally ORC File format are used for improve the performance of Hive Query and reduce the access time and reduce the storage space,At that time we better use Partition and Bucket concepts including with ORC File Format .Here is the example table of creating a hive table with Partition,Bucket and ORC File Format

In The above we are declaring properties of ORC table properties

orc.compress indicates the compression techniques like NONE,Snappy,LZO etc

orc.stripe.size indicates blocks size of file

orc.row.index.stride indicates index

Inserting The data into airanalytics table

Advantages With ORC File Format

i) column stored separately

ii) Stores statistics (Min,Max,Sum,Count)

iii) Has Light weight Index

iv)  Larger Blocks by default 256 MB

v) Reduce The accessing Time and storage Space 


  1. Venkatesh Kamthane says:

    can we merge two ORC files programatically. without using ALTER TABLE CONCATENATE command from hive?

Speak Your Mind