Built-in Mathematical Functions in Apache hive


Built-in Mathematical Functions in Apache hive The following built-in mathematical functions are supported in hive,most return NULL when the argument(s) are NULL.  

Finding Frequent Itemsets using Hadoop-MapReduce Model


Finding Frequent Itemsets using Hadoop-MapReduce Model Frequent sets play an essential role in many Data Mining tasks that try to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers and clusters. The mining of association rules is one of the most popular problems of all these. The identification of sets of […]

Apache Storm Tutorial


Apache Storm Tutorial Introduction: Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.  Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: […]

Apache Oozie Tutorial


Apache Oozie Tutorial Apache  Oozie is a Java Web application used to schedule Apache Hadoop jobs.It is integrated with the Hadoop stack and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop. There are two basic types of Oozie jobs: 1)Oozie Workflow : An Oozie Workflow is a collection of actions arranged in […]

Hadoop Ecosystem

Hadoop elephants

Hadoop Ecosystem  Distributed FileSystem: Apache HDFS: The Hadoop Distributed File System (HDFS) offers a way to store large files across multiple machines. Red Hat GlusterFS: GlusterFS is a scale-out network-attached storage file system. Quantcast File System QFS: (QFS) is an open-source distributed file system software package for large-scale MapReduce or other batch-processing workloads Ceph Filesystem: […]