Hadoop Falcon Tutorial
Hadoop Falcon is a framework for managing data processing in Hadoop clusters or a data life cycle management framework.In Understanding way Falcon is a framework for simplifying and planning data management and pipeline processing in hadoop.Falcon can manage data management and processing pipelines,replication,work flow and compliance use cases.It also easily integrated with YARN.
This Falcon architecture is center of hadoop,to centrally manage the cluster’s data governance, maximize data pipeline reuse and enforce consistent data lifecycles.
Advantages of Hadoop Falcon
The Apache Falcon community is working to enhance operations, support for transactional applications and improved tooling.
What Hadoop Falcon Does
Falcon simplifies the development and management of data processing pipelines with a higher layer of abstraction, taking the complex coding out of data processing applications by providing out-of-the-box data management services. This simplifies the configuration and orchestration of data motion, disaster recovery and data retention workflows.
Apache Falcon meets enterprise data governance needs in three areas:
How Hadoop Falcon Works
How Hadoop Falcon Works Internally
A user creates entity specifications and submits to Falcon using the Command Line Interface (CLI) or REST API. Falcon transforms the entity specifications into repeated actions through a Hadoop workflow scheduler. All the functions and workflow state management requirements are delegated to the scheduler. By default, Falcon uses Apache Oozie as the scheduler.
Recent Versions of Hadoop Falcon