Hadoop History

Hadoop History

Hadoop History

Hadoop was created by Doug Cutting who had created the Apache Lucene(Text Search),which is origin in Apache Nutch(Open source search Engine).Hadoop is a part of Apache Lucene Project.Actually Apache Nutch was started in 2002 for working crawler and search system.Nutch Architecture would not  scale up to billions of pages on the web.

In 2003 google had published one Architecture  called Google Distributed Filesystem(GFS),which was solve the storage need for the very large files generated as a part of the web crawl and indexing process.

In 2004 based on GFS architecture Nutch was implementing open source called the Nutch Distributed Filesystem (NDFS).In 2004 google was published Mapreduce,In 2005 Nutch developers had working on Mapreduce in Nutch Project.Most of the Algorithms had been ported to run using mapreduce and NDFS.

In February 2006 they moved out of Nutch to form an independent subproject of Lucene called Hadoop.At around the same time, Doug Cutting joined Yahoo!, which provided a dedicated team and the resources to turn Hadoop into a system that ran at web scale. This was demonstrated in February 2008 when Yahoo! announced that its production search index was being generated by a 10,000-core Hadoop cluster.

In January 2008, Hadoop was made its own top-level project at Apache, confirming its success and its diverse, active community. By this time, Hadoop was being used by many other companies besides Yahoo!, such as Last.fm, Facebook, and the New York Times.

In April 2008, Hadoop broke a world record to become the fastest system to sort a terabyte of data. Running on a 910-node cluster, Hadoop sorted one terabyte in 209 seconds (just under 3½ minutes), beating the previous year’s winner of 297 seconds.

Hadoop History

Hadoop History


Add a Comment
  1. Very Nice Article keep it up

  2. Very good and Useful article..

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

HadoopTpoint © 2017 Frontier Theme