What Started Hadoop --- Hadoop that?

Brief summary: Hadoop is used by the Apache organization to develop a Java language to deal with large data storage and distributed computing open source framework.

Hadoop origin

2003-2004, Google announced the details of the GFS and MapReduce ideas, inspired by the Doug Cutting, who used his spare time to achieve a two-year DFS and MapReduce mechanism so that Nutch performance soar. Then Yahoo amnesty Doug Gutting and its projects. 
In 2005, Hadoop as part of Nutch Lucene subproject of the Apache Foundation formally introduced. 
February 2006 is separated out into a complete stand-alone software, named Hadoop 
Hadoop name is not an acronym, but a coin out of words. Hadoop is the father of the son of Doug Cutting stuffed toy elephant named.  
Hadoop's growing 
Lucene-> Nutch-> Hadoop

To sum up, Hadoop originated in Google's three major papers 
GFS: Google's distributed file system File System Google 
MapReduce: Google's open-source MapReduce distributed parallel computing framework 
BigTable: a large-scale distributed database

The evolution of the relationship between 
GFS -> HDFS 
Google MapReduce -> Hadoop MapReduce 
BigTable -> HBase

History of Hadoop development

Hadoop Memorabilia 
2004 - Initial version (now called HDFS and MapReduce) of Doug Cutting and Mike Cafarella started. 
December 2005 - Nutch migration to the new framework, Hadoop stable operation on 20 nodes. 
January 2006 - Doug Cutting joined Yahoo. 
February 2006 - Apache Hadoop project was officially launched to support the independent development of MapReduce and HDFS. 
February 2006 - Yahoo's grid computing team uses Hadoop. 
April 2006 - Standard sorting (10 GB per node) running on 188 nodes 47.9 hours. 
May 2006 - Yahoo Hadoop study established a 300 node cluster. 
May 2006 - Standard sorting run 42 hours (hardware configuration is better than in April) on 500 nodes. 
November 2006 - Research cluster to 600 nodes. 
December 2006 - Standard sorting run 1.8 hours on 20 nodes, 100 nodes 3.3 hours, 5.2 hours 500 nodes, 900 nodes 7.8 hours. 
January 2007 - to reach 900 research cluster node. 
April 2007 - Research Cluster reach two 1000 cluster nodes. 
April 2008 - to win the world's fastest 1TB data sorting used on 900 nodes 209 seconds. 
July 2008 - Yahoo test nodes to a 4000 
September 2008 -  Hive Hadoop subprojects to be 
in November 2008 - Google announced its MapReduce with 68 seconds of the program to sort 1TB 
October 2008 - Research Cluster for loading data 10TB per day. 
2008 -  Taobao put into research Hadoop-based systems - ladder . The total capacity of about ladder 9.3PB, 1,100 machines, process 18000 jobs per day, 500TB scan data. 
March 2009 --17 clusters total of 24,000 machines. 
March 2009 -  Cloudera launched CDH (Cloudera's Dsitribution Including the Apache Hadoop) 
2009 Nian 4 Month - to win every minute, the sort 500 GB and within 173 minutes Sort 100 TB of data (over 1400 nodes) (in 3400 within 59 seconds Yahoo! the nodes). 
May 2009 - Yahoo team uses Hadoop to sort 1 TB of data in just 62 seconds. 
July 2009 -  Hadoop Hadoop Core project was renamed the Common; 
July 2009 -  MapReduce and Hadoop Distributed File System (HDFS) as an independent subprojects Hadoop project.  
July 2009 -  Avro and Chukwa become the new Hadoop subprojects.  
September 2009 - Asia-Link Hadoop BI team follow-up study began 
in December 2009 - Asia-Link made orange cloud strategy, we began to study Hadoop 
2010 Nian 5 Yue - Avro from the Hadoop project, become top-level Apache project. 
May 2010 - HBase from Hadoop project, become top-level Apache project. 
May 2010 - IBM provides --InfoSphere Hadoop-based big data analytics software BigInsights, including the Basic Edition and Enterprise Edition. 
September 2010 - Hive (Facebook) from Hadoop, become a top-level Apache project. 
September 2010 - Pig out of Hadoop, become a top-level Apache project. 
January 2011 -  ZooKeeper out of Hadoop, become a top-level Apache project.  
March 2011 - Apache Hadoop obtained Media Guardian Innovation Awards. 
March 2011 - Platform Computing announced support for Hadoop MapReduce API in its Symphony software. 
May 2011 - Mapr Technologies Introduces Distributed File System and MapReduce engine --MapR Distribution for Apache Hadoop. 
May 2011 - HCatalog 1.0 release. The project in March 2010 proposed by Hortonworks, HCatalog mainly used to solve data storage problems of metadata, mainly to solve the bottleneck of HDFS, it provides a place to store status information data, which makes data clean-up and archiving tools easily handled. 
April 2011 - SGI (Silicon Graphics International) to provide optimized solutions based on Hadoop SGI Rackable and CloudRack server product line. 
May 2011 - EMC customers with the launch of a new data center equipment based on open source Hadoop solutions --GreenPlum HD, to help it meet growing customer demand for data analysis and accelerate the use of open-source data analysis software. EMC Greenplum is the acquisition in July 2010 of an open source data warehousing company. 
May 2011 - after the acquisition of Engenio, NetApp E5400 storage system products launched in conjunction with Hadoop applications. 
June 2011 - Calxeda company (before the company's name is Smooth-Stone) launched a "pioneer action", consisting of 10 software companies in the team will be based on Calxeda server upcoming ARM chip design support system . And provides a low-power server technology Hadoop. 
June 2011 - data integration vendor Informatica has released its flagship product, the product is designed to handle massive amounts of data in today's business and social media produced while supporting Hadoop. 
July 2011 - Yahoo! and Silicon Valley venture capital firm Benchmark Capital created Hortonworks company, designed to make Hadoop more robust (reliable), and allows business users to more easily install, manage and use Hadoop. 
August 2011 - Cloudera announced a plan to benefit the partner ecosystem - to create an ecosystem for hardware vendors, software vendors and system integrators can explore together how to use Hadoop better insight into the data. 
August 2011 - Dell and Cloudera Hadoop solutions jointly launched --Cloudera Enterprise. Cloudera Enterprise-based Dell PowerEdge C2100 rack servers and Dell PowerConnect 6248 Ethernet Switch

Four characteristics of Hadoop (advantage)

    1. Capacity expansion (Scalable): Hadoop is allocated among the available data in the computer cluster and complete computing tasks, these clusters can be used to facilitate the expansion of thousands of nodes.
    2. Low cost (Economical): Hadoop distributed data processing as well as by ordinary inexpensive machine consisting of a server cluster that is very low cost.
    3. High efficiency (Efficient): by concurrent data, the Hadoop dynamic parallel can move data between nodes, such that very fast.
    4. Reliability (Rellable): can automatically maintain multiple copies of data, and can automatically re-deployment (the redeploy) after a failure computing tasks. So Hadoop basis of the capability to store and process data bits worthy of trust.

Guess you like

Origin www.cnblogs.com/shun7man/p/11521257.html