Follow this roadmap to learn.
1. Three articles by M. Tim Jones: Distributed Data Processing
with Hadoop Part 1 (Introduction): http://www.ibm.com/developerworks/cn/linux/l-hadoop-1/index.html
Distributed Data Processing with Hadoop Part 2 (Advanced): http://www.ibm.com/developerworks/cn/linux/l-hadoop-2/index.html
Distributed Data Processing with Hadoop Part 3 (Application Development): http://www.ibm.com/developerworks/cn/linux/l-hadoop-3/index.html
2. The blog of "Stars in the Galaxy", in which the Google paper series (including Groundbreaking paper "MapReduce: Simplifying data processing on large clusters"), introduction to search and distribution
[google paper three] MapReduce simplifies data processing on large clusters: http://duanple.blog.163.com/blog /static/709717672010923203501/
The Map/Reduce program for word frequency statistics can be found here: http://blog.csdn.net/shijinupc/article/details/7522446
Google paper series:http://duanple.blog.163.com/blog/#m=0&t=3&c=google
is connected according to Hadoop components: http://duanple.blog.163.com/blog/static/7097176720119791920962/
3. IBM Other Hadoop articles on developerWorks, search with Hadoop keywords on dw, you can find a lot of Hadoop articles.
Here are some to see:
Introduction to Hadoop Distributed File System: http://www.ibm.com/developerworks/cn/web/ wa-introhdfs/index.html
uses Apache Pig to process data: http://www.ibm.com/developerworks/cn/bigdata/basic.html
4. Introduction in "Open Source Software Architecture "
(Volume 1 Chapter 8) HDFS --Hadoop Distributed File System: http://www.ituring.com.cn/article/4299
English original: http://www.aosabook.org/en/index.html (Volume 1, Chapter 8)
5. The official blog of Alibaba Group Data Platform, which contains a lot of Hadoop research and application experience
http://www.alidata.org/archives
6. The official blog of Baidu Search R&D Department, mainly including experience in distributed systems (Hadoop), search technology, data mining, large-scale website architecture , etc.
7. Dong's blog, research on Hadoop and distributed systems
http://dongxicheng.org/recommend/
8. Of course, the official documents are indispensable, mainly including the construction of Hadoop cluster, the use of MapReduce, and the introduction of HDFS architecture. The
stable version is given priority: http://hadoop.apache.org/docs/stable/
The latest version (including the following The first generation of MapReduce is the introduction of YARN): http://hadoop.apache.org/docs/current/
9, caibinbupt's blog, Hadoop source code analysis series
http://caibinbupt.iteye.com/?page=6
thousand of Column, Hadoop-0.20.0 source code analysis
http://blog.csdn.net/shirdrn/article/category/595039/3
10. spork's blog, among which the series about Hadoop
http://www.cnblogs.com/spork /category/226077.html
11. chinacloud's blog, some experience in Hadoop architecture and distributed system design
http://www.cnblogs.com/chinacloud/archive/2010/12/03/1895369.html
12, beanmoon's blog, the Hadoop series
http://www.cnblogs.com/beanmoon/
Reprinted from: http://blog.csdn.net/zhoudaxia/article/details/8801769