Netease great way to learn to share data engineer

The reason why all of a sudden pick up a pen, because this is just his spare time learning system Spark, the whole idea reminds me of learning to learn some of the ways the course of "analog electronic circuit" during college, personally feel that learning can be used as a template and to share share (This article talk about how the system efficiently learn a skill or a course, have to learn the purpose of the assault, please pass).

Whether it is learning or learning Spark Technology "Analog electronics" course, summed up, in general we have gone through several stages so:

1. acquaintance (10%): systematically go over the entire contents of "power mode" is generally listen to it again the teacher's curriculum, Spark look at the relevant information everywhere, write a little test code in the test environment. This process does not require special care, it does not require in-depth understanding of how, just to have a basic understanding of the concepts can be. Acquaintance stage is usually no way to establish a knowledge of architecture.

2. build a body of knowledge (20%): After a preliminary understanding of the basic concepts need to go over all the content, does not require the same attention to detail, but need to focus on core chapter and the chapter point system:

[Big Data to develop learning materials collection method: Join Big Data learning technology exchange group 458,345,782, click a group chat, private letters administrator can receive a free

This course what chapters, such as Spark Spark whole can be divided into a core principle, Spark shipped curtain optimization, SparkSQL few most. Spark core principles which can be divided into RDD introduction, DAG description, Spark task scheduling, Spark storage management system, Spark resource management system, and several other modules, Spark Spark deployed into operation and maintenance optimization, Web UI, etc.

Further, the need to list the core point for each module, without more, each chapter only need to list three or four core point to, such as Spark resource management module management strategies need to focus on static resources and dynamic resource management strategies

After this process, a thick book Spark / "power mode" can become a thin few pages, a page corresponding to a particular module, the module records the core point. I still remember the year before the exam to other students at the time holding a thick book-mode to roll, I just took a few pages to review (the highest in the whole system and single subject results).

The second step is the most important in the whole methodology, building a knowledge architecture, building knowledge tree of knowledge-depth understanding of spectrum is critical. Only those with a global vision, you now know what you learn, what did not learn. Like building a house, you need to first drawing paper, there is a global design, no one would think of it today, the east side to build a garden, put forward a swimming pool in the west tomorrow. Many students when learning a skill is always here Aspect information, learn where the optimization method, the lack of global vision, lack of systematic, learned forever shattered. And this time you can already in full flight, from shallow and deep summarize a book. The following two figures is the author of a core learning Spark outline their own columns, and optimization part of the core points of the chapter:

webp


webp

3. In-depth exploration (20%): After you set up knowledge, we need to be more thorough. At this point you can throw away the textbooks, only to break a shining core of a focus point for each section, such as the now-depth study of the dynamic resource scheduling resource scheduling module Spark, it would utilize all resources (google, the official blog can be used to , spark document, youtubo, jira, source code, etc.) to retrieve dynamic resource scheduling information, in-depth understanding of how it works. Such a break after a whole body of knowledge is even more fullness. This stage requires stringent artisan spirit, you need to collect all kinds of information to understand, to think why. At this point, you are already at the theoretical level of the so-called "expert" was. This stage has two suggestions:

(1) classical content retrieval: this stage fight is everybody's search capabilities and reading comprehension skills, I strongly recommend technical people pay more attention to foreign technology Daniel's blog and the official blog, documents, materials such as Spark can focus on are:

Official documents: http: //spark.apache.org/docs/latest/index.html

The official blog: https: //databricks.com/blog

Youtube videos (more): https: //www.youtube.com/watch v = cs3_3LdCny8?

Domestic Okami: https: //github.com/JerryLead/SparkInternals/tree/master/markdown

(2)画图整理:网络上关于Spark的内容有很多很多,经典的内容更不少,一般遇到经典内容之后都会一口气读完,再加入书签。然,久而久之,很多内容都会慢慢模糊,当你再想去查的时候已经不知道是哪个博客的内容了,相信很多人会有这样的苦恼。针对这样的问题,需要将一些自己体会非常深刻的内容记录下来,建议使用画图工具,俗话说一图胜千言,比如学习SparkSQL时为了理解SQL的整个解析过程,笔者就简单地将一个简单的SQL的执行计划通过几张图表示出来:


webp

4. 实践探索(30%): 第三步完成之后,相信你已经可以就这项技能和别人谈笑风生了,但也就仅此而已。一旦别人问你一个线上问题,相信你就会从滔滔不绝变得支支吾吾,因为你缺少实践。当然,只有在知识体系构建完成后的实践才是真正意义上的实践。有理论依据作为支撑,实践才有更多意义。实践是一个遇坑填坑的过程,没有遇到坑也不能称为实践。因为只有遇到问题,你才会完整地将监控、日志信息利用起来追踪整个系统工作流程,你才会真真切切地去想如何通过修改配置、修改源码来进一步改造它。

这个阶段,主要考察你解决问题的能力,一般来说通常就三板斧:监控、日志和源码。监控分为硬件监控以及业务监控,两者都需要看懂并会分析。日志也有很多,比如业务日志、GC日志等,需要能力根据异常猜测问题原因并进行验证。如果前者都失效,就只能分析源码。问题分析是第一步,更重要的,你还要提出一个高效地解决方案,这个可能是领导/面试官更看重的

实践探索不可能一撮而就,需要不断的踩坑填坑,所以需要一颗大心脏。

5. 分享交流(20%): 上面四步都是你自己对知识的理解,你还需要看看同行是如何理解的。实践结束之后一定记得需要以博客的形式系统完整的将这个模块完完整整、成体系地、由浅及深地进行复盘整理、分享交流!这个阶段可以让你认识更多圈子里的朋友,一起交流探讨才能不寂寞

There are countless ways of learning, for their own is the best. I think this is just a bit of learning, sharing out the one hand, in order to better improve themselves and on the other hand we also want to be able to provide a reference. Of course, learning is never an easy thing, but it was never a difficult thing. mutual encouragement.


Guess you like

Origin blog.51cto.com/14217196/2406128