The most complete big data learning route in 2021 (recommended collection)

Personal profile: Double first-class master's degree in non-technical courses, switched to big data in the direction of CAE simulation, and now is a big data engineer in a large factory in Hangzhou!

I can provide you with the most comprehensive [big data learning route]; help you build a big data knowledge system, from entry to proficiency; and personally guide you on how to write a resume for a big data engineer interview!

Preface

This article puts forward some practical suggestions for the problems encountered by non-major students switching to big data, so as to avoid detours in the learning process.

Based on some detours I have taken when changing careers, I have summarized my own detailed route of big data learning, and recommended some big data courses and technical books I have read to help you select resources.

I can understand everyone who is about to change careers or is changing careers. You may be worried about whether you can learn so much big data knowledge, or you may be worried about what you will do after the age of 35.

I once fell into the same anxiety and confusion. These emotions are entirely caused by the gap between the height you expect and the height you are currently at during the learning process. They are all normal emotions.

But I think we should not worry too much, because the future is always unpredictable. No one knows what we will do after the age of 35. We should not worry prematurely or limit ourselves prematurely. What we choose now You may not have to work until you are old in the future.

Therefore, at present, we should first precipitate ourselves and build our core competitiveness in the future. The first step is to save our first pot of gold. Once we have capital, we will have more choices and greater possibilities.

1 Big data development prospects

Based on the "14th Five-Year Plan and Outline of Long-term Goals for 2035" released by our country, I will show you the following indicator.

In the innovation-driven category, the added value of digital core industries will account for 10% of GDP from 7.8% in 2020. You may not be very sensitive to this concept yet, so follow Xiaolin and continue to look at another picture below.

Main indicators of economic and social development

Among the core key industries of the digital economy that our country is vigorously developing include the field of big data , as shown in the figure below. The government vigorously promotes technological development and innovation in the field of big data to achieve digital transformation. Big data has great development potential in the future!

Key industries of digital economy

2020 is the first year of 5G in my country, and the country is vigorously building 5G infrastructure. In 2021, the number of 5G mobile phones may gradually increase, and it will be the year when big data explodes. The data rate generated by the 5G network: 10G data volume per second, which will cause the data volume of various companies to explode.

In addition, the first batch of big data majors in my country were opened in 2017, and the first batch of big data majors will graduate in 2021. Therefore, there is a shortage of talents in the field of big data, which requires a large number of data research and development, data analysis and data mining engineers.

2 Overview of learning routes

If you want to develop in the Internet in the future, what should you learn? As far as the direction of big data is concerned, I personally think there are three main aspects:

First, basic computer knowledge is indispensable. If you have solid basic knowledge, when you encounter a problem, you can quickly recognize the essence of the problem and solve the problem. I am still strengthening my study of basic computer knowledge;

Second, regarding the technical principles of the big data framework, we should pay attention to enterprise-level tuning and source code learning for key frameworks.

Third, actual project implementation. After learning a large number of technologies, you need to apply them in combination with project scenarios to deepen your understanding of the technology.

Big data is a direction that can be advanced to attack and retreated to defend .

You can develop in the direction of artificial intelligence, but it requires very solid mathematical knowledge.

I very much agree with what my tutor once said to me: "Any problem will eventually be attributed to mathematics problems"! Therefore, good mathematical ability can support you to constantly challenge new problems!

You can develop in the direction of big data application development, but it requires rich experience in framework use and tuning.

2.1 Computer basics
  • Be proficient in a language: Java, C, C++, Python, Go, Scala, etc. (Big data recommends choosing Java, Scalaor Python) I am learning the Java language myself. The language is just a tool, so there is no need to get too entangled.
  • Data structures and algorithms: linked lists, queues, heaps, binary trees, sorting, search, greedy, backtracking, etc.
  • Computer network and foundation: OSI seven-layer system, commonly used TCP/IP four-layer system.
  • Operating system: processes and threads, optimistic locking and pessimistic locking, cache consistency, CPU time slice scheduling.
  • Mathematics: Advanced Mathematics, Linear Algebra, Probability Theory and Mathematical Statistics.

Mathematics is recommended because some friends want to further develop in the direction of AI, and mathematics is the cornerstone of machine learning. Only when you have these underlying foundations can you go further!

2.2 Big data components

It takes a long time to learn the entire big data knowledge system, and there are many big data frameworks. The picture below is the technology stack for my self-study of big data. Based on the recruitment requirements of the target company, I chose the following technology stacks to learn. There are other frameworks. You can choose whether to learn them or not depending on the situation.

Java is a basic tool. I personally learned JavaSE and focused on collections, multi-threading and JVMin-depth study. I didn't spend time learning JavaEE. If you have plenty of time, such as a sophomore or first-year graduate student, you can study in depth JavaEEand then continue your studies.

At present, enterprises basically use Linux systems for production. Mastering the basic principles of Linux is an essential skill in the future.

Hadoop is a step-by-step system infrastructure that mainly solves the storage of massive data and the analysis and calculation of massive data. It includes three components: HDFS, MapReduce, and Yarn. Other frameworks are not introduced here.

Big data key technology stack

For how to learn a technical framework, you can refer to my video below! I concluded that framework learning should be done in stages, step by step, rather than all at once. Eagerness for quick success will lead to not learning the technology in depth enough, and more importantly, it will waste your time.

2.3 Project practice

The pain point that most non-major students will encounter is that there are no practical projects in school. But when looking for a job, you need at least 23 projects on your resume and 12 highlight projects. For example, in a certain project, what difficulties did you encounter and what technology did you use to solve it? What optimizations have been made?

Regarding the project, I will have actual project recommendations later!

3Recommended study materials

As a non-major transitioner myself, I know how much time a good introductory study material can save. Therefore, I reviewed my own self-study process and recommended my own learning route and self-study learning materials to everyone.

I hope it can give some reference to friends who are changing careers. It mainly includes introductory videos and good book recommendations related to three modules: basic computer knowledge , big data framework learning, and project practice !

It is recommended that students with no basic knowledge first learn the basic syntax of the Java language. They can complete JavaSE in about a month, and then find interviews to fill in the gaps!

Then build a Linux virtual machine platform to prepare for subsequent big data framework learning.

Because my time is quite urgent, I not only have to complete the tasks assigned by my instructor, but I also have to squeeze in time to study. Therefore, my basic computer knowledge was interspersed with the study of big data frameworks, and I focused on some common interview questions before the interview. The following is a link to the Java interview questions blog I wrote.

The most complete Java interview summary:
https://blog.csdn.net/thinkwon/category_9731418.html

3.1 Basics

Programming language basics : Java basics are the cornerstone of all subsequent big data learning. I initially learned by reading books, but I didn’t feel much after reading it. Fortunately, I later found the 300 episodes of Shangxuetang Gao Qi. This video explains every knowledge point very comprehensively, and there are also detailed cases. If you have no basic knowledge, it is recommended to watch the video to get started. You must type the code yourself. Don’t be too ambitious!

The Three Hundred Episodes of Gao Qi:
https://www.bilibili.com/video/BV1oy4y1H7R6p=16

Java recommends "Java Programming Thoughts", which has an online Chinese version

In addition, there is also the Scala language, because you will need to learn frameworks such as Spark and Flink later. These frameworks are extremely flexible in Scala programming, so you need to learn Scala's programming specifications. Regarding Scala learning, I recommend the videos of teachers from Shang Silicon Valley.

Introduction to Scala language in Shang Silicon Valley:

https://www.bilibili.com/video/BV1Xh411S7bPp=50

Note: At this stage, you don’t need to learn the Scala language yet. You can learn it before learning Spark!


Data structure and algorithm : I highly recommend Zuo Shen’s video. The content he talks about is basically related to corporate interviews and is easy to understand. What I was watching was a video lecture on Niuke.com: it included basic and advanced algorithms. Before listening to this video, it is best to understand the basic data structure! You can get video materials and courseware from the Baidu network disk below ! After watching the video, you have a certain foundation and can finish the offer!

Data structure and algorithm video link:

https://pan.baidu.com/s/14bGK2Wva2MbyviIKjkhNNQ

Extraction code: 3ojw

If the network disk link fails, please add me on WeChat: a934614406, remark [Zuo Shen Algorithm], and I will send it to you again!


Computer Network and Basics : What I was watching was the video explained by the teacher from station B. The lecture was relatively comprehensive and thorough, and the time was not very long. There were 42 sections in total, each section averaged about 40 minutes, and it could be finished in about a week. Targeted at Non-major students are especially friendly! You should leave a lot of time to learn the technical framework behind it. After listening to the video, you can search for relevant interviews to check for any gaps and fill in the gaps.

Teacher Fang’s computer network link:

https://www.bilibili.com/video/BV1yE411G7Map=23


Operating system : There is a lot of knowledge about operating systems and the content involved is relatively detailed. If you have plenty of time and are not in a hurry to interview and find a job, you can go to Station B and search for the courses of teacher Li Zhijun of Harbin Institute of Technology. The teacher will use the perspective of Linux kernel code. Help you understand the principles of operating systems.

Operating system link:

https://www.bilibili.com/video/BV1d4411v7u7from=search&seid=15412161143884682127

If you are pressed for time and want to deal with the interview directly, here is a summary of key operating system interview knowledge for you!

Please add me on WeChat: a934614406, remark [operating system], and I will send you a detailed operating system interview knowledge!


Mathematical theoretical basis : When big data is combined with artificial intelligence, mathematical basis is indispensable. However, there is no end to learning mathematics, and few people are as proficient in mathematics as mathematics majors or Ph.D. students. Therefore, everyone must realize that to get started with AI, you only need to master the basic knowledge in mathematics, which mainly includes: advanced mathematics and linear algebra. , three courses of probability theory and mathematical statistics. Here are three simple introductory mathematics articles for you:

Advanced Mathematics: https://zhuanlan.zhihu.com/p/36311622

Linear algebra: https://zhuanlan.zhihu.com/p/36584206

Probability theory and mathematical statistics: https://zhuanlan.zhihu.com/p/36584335

Recommended notes: "Mathematical Foundations of Machine Learning" and "Mathematical Foundations of Machine Learning at Stanford University"

Link: https://pan.baidu.com/s/1mEPLOurp57IZL9GNOwx2sw

Extraction code: iihb

If the link is invalid, please add me on WeChat: a934614406, remark [Basics of Mathematics]

3.2 Big data framework

Linux : Whether you are doing back-end or big data,Linuxit has become a standard for companies to select talents. I highly recommend watching the Linux introductory video tutorial by Han Shunping from Shang Silicon Valley, a top student at Tsinghua University. The course logic is clear and the explanations are thorough.

LinuxThis course is almost chosen for domestic introductory courses. This is also the course that made the most profound impression on me. After reading it, I could only say, “F*ck,” but he was able to explain it so clearly!

Shang Silicon Valley Han Shunping Linux link:

https://www.bilibili.com/video/av21303002

You can study it together with the book "This is how you should learn Linux" to deepen your Linuxunderstanding!


Hadoop (key point):HadoopIt is one of the most important frameworks in big data technology and the first lesson for learning big data.

At present, Hadoopit has developed from 1.xversion to current 3.xversion. HadoopIt contains a total of 3 components: the most powerful step-by-step file system HDFS, a massive data parallel computing framework MapReduce, and a popular resource management system Yarn.

To learn any framework, first set up the environment, run a test case online, and then delve into its principles.

HDFS has pseudo-distributed, fully step-by-step and high-availability architecture models. Focus on understanding the HA architecture model and the responsibilities of each role.

The architectural model of HDFS mainly includes the following roles: Namenode( Active、Standyby), Datanode, JournalNode, DFSZKFailoverController( ZKFC), SecondNamenode.

Although SecondNamenodeit has fewer applications, it is still necessary to understand its working mechanism.

MapReduceThe core idea, detailed work process, and Shuffle mechanism should also be mastered, and interviews will ask.

YarnThe resource management system is not only suitable for MapReducecomputing frameworks, but will also be used in Sparkcomputing frameworks, so its working mechanism is also very important.

I recommend everyone to study Shang Silicon Valley’s Hadooptutorials, which are very thorough from principles to production practice tuning, and then go deep into the source code.

Silicon Valley Hadooplink:

https://www.bilibili.com/video/av21303002

You can study it in conjunction with the fourth edition of "Hadoop Definitive Guide".

If you are interested in Hadoop source code, you can refer to the two books "Hadoop Technology Insider" (Dong Xicheng) and "Hadoop2.x HDFS Source Code Analysis".


ZooKeeper :ZooKeeperIt is a step-by-step coordination and management component. The main typical application scenarios are data publishing/subscription, step-by-step coordination/notification, cluster management, etc.

You can study together with the book "From Paxos to ZooKeeper". This book not only explains CAPthe theory, but also ZooKeeperexplains the core principles thoroughly. Newbies can get started with the video below.

Shang Silicon Valley ZooKeeper link:

https://space.bilibili.com/302417610/videokeyword=ZooKeeper

Note: The video is only for beginners. To learn in depth, you need to read books and study official documents.


Hive :HiveIt is an open source data warehouse tool that can map structured data into a table, but its underlying function is toMapReduceprovide classSQLqueries, which is generally called HiveHQL.

Beginners Hivecan start with the video. The key point is to understand the difference between internal tables and external tables, as well as partitioning and bucketing.

If you want to learn more about its internal principles and tuning, you can read the "Hive Programming Guide" and Apacheofficial documents, which explain enterprise-level tuning in detail.

Silicon Valley Hive link:

https://www.bilibili.com/video/BV1EZ4y1G7iL


HBase :HBaseIt is a step-by-step storage system for structured data. It is scalable and supports massive data storageNoSQL. It is a basic framework that every big data practitioner should master. The focus is to master its architectural principles, various role responsibilities,Compactprocesses andRegionprocesses. Below isHBasea video tutorial to get started.

Shang Silicon Valley HBase link:

https://www.bilibili.com/video/BV1Y4411B7jy

Note: You can combine the two books "HBase Authoritative Guide" and "HBase Practical Chinese Edition" to deepen your understanding of HBase.


Redis (emphasis!):RedisIt is an open sourcekey-valuestorage system that supports relatively more storagevaluetypes and supports various sorting methods. For storage efficiency, data is cached in memory.

Whether it is backend or big data, this component is a must-know framework. When I learn a new technology, I first get started through videos, and then read relevant books and official documents to understand the technical details in depth.

RedisIt is recommended that you read what Teacher Zhou Yang of Shang Silicon Valley teaches. This course is a bit old and you may not be able to understand many new features. I have posted two Redislinks to get started with the course:

Link to Teacher Zhou Yang from Shang Silicon Valley Redis:

https://www.bilibili.com/video/BV1oW411u75R

The latest entry-to-proficiency Redislink in 2021:

https://www.bilibili.com/video/BV1Rv41177Afp=4

Recommended books: "Redis Design and Implementation" and "Redis Deep Adventure: Core Principles and Application Practice"


Kafka (emphasis!): As a high-throughput step-by-step publish-subscribe messaging system, Kafka can handle all action streaming data in a consumer-scale website.

Here is a suggestion: first understand what problem Kafka is created to solve, then understand its basic architecture, and finally have a deep understanding of the core implementation principles.

Here is the link to the Kafka getting started video:

Silicon Valley Kafka getting started link:

https://www.bilibili.com/video/BV1a4411B7V9

Recommended books: The first recommendation is "In-depth Understanding of Kafka: Core Design and Practical Principles". If you want to understand the Kafka source code in depth, you can follow "Apache Kafka Source Code Analysis" to read it, which will give you an epiphany!


Spark (emphasis! Emphasis! Emphasis!):SparkSupportsStreaming,SQL,GraphX,MLLiband other applications. But compared toHadoopinMapReduce,Sparkit is about 10 to 100 times faster

In addition, during the calculation process, if a problem occurs at a certain node, the cost of replaying the event is much lower MapReduce. Spark SQLCan process structured data

Spark StreamingIt is mainly used in real-time streaming data processing scenarios, supports multiple data sources, and DStreamis Spark Streamingthe basic abstraction of

Spark MLlibProvides a library of common machine learning functions, GraphXmainly used for graph computing. Below are the Spark introductory learning links that I selected for everyone. This video is mainly based on version explanations and provides a detailed introduction Scala 2.12to the latest ones . It is a good set of information for beginners to learn about.Spark3.0

2021 SparkFrom Beginner to Mastery Link:

https://www.bilibili.com/video/BV11A411L7CK

Note: Before learning Spark, you must first learn the Scala language. In Programming Language Basics, detailed learning recommendations for Scala have been given!

Recommended books: "learning Spark", "In-depth understanding of Spark core ideas and source code analysis"


Flink (emphasis! Emphasis! Emphasis!):FlinkIt is a step-by-step processing engine for state calculation of unbounded and bounded data streams. Flink computing has a series of advantages such as fast, flexible, accurate results and good fault tolerance, and is widely used in streaming data scenarios in various industries.

At present, domestic enterprises headed by Alibaba, Tencent, JD.com, Didi, Ctrip, Meituan, etc. are all using Flinkthe framework. Flink occupies a very important position in big data streaming computing, and every big data person should master this technology.

FlinkWhat I recommend to everyone is the class taught by Teacher Wu from Shang Silicon Valley. Teacher Wu, who graduated from Tsinghua University, analyzed the technical knowledge points very thoroughly. This course mainly contains two modules: Flink theoretical basis and practical e-commerce user behavior analysis project based on Flink.

Shang Silicon Valley Flink link:

https://www.bilibili.com/video/BV1Qp4y1Y7YN

Recommended book: "Flink Principles, Practical Combat and Performance Optimization"

I haven't studied the data mining and machine learning part yet. After I finish studying it later, I will sort out this part of the content for everyone's reference.

3.3 Project

Regarding the project, this is the weakest link for our non-major students during the interview. In school, it is almost difficult for you to do a real implementation project because you basically have no access to related projects.

Therefore, I suggest that you plan your internship in advance and gain project experience through internships. I started to learn programming by myself in the first semester of my second year of graduate school. I learned a little bit of C++ as an undergraduate, so I have a little bit of foundation.

At that time, I was helping my tutor with projects related to my major while learning big data technology. The picture below is some of the notes I took while studying on my own.

study notes

If you are currently in your sophomore year or first year of graduate school, you can plan your internship in advance and take the initiative to learn about some relevant implementation projects at the internship company; but if you are about to find a job and have not finished learning each technology stack, you can first Go through the basic technical framework, and then refer to the following projects I recommend to you.


Shang Silicon Valley big data e-commerce warehouse project link:

https://www.bilibili.com/video/BV1Hp4y1z7aZ

Technology selection: Hadoop+ZooKeeper+Hive+Flume+Sqoop+Kafka+Azkaban+Kylin+Spark

This project mainly explains the architectural model of data warehouse and realizes the closed loop of data warehouse project, from data collection to data warehouse modeling to data warehouse application. The project also involves some other technologies, which can be interspersed with learning.

During the interview process, first of all, you should clearly explain the project structure and the reasons for the technology selection, and whether there are other alternatives; secondly, explain what problems you encountered in the project and what methods you used to solve the problem; finally, you should clearly explain what you can do. State the code logic of the part you are responsible for.

Although e-commerce data warehouse projects are relatively common, they can be used as basic projects if there is no project.


Shang Silicon Valley Big Data Real-time Processing ( SparkStreaming) project link:

https://www.bilibili.com/video/BV1tp4y1B7qdspm_id_from=333.788.b_636f6d6d656e74.27

This project is based on SparkStreaming to conduct real-time analysis and calculation of user behavior and order business of e-commerce platforms through different indicators and dimensions. Mainly includes data generation, data transmission, data calculation and final data visualization.

You can master the real-time computing process of SparkStreaming, as well as the big data collection framework, high-concurrency step-by-step message queue, memory-based high-throughput real-time computing technology, and massive database for storing millisecond-level queries.


Flink real-time project: This project is my own private project. You can add my WeChat account and send you Flink project information.

Disclaimer: Most of the books and learning materials mentioned above are personally recommended by Xiao Lin and do not contain any advertising nature!

4 interviews

Looking for a job is a huge project for everyone. I still remember the uneasiness in my heart during the first interview. I started preparing for autumn recruitment in the second semester of my second year of graduate school. At that time, I had not returned to school due to the epidemic.

If you are an intern and cannot become a regular employee, you can prepare for the early approval recruitment of each company around July, but you should pay attention to whether the company's early approval will have an impact on the autumn recruitment application, because the early approval is basically a fight between gods. I was Just to gain interview experience.

For most people, the most important thing is the autumn recruitment, or the spring recruitment at the beginning of the year. I will share my experience with you from the two aspects of obtaining recruitment information and interview experience.

4.1 How to obtain referral qualifications from each company?
  1. It is recommended that everyone pay attention to the two public accounts of Internal Recommendation Army and School Recruitment Bus, which means joining an internal recommendation group created by the account owner. The account owner will update the internal recommendation codes of various major manufacturers every day.
  2. If you want to go to ByteDance, you can follow the official account of Internal Referral Bear. The account owner is a Byte algorithm engineer and has recommended nearly 1,000 people to ByteDance. It is very reliable.
  3. Niuke.com: On Niuke.com, employees from each company will directly post the referral code. Generally, you are required to send your resume to the department leader via email. Be sure to remember to read the format requirements clearly before sending, otherwise no one will Reply to you.
  4. There is a website called Super Resume, which integrates the campus recruitment portals of major companies. The address is: https://www.wondercv.com/jobs/.
  5. If you have seniors or friends you know who work in a certain company, you can ask them about some autumn recruitment information and ask them to help you make referrals.
  6. Follow the WeChat public account of the target company. He will publish the recruitment itinerary for that year. According to the itinerary, go to the university closest to your home city to attend the presentation meeting.
  7. For BOSS direct recruitment, you can also submit your resume to many companies!

Basically, Xiao Lin Qiuzhao mainly submits his resume through the above methods, but he also needs to pay attention to the following points:

  • A company must not apply for multiple positions, otherwise HR will not know which position you are qualified for.
  • You can invest in some small companies first and accumulate some interview experience before investing in your target company, but don't wait until too late.
  • Resume delivery time: Tuesday - Thursday 8:00-17:00 . HR usually meets on Monday to make weekly plans, and Friday is usually the weekly summary meeting. HR does not have time to check the mailbox.
  • It is best to make a targeted resume based on the specific requirements of each position and your own abilities. You can focus on highlighting your underlying abilities (communication skills, management skills, conflict resolution skills, etc.) and technical abilities.
4.2 Interview experience

I applied to more than 100 companies throughout the autumn and saw various interview sites. It is strongly recommended that everyone do an interview summary as soon as possible after the interview to improve their own technical deficiencies.

Through continuous summarization, you will understand that the technical interview questions of each company are not much different, especially for fresh graduates, you are required to have a solid basic knowledge of computers.

Of course, there is another most important link, which is self-introduction. You need to write it in advance according to your own situation. Do not read the information that already exists on your resume.

Express more of your own experiences and things that prove your abilities. The language is required to be concise and highlight the technical areas in which you are best at.

For example: The following is my self-introduction during the autumn recruitment interview

Hello, interviewer! My name is XXX. First of all, thank you for taking the time out of your busy schedule to give me an interview!

During my postgraduate period, in addition to completing my academic tasks, I mainly used my extracurricular time to teach myself basic computer knowledge (data structures and algorithms, computer network basics), JavaSE (such as collections, multi-threading, JVM), Hadoop, and Spark. I once participated in the research and development of the XXXX project, and was mainly responsible for the two modules of XXXXX design and XXXXX analysis. In addition, after studying, I prefer to share what I have learned through various platforms such as blogs and Zhihu. In life, I am an optimistic and cheerful person. I release stress through photography and basketball. I particularly like your company’s XXX culture (you should take the initiative to learn about it in advance) and look forward to working with you!

During the interview, you generally need to pay attention to the following points:

  • If you encounter an algorithm question that you don’t know, you should proactively communicate with the interviewer to find ideas for solving the problem.
  • Face the problem honestly and express it sincerely (never pretend that you know a technology that you just know, the interviewer can see through it at a glance)
  • You may not be able to win with a professional counterpart, so non-major students must be confident when interviewing

5 Summary

Most of the learning routes and learning materials shared with you above have been learned by me personally. For new technologies, I basically start with videos, and then use books and Google to check for gaps and go deep into the technical principles.

If you encounter relevant questions, I recommend you to go there Googleand StackOverFlowfind answers. In addition, during the learning process, everyone should remember to share their knowledge on blogs or Zhihu. Without output, your input will be greatly reduced!

Looking back on my three years as a graduate student, I helped my supervisor with projects and studied at the same time. This period of time was very fulfilling and full of pressure. Not only do you have to bear the pressure of project tasks assigned by your instructor, but you also have to prepare yourself for job hunting, which is not easy. Finally, I hope that every friend can harvest what they are satisfied with as soon as possible offer.

Guess you like

Origin blog.csdn.net/web13618542420/article/details/133640566