How novice to enter the field of big data, what is the learning route



Big data is not a profession or a programming language, it is actually a combination of the use of a range of technologies.

Someone gives the definition of big data by the equation below.

Big Data = programming skills + Data Structures and Algorithms + + database analysis capabilities skill + mathematics + machine learning + NLP + OS + + cryptography parallel programming

Although this equation looks very long, a lot of things to learn, but to pay and reporting are proportional, and at least salary is directly proportional.

Since the knowledge to learn a lot, so a correct sequence of learning is very crucial.

Xiao Bian as "big data" to develop a professional learning paths, hoping to help you avoid detours. Divided into seven phases: Getting basic knowledge → Java → Scala basis → Hadoop technology modules → Hadoop project combat → Spark → Big Data technology module project combat.

Among them, five were stage one to stage free courses, specifically:



Phase One: Learning to get started

This section is aimed primarily at novice, before learning you need to master the basic knowledge of databases. MySQL is a DBMS (database management system), is the most popular

Relational database management system (relational database, the database is built on the basis of the relational database model, by means of a set of algebraic concepts and methods to process the database

data). MongoDB IT industry is a very popular form of non-relational databases (NoSQL), its flexible data storage much favored by the current IT practitioners.

The Redis is an open source, support network, based on memory, key-value pairs stored in the database. Both are very important to understand.

1.Linux Basics (new edition)

2.Vim Editor

3.Git combat tutorial

4.MySQL Foundation Course

5.MongoDB based tutorial

6.Redis based tutorial

 

Stage two: Java foundation

Java is currently the most widely used programming language, it has many features, especially for the development of language as a big data applications.

Java language has a powerful and easy to use two features, cross-platform capability than C, C ++ easier to use, easier to get started. It also has a simple, object-oriented, distributed, robustness, security, platform independence and portability, multi-threaded, dynamic characteristics. The most important point is that Hadoop is written in Java.

1.Java programming language (the new version)

2.Java of Advanced Design Patterns

3.J2SE real core development

4.JDK core API

5.JDBC Getting Started Tutorial

6.Java 8 New Features Guide

 

Phase Three: Scala foundation

Scala is a multi-paradigm programming language that was originally designed to integrate various characteristics of object-oriented programming and functional programming. Since Scala runs on the Java platform (Java Virtual Machine), and is compatible with existing Java programs, so Scala can be well integrated and large data relating to JVM-based systems.

1.Scala development tutorial

2.Scala topics Tutorial - Case Class and pattern matching

3.Scala topics Tutorial - Implicit conversion and implicit parameters

4.Scala topics Tutorial - abstract members

5.Scala topics Tutorial - Extractor

6.Scala game development 24.2

Phase IV: Hadoop technology module

Hadoop is a support for data-intensive distributed applications and open source software framework Apache 2.0 license issued, it can build a large data warehouse, storage PB-level data processing, analysis, statistical and other services. You can choose the programming language, but it must be Hadoop big data will learn content.

 

1.Hadoop entry Advanced Course

2.Hadoop deployment and management

3.HBASE Tutorial

4.Hadoop distributed file system - Import and export data

5. Data collected Flume

 

Stage 5: Hadoop project combat

Of course, the completion of the theory will be hands-on combat, Hadoop project combat can help deepen understanding of the content and exercise ability.

 

FIG 1.Hadoop Processing - "hadoop application framework"

 

Stage six: Spark Technology Modules

Hadoop and Spark are big data framework. Hadoop Spark offers features that are not, such as distributed file systems, and Spark provides real-time processing for the memory of those data sets need it. So learning Spark is also very necessary.

1.Spark

2.x Quick Start Tutorial

2.Spark big hands-on lab data

Basis of calculation 3.Spark GraphX ​​FIG learning frame

4.Spark basis of the basic concepts of learning DataFrame

5.Spark basis of the application of higher-order skills DataFrame

6.Spark basis of the Quick Start Streaming

7.Spark basis of SQL Quick Start

Use 8.Spark basis of machine learning library MLlib

9.Spark basis of SparkR Quick Start

10. streaming real-time log analysis system - "Spark Best Practices"

11. Use Spark and D3.js flight analysis of large data

Stage Seven: Big Data project combat

The last stage provides a real big data project, which is commonly used systems using skills, such as using a common machine learning, modeling, analysis and operations, which is an important step to become big data engineer processes.

1.Ebay online auction data analysis 

2. streaming real-time log analysis system - "Spark Best Practices"

3. Big Data mining taxi to take you Cheats

4.Twitter sentiment analysis data

5. Spark traffic log analysis

6.Spark flow is calculated as the electricity supplier of goods attention

7.Spark of pattern mining algorithm -FPGrowth

 


For large data I learn to create a small circle of learning, provides a platform for you, everyone will work together to discuss your study large data. Welcome you to the arrival of Big Data learning qun: discussion of video sharing learning with 1.4.2.9 ++ 7.4.1.5.1. Big Data is the future direction of development, we are challenging analytical skills and awareness of the way the world, so we advance with the times, embrace change, and continue to grow, to master the core technology of big data, is to grasp the real value lies.
 

We hope this to be useful, and wish to become a good little friends Big Data Engineer.
----------------
Disclaimer: This article is CSDN bloggers' Tekken tiger "in the original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source and this link statement.
Original link: https: //blog.csdn.net/juan189/article/details/84321549

Guess you like

Origin www.cnblogs.com/baijindashuju666/p/11654746.html