Big Data learning road map, big data need to learn what

 

Big Data learning road map, big data need to learn what, big data era (the whole network the most detailed]

 

Big Data Development Learning Paths:

The first stage: Hadoop eco-architecture technology

1, basic language

Java: more understanding and practice on it, does not require a deep understanding of the Java virtual machine memory management, and multi-threading, thread pool, design patterns, parallelization.

Linux: installation, basic commands, network configuration, Vim editor, process manager, Shell scripts, virtual machines, and so familiar with the menu.

Fundamentals basic syntax, data structures, functions, condition judgment, circulation: Python.

2, prepare the environment

Here are the windows to build a fully distributed computer, from the main 2.

VMware virtual machines, Linux system (Centos6.5), Hadoop installation package, good preparation here fully distributed Hadoop cluster environment.

3、MapReduce

Offline MapReduce distributed computing framework, Hadoop is the core programming model.

4, HDFS1.0 / 2.0

HDFS provides high throughput data access for applications on large data sets.

5、Yarn(Hadoop2.0)

Yarn is a resource management platform, is responsible for allocating resources to the task.

6、Hive

Hive is a data warehouse, all data is stored in the on HDFS. Use Hive mainly write Hql.

7、Spark

Spark is designed for large-scale data processing designed for fast general-purpose computing engine.

8、SparkStreaming

Spark Streaming real-time processing framework, the data is processed in batch to batch.

9、SparkHive

Spark as Hive calculation engine, the Hive queries submitted as Spark tasks to be calculated on Spark cluster, you can improve the performance of Hive queries.

10、Storm

Storm is a real-time computing framework, Storm is a real-time data for each new processed, a process is one, can ensure the timeliness of data processing.

11、Zookeeper

Zookeeper is the basis of many large data frame, the cluster manager.

12、Hbase

Hbase Nosql is a database, is highly reliable, column-oriented, scalable, distributed database.

13、Kafka

kafka is a middleware message, as an intermediate buffer layer.

14、Flume

Flume常见的就是采集应用产生的日志文件中的数据,一般有两个流程。

一个是Flume采集数据存储到Kafka中,方便Storm或者SparkStreaming进行实时处理。

另一个流程是Flume采集的数据存储到HDFS上,为了后期使用hadoop或者spark进行离线处理。

第二阶段:数据挖掘算法

1、中文分词

开源分词库的离线和在线应用

2、自然语言处理

文本相关性算法

3、推荐算法

基于CB、CF,归一法,Mahout应用。

4、分类算法

NB、SVM

5、回归算法

LR、DecisionTree

6、聚类算法

层次聚类、Kmeans

7、神经网络与深度学习

NN、Tensorflow

以上就是学习Hadoop开发的一个详细路线

学习大数据开发需要掌握哪些技术呢?

(1)Java语言基础

Java开发介绍、熟悉Eclipse开发工具、Java语言基础、Java流程控制、Java字符串、Java数组与类和对象、数字处理类与核心技术、I/O与反射、多线程、Swing程序与集合类

(2)HTML、CSS与Java

PC端网站布局、HTML5+CSS3基础、WebApp页面布局、原生Java交互功能开发、Ajax异步交互、jQuery应用

(3)JavaWeb和数据库

数据库、JavaWeb开发核心、JavaWeb开发内幕

Linux&Hadoop生态体系

Linux体系、Hadoop离线计算大纲、分布式数据库Hbase、数据仓库Hive、数据迁移工具Sqoop、Flume分布式日志框架分布式计算框架和Spark&Strom生态体系(1)分布式计算框架

Python programming language, Scala programming language, Spark big data processing, Spark-Streaming Big Data processing, Spark-Mlib machine learning, Spark-GraphX map calculation, real one: Spark recommendation system (a line of company real project) based combat two : Sina (www.sina.com.cn) If you are interested in big data development, want the system to learn big data , you can join the big data exchange technology to learn buttoned Junyang: 522 189 307 , welcome additions, understand courses

(2) storm system technology architecture

Storm principle and foundation, message queues kafka, Redis tools, zookeeper Hi, big data project combat data acquisition, data processing, data analysis, data presentation, data applications

Big Data analysis -AI (Artificial Intelligence) Data

Analyze & prepare the work environment data base analysis, data visualization, Python Machine Learning

Outdoor equipment identification analysis: 2, & neural network image recognition, natural language processing & social network processing, machine learning Python actual project

Published 138 original articles · won praise 0 · Views 7747

Guess you like

Origin blog.csdn.net/mnbvxiaoxin/article/details/104261344
Recommended