Big Data learning, learning the basics and route large data needed

Big Data based learning

Big Data Basics

Why Big Data learning

1. Purpose: To work up a good (money)

2, Contrast: Java development and large data development

Here Insert Picture Description

What is big data?

For example:

1, product recommendation: Question:

(1) How large number of orders stored?

How (2) a large number of orders computing?

2, weather forecast: Question:

(1) how to store large amounts of weather data?

(2) a large amount of weather data to calculate how?

If you want to learn the best big data added to a good learning environment, this may be the Q group 251,956,502 so that everyone would be relatively easy to learn, but also to communicate and share information on common

What is Big Data, nature?

(1) data stored: Distributed File System (distributed storage)

Computing (2) data: Distributed Computing

Java and big data What is the relationship?

1, Hadoop: based on Java language development

2, Spark: Scala-based language, Scala based on the Java language

Learn basic needs of big data and route

1, the basic learning needs of big data:

Java based (JavaSE) -> classes, inheritance, I / O, reflection, generic *****

Linux foundation (Linux operating) -> Create files, directories, vi editor ***

2, learning routes:

(1) Java and Linux foundation basic

(2) Hadoop study: architecture, theory, programming

(*) The first stage: HDFS, MapReduce, HBase (NoSQL database)

(*) The second stage: data analysis engine -> Hive, Pig

Data acquisition engine -> Sqoop, Flume

(*) Phase III: HUE: Web Administration Tool

ZooKeeper: HA realization of Hadoop

Oozie: workflow engine

(3) Spark learning

(*) The first stage: Scala Programming Language

(*) The second stage: Spark Core -----> based memory, calculation data

(*) Third stage: Spark SQL -----> similar in Oracle SQL statements

(*) The fourth stage: Spark Streaming->

Calculated in real time (streaming calculated) such as: water purification plant

(4) Apache Storm: Similar Spark Streaming -> calculated in real time (streaming calculated): For example: waterworks

(*) NoSQL: Redis memory-based database

Guess you like

Origin blog.csdn.net/dvfghj/article/details/95463855