Big Data based learning
Big Data Basics
Why Big Data learning
1. Purpose: To work up a good (money)
2, Contrast: Java development and large data development
What is big data?
For example:
1, product recommendation: Question:
(1) How large number of orders stored?
How (2) a large number of orders computing?
2, weather forecast: Question:
(1) how to store large amounts of weather data?
(2) a large amount of weather data to calculate how?
If you want to learn the best big data added to a good learning environment, this may be the Q group 251,956,502 so that everyone would be relatively easy to learn, but also to communicate and share information on common
What is Big Data, nature?
(1) data stored: Distributed File System (distributed storage)
Computing (2) data: Distributed Computing
Java and big data What is the relationship?
1, Hadoop: based on Java language development
2, Spark: Scala-based language, Scala based on the Java language
Learn basic needs of big data and route
1, the basic learning needs of big data:
Java based (JavaSE) -> classes, inheritance, I / O, reflection, generic *****
Linux foundation (Linux operating) -> Create files, directories, vi editor ***
2, learning routes:
(1) Java and Linux foundation basic
(2) Hadoop study: architecture, theory, programming
(*) The first stage: HDFS, MapReduce, HBase (NoSQL database)
(*) The second stage: data analysis engine -> Hive, Pig
Data acquisition engine -> Sqoop, Flume
(*) Phase III: HUE: Web Administration Tool
ZooKeeper: HA realization of Hadoop
Oozie: workflow engine
(3) Spark learning
(*) The first stage: Scala Programming Language
(*) The second stage: Spark Core -----> based memory, calculation data
(*) Third stage: Spark SQL -----> similar in Oracle SQL statements
(*) The fourth stage: Spark Streaming->
Calculated in real time (streaming calculated) such as: water purification plant
(4) Apache Storm: Similar Spark Streaming -> calculated in real time (streaming calculated): For example: waterworks
(*) NoSQL: Redis memory-based database