Compulsory three big data skills, you know?

Big Data, artificial intelligence technology to lead the trend of science and technology, opened the door to the era of big data! National thumbs up! Policy support, the outlook is outstanding! Then, learning just like a big data talent over the river carp, an endless stream! The overall situation, thriving! Here, small series for everyone to send technical dry goods, hand to help you learn the big data technology, we must pay attention to the quality of training, just so, before more with less! Next, it is for you on the three courses of large data compulsory!

A, Hadoop ecosystem 

Hadoop is a distributed system architecture, developed by the Apache Foundation. Users can distributed without knowing the underlying details of the development of distributed applications. Full use of the cluster of high-speed computing power and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to HDFS. 

Hadoop "stack" comprised of multiple components. include: 

1.Hadoop Distributed File System (HDFS): the default storage layer all Hadoop cluster

2. The name of the node: in Hadoop cluster, there is provided a data storage location node failure and node information. 

3. two nodes: node name of the backup, it will periodically replicated data store name and node name of the node failure case. 

4. Job Tracker: Hadoop cluster node initiating and coordinating MapReduce data processing task or job.

5. From the node: obtaining data processing instruction from the job tracker where ordinary node Hadoop cluster nodes and stores data from. 

  Two, Spark ecosystem 

Spark is similar to the one with the open-source Hadoop cluster computing environment, but there are still some differences between the two, these useful differences between the Spark in certain workloads performance was superior, in other words, to enable Spark memory distributed data sets, in addition to providing interactive query, it also can optimize iterative workloads. 

Spark is implemented in the Scala language, it will Scala as its application framework. And different Hadoop, Spark and Scala can be tightly integrated, which Scala can operate as a local collection of objects as easily as operating a distributed data sets. 

  Three, Storm real-time development 

Storm is a free and open source distributed real-time computing systems. Storm can easily be done using the reliably handle unlimited data streams, like Hadoop to process large quantities of data, like, Storm can handle real-time data. Storm simple, you can use any programming language. 

Storm has the following characteristics: 

1. Programming is simple: developers only need to focus on the application logic, and similar with Hadoop, Storm provides programming primitives are very simple 

2. High performance, low latency: search engine advertising may be applied to such a scenario requires advertisers operation real-time response. 

3. Distributed: You can easily deal with large volumes of data, can not handle single scene 

4. Scalable: With the development of business, the amount of data and the calculation amount increases, the level of the system can be extended 

5. Fault Tolerance: single node does not affect the application hung 

6. The message is not lost: guarantee Messaging 

In the process of learning big data have met any problem, you can join my Java / big data exchange study Qiuqiu qun: 732308174, a lot of communication problems, help each other, the group has a good tutorial and development tools. Big Data have any problems learning (learning methods, learning efficiency, how employment), may at any time to consult me

Guess you like

Origin www.cnblogs.com/xiaoxiany/p/10971458.html