Three compulsory courses quickly large data record

Big Data, artificial intelligence technology to lead the trend of science and technology, opened the door to the era of big data! National thumbs up! Policy support, the outlook is outstanding! Then, learning just like a big data talent over the river carp, an endless stream! The overall situation, thriving! Here, we sent dry technology to help you a hand, learn big data technology, we must pay attention to the quality of training, just so, before more with less! Next, it is for you on the three courses of large data compulsory!

A, Hadoop ecosystem
 
  Hadoop is a distributed system architecture, developed by the Apache Foundation. Users can distributed without knowing the underlying details of the development of distributed applications. Full use of the cluster of high-speed computing power and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to HDFS. 
  Hadoop "stack" comprised of multiple components. Comprising: 
  1.Hadoop distributed file system (the HDFS): All default storage layer Hadoop cluster
  2. node name: in Hadoop cluster, there is provided a data storage location node failure and node information. 
  3. two nodes: node name of the backup, it will periodically replicated data store name and node name of the node failure case. 
  4. Job Tracker: Hadoop cluster node initiating and coordinating MapReduce data processing task or job.
  5. From the node: obtaining data processing instruction from the job tracker where ordinary node Hadoop cluster nodes and stores data from.

Two, ecosystem Spark
  
  Spark is similar to the one with the open-source Hadoop cluster computing environment, but there are still some differences between the two, these useful differences between the Spark in certain workloads behave more favorable, change words, Spark enabled memory distributed data sets, in addition to providing interactive query, it also can optimize iterative workloads. 
  Spark is implemented in the Scala language, it will Scala as its application framework. And different Hadoop, Spark and Scala can be tightly integrated, which Scala can operate as a local collection of objects as easily as operating a distributed data sets.

Three, Storm real-time development
  
  Storm is a free and open source distributed real-time computing systems. Storm can easily be done using the reliably handle unlimited data streams, like Hadoop to process large quantities of data, like, Storm can handle real-time data. Storm simple, you can use any programming language. 
  Storm has the following characteristics: 
  1. Programming is simple: developers only need to focus on the application logic, and similar with Hadoop, Storm provides programming primitives are very simple 
  2. High performance, low latency: search engine advertising can be applied to such a request operating advertisers were scenes of real-time response. 
  3. Distributed: can easily deal with a large amount of data, can not handle single scene 
  4. Scalable: With the development of business, the amount of data and the calculation amount increases, the level of the system can be extended 
  5. Fault Tolerance: single node is not hung up affect the application 
  6. message is not lost: guarantee messaging

Guess you like

Origin blog.csdn.net/kangshufu/article/details/92427415