Three large data compulsory to teach you skills to quickly record

Big Data, artificial intelligence technology to lead the trend of science and technology, opened the door to the era of big data! National thumbs up! Policy support, the outlook is outstanding! Then, learning just like a big data talent over the river carp, an endless stream! The overall situation, thriving! Here, we send a good programmer dry technology to help you a hand, learn big data technology, we must pay attention to the quality of training, just so, before more with less! Next, it is for you on the three courses of large data compulsory!
Three large data compulsory to teach you skills to quickly record

A, Hadoop ecosystem

Hadoop is a distributed system architecture, developed by the Apache Foundation. Users can distributed without knowing the underlying details of the development of distributed applications. Full use of the cluster of high-speed computing power and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to HDFS.

In the process of getting started big data have met learning, industry, the lack of systematic learning path, learning systems planning, you are welcome to join my big learning data exchange skirt: 529 867 072, skirt documents have my years of study manual sorting of large data , development tools, PDF document with a book, you can download yourself.

Hadoop "stack" comprised of multiple components. include:

1.Hadoop Distributed File System (HDFS): the default storage layer all Hadoop cluster

2. The name of the node: in Hadoop cluster, there is provided a data storage location node failure and node information.

3. two nodes: node name of the backup, it will periodically replicated data store name and node name of the node failure case.

4. Job Tracker: Hadoop cluster node initiating and coordinating MapReduce data processing task or job.

5. From the node: obtaining data processing instruction from the job tracker where ordinary node Hadoop cluster nodes and stores data from.

Two, Spark ecosystem

Spark is similar to the one with the open-source Hadoop cluster computing environment, but there are still some differences between the two, these useful differences between the Spark in certain workloads performance was superior, in other words, to enable Spark memory distributed data sets, in addition to providing interactive query, it also can optimize iterative workloads. Big Data learning exchange group: 251 956 502

Spark is implemented in the Scala language, it will Scala as its application framework. And different Hadoop, Spark and Scala can be tightly integrated, which Scala can operate as a local collection of objects as easily as operating a distributed data sets.

Three, Storm real-time development

Storm is a free and open source distributed real-time computing systems. Storm can easily be done using the reliably handle unlimited data streams, like Hadoop to process large quantities of data, like, Storm can handle real-time data. Storm simple, you can use any programming language.

Storm has the following characteristics:

1. Programming is simple: developers only need to focus on the application logic, and similar with Hadoop, Storm provides programming primitives are very simple

2. High performance, low latency: search engine advertising may be applied to such a scenario requires advertisers operation real-time response.

3. Distributed: You can easily deal with large volumes of data, can not handle single scene

4. Scalable: With the development of business, the amount of data and the calculation amount increases, the level of the system can be extended

5. Fault Tolerance: single node does not affect the application hung

6. The message is not lost: guarantee Messaging

Guess you like

Origin blog.51cto.com/14296550/2404107