Big Data framework Spark six noteworthy features

Apache Spark is against a powerful open source Hadoop data processing engine, which is built around speed, ease of use and sophisticated analysis. It was originally developed by the AMPLab UC Berkeley, and later submitted to the Apache Software Foundation.

  What Apache Spark that? If you are working on analysis of large data, you will want to know whether Spark? I hope this paper will help answer these continue to linger in your mind lingering problem. Apache Spark is against a powerful open source Hadoop data processing engine, which is built around speed, ease of use and sophisticated analysis. It was originally developed by the AMPLab UC Berkeley, and later submitted to the Apache Software Foundation.

  Apache Spark is basically a parallel data processing framework, and Apache Hadoop that can develop synergies and make work more quickly and easily. Spark can make big data combined with fast data applications, can make all the interactive data analysis by the stream data processing.

  This article will introduce the most noteworthy of six characteristics Apache Spark.

  1. Ultra-fast data processing

  When referred to the large data processing speed is always essential. We always seek to handle massive amounts of data as quickly as possible. Spark allows applications Hadoop cluster execution speed in memory of an increase of 100 times, and even the speed of the disk can speed up to 10 times as much.

  Spark embodiment is employed to reduce the number of disk read and write, that it is the intermediate processing data is stored in memory. It uses the concept of a distributed data set called elastic (RDD), which makes it possible to transparently transmit stored data to the disk, and only when needed in memory. This will reduce the disk read and write most of the data processing, which is the most time-consuming factors.

  (Based on the performance of Hadoop Spark.)

  2. Multi-Language Support

  Spark allows you to use Java, Scala or quickly write Python applications. This helps developers to use their own familiar programming language to create and execute the application. It comes with a built-in instruction set, a plurality of supports 80 advanced operators. We can use it to interactively query the data in the shell with.

  3. support complex queries

  In addition to the simple map and reduce operations, Spark also SQL queries, data flow, as well as complex analysis and machine learning algorithms such as graphics out of the box like support. Moreover, users can seamlessly combine all of these features in a single stream in the work.

  4. Real-time streaming

  Spark can handle real-time streaming data, while the main processing MapReduce "landing" of data. Spark Spark Streaming use to manipulate real-time data, but also provides a Hadoop Other frameworks, the same stream of data processing can be realized.

  Spark Streaming Features:

  Simple: Spark built on top of the lightweight and powerful API, Spark Streaming allows you to quickly develop a workflow application.

  Fault Tolerance: Unlike other solutions stream (e.g.: Storm), Spark Streaming without additional code and configuration conditions can recover lost work out of the box and to provide unique semantics.

  Integration: a batch stream and to multiplex the same code, even stream data is added to the history data.

  (Storm based on a comparison with the data stream processing performance Spark)

  5. Integrated Hadoop Hadoop existing data and functionality

  Spark can run independently. In addition, it can run on Hadoop YARN Cluster Manager 2, and can read any existing Hadoop data. This is a huge advantage! It can read data from any Hadoop data sources, such as HBase, HDFS and so on. If this scenario is really suitable Spark words, this feature makes it suitable for Spark migrate existing pure Hadoop applications.

  6. active user community

  Apache Spark was founded by a large group of developers from over 50 companies are. This project began in 2009 and so far there are more than 250 developers involved to come! It has active for issue tracking mailing lists and tracking tool for Project Services (JIRA).

Highly recommended reading articles

40 + annual salary of big data development [W] tutorial, all here!

Zero-based Big Data Quick Start Tutorial

Java Basic Course

web front-end development based tutorial

Big Data engineers must understand the concept of the seven

The future of cloud computing and big data Five Trends

How to quickly build their own knowledge of large data

 

Guess you like

Origin blog.csdn.net/yuyuy0145/article/details/92718431
Recommended