What are the application scenarios of Spark?

  

  Spark is an open source cluster computing environment similar to Hadoop. It is a fast and general computing engine designed for large-scale data processing. It has formed a rapidly developing and widely used ecosystem. The main application scenarios are as follows:

 

  1. Spark is a memory-based iterative computing framework, suitable for applications that require multiple operations on a specific data set. The more repeated operations are required, the greater the amount of data that needs to be read, and the greater the benefit. In situations where the amount of data is small but the computation intensity is high, the benefit is relatively small;

 

  2. Due to the characteristics of RDD, Spark is not suitable for applications that update state asynchronously and fine-grained, such as storage of web services or incremental web crawlers and indexes. It is not suitable for the application model of incremental modification:

 

  3. The amount of data is not particularly large, but real-time statistical analysis is required.

 

  Those that meet the above conditions can be processed by Spark technology. In practical applications, big data is currently mainly used in Internet companies in advertising, reporting, recommendation systems and other businesses. In advertising business, big data is required for application analysis, effect analysis, Targeted optimization, etc. In terms of recommendation systems, big data is needed to optimize related rankings, personalized recommendations, and hot click analysis.

 

  The general characteristics of these application scenarios are large amount of calculation and high efficiency requirements. Spark can meet these requirements. Once launched, the project has been widely concerned and praised by the open source community, and has developed into a hot open source in the field of big data processing in the past two years. project.

 

 

  Spark is implemented using the Scala language, which is an object-oriented, functional programming language that can manipulate distributed datasets as easily as local collection objects. It is suitable for most batch work, and has become the preferred technology for enterprise big data processing in the era of big data. Representative companies include Tencent, Yahoo, Taobao, and Youku Tudou.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326187701&siteId=291194637