Big data platform, computing platform, storage platform and other various technology integration and deployment solutions

Author: Zen and the Art of Computer Programming

1 Introduction

In recent years, with the rapid development of Internet technology, big data technology has also shown explosive growth, generating massive amounts of data through data collection, processing, and analysis. How to effectively utilize the value of big data has become more and more urgent, so big data-related cloud service providers such as Amazon AWS and Microsoft Azure have emerged. In addition to traditional IT technology stacks, cloud services also involve some big data platform technologies, such as Hadoop, Spark, Hive, Pig, etc. These platform technologies can help users quickly build, manage, and maintain big data platforms. In order to successfully use the big data platform, users need to install corresponding components on the cloud platform, configure cluster parameters, then run related application jobs and conduct corresponding data analysis.
Based on the above reasons, this article will use Amazon AWS as an example to explain the integration and deployment plan of big data platform technology. The solution described is mainly suitable for scenarios such as data analysis, machine learning, high-performance computing, massive data storage, and data exchange. The article will elaborate on the following aspects:

  1. Data Lake Infrastructure Architecture

  2. Selection of big data computing engines

  3. Enterprise-level computing cluster hardware selection

  4. HDFS storage optimization strategy

  5. The use of Hive/Impala and its configuration tuning

  6. The use of Spark Streaming and its configuration tuning

  7. Configuration adjustments for Yarn resource manager

  8. Data warehouse construction plan

  9. SQL query optimization and slow log troubleshooting

  10. Choice of visualization tools

  11. Choice of Hadoop cluster management tools

  12. Machine learning system architecture and principles

About the author: Wang Yanan (Tencent) - cloud computing related product manager, ten years of cloud experience. Served in first-tier Internet companies such as Tencent, Alibaba, and Baidu

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132002258