Big data industry: Appreciate the profound impact of big data on human society, economy, business, government affairs and other aspects from different perspectives

Author: Zen and the Art of Computer Programming

1 Introduction

The richest man in the field of big data, the famous Buffett is one who stands out in this industry. The "5Vs" theory he proposed is based on massive data and fully explores, integrates, analyzes, interacts and feedbacks the five core values.
  
  This series of articles aims to share the development trends of cutting-edge technologies in the field of big data, and provide in-depth analysis based on specific cases. It is hoped that readers can appreciate the profound impact of big data on human society, economy, commerce, government affairs and other aspects from different perspectives, enhance their understanding, grasp and application of big data technology, and further promote scientific and technological progress and economic development.
  Everyone is welcome to provide valuable opinions and jointly promote the development of the big data field.

2. Explanation of core concepts and terminology

1. Hadoop
    ① Hadoop is an open source project initiated by the Apache Foundation and is used for distributed storage, data processing and ultra-large-scale computing. It provides Hadoop Distributed File System (HDFS) and MapReduce computing framework. Hadoop has high fault tolerance, reliability, elastic scalability and good scalability.
    ② The Hadoop ecosystem includes multiple open source products such as Apache Hive, Apache Pig, Apache HBase, Apache Mahout, and Apache Spark. Among them, Hive can be used for data warehouse construction, ETL and data query; Pig can be used for large-scale data processing; HBase is a columnar database that can be used to quickly retrieve massive data; Mahout is a machine learning library that can Used to implement complex machine learning algorithms; Spark is a fast and versatile cluster computing system that can be used for real-time data processing, real-time stream processing and offline analysis.
  2. Data warehouse
    ① Data warehouse is a collection of data that integrates from multiple business departments or systems. The purpose is to speed up information acquisition, improve decision-making capabilities and meet user needs. It is generally modeled using a star model or a snowflake model. Data warehouse usually consists of two levels: dimension and measure.
 

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132138106