GreenPlum 与hadoop

gp can handle large amounts of data, and hadoop can handle massive amounts.
gp can only handle lake or river volume. It cannot handle massive amounts.

 

Greenplum adopts the PostgreSQL framework, which is an important application of the PostgreSQL system. From this point of view, GreenPlum is a relational database. The Hadoop framework is a distributed platform design concept. It is not a database by itself. Impala can be considered as a non-relational database, and Hive is equivalent to SQL.

 

GreenPlum's components are divided into three parts: MASTER/SEGMENT and GNET, an efficient interconnection technology between MASTER and SEGMENT. Among them, MASTER and SEGMENT are their own independent database SERVER. The difference is that MASTER is only responsible for the connection of the application, generating and splitting the execution plan, assigning the execution plan to the SEGMENT node, and returning the final result to the application. It only stores some metadata of the database and is not responsible for the operation, so it will not. become the bottleneck of system performance. This is also an important difference between GREENPLUM and traditional MPP schema databases. The SEGMENT node stores the user's business data, and is responsible for processing the business data according to the obtained execution plan. That is, the data of the user relationship table will be scattered and distributed to each SEGENGT node. When data access is performed, first all segments process data related to themselves in parallel. If segments are required, they can interact with each other through interconnect. The more segment nodes, the more scattered the data will be, and the faster the processing speed will be. Therefore, unlike the SHARE ALL database cluster, by increasing the number of SEGMENT node servers, the performance of GREENPLUM will increase linearly. GREENPLUM is a typical relational database product and a query-oriented relational database. Its main features are fast query speed, fast data loading speed, and fast batch DML processing. And performance can increase linearly with the addition of hardware, with very good scalability. Therefore, it is mainly suitable for analysis-oriented applications. Based on the advanced machine learning functions of Apache MADLib, GreenPlum supports fast and complex query analysis and meets the needs of various BI users. So, greenplum is a distributed database system. Apache hadoop is a framework for large-scale distributed computing, involving distributed storage HDFS, distributed parallel computing framework MapReduce, Hadoop Yarn job scheduling and cluster resource management framework, and hadoop architecture-related frameworks HBase, Hive, Pig, ZooKeeper, and the hot spark. It can be seen that hadoop is more like a distributed computing framework. More and more application frameworks will use the hadoop framework to complete big data analysis. You can even deploy Greenplum to hadoop to complete the analysis and processing of big data.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326487007&siteId=291194637