Comparison of several major databases

Hadoop's hdfs supports massive data storage, mapreduce supports distributed processing of massive data,
although oracle can build a cluster, when the amount of data reaches a certain limit, the query processing speed will become very slow and the performance requirements of the machine are very high.
In fact, these two things Not the same kind. Hadoop is a distributed cloud processing architecture that tends to data computing while oracle is a relational database that tends to data storage. To say the comparison can compare hbase and oracle.
Hbase is a nosql database, a columnar database that supports massive data storage and column expansion, but the query operation is more complicated, not as simple as a relational database such as oracle, and only supports one index, but Hbase has a reasonable table structure setting The query speed has little to do with the size of the data, that is, the size of the data will not affect the query speed. By the way, the query speed of HBase can reach the ms level .

PostgreSQL

PostgreSQL is POSTGRES developed by the Department of Computer Science at the University of California, Berkeley, and now Renamed POSTGRES, version 4.2 is based on an object-relational database management system (ORDBMS). PostgreSQL supports most of the SQL standard and provides many other modern features: complex queries, foreign keys, triggers, views, transactional integrity, MVCC. Likewise, PostgreSQL can be extended in many ways, for example, by adding new data types, functions, operators, aggregate functions, indexing methods, procedural languages. And, because of the flexibility of the license, anyone can freely use, modify, and distribute PostgreSQL for any purpose, whether private, commercial, or academic research.




Greenplum






In the currently used OLTP program, the user accesses a central database. If the SMP system structure is used, its efficiency is much faster than that of the MPP structure. The MPP system has shown advantages in decision support and data mining. It can be said that if the operations have nothing to do with each other and the communication between the processing units is relatively small, it is better to use the MPP system. On the contrary, it is not suitable. .









Software Advantages

Data Storage

Today is an era of constantly expanding data, and only a database system with MPP architecture can manage massive data.

Greenplum supports the storage and processing of 50PB (1PB=1000TB) level of massive data. Greenplum integrates data from different source systems, different departments, and different platforms into the database for centralized storage, and stores detailed historical data traces. Business users no longer need to Facing one information island after another, it is no longer confused by the deviation caused by different versions of data, and it also reduces the complexity of management and maintenance work for IT personnel.

High Concurrency

With the rapid development of business intelligence in the enterprise, the frequency of BI users' access to the information analysis platform and the query complexity are also rapidly increasing. Therefore, the corresponding database system is required to support high-concurrency queries. Greenplum provides concurrency support with powerful parallel processing capabilities.

Greenplum provides a resource management function (workload managemnt) to manage database resources. Resource queue management can be used to allocate resources according to user groups, such as the number of simultaneous active sessions and the maximum resource value. Through the resource management function, resources can be allocated according to the user level and the priority level of user SQL queries can be managed, and at the same time, the consumption of system resources by low-quality SQL (such as multi-table join without conditions, etc.) can be prevented.

Linear expansion

Like other distributed big data products such as Yonghong Z-DataMart, Greenplum adopts the general MPP parallel processing architecture. Adding nodes in the MPP architecture can linearly increase the storage capacity and processing capacity of the system. Greenplum is easy to operate when scaling nodes, and data redistribution can be completed in a very short time.

Greenplum's linear expansion support provides a technical guarantee for the future expansion of the data analysis system, and users can expand the capacity and performance according to implementation needs.

The cost-effective

Greenplum Database software system nodes are based on various open hardware platforms in the industry, such as PC Servers from manufacturers such as SUN/HP/DELL, etc., and can achieve high performance on ordinary x86 Servers, so the cost-effectiveness is very high. Compared with Compared with other dedicated systems for closed data warehouses, Greenplum's investment per TB is 1/5 or even lower than the former. Similarly, the maintenance cost of Greenplum products is much lower than that of similar manufacturers.

Speed ​​of Response

We are facing a rapidly changing market. Whoever perceives the needs and changes of the market first will be able to take the lead in the competition, gain the initiative, and remain invincible in the competition.

Greenplum realizes the real-time update of the data warehouse through the quasi-real-time and real-time data loading method, and then realizes the dynamic data warehouse (ADW). Based on the dynamic data warehouse, business users can conduct BI real-time analysis on current business data - "Just In Time BI", which enables enterprises to keenly perceive market changes and speed up decision support response.

High Availability

Greenplum is a highly available system. In existing cases, a cluster MPP environment of up to 96 machines is used. In addition to the hardware-level Raid technology, Greenplum also provides database layer mirror mechanism protection, that is, the data of each node is mirrored synchronously in other nodes, and the error of a single node does not affect the use of the entire system.

For the master node, Greenplum provides the Master/Stand by mechanism for master node fault tolerance. When an error occurs on the master node, you can switch to the Stand by node to continue the service.

Easy to use system

Greenplum products are developed based on the popular PostgreSQL. Almost all PostgreSQL client tools and PostgreSQL applications can run on the Greenplum platform. There are abundant PostgreSQL resources on the Internet for users to refer to.

The latest development

Greenplum was acquired by EMC, integrating it into EMC's cloud computing strategy.


In short: GP is developed on top of the open source PostgreSQL. GP is not open source, it is a commercial version, and PostgreSQL is open source.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326630781&siteId=291194637