The main demand
Statistics for the general scale OLAP queries, need to find a stable, high-performance Big Data database, the specific use
- Real-time data can be written to and queries, concurrency is not very high tps
- Data warehouse, mainly star model, the model snowflake pattern, or widetable
- Shows the distal end divided into 3 categories saiku, granafa, c # code development
- Data body mass: 300-500 million in the fact table, large dimension tables around 5 million
- Data Integration: kettle can now be seamlessly integrated
Based on the above requirements, the early use tidb, but not very good in general olap query performance scale, off-line calculation using tipark also, but can not meet the system requirements on time, initial understand greenplum mpp architecture. Therefore, a simple comparison of early
The basis of test data table shows
data sheet
Line width table, the data table field is approximately 300
The basic test results - does not include the concurrent test
The basic cluster configuration:
Greenplum 4 sets of 8-core 56G, 9 th segments: Column deposit, no index
tidb: 6 sets of eight-core 56G, ssd
tpc-ds
tpc-h
The remaining test -
summary
- For OLAP queries, statistical analysis of performance greenplum better than tidb
- Without using the index greenplum, spreads a lot worse than tidb, after corresponding increase in the index, performance is almost, but not recommended index greenplum
- greenplum scene stored in the column, the column number of queries greater impact on performance.
The next step verification
1. Performance under the stars model, considering the fact that 300 million table, dimension tables 5 million,
Are 230 million of orders require the use of the partition table
3. Does the report can be used to export the scene gp
4. sqlserver whether the stored procedure can be migrated to greenplum