The Road to Big Data, Alibaba Big Data Practice Reading Notes --- Chapter 14, Storage and Cost Management

  • In the era of big data, applications such as mobile internet, social networks, data analysis, and cloud services have rapidly become popular, which has brought about revolutionary demands for data centers. Storage management has become one of the cores of IT. For the explosive growth of data, storage management also faces a series of challenges. How to effectively reduce the consumption of storage resources and save storage costs will be the goal pursued by storage management;

 

1. Data compression

  • In a distributed file system, in order to improve the availability and performance of data, it is usually said that data storage is 3 points, which means that storing 1TB of logical data will actually occupy 3TB of physical space. Currently, the archive compression method is provided in MacCompute. It uses a compression algorithm with a higher compression ratio, which can save the data as a RAID file. The data is no longer simply saved as 3 points, but the Pangu RAID file default is used. The value (6, 3) format file, that is, 6 points of data + 3 check blocks, can effectively store the ratio of about 1: 3 to 1: 1.5, which can save about half of the physical space. Of course, there is a certain risk of using archive compression, if a block is damaged or a machine is damaged immediately. Therefore, the archive compression method must be applied to the compressed storage of cold standby data and log data. For example, for some very large log data of the Amoy system, the usage frequency of the underlying data exceeds a certain period of time is very low, but there is sufficient and necessary data that is unrecoverable. For this part, you can consider archive compression of the historical data partition and use RAID. file to store to save storage space.

 

Alter table A partition(ds=‘20130101’) archive;

 

 

In the output information, you can see the logical storage (File size) and before and after the archive

Guess you like

Origin blog.csdn.net/u012965373/article/details/105509524