POI data storage

  POI data storage means for recording and storing the result data in the whole data processing and the generated intermediate data, different data storage purposes are not the same in the process.

  For convenience of description, here referred to as the data access poI pp, result called the fusion poi

  According to different purposes, we store the data into several categories:

  1. Data output: The results of the process, that is, POI data, is the most important data.

  2. Data Calculation: intermediate results of a process, for example, each of the normalized output has accessed pp data (POI our output is determined that the same POI several pp resulting from the fusion, the result of each access pp normalized per times when the integration will be used)

  3. History Tracking: or troubleshoot or change data analysis, history POI how history is changed, such as the search for ancient water town in the north high moral map, returns results that contain the location of the ancient North Township, telephone, information and ticketing comments, these data may come from multiple data sources, we assume that the positional information derived from the Old high German North Watertown own data production, ticketing and other information from the Flying Pig platform data, comments from the public comments and other platforms, and these three sources pp respectively of access, then the poi data will be the basis of information from the beginning, gradually increase ticket information and review information, original poi data is the data associated with the corresponding analysis in order to discover the problem.

  Data for different purposes, the stored data may be the same, e.g. poi, output data to a user after the fusion or the service side of the result data, and this data is also used in the data calculation process, the process of tracing the history of poi the change is recorded each time the results poi data fusion. For data output depending on output, storage will be different, as users need real-time access might be appropriate ways should be kv database, of course, under the relational database performance to meet the situation is possible, but when data needs to be output when the batch, in particular, in relation to the national poi data when (the current high German, Baidu map, a map of the total amount of poi Tencent have reached about 70 million) database obviously can not meet the demand, the best time for data output via the hive; and historical relational data as needed according to id query multiple versions of history, this time should be a better way hbase or cost-effective kv database.

  In this process, the data output and data calculation must be stable and efficient performance to meet the requirements of the most urgent needs, instead of tracking history, will not affect the current online services, less urgent, and its data storage capacity is far 2 far exceeding demand, it is possible to achieve the performance requirements of the sacrificial portion.

Guess you like

Origin www.cnblogs.com/dlgh/p/12041060.html