[The bottleneck problem encountered in the development of Mycat]

These two problems are not only the problems encountered by Mycat at present , but also the problems encountered by other distributed systems. The solution can only be a compromise, either time for space, or space for time.

1. Multiple Aggregation Problems

For example, I have a log table: department, user, module, access time.... Now there is a requirement as follows: real-time statistics of which department and which user, at a certain moment, which system module accesses the most .

 

           select  

                  department, user, access time, module, count(*) as cn

           from a table

           group by  department, user, access time, module

           order by cn  desc

 

When encountering massive data, mycat stops directly

 

 

2. Deep paging problem

 

Deep paging in a clustered system

To understand why deep paging is problematic, let's assume searching in an index with 5 primary shards. When we request the first page of results (results 1 to 10), each shard generates its own top 10 results and returns them to the requesting node, which then sorts all 50 results to select Top 10 results.

Now suppose we request page 1000 - results 10001 to 10010. Both work the same way, except each shard must produce the top 10010 results. Then request the node to sort these 50050 results and discard 50040!

 

You can see that in a distributed system, the cost of sorting results grows exponentially as the paging goes deeper. This is why any statement in a web search engine cannot return more than 1000 results.

Why does requesting page 1000 - results 10001 to 10010 need to return 10010 results?

Because according to different dimension statistics, the sorting position of each piece of data in the whole system is not clear, so it is necessary to aggregate the results of 10010 of each machine to do the final sorting.

 



 

 

 



 

 



 


 
 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326991265&siteId=291194637