Experience in studying some complex java open source software codes

    Sometimes it is very interesting to look at the source code, like thinking about games, like thinking about chess games...
    Usually in J2EE projects, I have always focused on doing business. If I use a framework, I am more about writing beans. dao, service, action, and functions are mainly additions, deletions, changes, and inspections. Such code must be boring, but I have analyzed several open source system codes before, and found that it is very interesting to study those codes, and some designs naturally find prototypes in life, or feel that they are designing a factory for processing. The product, or feeling, is to design a playground to serve the public. The most common experience when looking at these codes is the following:
    [Deep] - The design ideas in the code reflect the depth of the author's thinking;
    [Regulation] - The style specification of the code reflects the author's working attitude and the meaning of understanding the specification;
    [Comprehensive] -Reflects the modular style, integrates things together, and embodies the characteristics of high aggregation and low coupling;
    [Bo]-Extensive use of new technologies, reflecting the author's in-depth research on similar codes or technologies.
    This article focuses on the Druid analysis of Alibaba that I have studied before (Druid is Alibaba's connection pool product, known as monitoring), to talk about experience, and how to do it step by step if you are facing such functional requirements. produce such a product. The three basic characteristics of object-oriented are: encapsulation, inheritance, polymorphism, but when designing complex software, the most experienced are the following points ( the writing is a bit long-winded, but some processes need to be carefully understood, saving you from looking back Process ):

1. Combination (or holding, citing) is one of the most important technologies
    . The essence of human beings is the combination of various social relationships. Human society is complex because of the interweaving of various human body and organizational relationships.
1. Long-term combination
    Whether it is a car, a fly, or even a person, it is composed of countless subsystems and countless parts, so combination is an important technology to realize complex software.
    The combination is mainly a class as a reference attribute of another class. It can be simply said that if you know where the other party is, you can only use the other party's functions if you refer to the other party. A lot of times, it refers to each other. I know the other party, and the other party also knows me. The common code is that when I refer to the other party, I pass my this to the other party.
    A more complicated situation is that a complex object has been composed, and the complex counterpart has established a reference relationship with the other counterpart, so that the other object can use the objects in the complex object. If in a large software system, the reference between objects is very complex... Only by abstracting the dynamic and static model of the core object relationship can complex software be made.
    There is also an object held internally by the object is the thread object, which has been serving itself.
    For example: DruidDataSource is a data source object, it must have many properties, and the important objects it holds are:       
    DruidConnectionHolder[], for the time being, think that the data source has an array object, and all the connections of this data source are placed, the so-called connection pool. .
    CreateConnectionThread, DestroyConnectionThread, these two are thread objects for creating and destroying connections. Just like in a large room, if there are many people, it will dynamically increase the number of fluorescent lamps, and if there are few people, reduce the number of fluorescent lamps, and dynamically change the size of the connection pool
    . A data source service, if you fight, you will use a lock.
    List<Filter>, this is a list of filters at a glance, since a data source holds filters. Then the filters must be configured independently for different data sources. Why not configure them uniformly? Of course, flexibility, why not configure them for smaller objects? Maybe it's not necessary to be so detailed, this is also a trade-off of experience. Why not configure the connection object, the connection object is constantly generated and destroyed, which is unstable. Why not configure and build an object, such as putting ReentrantLock and List<Filter> in, and let the data source hold this new object? Quite simply, a company will not combine a business department with a functional department to form a new department. Why not just hold each filter separately? First of all, the number is not fixed, so it is more flexible to configure the data source. In addition, the same tool is of course better to be directly grouped, and the general will not directly manage the soldiers one by one.
    The serivce and dao in the usual code realize the simple relationship of long-term combination through spring, and they are held in one direction. When there was no spring in the past, some people passed it in through the construction method, some passed it in when using it, and some set the properties directly, which was messy.

2. Time combination (hold)
    Time combination is generally a relatively stable object, and the process of dealing with a changing object may be more complicated than the above. Similar to providing services, for example, a doctor and a hospital are a relatively stable combination of one person, but a doctor and a patient are a temporary combination. It is similar to the relationship between the printer and the paper. It is empty when entering, and there is pattern text when it comes out.
    Each connection in druid is an object of change, a bit like a soldier in a barracks, a bit like a student in a school, a soldier in an iron camp. If there is less, it needs to be supplemented, and if there is more, it will be discharged from the army. DruidConnectionHolder[] holds the connection, and the two threads count from time to time.
    Another important change object in druid is the filter chain FilterChain. The filter filter is mentioned earlier, which holds the DruidDataSource by the data source. How are these three related? For example, if you go for a medical examination, the medical checklist in each person's hand is the filter chain, each department (doctor) is the filter, and the hospital is the data source. Every time a new medical examiner comes, a medical checklist will be generated, and the medical checklist will hold the object of the hospital. You cannot cross-hospital checkups halfway through. In other words, if there is a factory with several processing equipment, each processing order is a filter chain. It feels like the filter chain is actually a light temporary object, but the filter is a heavy permanent object.
    What is the actual processing process like? First, a data source holds a set of filters (such as statistical filters, comparison log filters, and security filters). Each time an object to be monitored is generated, such as getgetConnection, if the data source is configured with a filter, a filter chain is generated. filterChain (each filter chain holds the same data source, and the filter can be found when the data source is held), and the filter chain is responsible for filtering before and after the actual execution of the function. The core of the filter chain is a counter, just like a check mark when a project is completed in a physical examination. The last operation in the filter chain must be the last function directly executed. Before that, the filters in the data source held by the filter chain were handed over to filter one by one, and the location was marked when filtering.

    To summarize temporarily, to execute a function, first generate a filter chain, find filters one by one on the filter chain to filter, and finally execute the official function.
    If it is the same as the physical examination, it is relatively simple to filter one by one and then perform the core functions. But what we found more complicated is that when the filter chain calls the filter, does it pass itself and the data source to the filter? Why pass yourself to the filter? Why give the medical form to the doctor? Why do I have to tell the doctor which hospital this is? I don't give the doctor a medical examination form. I have done one medical examination and I will mark it myself, why not do the next one?
    In fact, what is considered is that the filter does not necessarily filter before the core function, but can also filter after the core function is completed. There is a problem with recursive calls. It is you who come to me for a physical examination, but my doctor (filter) requires me to do other physical examinations and core functions before I do my steps (filtering), your things will be pressed on me, so you have to give me a physical examination Single and hospital, I arrange for the next doctor (filter) to work first, and the next work requires your receipt and other doctor (filter - held by the data source, so the data source is passed in) information, because also Maybe the next doctor will do the same, pass me the document and the hospital, and I will let the document go to the next step, generating a call stack.

    Take the process of obtaining the connection getConnection from DruidDataSource as an example to review the whole process and why some parameters are passed:
    1. If the data source is configured with a filter, a new filter chain filterChain is required, and a this is passed to represent the data source. Otherwise get the connection directly.
    2. If filterChain is required, the task of obtaining the connection is handed over to it. Why not just make it filter? After completing it, return it to yourself to get it? The reason is as mentioned above, filtering is the core function before and after, there is recursion.
    3. Since the core function is done by filterChain, it also has the conditions to do this. Although it really needs DruidDataSource to do it, then you need to pass DruidDataSource as a parameter to filterChain, or pass your own this to it , is to let it inform itself when it is appropriate to do it. Similar patterns such as listeners, callbacks are similar. Note that this (long-term combination) is passed when new in 1, and this (time combination) is passed when doing things, which will be explained later.
    4. The method of filterChain is to filter the initiator with the core function, take a look at its dataSource_connect method. If the counter indicates that there is still a filter, the filter will generate a connection to return; if the counter indicates that it has been completed, it will directly generate a connection to return. Does it feel like there is already a bit of recursive call here?
    5. filterChain now calls filter to do things. We can guess that the filter will not directly do the core thing. So what parameters are passed when the filter is narrated? First of all, filterChain has to pass itself in, because after the filter is done, let filterChain do it next, filterChain let the next filter do it, and the next one will call back filterChain, if there is still another filter to do
    ... In addition, the DruidDataSourcec parameter is also passed to the filter. The filterChain has to do the core work, so this parameter is needed. Why does the filter need this? In fact, it is returned to it when filter calls back filterChain. In a realistic scenario, for example, I was preparing to eat with a bowl. When I thought of going to WC, I told others about myself and let him hold the bowl. When he called me back later, he returned the bowl to me. I am also a little unclear here. For example, when the new one in 1 came out, I have been holding this bowl. When I want to eat it, I have to tell me where this bowl is. When I went to WC, I actually held this bowl all the time. I don't have to give the bowl to someone else, and someone else has to give it back to me. Don't be so troublesome, right?
    7. After the filterChain passes itself and the data source to the filter, the filter will do its own thing, and will call the filterChain again, and return the bowl to it. These two things can be designed successively according to the needs, maybe make your own records before calling, maybe make your own records after calling.
    8. Finally, let’s mention the agent introduced later. Filterchian finally does the core work, such as generating a connection or generating a resultset. These objects are all objects after the original object is wrapped.



    Summary:
    It can be seen that the data source (hospital), filter (doctor), and each filter chain (document) are complicated to call and hold between these three core objects. In fact, think about the dynamic process. It's easy to understand. In fact, the filter configured in web.xml in J2EE has a dofilter method, and the core is doing the same. This is not directly copying the code, but copying the idea.
    Speaking of recursion, mutual recursion between objects is slightly more complicated than method recursion. Since they call each other, they must refer to each other. Here is the mutual reference of filter and filter chain. I call you and pass me to you, you Call me again, pass you to me... what? It's not a filter, it's a data source, it's not just that the data source holds a filter.

    Extension:
    I have also seen a code that uses freemarker, passing the object to the template, using the object in the template, and then calling the template from the object... It is also a relatively rare code.
    In addition, I saw a code on the Internet once. Three threads need to print their own content in turn. The original code needs a lock, and after the thread acquires the lock, it needs to judge whether it can run by itself, and notify other wait() after running. Thread, but it may still wake up itself, and there is a problem with efficiency. But I suddenly thought, is it possible to use recursion to string together three thread objects, the three objects should be held by a coordinator object, and each object only has a reference associated with it. Although this is still a multi-threaded relationship, there is no efficiency problem. Another realistic scenario can be imagined: an adult directs three children to eat, let the first one eat and tell him after eating (the adult passes himself to the child), and when he really eats and tells the adult, the child needs to eat himself. Tell the adults (children pass themselves on to the adults) so the adults can tell who is next and check how much is left. Invoke true in turn until finished.
   
    One more question, I have seen some analysis before, and I am a little dizzy when I look at the class diagram, and it is too simple to look at the swimming lane. I don't know any UML diagrams that can clearly express this dynamic and static design. Maybe it's an initial class change, maybe it's a flash animation of the class call change process, maybe a common recursive object call can be a template in UML.


2. Proxy proxy (or wrap wrap, adapter adapter) is one of the important technologies.
    It has a certain relationship with the combination mentioned above. If the combination part is a complete module that has been developed by others, and you want to use it, then you need to Note the use of this technique.
    The most obvious thing in druid's design is that almost all existing objects related to JDBC operations have become Proxy objects. Of course, in order to monitor the operation of each JDBC object, if you insist on inserting what you want to do in each step, then each object must install a proxy. Everything is handed over to the agent, the agent does the added things (filtering) in the middle, and then the core object does the original thing.
    java.sql.Connection is proxied as ConnectionProxy. The proxy object must hold the proxied object. Of course, there are also inherited ones. In the inherited method, after the things you have done are done, you will do the official work of super.
    Or go back to druid for monitoring, that is to say, when any operation is performed, it must be monitored. In other words, it is to be filtered by the filter once. In fact, a filter chain will be generated. This chain is performing this operation. Produced in the process, and died, the life cycle is very short. It can be seen that the filter should be a singleton, not holding process data, and this filter chain is generated by each action, holding the counting function, and recursively calling between them.

    Take createBlob() in ConnectionProxyImpl as an example:
    1.createChain()---It should be generated by each operation, and the source code is almost new, but why write it like that?
    2.Blob value = chain.connection_createBlob(this);---The starting point of recursively calling filter must be the method of filterchain, and the final function must be in filterchain.
    3.recycleFilterChain(chain);---After each method of ConnectionProxyImpl is called, the count is reset to 0. In fact, there is no need to re-create a new one. As long as the count is set to 0, it is new. It means that all the methods in this ConnectionProxyImpl cannot be called concurrently, otherwise there will be a problem, so write it.
    4. When the connection is generated, there is a fillterchain, and the connection itself holds a fillterchain. Every time the connection does something, the count is reset. The core function is still completed by the native object it holds.

    Fundamentally speaking, the caller does not know what is actually called, it is the original object, the proxy object, the adapter, and the adapter may not know who to adapt to, it depends on the parameters of the caller. When it comes to adapters, the ones I see the most are Ali's dubbo, which is used a lot. For example, adapters are used to adapt to different communication methods.


3. The understanding of the core functions of complex software is not very complicated, but it is quite difficult to realize
    To do it well, the knowledge must be very comprehensive, that is, depth, breadth, and norms, and the development of each place must not only understand the overall situation, but also consider the same details as above. Such as so many knowledge points: mbean,spi,mock,nio,protocal.zookeeper,classloader,redis,factory,serilize,anotation,multicase,netty,invoke,threadpool,reentrantLock,handler,holder,LoadBalance,Cluster,ConsistentHash,md5, sha1, LRU... also need to cooperate with other existing products, such as parser, init...
    Seeing so many technologies, compared with the ones we usually use in our projects, it is really a sky and an underground. If you learn java, think in java is just an introductory book. Only by reading a few brick-thick books, keeping up with the latest technology, and standing on the basis of predecessors, can you make a perfect product.

    Salute to the cattlemen of Alibaba!

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326743628&siteId=291194637
Recommended