In a recent project, we chose to use a state machine engine to express the state flow because of the flow of many states. Because of the expressive power brought by the state machine DSL (Domain Specific Languages), compared to the if-else code, it is more elegant and easier to understand. On the other hand, the state machine is very simple, not as flashy as the process engine.

At the beginning we chose an open source state machine engine, but I didn't think it was easy to use, so I wrote a simple state machine that can meet our requirements, which compares KISS (Keep It Simple and Stupid).

As a part of COLA open source, I have open sourced the state machine (cola-statemachine), you can visit https://github.com/alibaba/COLA to get it.

In the process of implementing the state machine, I was lucky to see "Domain Specific Languages" written by Martin Fowler . The content of the book gave me a different understanding of DSL.

This is why there is this article. I hope that after reading this article, you can have a different understanding of what DSL is, how to use DSL, and how to use state machines .

DSL

Before introducing how to implement a state machine, let us first take a look at what is a DSL, in Martin Fowler's "Domain Specific Languages" book. The beginning of the article is to introduce DSL with State Machine as an introduction. If you have time, I strongly recommend that you read this book. If you don’t have time, just look at the following content to get a general idea.

Let me refine the content of the book below and take you to an in-depth understanding of DSL.

What is DSL

DSL is a tool, and its core value is that it provides a means to communicate the intent of a certain part of the system more clearly.

This clarity is not just an aesthetic pursuit. The easier a piece of code is to understand, the easier it is to find errors, and the easier it is to modify the system. Therefore, we encourage variable names to be meaningful, documentation to be clear, and code structure to be clear. For the same reason, we should also encourage the adoption of DSL.

By definition, DSL is a computer programming language with limited expressivity for a specific field. This definition contains 3 key elements:

Language nature: DSL is a programming language, so it must have the ability to express coherently-whether it is an expression or a combination of multiple expressions.
Limited expressiveness: General-purpose programming language provides a wide range of capabilities: supporting various data, control, and abstract structures. These abilities are useful, but they also make language difficult to learn and use. DSL only supports the smallest set of features required in a particular field. Using DSL, a complete system cannot be built, on the contrary, it can solve a certain aspect of the system.
Domain focus: This kind of language with limited capabilities is only useful in a small, clear domain. This field makes this language worthwhile.

For example, regular expressions /\d{3}-\d{3}-\d{4}/are a typical DSL, which solves the problem of string matching in this specific field.

Classification of DSL

According to the type, DSL can be divided into three categories: internal DSL (Internal DSL), external DSL (External DSL), and language workbench (Language Workbench).

Internal DSL is a specific usage of a universal language . The script written with the internal DSL is a legal program, but it has a specific style, and only uses a part of the language features to deal with a small aspect of the entire system. Programs written in this DSL have a custom language style, which is different from the host language they use. For example, our state machine is the Internal DSL, it does not support script configuration, and it is still Java when used, but it does not prevent it from being a DSL.

     builder.externalTransition()
                .from(States.STATE1)
                .to(States.STATE2)
                .on(Events.EVENT1)
                .when(checkCondition())
                .perform(doAction());

External DSL is a language "different from the main language used by the application system" . External DSLs usually use custom grammars, but it is also common to choose grammars in other languages (XML is a common choice). For example, the XML configuration files used by systems like Struts and Hibernate.
Workbench is a dedicated IDE . Simply put, the workbench is a productized and visual form of DSL.

The three categories of DSL have a progressive relationship from the front to the back. The Internal DSL is the simplest and has a low implementation cost, but it does not support "external configuration". Workbench not only realizes the configuration, but also realizes the visualization, but the implementation cost is also the highest. Their relationship is shown in the figure below:

How to choose different DSL

Several DSL types have their own usage scenarios. When choosing, you can make a judgment like this.

Internal DSL: If you just want to increase the comprehensibility of the code without external configuration, I suggest using Internal DSL, which is simple, convenient and intuitive.
External DSL: If you need to configure during Runtime, or if you do not want to redeploy the code after the configuration, you can consider this approach. For example, if you have a rule engine and want to add a rule without re-publishing the code, you can consider External.
Workbench: Whether it is configuration or DSL Script, this thing is not user-friendly. For example, in Taobao, various commodity-oriented activities and control rules are very complicated and change rapidly. We need to provide a workbench for operations to allow them to set various rules and take effect in time. The workbench at this time will be very useful.

All in all, with the right solution in the right place, you can't eat fresh all the time . Just like the most notorious DSL, the process engine , it belongs to the type of serious abuse and transitional design, and it is a type of complicated simple problems.

It is best not to increase complexity for no reason. However, it is not easy to be simple, especially in large companies. We not only have to write code, but also can accumulate "NB technology". It is best to be the kind that can let the boss say for a moment. Technology, as Nicholas said in "Anti-fragility":

In modern life, a simple approach has been difficult to achieve because it goes against the spirit of some people who strive to seek complexity to justify their work.

Fluent Interfaces

When writing a software library, we have two choices. One is to provide Command-Query API, and the other is Fluent Interfaces. For example, Mockito's API when(mockedList.get(anyInt())).thenReturn("element")is a typical coherent interface usage.

Fluent interfaces are an important way to implement Internal DSL . Why do you say that?

Because of the improved readability and comprehensibility brought by Fluent's coherence, its essence is not only to provide APIs, but also a domain language and an Internal DSL.

For example, Mockito's API when(mockedList.get(anyInt())).thenReturn("element")is very suitable for use in the form of Fluent. In fact, it is also a DSL for unit testing in this specific area.

If this Fluent is replaced by Command-Query API, it will be difficult to express the domain of the test framework.

String element = mockedList.get(anyInt());
boolean isExpected = "element".equals(element);

It should be noted here that the coherent interface can not only provide method chaining and builder mode cascade calls, such as the Builder in OkHttpClient

OkHttpClient.Builder builder=new OkHttpClient.Builder();
        OkHttpClient okHttpClient=builder
                .readTimeout(5*1000, TimeUnit.SECONDS)
                .writeTimeout(5*1000, TimeUnit.SECONDS)
                .connectTimeout(5*1000, TimeUnit.SECONDS)
                .build();

His more important role is to limit the order of method calls. For example, when building a state machine, we can only call the to method after calling the from method. The Builder mode does not have this function.

How to do it We can use the combination of Builder and Fluent interface to achieve, I will further introduce the following state machine implementation part.

state machine

Okay, so much about DSL. Next, let us see how to implement an Internal DSL state machine engine.

State machine selection

I am opposed to abusing process engines, but I do not exclude state machines, mainly for the following two reasons:

First of all, the realization of the state machine can be very lightweight. The simplest state machine can be realized with an Enum , which is basically zero cost.
Secondly, using the DSL of the state machine to express the flow of state, the semantics will be clearer, and the readability and maintainability of the code will be enhanced .

However, although our business scenario is not particularly complicated, it still exceeds the scope of Enum which only supports linear state transfer. So I have to look outside first.

Open source state machine is too complicated

Like the process engine, there are not many open source state machine engines. I focused on the implementation of two state machine engines, one is Spring Statemachine and the other is Squirrel statemachine. This is the current implementation of the Top 2 state machine on github. Their advantage is that they have complete functions , but their disadvantages are also complete functions.

Of course, this can’t be blamed on the author of the open source software. It’s hard to open up a project. At least all the functions listed on the UML State Machine must be supported.

As far as our project is concerned (in fact, most projects are like this). I really do not need to play so many senior state machine: such as nested state (substate), a parallel state (parallel, fork, join), sub-state machine, and so on .

Poor performance of open source state machines

In addition, there is another problem that I cannot tolerate. These open source state machines are all stateful. On the surface, the state mechanism should of course maintain state. But think about it more deeply, this kind of state is not necessary, because there is state, the instance of the state machine is not thread-safe, and our application server is distributed and multi-threaded, so every time the state machine is receiving the request At that time, they have to rebuild a new state machine instance.

Taking e-commerce transactions as an example, after the user places an order, we call the state machine instance to change the state to "Order Placed". When the user pays for the order, it may be another thread or another server, so we must recreate a state machine instance. Because the original instance is not thread safe.

This kind of new instance per request approach, not to mention the power consumption. If the construction of the state machine is complicated and the QPS is high, performance problems will definitely be encountered.

In view of complexity and performance (company electricity bills) considerations, we decided to implement a state machine engine ourselves. The design goals are very clear and there are two requirements:

Concise state machine that only supports state flow, and does not need to support advanced gameplay such as nesting and parallelism.
The state machine itself needs to be Stateless (stateless), so that a Singleton Instance can serve all state transfer requests.

State machine implementation

State machine domain model

Given that our appeal is to implement a state machine that only supports simple state transfer, the core concept of the state machine is shown in the figure below, including:

State: State
Event: event, the state is triggered by an event, causing a change
Transition: Circulation, which means from one state to another
External Transition: external transition, transition between two different states
Internal Transition: internal transfer, transfer between the same state
Condition: Condition, indicating whether to allow reaching a certain state
Action: Action, what can be done after reaching a certain state
StateMachine: State machine

The core semantic model of the entire state machine (Semantic Model) is also very simple, as shown in the following figure:

Note: The reason why it is called Semantic Model here is the terminology in the "DSL" book. You can also understand it as a domain model of a state machine. Martin uses the word Semantic to say that the external DSL script stands for Syntax, and the model inside stands for Semantic. I think this metaphor is still very appropriate.

OK, the core code of the state machine semantic model is as follows:

//StateMachine
public class StateMachineImpl<S,E,C> implements StateMachine<S, E, C> {
    
    

    private String machineId;

    private final Map<S, State<S,E,C>> stateMap;

    ...
}

//State
public class StateImpl<S,E,C> implements State<S,E,C> {
    
    
    protected final S stateId;
    
    private Map<E, Transition<S, E,C>> transitions = new HashMap<>();
    
    ...
}

//Transition
public class TransitionImpl<S,E,C> implements Transition<S,E,C> {
    
    

    private State<S, E, C> source;

    private State<S, E, C> target;

    private E event;

    private Condition<C> condition;

    private Action<S,E,C> action;
    
    ...
}

Fluent API for state machines

In fact, the code I use to write the Builder and Fluent Interface is even more than the core code, for example, our TransitionBuilder is written like this

class TransitionBuilderImpl<S,E,C> implements ExternalTransitionBuilder<S,E,C>, InternalTransitionBuilder<S,E,C>, From<S,E,C>, On<S,E,C>, To<S,E,C> {
    
    

    final Map<S, State<S, E, C>> stateMap;

    private State<S, E, C> source;

    protected State<S, E, C> target;

    private Transition<S, E, C> transition;

    final TransitionType transitionType;

    public TransitionBuilderImpl(Map<S, State<S, E, C>> stateMap, TransitionType transitionType) {
    
    
        this.stateMap = stateMap;
        this.transitionType = transitionType;
    }

    @Override
    public From<S, E, C> from(S stateId) {
    
    
        source = StateHelper.getState(stateMap, stateId);
        return this;
    }

    @Override
    public To<S, E, C> to(S stateId) {
    
    
        target = StateHelper.getState(stateMap, stateId);
        return this;
    }

    @Override
    public To<S, E, C> within(S stateId) {
    
    
        source = target = StateHelper.getState(stateMap, stateId);
        return this;
    }

    @Override
    public When<S, E, C> when(Condition<C> condition) {
    
    
        transition.setCondition(condition);
        return this;
    }

    @Override
    public On<S, E, C> on(E event) {
    
    
        transition = source.addTransition(event, target, transitionType);
        return this;
    }

    @Override
    public void perform(Action<S, E, C> action) {
    
    
        transition.setAction(action);
    }

}

Through this method of Fluent Interface, we ensure the sequence of Fluent calls. As shown in the figure below, you can only call from after externalTransition, and you can only call to after from, thus ensuring that the semantics of the state machine construction is correct. Sex and continuity.

Stateless design of state machine

So far, I have finished introducing the core model of the state machine and the Fluent interface. We also need to solve a performance problem, which is what I said earlier, to make the state machine stateless .

Analyzing the open source state machine engines on the market, it is not difficult to find that the reason why they have state is mainly that they maintain two states in the state machine: the initial state and the current state. If we can If these two instance variables are removed, statelessness can be achieved, so that a state machine only needs one instance.

The key is whether these two states can be dispensed with? Of course, the only side effect is that we cannot get the current state of the state machine instance. However, I don't need to know, because we use a state machine, just accept the source state, check the condition, execute the action, and then return to the target state. It just implements a DSL expression of state flow, nothing more, the entire operation can be completely stateless.

After adopting the stateless design, we can use a state machine Instance to respond to all requests, and the performance will be greatly improved.

Use state machine

The realization of the state machine is very simple, and its use is not difficult. As shown in the following code, it shows all three transition methods supported by the cola state machine.

StateMachineBuilder<States, Events, Context> builder = StateMachineBuilderFactory.create();
        //external transition
        builder.externalTransition()
                .from(States.STATE1)
                .to(States.STATE2)
                .on(Events.EVENT1)
                .when(checkCondition())
                .perform(doAction());

        //internal transition
        builder.internalTransition()
                .within(States.STATE2)
                .on(Events.INTERNAL_EVENT)
                .when(checkCondition())
                .perform(doAction());

        //external transitions
        builder.externalTransitions()
                .fromAmong(States.STATE1, States.STATE2, States.STATE3)
                .to(States.STATE4)
                .on(Events.EVENT4)
                .when(checkCondition())
                .perform(doAction());

        builder.build(machineId);

        StateMachine<States, Events, Context> stateMachine = StateMachineFactory.get(machineId);
        stateMachine.showStateMachine();

It can be seen that this internal DSL state machine significantly improves the readability and understandability of the code. Especially in the relatively complex business state flow, for example, the following figure is the plantUML diagram in our actual project that we generated with cola-statemachine. Without the support of state machines, business code like this would be difficult to understand and maintain.

This is the core value of DSL-to more clearly express the design intent and business semantics of a certain part of the system. Of course, the configurability and flexibility brought by External DSL are also very valuable, but cola-statemachine has not yet supported it. The reason is very simple and it is temporarily not available.

At last

Finally, if you think this article is useful to you, also support my new book- "The Road to Code Improvement" by the way

At the end of the day, my team is recruiting talents. If you are a little confused on the road of technological development, you might as well come to my team .

Implement a state machine engine to teach you the essence of DSL

DSL