Spring JPA: What is the cost of saveandflush vs save?

skyman :

I have an application built from a set of microservices. One service receives data, persists it via Spring JPA and Eclipse link and then sends an alert (AMQP) to a second service.

Based on specific conditions, the second service then calls a RESTfull web service against the persisted data to retrieve the saved information.

I have noticed that sometimes the RESTfull service returns a null data set even though the data has been previously saved. Looking at the code for the persisting service, save has been used instead of saveandflush so I assume that data is not being flushed fast enough for the downstream service to query.

  • Is there are cost with saveandflush that I should be weary of or should is it reasonable to use it by default?
  • Would it ensure immediacy of data availability to downstream applications?

I should say that the original persistence function is wrapped in @Transactional

Edwin Dalorzo :

Possible Prognosis of the Problem

I believe the issue here has nothing to do with save vs saveAndFlush. The problem seems related to the nature of Spring @Transactional methods, and a wrongful use of these transactions within a distributed environment that involves both your database and a AMQP broker; and perhaps, add to that poisonous mix, some basic misunderstandings of how JPA context works.

In your explanation, you seem to imply that you start your JPA transaction within a @Transactional method, and during the transaction (but before it has committed) you send messages to an AMQP broker; and later, at the other side of the queue, a consumer application gets the messages and makes a REST service invocation. At this point point you notice that the transactional changes from the publisher side have not yet been committed to the database and therefore are not visible to the consumer side.

The problem seems to be that you propagate those AMQP messages within your JPA transaction before it has committed to disk. By the time the consumer reads a message and process it, your transaction from the publishing side may not be finished yet. So those changes are not visible to the consumer application.

If your AMPQ implementation is Rabbit, then I have seen this problem before: when you start a @Transactional method that uses a database transaction manager, and within that method you use a RabbitTemplate to send a corresponding message.

If your RabbitTemplate is not using a transacted channel (i.e. channelTransacted=true), then your message is delivered before the database transaction has committed. I believe that by enabling transacted channels (disabled by default) in your RabbitTemplate you solve part of the problem.

<rabbit:template id="rabbitTemplate" 
                 connection-factory="connectionFactory" 
                 channel-transacted="true"/>

When the channel is transacted, then the RabbitTemplate "joins" the current database transaction (which apparently is a JPA transaction). Once your JPA transaction commits, it runs some epilogue code that also commits the changes in your Rabbit channel, which forces the actual "sending" of the message.

About save vs saveAndFlush

You might think that flushing the changes in your JPA context should have solved the problem, but you'd be wrong. Flushing your JPA context just forces the changes in your entities (at that point just in memory) to be written to disk, but they are still written to disk within a corresponding database transaction, which won't commit until your JPA transaction commits. That happens at the end of your @Transactional method (and unfortunately some time after you had already sent your AMQP messages — if you don't use a transacted channel as explained above).

So, even if you flush your JPA context, your consumer applicatipn won't see those changes (as per classical database isolation level rules) until your @Transactional method has finished in your publisher application.

When you invoke save(entity) the EntityManager needs not to synchronize any changes right away. Most JPA implementations just mark the entities as dirty in memory, and wait until the last minute to synchronize all changes with the database and commit those changes at database level.

Note: there are cases in which you may want some of those changes to go down to disk right away and not until the whimsical EntityManager decides to do so. A classical example of this happens when there is a trigger in a database table that you need it to run to generate some additional records that you will need later during your transaction. So you force a flush of the changes to disk such that the trigger is forced to run.

By flushing the context, you’re simply forcing a synchronization of changes in memory to disk, but this does not imply an instant database commit of those modifications. Hence, those changes you flush won't be necessarily visible to other transactions. Most likely they won't, based on traditional database isolation levels.

The 2PC Problem

Another classical problem here is that your database and your AMQP broker are two independent systems. If this is about Rabbit, then you don't have a 2PC (two-phase commit).

So you may want to account for interesting scenarios, e.g. your database transaction successfully commits, but then Rabbit fails to commit your message, in whose case you will have to repeat the entire transaction, possibly skipping the database side effects and just re-attempting to send the message to Rabbit.

You should probably read this article on Distributed transactions in Spring, with and without XA, particularly the section on chain transactions is helpful to address this problem.

They suggest a more complex transaction manager definition. For example:

<bean id="jdbcTransactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
    <property name="dataSource" ref="dataSource"/>
</bean>

<bean id="rabbitTransactionManager" class="org.springframework.amqp.rabbit.transaction.RabbitTransactionManager">
    <property name="connectionFactory" ref="connectionFactory"/>
</bean>

<bean id="chainedTransactionManager" class="org.springframework.data.transaction.ChainedTransactionManager">
    <constructor-arg name="transactionManagers">
        <array>
            <ref bean="rabbitTransactionManager"/>
            <ref bean="jdbcTransactionManager"/>
        </array>
    </constructor-arg>
</bean>

And then, in your code, you just use that chained transaction manager to coordinate both, your database transactional part, and your Rabbit transactional part.

Now, there is still the potential that you commit your database part, but that your Rabbit transaction part fails.

So, imagine something like this:

@Retry
@Transactional("chainedTransactionManager")
public void myServiceOperation() {
    if(workNotDone()) {
        doDatabaseTransactionWork();
    }
    sendMessagesToRabbit();
}

In this manner, if your Rabbit transactional part failed for any reason, and you were forced to retry the entire chained transaction, you would avoid repeating the database side effects and simply make sure to send the failed message to Rabbit.

At the same time, if your database part fails, then you never sent the message to Rabbit and there would be no problems.

Alternatively, if your database side effects are idempotent, then you can skip the check, just reapply the database changes and just re-attempt to send the message to rabbit.

The truth is that initially what you're trying to do seems deceivingly easy, but once you delve into the different problems and understand them you realize it is a tricky business to do this the right way.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=430907&siteId=1