Reduce Maven build execution time

Current state

We have a multi-module Spring Web MVC 4 application containing about 100k lines of code. We use Maven and Azure DevOps. A simple build pipeline can build and run all unit tests-about 2.8k. Well, to be honest, I call most of them component testing or even integration testing. Let's clear some definitions at the beginning.

What is a unit test?

There are many definitions of a unit test. I like the one by Roy Osherove:

"Unit testing is an automated piece of code that calls the unit of work to be tested and then checks some assumptions about the single end result of the unit. Unit testing is almost always written using a unit testing framework. It can be written easily and run quickly. It is trustworthy, readable and maintainable. As long as the production code has not changed, the results are consistent. "

There is a nice article about unit tests in Martin Fowler's bliki so I won't go into much details.

Suppose we expect the following from unit testing:

  • Quick feedback (must run fast) short (a small amount of code) test one thing (a logical assertion) failed for a reason

What is a component test?

Again a nice article in bliki.

Since our application uses Spring Data JPA, we have lots of repositories. Many of them contain custom query methods. We use 小号pring Test DBUnit to help us out testing them. The idea is simple - you setup a mockup database (H2), import some test data before a test, run a test, assert and then cleanup the DB end the end. Since this approach creates the DB and some (small) spring context with all necessary beans, I treat these tests as component tests.

These tests run longer than unit tests.

What is an integration test?

Ťhe bliki defines the purpose of integration tests as:

As the name suggests, the purpose of integration testing is to test whether there are many separately developed modules that work as expected.

In our example, we have the Spring Web MVC controller. These controllers expose RESTful APIs. For these, we usually create a complete spring context (with a simulated H2 database) and check it with MockMvc. The benefit of this type of testing is that you can test the API, including all basic services (business logic) and repositories (persistence). The huge disadvantage is the execution time. Creating a Spring context can take a long time (depending on the size of your application). In many cases, the context will be destroyed and recreated. All these add up to a total build time.

What tests do I need?

If you are now biased to lean towards just unit tests, as they are fast to run, and want to ditch your component/integration tests as they are slow - hang on ... not so fast. You need The tests at all levels. In a reasonable ratio. In large applications you might split the test runs to fast/slow to speedup the feedback loop. You also need UI driven tests, system tests, performance tests ... Ok, the whole topic about high quality testing of a software is very complex and you might start reading here if you are interested. Let's get back to our topic for now.

The path to improve the situation

There is a nice article written about what you can do. It tells you to re-think whether you need that many integration tests and rewrite them with unit tests. As described above, do it with caution.

It is important to define the test strategy early. Otherwise, the problem will become too large to finally conclude that you have to deal with many old tests, which are now costly to change. In companies and large-scale projects, this situation happens so often that it cannot be ignored. Whether old or not, we can still do something with reasonable effort and a positive return on investment.

Spring context

For various reasons, our Spring environment was created a total of 17 times during the test run. It takes about 30 seconds to start each time. This is 8.5 minutes added to the test run. You can use the suggestions in the above article to ensure that the context is only created once. We can save 8 minutes here.

DBUnit

If you use DBUnit, each test starts with importing data into DB and ends with cleanup. There are several strategies for this. Spring Test DBUnit expects these comments on the test method:

@DatabaseSetup(value = "data.xml")
@DatabaseTearDown(value = "data.xml")

The default strategy is CLEAN_INSERT for "setup" and "disassembly", in fact, delete all data from the affected table, and then insert them again. With pure DBUnit, you can configure it explicitly, for example:

AbstractDatabaseTester databaseTester = new DataSourceDatabaseTester(dataSource);
databaseTester.setSetUpOperation(DatabaseOperation.CLEAN_INSERT);
databaseTester.setTearDownOperation(DatabaseOperation.DELETE_ALL);

Consider changing these strategies to a faster speed, for example instead of CLEAN_INSERT For installation, only insert is used. Of course, if the previous test did not clean up your table correctly, this will cause problems. The CLEAN_INSERT disassembly can be replaced with a TRUNCATE_TABLE that deletes all even faster. There is a trap TRUNCATE_TABLE for more information about the H2 database, see the appendix.

We have about 1,200 tests running with DBUnit. Imagine if we saved 100ms of setup / disassembly time. You can save 4 minutes with relatively little effort.

Build time analysis

However, our situation was a bit more complicated. Our build execution time grew from about 9 minutes to something about 25 minutes in avarage over the past 6 months.
Pipeline duration
(the sudden peaks are caused by the fact that we recycle the build agents every now and then and hence the maven cache is gone).

In addition, we experienced some builds that took up to 45 minutes, and some even exceeded 1 hour.

We conducted a simple analysis of the build log. The test is configured to output the trace level to the console using the following command. There, we can see that certain operations suddenly took longer than usual. For example, database disassembly:

2020-03-13T12:59:21.9660309Z INFO  DefaultPrepAndExpectedTestCase - cleanupData: about to clean up 13 tables=[...]
2020-03-13T12:59:29.0545094Z INFO  MockServletContext - Initializing Spring FrameworkServlet ''

The first message is a test database dismantling, the second message is starting a new spring context for another test. Please pay attention to the time stamp. 7 seconds to clean up the database! Remember, we did it 1,200 times. After analyzing a build output (the build took 45 minutes), we calculated the total time spent cleaning the database-1800 seconds. 2/3 of the construction is spent on this. It certainly won't take that long.

Of course we already checked the infrastructure. The build agents are pretty decent t2.large EC2 instances with 2 vCPUs and 8GiB RAM powering Ubuntu. Should be OK.

I like quick and easy solutions to complex problems. We analyzed the build output using simple tools (shell and excel). Let us proceed step by step:

  1. extract the timestamps when cleanup DB starts:

    cat buildoutput.log | grep "DefaultPrepAndExpectedTestCase - cleanupData" | sed 's/\([0123456789T.:-]*\).*DefaultPrepAndExpectedTestCase.*/\1/g' > cleanup_start.out
    

    (grep for the lines matching the substring, then using sed to only output the timestamp)

  2. extract the timestamps of the next messages - this will roughly be the end time of the cleanups:

    cat buildoutput.log | grep "DefaultPrepAndExpectedTestCase - cleanupData" -A 1 | grep -v "\-\-" | grep -v "DefaultPrepAndExpectedTestCase" | sed 's/\([0123456789T.:-]*\).*/\1/g' > cleanup_end.out
    

    (grep for the lines matching the substring and 1 line after, grep out the first line and the line with '--', then using sed to only output the timestamp)

  3. Load both files to Excel

    • choose Data -> From Text/CSV (Alt+A, FT)
    • select the first file cleanup_start.out
    • Source and Change type steps should be added automatically, add a few more: Power Query Editor
    • the new Added column contains Time.Hour([Column1])*60*60+Time.Minute([Column1])*60+Time.Second([Column1])
    • the Removed columns step just deletes the Column1
    • now the excel contains the amount of seconds (incl. fraction) elapsed from the day start
    • repeat the same for second file cleanup_end.out
    • add diff column Diff column

    Now the diff column contains roughly the duration of the DB cleanup. I simply put a sum at the end of the column to calculate the 1800 seconds mentioned above.

  4. plot a chart

    • select the whole third column
    • add a new line chart Chart

    The chart now shows how log it took the DB cleanup during the build. You can see important information here. It was all fine and then we see sudden blocks of peaks. The pattern looks very suspicious.

    Short stare at the chart, some experience and the 'blink' moment - the Garbage Collector!!! This explains why it takes 7 seconds to cleanup the DB. Because the full GC runs in the background.

Improving the situation

We checked the POM of the affected module and then guessed surefire plugin configuration:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>${surefire.version}</version>
        <configuration>
            <argLine>${surefireArgLine} -Dfile.encoding=UTF-8 -Xmx1024m
            </argLine>
        </configuration>
</plugin>

Yes-Xmx1024m Java maximum heap space is explicitly set to 1024m. This is not enough for retesting. Even if we set Xmx in the build pipeline, it will not be picked up. This shows everything-when the heap space reaches maximum, a full GC will be called to free some memory. This fundamentally slowed down the execution of the test.

The first thing we do-increase Xmx to at least 2048m (or leave it blank to default to 1/4 of RAM, whichever is less).

Since we are here, let me tell you one more thing. If your test is good enough to run normally alone, and your hardware is powerful enough, please use the following configuration of surefire:

<configuration>
    <forkCount>2</forkCount>
    <parallel>classes</parallel>
    <threadCountClasses>2</threadCountClasses>
</configuration>

This setting means that there will be 2 surefire bootstrap branches (this person actually executes the test in an isolated JVM), it will use 2 threads to run the test and break the test suite into multiple classes to distribute . The important thing to remember here is that threadCountClasses are used in the same (forked) JVM, meaning that there will be 2 threads per fork. Therefore, if you encounter race conditions, simply reduce threadCountClasses to 1. The test will still run in parallel, but there is 1 thread per fork. And since each JVM H2 is usually 1, it should be fine. Unless you have another kind of shared data outside of the JVM.

The result we achieved-until 11 minutes the generation time starts at 25 minutes, which is only 44% of the previous time.

Conclusion

Understand the tools you use (eg MockMvc, DBUnit, Spring Test DBUnit, Surefire plugin). Analyze your build output. Don't just say that your application is growing, the test suite is growing, so the build time will be longer. Of course, this is correct, but you should always reserve some time to refactor and improve the code.

Appendix

DBUnit H2 TRUNCATE_TABLE operation

Truncate table is usually faster operation than delete from. There are some catches though. Read more about it here. H2 won't allow you to truncate table if there are foreign keys to that table. However, there is a special H2 syntax that you can make use of. We implemented the following DBUnit operation:

class H2TruncateOperation extends org.dbunit.operation.AbstractOperation
{
    private final Logger logger = LoggerFactory.getLogger(H2TruncateOperation.class);

    @Override
    public void execute(IDatabaseConnection connection, IDataSet dataSet)
            throws DatabaseUnitException, SQLException
    {
        logger.debug("execute(connection={}, dataSet={}) - start", connection, dataSet);
        IDataSet databaseDataSet = connection.createDataSet();
        DatabaseConfig databaseConfig = connection.getConfig();
        IStatementFactory statementFactory = (IStatementFactory) databaseConfig
                .getProperty("http://www.dbunit.org/properties/statementFactory");
        IBatchStatement statement = statementFactory.createBatchStatement(connection);

        try
        {
            int count = 0;
            Stack<String> tableNames = new Stack<>();
            Set<String> tablesSeen = new HashSet<>();
            ITableIterator iterator = dataSet.iterator();

            String tableName;
            while(iterator.next())
            {
                tableName = iterator.getTableMetaData().getTableName();
                if(!tablesSeen.contains(tableName))
                {
                    tableNames.push(tableName);
                    tablesSeen.add(tableName);
                }
            }

            if(!tableNames.isEmpty())
            {
                statement.addBatch("SET FOREIGN_KEY_CHECKS=0");
            }

            for(; !tableNames.isEmpty(); ++count)
            {
                tableName = tableNames.pop();
                ITableMetaData databaseMetaData = databaseDataSet.getTableMetaData(tableName);
                tableName = databaseMetaData.getTableName();
                String sql = "TRUNCATE TABLE " +
                             this.getQualifiedName(connection.getSchema(), tableName,
                                                   connection) +
                             " RESTART IDENTITY";
                statement.addBatch(sql);
                if(logger.isDebugEnabled())
                {
                    logger.debug("Added SQL: {}", sql);
                }
            }

            if(count > 0)
            {
                statement.addBatch("SET FOREIGN_KEY_CHECKS=1");
                statement.executeBatch();
                statement.clearBatch();
            }
        }
        finally
        {
            statement.close();
        }
    }
}

Example SQL statement:

SET FOREIGN_KEY_CHECKS=0;
TRUNCATE TABLE table1 RESTART IDENTITY;
TRUNCATE TABLE table2 RESTART IDENTITY;
SET FOREIGN_KEY_CHECKS=1;

You can use this operation to pass to the normal database unit configuration as follows:

AbstractDatabaseTester databaseTester = new DataSourceDatabaseTester(dataSource);
databaseTester.setTearDownOperation(new H2TruncateOperation());

Or implement a new database operation query (if you are using Spring Test DBUnit):

public class H2SpecificDatabaseOperationLookup extends DefaultDatabaseOperationLookup
{
    @Override
    public org.dbunit.operation.DatabaseOperation get(DatabaseOperation operation)
    {
        return operation == DatabaseOperation.TRUNCATE_TABLE ?
               new H2TruncateOperation() :
               super.get(operation);
    }
}

And use annotations:

@DbUnitConfiguration(databaseOperationLookup = H2SpecificDatabaseOperationLookup.class)

from: https://dev.to//vladonemo/reducing-the-maven-build-execution-time-1o7k

Published 0 original articles · liked 0 · visits 414

Guess you like

Origin blog.csdn.net/cunxiedian8614/article/details/105690030