How to limit the number of active Spring WebClient calls

PVS :

I have a requirement where I read a bunch of rows (thousands) from a SQL DB using Spring Batch and call a REST Service to enrich content before writing them on a Kafka topic.

When using the Spring Reactive webClient, how do I limit the number of active non-blocking service calls? Should I somehow introduce a Flux in the loop after I read data using Spring Batch?

(I understand the usage of delayElements and that it serves a different purpose, as when a single Get Service Call brings in lot of data and you want the server to slow down -- here though, my use case is a bit different in that I have many WebClient calls to make and would like to limit the number of calls to avoid out of memory issues but still gain the advantages of non-blocking invocations).

Edwin Dalorzo :

Very interesting question. I pondered about it and I thought of a couple of ideas on how this could be done. I will share my thoughts on it and hopefully there are some ideas here that perhaps help you with your investigation.

Unfortunately, I'm not familiar with Spring Batch. However, this sounds like a problem of rate limiting, or the classical producer-consumer problem.

So, we have a producer that produces so many messages that our consumer cannot catch up, and the buffering in the middle becomes unbearable.

The problem I see is that your Spring Batch process, as you describe it, is not working as a stream or pipeline, but your reactive Web client is.

So, if we were able to read the data as a stream, then as records start getting into the pipeline those would get processed by the reactive web client and, using back-pressure, we could control the flow of the stream from producer/database side.

The Producer Side

So, the first thing I would change is how records get extracted from the database. We need to control how many records get read from the database at the time, either by paging our data retrieval or by controlling the fetch size and then, with back pressure, control how many of those are sent downstream through the reactive pipeline.

So, consider the following (rudimentary) database data retrieval, wrapped in a Flux.

Flux<String> getData(DataSource ds)  {
    return Flux.create(sink -> {
        try {
            Connection con = ds.getConnection();
            con.setAutoCommit(false);
            PreparedStatement stm = con.prepareStatement("SELECT order_number FROM orders WHERE order_date >= '2018-08-12'", ResultSet.TYPE_FORWARD_ONLY);
            stm.setFetchSize(1000);
            ResultSet rs = stm.executeQuery();

            sink.onRequest(batchSize -> {
                try {
                    for (int i = 0; i < batchSize; i++) {
                        if (!rs.next()) {
                            //no more data, close resources!
                            rs.close();
                            stm.close();
                            con.close();
                            sink.complete();
                            break;
                        }
                        sink.next(rs.getString(1));
                    }
                } catch (SQLException e) {
                    //TODO: close resources here
                    sink.error(e);
                }
            });
        }
        catch (SQLException e) {
            //TODO: close resources here
            sink.error(e);
        }
    });
}

In the example above:

  • I control the amount of records we read per batch to be 1000 by setting a fetch size.
  • The sink will send the amount of records requested by the subscriber (i.e. batchSize) and then wait for it to request more using back pressure.
  • When there are no more records in the result set, then we complete the sink and close resources.
  • If an error occurs at any point, we send back the error and close resources.
  • Alternatively I could have used paging to read the data, probably simplifying the handling of resources by having to reissue a query at every request cycle.
  • You may consider also doing something if subscription is cancelled or disposed (sink.onCancel, sink.onDispose) since closing the connection and other resources is fundamental here.

The Consumer Side

At the consumer side you register a subscriber that only requests messages at a speed of 1000 at the time and it will only request more once it has processed that batch.

getData(source).subscribe(new BaseSubscriber<String>() {

    private int messages = 0;

    @Override
    protected void hookOnSubscribe(Subscription subscription) {
        subscription.request(1000);
    }

    @Override
    protected void hookOnNext(String value) {
        //make http request
        System.out.println(value);
        messages++;
        if(messages % 1000 == 0) {
            //when we're done with a batch
            //then we're ready to request for more
            upstream().request(1000);
        }
    }
});

In the example above, when subscription starts it requests the first batch of 1000 messages. In the onNext we process that first batch, making http requests using the Web client.

Once the batch is complete, then we request another batch of 1000 from the publisher, and so on and so on.

And there your have it! Using back pressure you control how many open HTTP requests you have at the time.

My example is very rudimentary and it will require some extra work to make it production ready, but I believe this hopefully offers some ideas that can be adapted to your Spring Batch scenario.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=97004&siteId=1