How to manage backpressure with Apache Beam

Victor Bartel :

I have very basic apache beam pipeline which runs on GCP Dataflow and reads some data from a PubSub, transforms it and writes it to a Postgres DB. All this is done with standard readers/writers components of Apache Beam. The issue is when my pipeline starts to receive really big amount of data, my Postgres end suffers of deadlock errors due to awaits of ShareLocks.

It's obvious that such things happen because of overflowing at Postgres end. My pipeline tries to write too quickly and too many things at a time, so to avoid such situation it merely should slow down. Thus we may use a mechanisme such as backpressure. I've tried to dig out any information about backpressure configuration for Apache Beam and unfortunately, the official documentation seems to be silent about such matters.

I get overwhelmed with following kind of exceptions:

java.sql.BatchUpdateException: Batch entry <NUMBER>
<MY_STATEMENT>
 was aborted: ERROR: deadlock detected
  Detail: Process 87768 waits for ShareLock on transaction 1939992; blocked by process 87769.
Process 87769 waits for ShareLock on transaction 1939997; blocked by process 87768.
  Hint: See server log for query details.
  Where: while inserting index tuple (5997152,9) in relation "<MY_TABLE>"  Call getNextException to see other errors in the batch.

I would like to know if there is any backpressure toolkit or something like that to help me manage my issue without writing my own PostgresIO.Writer.

Many thanks.

Alexey Romanenko :

Assuming that you use JdbcIO to write into Postgres, you can try to increase the batch size (see withBatchSize(long batchSize)), which is 1K records by default, what is probably not enough.

Also, in case of SQL exception, and you want to do retries then you need to make sure that you use a proper retry strategy (see withRetryStrategy(RetryStrategy retryStrategy)). In this case, FluentBackoff will be applied.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=154266&siteId=1