dantax parameter tuning

dantax parameter tuning

1.Speed ​​tuning

Insert image description here
It may cause data skew
. Different processing speeds may cause the job to be very slow.

For example, for example, the total speed limit is 100 records per second, the first channel speed is 99 records per second, and the second channel speed is 99 records per second. 1 record, which adds up to 100 records each. Assuming that each channel needs to process 10,000 pieces of data, then the first channel will be processed very early, and the second channel will consume a longer time. Similar to data skew in hadoop, in order to avoid this problem, we need to configure the rate limit of a single channel
Insert image description here
Insert image description here

Memory optimization

When increasing the number of concurrent channels in the Datax Job, the memory usage will increase significantly, because datax, as a data exchange channel, will cache more data in the memory. For example, there will be a Buffer in the Channel as a buffer for temporary data exchange, and there will also be some Buffers in some Readers and Writers. In order to prevent OOM and other errors, the JVM heap memory needs to be increased.

  1. It is recommended to set the memory to 4G or 8G
  2. It is recommended to add the corresponding parameters when starting: python datax/bin/datax.py --jvm="-Xms8G -Xmx8G" /path/to/your/job.json

Guess you like

Origin blog.csdn.net/m0_37759590/article/details/132710141