How to ensure data is not lost kafka

Generally, we use the time in this part of the message will certainly be considering how to ensure that data is not lost, in the interview also asked relevant questions. Whenever

Encounter this problem, it means that the data is not lost in three areas, namely: producer consumer side data is not lost broker side data is not lost

Here we were from the three aspects to learn how kafka is to ensure that data is not lost

How to end a .producer production is to ensure that data is not lost

  1.ack configuration policy

  acks = 0 
    producers do not need to wait for any response after sending the message server, it has not sent a message no matter, if the process of sending an exception is encountered, leading broker does not receive a message, the message will be lost. In fact, it just
  sends a message to the socketBuffer (cache), and when it is submitted to the broker socketBuffer end does not care, it does not guarantee the broker end if you receive a message, but such a configuration to retry does not work, because producer
  end do not know whether the error occurred, and to always get offset of -1, because the broker end may not start writing data. Why do not insurance operations there such a configuration? kafka for gathering mass data, if a collected

  when a log is to allow a certain amount of data is lost, then such a configuration can be used to collect logs.
  
  acks = 1 ( the default value )
    after the message sent by the producer, as long as the copy partition leader message is successfully written, then it receives a successful response from the server. In fact, the message is sent only to the leader leader after receiving the message will be returned to the producer ack end.
  If a message can not be written leader (when circumstances election, downtime, etc.), the production will receive an error response, in order to avoid the message is lost, the producer can choose to resend the message if the message is successfully written, the other copies are synchronized when data leader
  crashes, or the entry of data is lost, because the new leader is elected does not receive this message, ack 1 is set to message reliability and throughput compromise.

  acks = all (or -1)
    Producers After sending the message, we need to wait for all the copies after the ISR have successfully written messages to be able to successfully receive a response from the server, in this case the same configuration environment configuration can achieve the strongest reliability. Namely: sending
  a message, you need to synchronize leader after the data, that is, all of the broker queue ISR After all save this message, a message will be sent to the ack to fllow, expressed sent successfully.
  
    

2.retries configuration policy

  In kafka error is divided into two kinds, one is recoverable, the other is not recoverable. 
  Recoverability of error:
      in case when the leader of the election, the network jitter, etc. These abnormalities, if we configured at this time retries greater than zero, that is, the operation can be retried, then wait until after the election of leader, network after stabilization, the
    exception is the message, an error will be restored, the normal data will be sent again to the retransmission end broker. Note that the time interval retries (retry) between to ensure retry recoverable errors have been restored.
  Unrecoverable errors:
      such as: sending a message beyond the maximum (max.request.size), this error is unrecoverable, if not treated, then the data will be lost, so we need to pay attention in the event of abnormal when these messages written to the DB, the cache
    local file, etc., these unsuccessful data recorded Once bug fixes, broker then sends the data to the terminal.

  We talked about above two CIs effect, the following combination of how to use the actual scene

 3. How to choose

1. availability type 
  configuration: acks = all, retries> 0 retries time interval (set time interval and retries may be recovered in accordance with the actual situation)
  Advantages: This ensures that the end of each producer transmits a message to be successful, and if unsuccessful the message is cached, and so abnormal after recovery sent again.
  Disadvantages: This ensures high availability, but this leads to the throughput of the cluster is not very high, because after sending data to the broker, leader of data you want to sync on fllower, if the network bandwidth and unstable situation, ack response time longer
2. compromise type
  configuration: acks = 1 retries> 0 retries time interval (set time interval and retries may be recovered in accordance with the actual situation)
  advantages: guarantees message throughput and reliability, is a compromise solution
  disadvantages: performance in the intermediate 2 by

3. high throughput type
  configuration: acks = 0
  advantages: relatively tolerate some data loss, the throughput, can receive a large number of requests
  disadvantages: not know whether the message is successfully sent

 

How two .consumer end is to ensure that data is not lost

  1.consumer end configuration

the group.id: Consumer Group a packet ID
 auto.offset.reset =     Earliest (first) / Latest (latest)
 enable.auto.commit = to true / to false

 The only production Key: https://www.cnblogs.com/zeussbook/p/11284396.html

 

Guess you like

Origin www.cnblogs.com/MrRightZhao/p/11498952.html