Remember a flink local mode startup failure problem

1. Phenomenon: The xsink task is always starting, but occasionally it can start successfully, and the start fails. The following error is found in the log:
Exception in thread “main” java.io.IOException: Unable to open BLOB Server in specified port range: 0
at org. apache.flink.runtime.blob.BlobServer.(BlobServer.java:199)
at org.apache.flink.runtime.minicluster.MiniCluster.start(MiniCluster.java:319)
at org.apache.flink.client.program.PerJobMiniClusterFactory .submitJob(PerJobMiniClusterFactory.java:87)
at org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment .java:1812)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1713)
at org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:74)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1699)
2. Code investigation Process:
Through the log and tracking code, it is found that when the local mode of flink is started, a ServerSocket needs to be initialized and a port is bound. And this port is 0 by default, and when Socket binds to port 0, the system will select an available port according to the configuration range. Now it is found that there is no available port when selecting an available port, which causes the task to fail to start.
Insert picture description here

3. Port occupancy investigation
Confirm that the server system randomly selects the port range:
cat /proc/sys/net/ipv4/ip_local_port_range
Insert picture description here
has 32767 in total
. Command to view the port status in LISTEN: netstat -tulnp >./1.txt
Use command: netstat -anpt >./1.txt, check the port usage and find that there are 32745 ports in total including the TIME_WAIT state, and the port is seriously insufficient. The relevant calculation codes are as follows:

public class FileTest {
    
    

  public static void main(String[] args) throws IOException {
    
    
    File file = new File("C:\\Users\\Desktop\\222.txt");
    FileInputStream stream = new FileInputStream(file);
    BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
    String str = null;
    Set<String> ports = new HashSet<>();
    while ((str = reader.readLine()) != null) {
    
    
      Pattern pattern = Pattern.compile(".*([0-9]+\\.){3}([0-9]+)*:([1-9][0-9]*).*([0-9]+\\.){3}([0-9]+)*:([1-9][0-9]*)?.*");
      Matcher matcher = pattern.matcher(str);
      if (matcher.matches()) {
    
    
        String port = matcher.group(3);
        if (Integer.valueOf(port) >= 32768) {
    
    
          ports.add(port);
        }
        port = matcher.group(6);
        if (port != null) {
    
    
          if (Integer.valueOf(port) >= 32768) {
    
    
            ports.add(port);
          }
        }
      }
    }
    List<String> list = new ArrayList<>();
    list.addAll(ports);
    Collections.sort(list);
    StringBuilder sb = new StringBuilder();
    for (String p : list) {
    
    
      sb.append(p).append(",");
    }
    System.out.println(sb.toString());
    System.out.println("已经使用端口总数:" + ports.size());
  }
}

Insert picture description here

4. Solution:
By configuring the port segment of xsink, let flink obtain ports in the available port segment. The configuration scheme is as follows
Insert picture description here

By changing the above configuration, the tasks can be started successfully, but there are also problems. If the tasks are started too many and the ports are not enough, the scalability is limited, so after learning, I found that you can set the server TIME_WAIT port recovery time to speed up the port recovery solution:
echo "1"> ./tcp_tw_recycle, after adding this configuration, I found that the number of ports used has dropped a lot, programming 3497, the task starts normally, and the problem is solved perfectly.

Guess you like

Origin blog.csdn.net/myhappy_huang/article/details/115348793