Linux server start Tomcat service browser can not access the solution

I recently bought a lightweight application server from Alibaba Cloud, but after deploying Tomcat to start the service, the browser cannot access http://[public IP]:8080 for a long time. So I started a long debugging journey.

normal circumstances

1. Ensure that the Alibaba Cloud Security Group has opened port 8080

Insert picture description here

2. Ensure that the Linux firewall allows port 8080

# 查看firewall服务状态
systemctl status firewalld

# 开启、重启、关闭、firewalld.service服务
# 开启
service firewalld start
# 重启
service firewalld restart
# 关闭
service firewalld stop

# 查看防火墙规则
firewall-cmd --list-all    # 查看全部信息
firewall-cmd --list-ports  # 只看端口信息

# 开启端口
开端口命令:firewall-cmd --zone=public --add-port=8080/tcp --permanent
重启防火墙:systemctl restart firewalld.service

命令含义:
--zone #作用域
--add-port=80/tcp  #添加端口,格式为:端口/通讯协议
--permanent   #永久生效,没有此参数重启后失效

#查看端口是否正常监听
netstat -an | grep 8080
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN     

Under normal circumstances, here, start the Tomcat service, the browser can access, but there are so many normal situations.

Abnormal situation

1. Encountered a problem

Let me elaborate on the problems I encountered:

  1. After opening the Tomcat service, the browser can't access it, it keeps turning around~

  2. At this moment, I want to shut down the service and restart, the problem is coming

    SEVERE: Could not contact localhost:8005. Tomcat may not be running.
    SEVERE: Catalina.stop: 
    java.net.ConnectException: Connection refused
            at java.net.PlainSocketImpl.socketConnect(Native Method)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
            at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
            at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
            at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
            at java.net.Socket.connect(Socket.java:579)
            at java.net.Socket.connect(Socket.java:528)
            at java.net.Socket.<init>(Socket.java:425)
            at java.net.Socket.<init>(Socket.java:208)
            at org.apache.catalina.startup.Catalina.stopServer(Catalina.java:450)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:400)
    
            at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:487)
    

2. Find the reason

Found the reason here, SEVERE: Could not contact localhost:8005. Tomcat may not be running.

In other words, the Tomcat service has not been started , so I can't stop it at this time. Then why it hasn't started and no error is reported? ?

This is actually inaccurate, because we shut down Tomcat before Tomcat is fully started .

Then since there is no error, it means that the service startup will succeed, then I will wait for your success!

Waited for 6 minutes, finally waiting for you!

3. Principles of inquiry

So now that is to say, our operation is fine, but the service starts too slowly.

But why is it so slow? ? ?

I checked the process, and the JVM process where Tomcat is located has been started, so I can rule out the problem caused by the JVM exit. Then the problem is really that the JVM is blocked for some reason .

analysis

The program is blocked. Generally speaking, it must be waiting for a certain resource. The current situation is that all resources are sufficient. Think carefully about where I need to find Tomcat stopped? What happened in the code. I decided to try strace , which is a tool for tracking system calls . Many resource applications, whether it is Java or Pyhton, will become System Call. (Such as opening files, creating threads, reading and writing data, waiting for I/O) Through this tool, I can at least know which System Call Tomcat is stopped on , so that I can infer the cause of the problem.

When I was looking for the problem, he started it again, so we turned it off first.

./shutup.sh

Then follow up

strace -f -o strace.out ./catalina.sh run

strace has many parameters, I used two parameters

  • -f track the child process of fork, in layman's terms, it will track the system calls of all threads
  • -o output content to file

After this step is over, let's take a look at the results.

cat strace.out

The analysis method is from bottom to top (the blocked place is definitely at the end). First of all, we need to remove the System Call caused by Tomcat stop, they are not what we need. Search from back to front to find SIGINT.

Insert picture description here

The red part is the system call that caused the blocking. There are a lot of futex calls on it. They are a lightweight synchronization method in Linux, so we can judge that there must be a System on the top. Call is the real culprit of blocking . Skip all futex:

Insert picture description here

This read is the real reason for the next series of futex. Strace is very smart. It not only gives the System Call but also the passed parameters and return value. Read reads the file handle No. 61 without returning success (unfinished). .

Along this road, let's take a look at what file handle No. 61 is:

Insert picture description here

/dev/random is a random function generator under Linux. Reading it is equivalent to generating random numbers.

Searching for it, in simple terms, the random number generation of /dev/random depends on the system environment noise, such as mouse and keyboard operations.
When the noise data is not enough, there will be read blocking.

In-depth analysis

If you use Tomcat /dev/random as a keyword, you can basically answer our doubts. Tocmat's Session ID is calculated by the SHA1 algorithm, and a key must be provided when calculating the Session ID. To improve security, Tomcat will randomly generate a key when it starts. This random number is provided by the random function generator provided by Linux to provide a never-empty random byte data stream. Many encryption and decryption programs need to use the random number they provide.

Solve the problem

It is very simple to understand the cause of the problem-replace /dev/random with /dev/./urandom, and use a pseudo-random function generator (/dev/./urandom) to replace the random function generator (/dev/ random).

  • By modifying the Tomcat startup file catalina.sh
    --Djava.security.egd=file:/dev/./urandom
  • By modifying the java.security file securerandom.source=file:/dev/./urandom in the JRE

4. Solve the problem thoroughly

In Linux (CentOS) environment, random numbers can be generated from two special files, one is /dev/urandom and the other is /dev/random.

Their principle of generating random numbers is to use the entropy pool of the current system to calculate a fixed number of random bits, and then return these bits as a byte stream. Entropy pool is the environmental noise of the current system. Entropy refers to the degree of confusion of a system. System noise can be evaluated by many parameters, such as memory usage, file usage, and the number of different types of processes.

The two methods described above are replacing /dev/random with /dev/urandom. In fact, there is a third method-increasing the entropy pool of /dev/random. The cause of the problem is that the entropy pool is not large enough, so increasing it is the most thorough method.

Through the following command, we can view the current entropy pool size;

cat /proc/sys/kernel/random/entropy_avail
189

We need to find a way to increase this value. If your CPU has DRNG features, you can make full use of the hardware to increase the speed of entropy pool generation.

You can check whether your CPU supports it through cat /proc/cpuinfo | grep rdrand. Generally speaking, Intel’s Ivy_Bridge architecture CPUs are supported (i3 and i5 need to pay attention to whether this architecture is adopted, i7 and xeon are basically supported); AMD All CPUs generated after 2015 are supported. (If you are a virtual machine, you need to enable additional parameters). If your hardware doesn't support it, it doesn't matter, we can use /dev/urandom as the "entropy source".

Take Centos7 as an example,

  • yum install rngd-tools (or rng-tools) install rngd service (entropy service)

  • systemctl start rngd starts the service (for students who support DRNG features, you can cat /proc/sys/kernel/random/entropy_avail to view here)

    cat /proc/sys/kernel/random/entropy_avail
    3009
    
  • If your CPU does not support DRNG features, you can use /dev/urandom to simulate.

  • cp /usr/lib/systemd/system/rngd.service /etc/systemd/system

  • vim /etc/systemd/system/rngd.service

  • ExecStart=/sbin/rngd -f -r /dev/urandom

  • systemctl daemon-reload reload service

  • systemctl restart rngd restart service

5. Which solution to choose

I personally suggest to choose the third method. The entropy pool is not only used by Tomcat, but all applications under Linux will use this to generate random numbers, so not only Tomcat may be blocked . If you search will find Apache, Nginx, OpenSSL have been the issue pit too . If we solve this problem by modifying the configuration of Java, it is actually only a solution to the problem of Java applications. It can only solve the symptoms rather than the root cause. The cure should be to increase the speed of random number generation through rngd.

6. How to reproduce the fault

The failure described above can be easily reproduced

  • systemctl stop rngd stop rngd service (if you have started rngd)
  • View the size of the current entropy pool cat /proc/sys/kernel/random/entropy_avail
  • head -c1024 /dev/random, forced consumption of 1024 random numbers, the system will not respond for a long time. Direct ctrl+c
  • Check the size of the entropy pool cat /proc/sys/kernel/random/entropy_avail again to ensure that its size is as small as possible
  • Start tomcat, you will find that it is a long time to wait

Guess you like

Origin blog.csdn.net/The_Beacon/article/details/109219626