Distributed deployment OpenDevOps reports 502 error problem solving process

When using a distributed method to build OpenDevOps, it was built according to the official website's guidance document https://docs.opendevops.cn/zh/guide/install/distribute/ . In the end, a 502 error was reported. It took more than half a day. After finding the problem, I finally solved it, now record it. The specific solution process is as follows

Report an error

After setting up the front-end and back-end, use commands to check whether the services are healthy before deploying the gateway, and find that all returns 200, and then go directly to the next step.
Insert picture description here
Use the command to test after deploying the gateway

[root@localhost api-gateway]# curl -I -X GET -m 10 -o /dev/null -s -w %{http_code} http://gw.opendevops.cn:8888/api/accounts/are_you_ok/

A 502 error is displayed.
Insert picture description here
Go to the FAQ section of the document and found that there are solutions to the 502 error.
Insert picture description here
Since 502 is a gateway configuration error, we need to check the gateway configuration and DNS configuration. Then I will look at these two aspects again.

DNS check

DNS is deployed in strict accordance with the content of the document, and it shows 200 when checking the health of each service before. When pinging all domain names, it can be pinged. It is reasonable to say that the domain name is no problem. But just in case, check the configuration file.

Check the /etc/resolv.conf file

[root@localhost api-gateway]# vim /etc/resolv.conf

The content of the file is as follows:
Insert picture description here
where 192.168.134.156 is the IP address of the internal DNS, it is indeed placed in the first item, and there is no problem.

Check the /etc/resolv.dnsmasq file

[root@localhost api-gateway]# vim /etc/resolv.dnsmasq

The content of the file is as follows: The
Insert picture description here
build document says that this file is used to set up the upstream DNS, but I tested it before and I used a proxy to surf the Internet. If this DNS is configured, the OpenDevOps domain name cannot be pinged, so I commented it out.
No problem here

Check the /etc/dnsmasqhosts file

[root@localhost api-gateway]# vim /etc/dnsmasqhosts

The content of the file is as follows:
Insert picture description here
I deploy all modules on a stand-alone machine, all domain names point to the same IP, and the parsed result is also resolved to the same IP, so I took a closer look here and there is no problem.
Later, I thought about the DNS listening port is 53/udp port. Is the service in docker unable to connect to this port when checking the dns server through the dns configuration, resulting in the 502 error report caused by not getting the IP, I telnet on other servers I checked this port and found that it was really blocked, so I opened port 53 on the firewall:

[root@localhost api-gateway]# firewall-cmd --zone=public --add-port=53/udp --permanent

Overload firewall rules

[root@localhost api-gateway]# firewall-cmd --reload

Then found out, still not possible. The 502 error still exists.

But port 53 is indeed to be opened, because later my 502 problem was solved and I tried to close port 53 and found that there was indeed a 502 error, and the log showed
Insert picture description here

At this point, the dns check was completed, and finally restarted the dns without giving up, and found that the result was still not working, all domain names dig can be resolved, but the gateway still reported 502.
Can only continue to the next step

Gateway check

Gateway check is actually to follow the steps of building the gateway before and re-read it.
Check the nginx.conf file

[root@localhost api-gateway]# vim /opt/codo/api-gateway/conf/nginx.conf

The content of the file is as follows:
Insert picture description hereOnly the dns address of the resolver is modified in nginx.conf, which is indeed the address of the local dns, no problem.

Check the gw.conf file

[root@localhost api-gateway]# vim /opt/codo/api-gateway/conf/conf.d/gw.conf

The content of the file is as follows: After reading it
Insert picture description here
carefully, I think it’s good to keep this part by default. I haven’t modified it before and there is no need to modify it. This part is fine.

Check the configs.lua file

[root@localhost api-gateway]# vim /opt/codo/api-gateway/lua/configs.lua

The content of the file is as follows: The
Insert picture description here
upper part is mainly concerned with whether the redis configuration and token_sercret, rewrite_cache_tocken and the management backend configuration file /opt/codo/codo-admin/settings.py are consistent with the token_secret and secret_key. There is no problem after careful comparison here. The
Insert picture description here
second half is mainly to check whether the rui and domain name ports match. There is no modification here. Keep the default. After careful comparison, there is no problem.
Check the Dockerfile

[root@localhost api-gateway]# vim /opt/codo/api-gateway/docker-compose.yml

The content of the file is as follows: The
Insert picture description here
Dockerfile file is actually copied and pasted directly from the document, without modification at all, just check whether it is consistent with the document. I checked and there is no problem.
At this point, the gateway file check is complete.
I always feel that this shouldn't be the case. Every step is based on the document, and an error will be reported in the end. I re-download the source code, import the image, modify the configuration and compile and start docker. Finally, I checked hopefully. Still reported 502.

View gateway log-port open

Following the troubleshooting ideas of the official website, I did not find the problem. At this time, I suddenly thought of looking at the gateway log (I should have looked at the gateway log at the beginning, but at the time I thought about following the official website and did not pay attention)

The error log of the gateway is: /usr/local/openresty/nginx/logs/error.log. There is no special note in the official documentation, but it can be found in the configuration file.

I saw an error in the log

The log was not kept clear at the time, this is a post-simulation

No route to host is displayed, and the proper port is not open. So the following ports were opened on the firewall

[root@localhost supervisor]# firewall-cmd --zone=public --add-port=8010/tcp --permanent
[root@localhost supervisor]# firewall-cmd --zone=public --add-port=8020/tcp --permanent
[root@localhost supervisor]# firewall-cmd --zone=public --add-port=8030/tcp --permanent
[root@localhost supervisor]# firewall-cmd --zone=public --add-port=8040/tcp --permanent
[root@localhost supervisor]# firewall-cmd --zone=public --add-port=8050/tcp --permanent
[root@localhost supervisor]# firewall-cmd --zone=public --add-port=8060/tcp --permanent
[root@localhost supervisor]# firewall-cmd --zone=public --add-port=8080/tcp --permanent
[root@localhost supervisor]# firewall-cmd --zone=public --add-port=8888/tcp --permanent
[root@localhost supervisor]# firewall-cmd --zone=public --add-port=9900/tcp --permanent
[root@localhost supervisor]# firewall-cmd --reload

Then check the gateway log, it didn’t report the no route to host error, and started reporting (5: Operation refused)
smileInsert picture description here

This rejected indicates that the domain names of the backend modules are connected, but the connection is refused. I began to suspect that some modules may not start normally, so I went to check the backend modules.

View each module log

Front
end The log path of the front end is under /var/log/nginx. There are a total of 6 log files, of which error.log is the error log. It mainly depends on whether there is an error in error.log.

[root@localhost api-gateway]# vim /var/log/nginx/error.log

Insert picture description here
Only when the domain name is unreachable at the beginning of the deployment, because my test machine is on the internal network and cannot directly access the Internet, it uses a proxy to connect to the external network. After the proxy is set, the front-end nginx will report an error if it cannot find the domain name, cancel the proxy Then restart the front end.

Background
All logs are in the background / var / log / supervisor / Under
Insert picture description here
wherein:

  • cmdb_cron.log and cmdb.log are the logs of the asset management module
  • codo_dns is the log of the domain name management module
  • cron_jobs.log and cron.log are logs of the timing task module
  • exec_task.log, task_cron_app.log, task_other.log and task_scheduler.log are logs of task system modules
  • kerrigan.log is the log of the configuration center module
  • tools.log is the log of the operation and maintenance tool module
  • mg.log is the log of the management backend.

Error messages such as mysql, redis, mq, etc. that cannot be connected may appear in the log, as shown below:
Insert picture description herePort 5672 is the default port of RabbitMQ. The error message here shows that RabbitMQ cannot be connected. Check that it is an IP configuration error of mq. If there are errors in these logs, mainly check the configuration of the relevant modules. If the configuration confirms that there is no problem or an error is reported, it may be that the port is not open on the firewall. Use the following command to open port 5672 (the default ports of redis and mysql are preferably open too)

[root@localhost supervisor]# firewall-cmd --zone=public --add-port=5672/tcp --permanent

Overload firewall rules

[root@localhost supervisor]# firewall-cmd  --reload

After restarting the module and looking at the log, there is no error.
Insert picture description hereAfter checking all the modules through the above method, and then restarting the gateway, the test found that the 502...502 error is actually not related to the health of the background service, because if the service does not start normally, the browser will report 500. Wrong, but it is better to check.
Go to view the log of the gateway

[root@localhost supervisor]# vim /usr/local/openresty/nginx/logs/error.log

Found still reported (5: Operation refused)
Insert picture description here

View gateway logs-problem solving

I have read all the places that can go wrong, but the problem is still not solved. Finally decided to start with the log and start Baidu. Baidu searched for "5: operation refused nginx", and I found this link one by one:

"Pit" in NGINX resolver configuration

There is a paragraph saying that
Insert picture description hereso I entered the docker container to check the version of my nginx

[root@localhost supervisor]# docker exec -it api-gateway_gateway_1 bash
[root@84229c61f571 sbin]# cd /usr/local/openresty/nginx/sbin/ && ./nginx -v

Seeing that the version is openresty/1.15.8.1, it is indeed higher than the 1.11.5 version.
Insert picture description here
So I modified the nginx configuration file in the container and added the parameter ipv6=off.

[root@84229c61f571 sbin]# vi /usr/local/openresty/nginx/conf/nginx.conf

Insert picture description hereAfter restarting the gateway container and checking again, it was found to be normal 200.

[root@localhost api-gateway]# curl -I -X GET -m 10 -o /dev/null -s -w %{http_code} http://gw.opendevops.cn:8888/api/accounts/are_you_ok/

Insert picture description here
Then visit the browser -> enter the account password -> successful login
Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/xiguashixiaoyu/article/details/106665793