Remember once Scrapy Debug process process stuck

problem found

Routine inspections data storage, the storage time of the latest data found parked in the morning. Immediately login to the remote server, try to locate the problem.

  1. Whether regular tasks to work, if there is an error message

    crontab -l

    The inspection found that regular tasks to work, there is no running error records.

  2. View system processes, acquisition program is running

    ps -ef | grep xxxappspider

    The following output

    At 01:40 in the morning you can see the process successfully started, but has not been executed, presumably code appears deadlock issues? To view the log does not record useful information.

  3. Check the code and try to reproduce the bug

    In the server manually execute the program can run normally. Simple review of the code, also did not find where it would lead to a deadlock.

Solve the problem

Because there are more urgent task at hand, just to add a more detailed program of logging, and then kill off the jammed process, so that a scheduled task running again.

The next problem occurs again, the same regular tasks morning appeared in the case stuck. First, rule out the cause server, the same server other tasks are operating normally. Second reason to exclude storage, our collection is the result of a unified queue into Kafka, after a series of operations stored in the database. The Kafka queue all applications are in use, if a problem occurs not only this one task. Then you can roughly determine, it is this task in the morning run, because for some reason stuck.

Well, it's time to reposition itself in our big kill: Py-Spy

This is a performance analysis tool for Python, I was listening to "snake say" when learned that the library, make use of it right now with.

Briefly look at how to use:

[test@localhost ~]# py-spy --help
py-spy 0.1.11
A sampling profiler for Python programs

USAGE:
    py-spy [FLAGS] [OPTIONS] --pid <pid> [python_program]...

FLAGS:
        --dump           Dump the current stack traces to stdout
    -F, --function       Aggregate samples by function name instead of by line number
    -h, --help           Prints help information
        --nonblocking    Don't pause the python process when collecting samples. Setting this option will reduce the
                         perfomance impact of sampling, but may lead to inaccurate results
    -V, --version        Prints version information

OPTIONS:
    -d, --duration <duration>    The number of seconds to sample for when generating a flame graph [default: 2]
    -f, --flame <flamefile>      Generate a flame graph and write to a file
    -p, --pid <pid>              PID of a running python program to spy on
    -r, --rate <rate>            The number of samples to collect per second [default: 100]

ARGS:
    <python_program>...    commandline of a python program to run

Only you need to enter the Python process pid will be able to visually display the time-consuming process of all of the mandates. More importantly, it does not need to reset code can run, very appropriate for the situation we are now experiencing.

Installation is very simple:

pip install py-spy

Very simple to use:

# 先找到这个卡住的Python进程的pid
ps -ef |grep python |grep ***
# 启动 py-spy 观察这进程
py-spy --pid 32179

Output information is as follows:

可以看到,程序是卡在了建立网络连接的部分。hand_request是一个为某个App请求签名的函数,被单独放在了utils这个目录下。接下来就简单了,找到这个函数,在第43行,发现了一个 post 请求。嗯,其实不管是 post 还是 get 都不要紧,重要的是这个请求没有加 timeout 参数!!!

Requests 文档里写的很清楚了,如果没有超时参数,程序有可能永远失去响应。

超时

你可以告诉 requests 在经过以 timeout 参数设定的秒数时间之后停止等待响应。基本上所有的生产代码都应该使用这一参数。如果不使用,你的程序可能会永远失去响应:

>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

注意

timeout 仅对连接过程有效,与响应体的下载无关。 timeout 并不是整个下载响应的时间限制,而是如果服务器在 timeout 秒内没有应答,将会引发一个异常(更精确地说,是在timeout 秒内没有从基础套接字上接收到任何字节的数据时)If no timeout is specified explicitly, requests do not time out.

至此,Debug完成。

总结

这么低级的 bug,确实是我自己写的。

当初写的时候忽视了这个问题,测试的时候没有发现问题也就过去了。第一次发现问题的时候,查问题并不仔细,只简单看了spiders目录下的几个爬虫代码,没有去检查utils目录下的工具类的代码,故而并没有找到具体问题。第二次通过 py-spy 的帮助,成功找到并解决了问题。

解决问题后,反思下原因:很可能是这个 App 会在凌晨进行维护,导致请求没有得到响应,同时没有设置超时函数,程序就会一直卡在哪里。

最后,推荐一下《捕蛇者说》,这是一个关于“编程、程序员、Python”的中文博客。没事听听大佬们唠嗑,真的很涨知识。

参考链接

  1. py-spy 的官方地址
  2. Ep 02. 开发中的碎碎念 ——《捕蛇者说》
  3. 超时 ——Requests 官方文档

Guess you like

Origin www.cnblogs.com/ljz-2014/p/11270640.html