BrokenPipeError and python subprocess.run() timeout parameters are invalid on Windows

1. Discovery of the problem

Today, a python script that runs well on windows reports an error under linux, prompting the error BrokenPipeError: [Errno 32] Broken pipe. After investigation, it is caused by the inconsistency between the performance of the timeout parameter of the subprocess.run method on linux and windows.

try:
    ret = subprocess.run(cmd, shell=True, check=True, timeout=5,
          stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
except Exception as e:
    logging.debug(f"Runner FAIL")

2. Problem description

In order to describe this problem, the following example was made. subprocess.runA program that takes 10s to execute is called, but a timeout of 1s is set. In theory this code should exit with a timeout after 1s, but it doesn't.

import subprocess
import time

t = time.perf_counter()
args = 'python -c "import time; time.sleep(10)"'
try: 
    p = subprocess.run(args, shell=True, check=True,timeout=1,stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
except Exception as e: 
    print(f"except is {e}")
print(f'coast:{time.perf_counter() - t:.8f}s')

Tested on windows:

PS C:\Users\peng\Desktop> Get-ComputerInfo | select WindowsProductName, WindowsVersion, OsHardwareAbstractionLayer

WindowsProductName WindowsVersion OsHardwareAbstractionLayer
------------------ -------------- --------------------------
Windows 10 Pro     2009           10.0.19041.2251
PS C:\Users\peng\Desktop> python  .\test_subprocess.py
except is Command 'python -c "import time; time.sleep(10)"' timed out after 1 seconds
coast:10.03642740s
PS C:\Users\peng\Desktop>

Tested on linux:

21:51:31 wp@PowerEdge:~/bak$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
21:56:45 wp@PowerEdge:~/bak$ python test_subprocess.py
except is Command 'python -c "import time; time.sleep(10)"' timed out after 1 seconds
coast:1.00303393s
21:57:02 wp@PowerEdge:~/bak$

It can be seen that the timeout parameter of subprocess.run does not take effect under windows. subprocess.run executes the specified command, and returns an instance of the CompletedProcess class containing the execution result after waiting for the execution of the command to complete. The prototype of this function is:

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, capture_output=False, 
  shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None, text=None, env=None, universal_newlines=None)

args: Indicates the command to be executed. Must be a string, a list of string arguments.

stdin, stdout, and stderr: Standard input, output, and error for the child process. Its value can be subprocess.PIPE, subprocess.DEVNULL, an existing file descriptor, an already open file object, or None. subprocess.PIPE means to create a new pipe for the subprocess. subprocess.DEVNULL means use os.devnull. The default is None, which means do nothing. In addition, stderr can be merged into stdout and output together.

timeout: Set the command timeout. If the command execution time expires, the child process will be killed and a TimeoutExpired exception will pop up.

check: If this parameter is set to True, and the process exit status code is not 0, a CalledProcessError exception will pop up.

encoding: If this parameter is specified, stdin, stdout, and stderr can receive string data and encode it in this encoding. Otherwise, only bytes type data is received.

shell: If this parameter is True, the specified command will be executed through the shell of the operating system.

3. Problem Analysis

subprocess.run Will wait for the process to terminate and handle TimeoutExpiredthe exception. On POSIXabove, the exception object contains the stdoutsum stderrbytes of the read section. The main problem that the above test fails on windows is that the shell mode is used, the pipe is started, and the pipe handle may be inherited by one or more descendant processes (such as through shell=True), so when the timeout occurs, even if the shell program is closed, Other programs started by the shell, in this case the python program, are still running, so subprocess.runthe exit is prevented until all processes using the pipe exit. If changed shell=False, the result of 1s also appears on windows:

python  .\test_subprocess.py
except is Command 'python -c "import time; time.sleep(10)"' timed out after 1 seconds
coast:1.00460970s

It can be said that this is a defect in the implementation of windows, which can be seen specifically:
[subprocess] run() sometimes ignores timeout in Windows · Issue #87512 · python/cpython · GitHub
[subprocess] run() sometimes ignores timeout in Windows #87512

subprocess.run() handles TimeoutExpired by terminating the process and waiting on it. On POSIX, the exception object contains the partially read stdout and stderr bytes. For example:

cmd = 'echo spam; echo eggs >&2; sleep 2'
try: p = subprocess.run(cmd, shell=True, capture_output=True,
                        text=True, timeout=1)
except subprocess.TimeoutExpired as e: ex = e
 
>>> ex.stdout, ex.stderr
(b'spam\n', b'eggs\n')

On Windows, subprocess.run() has to finish reading output with a second communicate() call, after which it manually sets the exception's stdout and stderr attributes.
The poses the problem that the second communicate() call may block indefinitely, even though the child process has terminated.
The primary issue is that the pipe handles may be inherited by one or more descendant processes (e.g. via shell=True), which are all regarded as potential writers that keep the pipe from closing. Reading from an open pipe that's empty will block until data becomes available. This is generally desirable for efficiency, compared to polling in a loop. But in this case, the downside is that run() in Windows will effectively ignore the given timeout.
Another problem is that _communicate() writes the input to stdin on the calling thread with a single write() call. If the input exceeds the pipe capacity (4 KiB by default -- but a pipesize 'suggested' size could be supported), the write will block until the child process reads the excess data. This could block indefinitely, which will effectively ignore a given timeout. The POSIX implementation, in contrast, correctly handles a timeout in this case.
Also, Popen.exit() closes the stdout, stderr, and stdin files without regard to the _communicate() worker threads. This may seem innocuous, but if a worker thread is blocked on synchronous I/O with one of these files, WinAPI CloseHandle() will also block if it's closing the last handle for the file in the current process. (In this case, the kernel I/O manager has a close procedure that waits to acquire the file for the current thread before performing various housekeeping operations, primarily in the filesystem, such as clearing byte-range locks set by the current process.) A blocked close() is easy to demonstrate. For example:

args = 'python -c "import time; time.sleep(99)"'
p = subprocess.Popen(args, shell=True, stdout=subprocess.PIPE)
try: p.communicate(timeout=1)
except: pass

p.kill() # terminates the shell process -- not python.exe
with p: pass # stdout.close() blocks until python.exe exits

The Windows implementation of Popen._communicate() could be redesigned as follows:

read in chunks, with a size from 1 byte up to the maximum available,
as determined by _winapi.PeekNamedPipe()

write to the child's stdin on a separate thread

after communicate() has started, ensure that synchronous I/O in worker
threads has been canceled via CancelSynchronousIo() before closing
the pipes.
The _winapi module would need to wrap OpenThread() and CancelSynchronousIo(), plus define the TERMINATE_THREAD (0x0001) access right.

With the proposed changes, subprocess.run() would no longer special case TimeoutExpired on Windows.