python tips:文件读取——换行符的问题

问题:在windows系统中,换行的符号是'\r\n'。python在读文件的时候为了系统兼容,会默认把'\r','n','\r\n'都视作换行。但是在windows文件中,可能在同一行中同时存在'\n','\r\n','\r'。这个时候python的默认行为会将一行拆分成多行输出,影响预期结果。

此时需要设置open函数的newline参数,修改python对换行的默认行为。

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

newline有五种取值:None,'','\n','\r','\r\n'。

在输入过程(从文件到程序),newline用于定义换行的符号:

1.如果newline为None,碰到'\r','\n','\r\n'都算行尾,而且这些符号都会被转换成'\n'。

2.如果newline为'',也是碰到'\r','\n','\r\n'都算行尾,但是这些符号不会发生转换。

3.如果newline为'\r','\n','\r\n',等于是显示指定了换行符,而且行中的符号不会发生转换。

在输出过程(从程序到文件),newline用于指定'\n'的转换符号:

1.如果newline为None,所有的'\n'都被转换成系统换行符。

2.如果newline为'','\n',不会发生转换。

3.如果newline为'\r','\r\n',所有的'\n'会被转换成'\r'或者'\r\n'。

实例一:输出不指定newline,所有的'\n'都被替换成'\r\n',即使是'\r\n'中的'\n'也不例外。

def file_seperator_test1():
    # output
    with open("medical.txt", "w") as f:
        f.write("I am a\r good\n boy.\r\n")
    #input
    with open("medical.txt", "r", newline="\r\n") as f:
        print(list(f))

if __name__ == "__main__":
    file_seperator_test1()

输出结果:

['I am a\r good\r\n', ' boy.\r\r\n']

实例二: 输出指定newline为''或'\n',不会转换

def file_seperator_test2():
    # output
    with open("medical.txt", "w", newline="") as f:
        f.write("I am a\r good\n boy.\r\n")
    with open("medical2.txt", "w", newline="\n") as f:
        f.write("I am a\r good\n boy.\r\n")

    #input
    with open("medical.txt", "r", newline="\r\n") as f:
        print(list(f))
    with open("medical2.txt", "r", newline="\r\n") as f:
        print(list(f))

if __name__ == "__main__":
    file_seperator_test2()

输出结果:

['I am a\r good\n boy.\r\n']
['I am a\r good\n boy.\r\n']

实例三:输出指定newline为'\r'或'\r\n',所有的'\n'都被替换了,当所有'\n'都被替换成'\r'时,在windows中,换行符就不见了,所有的行变成了一行

def file_seperator_test3():
    # output
    with open("medical.txt", "w", newline="\r") as f:
        f.write("I am a\r good\n boy.\r\n where should\r\n I change the line ?\r\n")
        f.write("I can't stop\r\n")
    with open("medical2.txt", "w", newline="\r\n") as f:
        f.write("I am a\r good\n boy.\r\n")

    #input
    with open("medical.txt", "r", newline="\r\n") as f:
        print(list(f))
    with open("medical2.txt", "r", newline="\r\n") as f:
        print(list(f))


if __name__ == "__main__":
    file_seperator_test3() 

输出结果:

["I am a\r good\r boy.\r\r where should\r\r I change the line ?\r\rI can't stop\r\r"]
['I am a\r good\r\n', ' boy.\r\r\n']

实例四:输入不指定newline,默认把所有的三种符号都当做换行符,而且全都转换成'\n'

def file_seperator_test4():
    # output
    with open("medical.txt", "w", newline="") as f:
        f.write("I am a\r good\n boy.\r\n")
    #input
    with open("medical.txt", "r") as f:
        print(list(f))


if __name__ == "__main__":
    file_seperator_test4() 

输出结果:

['I am a\n', ' good\n', ' boy.\n']

实例五:输入指定newline为'',仍然把三种符号都当做换行符,但是不转换

def file_seperator_test5():
    # output
    with open("medical.txt", "w", newline="") as f:
        f.write("I am a\r good\n boy.\r\n")
    #input
    with open("medical.txt", "r", newline="") as f:
        print(list(f))

if __name__ == "__main__":
    file_seperator_test5()

输出结果:

['I am a\r', ' good\n', ' boy.\r\n']

实例六:输入指定newline为'\r','\n','\r\n',显式指定了换行符,只有碰到这几个符号才会换行

def file_seperator_test6():
    # output
    with open("medical.txt", "w", newline="") as f:
        f.write("I am a\r good\n boy.\r\n where should\r\n I change the line ?\r\n")
        f.write("I can't stop\r\n")
    with open("medical2.txt", "w", newline="") as f:
        f.write("I am a\r good\n boy.\r\n where should\r\n I change the line ?\r\n")
        f.write("I can't stop\r\n")
    with open("medical3.txt", "w", newline="") as f:
        f.write("I am a\r good\n boy.\r\n where should\r\n I change the line ?\r\n")
        f.write("I can't stop\r\n")

    #input
    with open("medical.txt", "r", newline="\r") as f:
        print(list(f))
    with open("medical2.txt", "r", newline="\n") as f:
        print(list(f))
    with open("medical3.txt", "r", newline="\r\n") as f:
        print(list(f))

if __name__ == "__main__":
    file_seperator_test6()

输出结果:

['I am a\r', ' good\n boy.\r', '\n where should\r', '\n I change the line ?\r', "\nI can't stop\r", '\n']
['I am a\r good\n', ' boy.\r\n', ' where should\r\n', ' I change the line ?\r\n', "I can't stop\r\n"]
['I am a\r good\n boy.\r\n', ' where should\r\n', ' I change the line ?\r\n', "I can't stop\r\n"]

结论:

1.如果要写入带'\n'的行,可以把newline设定为''或者'\n',避免python更改'\n'

2.如果要读入带'\n'的行,可以把newline设定为'\r\n',指定换行符只能是'\r\n'。

猜你喜欢

转载自www.cnblogs.com/luoheng23/p/9492732.html