question:
Sometimes, after we take a file that has been modified on Windows and open it with vim on Linux, extra characters "^M" will appear at the end of each line. What is going on?
1.CR/LF introduction
CR is the abbreviation of Carriage-Return, that is, carriage return;
LF is the abbreviation of Line-Feed, which means line feed.
CR and LF are holdovers from the days when computer terminals were teleprinters. Teletypewriters work just like regular typewriters.
At the end of each line, the CR command moves the print head back to the left. The LF command advances the paper one line.
Although the days of rolling paper terminals are over, the CR and LF commands still exist and are still used as delimiters by many applications and network protocols.
Linux (unix) and mac use "\n" as the newline character by default;
Windows uses "\r\n" as the newline character by default;
2.Unix (Linux) newline character
The newline character under Linux is "\n".
"\n" Corresponds to LF in the ACSII table, and the ACSII value is 10, which is 0x0a (hexadecimal)
3.Newline character under windows
The newline character under Windows is "\r\n".
"\r" corresponds to "CR" in the ACSII table, and the ACSII value is 13, which is 0x0d (hexadecimal).
"\r" is interpreted as "^M" in vim.
4. Unix/windows format newline conversion
4.1 You can use the following tools for conversion on Linux
- dos2unix: Convert windows-style newlines to unix-style newlines
- unix2dos: Convert unix style newlines to windows style newlines
4.2 Conversion of CRLF and LF on Windows
4.2.1 Using dos2unix/unix2dos conversion
Download the windows version of dos2unix/unix2dos,
dos2unix - Browse /dos2unix/7.5.1 at SourceForge.net
For usage, please refer to the dos2unix tool.
dos2unix-7.5.1-win64-nls/share/doc/dos2unix-7.5.1/dos2unix.htm
example and RECURSIVE CONVERSION chapters
4.2.2 Commonly used code editors on Windows generally support the conversion of CRLF and LF.
For example, VsCode, you can choose LF or CRLF in the lower right corner;
The operation of other editors is similar.
If you need the default settings, modify them in the settings.
5. Some configurations about line breaks in git
5.1 core.autocrlf
The core.autocrlf option has three optional values:
- true: Change to LF when submitting, and change to CRLF when checking out
- false (default value): It will be what it is when submitting, no newline characters will be changed, and it will not be changed when checking out.
- input: Change to LF when submitting, do not change when checking out
5.2 core.eol
The core.eol option is used to specify the line ending style of the file.
- lf : Use LF as the line ending style.
- crlf: Use CRLF as line ending style.
- native (default): Use the operating system's default line ending style.
5.3 core.safecrlf
The core.safecrlf option is used to prevent mixed newline errors. It has three optional values:
- false: Turns off checking, allowing errors with mixed newlines.
- warn (default): Turn on checking and print a warning message when an error with mixed newlines is found.
- true: Enable checking, print an error message and reject submission when an error with mixed newlines is found.
5.4 git configuration suggestions
Some commands to view git configuration
# 查看 git config 配置
git config -l
# 查看 git config 配置具体位置
git config --list --show-origin
# 全局配置
git config --global core.autocrlf true
Development environment: windows
Code compilation/running environment: windows
Recommended configuration: core.autocrlf = true
Development environment: windows
Code compilation/running environment: Linux/Mac
Recommended configuration: core.autocrlf = input
Development environment: Linux/Mac
Code compilation/running environment: Linux/Mac
Recommended configuration: core.autocrlf = false (keep the default configuration)
Development environment: Linux/Mac
Code compilation/running environment: Windows
Recommended configuration: core.autocrlf = true
The personal configuration is to keep the default configuration.
99% probability of submitting code on Linux and running on Linux;
There is a very small probability that it is possible to submit a bat script on Linux;
So keep the default configuration.
For bat scripts submitted in the Linux environment, manually convert them into CRLF format.
Appendix 1. ASCII code table
Appendix 2. Introduction to how to use dos2unix
EXAMPLES
Read input from 'stdin' and write output to 'stdout':
dos2unix < a.txt
cat a.txt | dos2unix
Convert and replace a.txt. Convert and replace b.txt:
dos2unix a.txt b.txt
dos2unix -o a.txt b.txt
Convert and replace a.txt in ascii conversion mode:
dos2unix a.txt
Convert and replace a.txt in ascii conversion mode, convert and replace
b.txt in 7bit conversion mode:
dos2unix a.txt -c 7bit b.txt
dos2unix -c ascii a.txt -c 7bit b.txt
dos2unix -ascii a.txt -7 b.txt
Convert a.txt from Mac to Unix format:
dos2unix -c mac a.txt
mac2unix a.txt
Convert a.txt from Unix to Mac format:
unix2dos -c mac a.txt
unix2mac a.txt
Convert and replace a.txt while keeping original date stamp:
dos2unix -k a.txt
dos2unix -k -o a.txt
Convert a.txt and write to e.txt:
dos2unix -n a.txt e.txt
Convert a.txt and write to e.txt, keep date stamp of e.txt same as
a.txt:
dos2unix -k -n a.txt e.txt
Convert and replace a.txt, convert b.txt and write to e.txt:
dos2unix a.txt -n b.txt e.txt
dos2unix -o a.txt -n b.txt e.txt
Convert c.txt and write to e.txt, convert and replace a.txt, convert and
replace b.txt, convert d.txt and write to f.txt:
dos2unix -n c.txt e.txt -o a.txt b.txt -n d.txt f.txt
RECURSIVE CONVERSION
In a Unix shell the find(1) and xargs(1) commands can be used to run
dos2unix recursively over all text files in a directory tree. For
instance to convert all .txt files in the directory tree under the
current directory type:
find . -name '*.txt' -print0 |xargs -0 dos2unix
The find(1) option "-print0" and corresponding xargs(1) option -0 are
needed when there are files with spaces or quotes in the name. Otherwise
these options can be omitted. Another option is to use find(1) with the
"-exec" option:
find . -name '*.txt' -exec dos2unix {} \;
In a Windows Command Prompt the following command can be used:
for /R %G in (*.txt) do dos2unix "%G"
PowerShell users can use the following command in Windows PowerShell:
get-childitem -path . -filter '*.txt' -recurse | foreach-object {dos2unix $_.Fullname}
References:
CRLF_Baidu Encyclopedia
[git series 4/4] How to set core.autocrlf | core.safecrlf (the meaning of configuration values and best practices)
[git series 4/4] How to set core.autocrlf | core.safecrlf (the meaning and best practices of configuration values) - CSDN Blog
Git automatic newline character (autocrlf) input converts newline characters from LF to CRLF
Does Git automatic line break (autocrlf) input convert line breaks from LF to CRLF | Geek Notes
Problems and solutions for ^M in Shell scripts
Problems and solutions of ^M in Shell script-CSDN Blog