tail 和 head
- see the first several lines of the file README.txt
head README.txt # see first 10 lines of the README.txt
head -n 50 README.txt # see first 50 lines of the README.txt
- see last several lines of the file README.txt
tail README.txt # see the last 10 lines of file README.txt
tail -30 README.txt | head -10 # first ten lines of the last 30 lines
Word count (wc command)
- get number of lines in the file
wc -l README.txt # calculate how many lines in the file README.txt
split
- The file is divided into several other large subfile
split -l 1000 README.txt # split file into sub_files that have 1000 lines each
shuffle
- shuffle data sets and then split, machine learning for data collection time should be useful!
shuf MQ2007 | split -l 1000 # shuffle and then split
cat
- The file contents to the command line interface (can take subsequent content)
cat file | grep 'carneige mellon university' # show all lines with 'cmu' to command line
uniq
- Linux uniq command line to check for and delete the text file recurring, generally used in combination with the sort command - Excerpt from 'rookie tutorial'
sort testfile1 | uniq -c # 先对文件排序,在去重,在标准输出显示每一行重复的次数