As follows, we encounter a file above 2G
Text editors shake each other:
At this point, my heart is broken, but let's take a look at how many lines there are in the file.
You can use the following command to count the number of lines in the file: wc -l file name
wc -l lesson_20201205.log
More than 12 million rows of data.
Then use head -n file name> new file
$ head -1000000 lesson_20201205.log > lesson_20201205_100.log
Then get a 163M 1 million rows of data
Next, we took out the user ID from the log and found many duplicates.
At this time, we definitely can't copy these IDs to Excel, and then choose to de-duplicate.
We must use the programmer's method to solve it.
We use cat file name | sort |uniq> the file name after deduplication
$ cat lesson_id_100.log | sort |uniq >lesson_id_100_uniq.log
Then we get the files saved in ascending order after deduplication.
That's it!