Log Analysis of Large Distributed Websites

Now most of the online application servers of companies use linux or Unix operating systems, so some commonly used shells are clear, which will make us more powerful. Some basic entry-level commands such as: ls, cp, mv, rm , mkdir, touch, etc.

1. View the content of the file

 The cat command is a convenient tool for displaying the content of text files. If a log file is relatively small, you can directly use the cat command to print its content for viewing.

 cat access.log

 124.119.223.36 455 GET www.xxx.com/list.html www.xx.com 404 432004

 124.119.223.36 230 GET www.xxx.com/list.html www.xx.com 500 432004

 ...

 cat -n access.log parameter -n, you can display the line number

 1  124.119.223.36 455 GET www.xxx.com/list.html www.xx.com 404 432004

 2  124.119.223.36 230 GET www.xxx.com/list.html www.xx.com 500 432004

 ...

 

2. The disadvantage of cat is that once it is executed, it can no longer interact and control, and the more command can display the contents of the file in pages. Press the enter key to display the next line of the file, press the space bar to display the next page, and press the F key to display the next page. The content of a 屛, press the B key to display the content of the previous 屛

 more access.log

  124.119.223.36 455 GET www.xxx.com/list.html www.xx.com 404 432004

  124.119.223.36 230 GET www.xxx.com/list.html www.xx.com 500 432004

   ...

   More

   Another command less provides richer functions than more, supporting content search and highlighting

   less access.log

 

  3. Show end of file

  Use the tail command to view the last few lines of the file, which is very effective for log files. In many cases, log files are written additionally, and the newly written content is at the end of the file.

  tail -n1 access.log

  The number following the -n parameter indicates the last few lines of the display file, here is 2, which means the last 2 lines of the file are displayed, and the -f parameter is specified so that the tail program does not exit and continues to display the newly added lines of the file

  tail -n2 -f access.log

 

4. Display the file header

   Similar to the tail command, head commands the user to display a set of lines at the beginning of a file

   head -n2 access.log

  The -n parameter is used to specify a few lines at the beginning of the display file, here is 2, which means to display the first 2 lines of the access.log

 

5. Content Sorting

 A file contains many lines, and it is often necessary to sort a column in these lines. The function of the sort command is to sort the data.

 cat sortfile

5

90

2

5

7

9

12

343

432

 sort -n sortfile

2

5

5

7

9

12

...

Viewing the numbers in the sortfile file through the cat command is unordered, and viewing through the sort -n command, the numbers are sorted from small to large

 6. Character Statistics

The wc command can be used to count the number of characters, words, and lines in the specified file and output the statistical results

 wc  -l access.log

 11001 access.log

 Use the -l parameter to count the number of lines in the file, the above shows that the log file has 11001 lines

 wc -c access.log // The -c parameter can display the number of bytes of the file,

 781633 access.log

7. Check for repeated lines

 The uniq command can be used to display the number of times a line is repeated in a file, or to display lines that occur only once, and only to display repeated lines, and

The deduplication of uniq is only for two consecutive lines, so it is often used in combination with sort

 cat uniqfile

 aaa

 bbb

 ccc

The above content is not sorted. After sorting by sort, use uniq to deduplicate statistics.

  sort uniqfile | uniq -c

 3 aaa

 4 BBB

 1 ccc

The -c parameter is used to add the number of occurrences of the new line at the beginning of each line

Show lines that appear only once:

 sort uniqfile | uniq -c -u

 1 ccc

 1 eee

Add the parameter -u to display only the lines that appear once.

 8. String search is also the most commonly used command to query logs +++

Use the grep command to find qualified strings in the file. If the content of the file is found to match the specified search characters, the line will be printed out

  grep qq access.log

 124.119.20.23 GET www.xxx.com/list.html www.qq.com 404 432004

 124.119.29.30 GET www.xxx.com/list.html www.qq.com 500 432004

qq is the specified search character, and access.log is the file name

Using the -c parameter, you can display the number of lines found

 grep -c qq access.log

 2262

9. File lookup

 It is often necessary to modify a file, but only know the file name but not the file path or need to find the path of a file, then you need to use the file search command find

 $ find /home/longlong -name access.log

 /home/longlong/temp/access.log

Search for a file named access.log in the /home/longlong path, and the file path found is /home/longlong/temp/access.log

Find files ending with txt suffix

 $ find /home/longlong -name "*.txt"

/home/longlong/active-cpp/test/test1.txt

/home/longlong/active-cpp/test/test2.txt

You can also use the find command to recursively print all files in the current directory

 $ find . -print

10 URL Access Tools

 To access web documents through the HTTP protocol under the command line, you need to use curl. It supports HTTP, HTTPS, FTP, FTPS, Telnet and other protocols, and is often used to crawl web pages and monitor the status of the web server under the command line.

Make a web request:

 $ curl www.baidu.com

<html><head><meta http-equiv="content-type" content="text/html;charset=utf-8">

11. View request traffic

 Such as the top 10 IP addresses by traffic

 $ cat access.log | cut -f1 -d " " | sort | uniq -c | sort -k 1 -n -r | head -10

 1455 174.119.232.29

 1437 124.119.22.59

Top 10 urls with page traffic

 $ cat access.log | cut -f4 -d " " | sort | uniq -c | sort -k 1 -n -r | head -10

 2280 www.xxx.com/list.html

 2236 www.xxx.com/info.html

Use pipes to connect the commands, extract the columns specified in Enjian up to the access date, sort and remove duplicates, and then reversely sort according to the number of occurrences, and take the first 10 records.

12. View the most time-consuming pages

 For us developers, the response time of the page is very worthy of attention, and we often need to find out the slow response pages for optimization:

 $ cat access.log | sort | -k 2 -n -r | head -10 

 174.119.232.29 740 POST www.xxx.com.userinfo.html www.taobao.com 404 2789

 174.119.232.29 740 POST www.xxx.com.userinfo.html www.taobao.com 301 49397

 174.119.232.29 740 POST www.xxx.com.userinfo.html www.taobao.com 200 432004

 174.119.232.29 740 POST www.xxx.com.userinfo.html www.sina.com 500 48243

The response time of the second line of the access.log file is used to sort the reverse order of the second line, and then use the head command to retrieve the top 10 pages

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326865950&siteId=291194637