LINUX project practical operation

Exercise 1:

1, extraction https://testing-studio.com/ all connections

curl https://testing-studio.com | grep -oE "http[s]://[^ '\"]*"
Regular explained: -o matched matching content using extended -E [s] represents a range indicated in brackets match http or https [^ '\ "] represents the end of a space or double single quotes (up to three matches the character that is ending)

2, the digital link is removed, all given a number without a connection

curl https://testing-studio.com | grep -oE "http[s]://[^ '\"]*“ | sed -e's/[0-9]//g' -e's/%.*%.*//g'
‘s/[0-9]//g’  表示把0-9数字用空替换

Exercise 2:

1, to find the error data from all 404,500 servers /tmp/nginx.log on the shell, and wherein the removed URL
① identify all error data 404 500

awk '$9~/404/ || $9~/500/' /tmp/nginx.log |less
匹配log文件中第九列为404 或500的行
less 工具也是对文件或其它输出进行分页显示的工具,应该说是linux正统查看文件内容的工具,功能极其强大。less 的用法比起 more 更加的有弹性。在 more 的时候,我们并没有办法向前面翻,
只能往后面看,但若使用了 less 时,就可以使用 [pageup] [pagedown] 等按键的功能来往前往后翻看文件,更容易用来查看一个文件的内容!除此之外,在 less 里头可以拥有更多的搜索功能,
不止可以向下搜,也可以向上搜。
更多:https://www.cnblogs.com/peida/archive/2012/11/05/2754477.html

② remove the url which

awk '$9~/404/ || $9~/500/' /tmp/nginx.log | sed 's/"GET.*HTTP\/[0-9]\.[0-9]"//g'

Exercise 3:

1, to find the most visited ip

awk '{print $1}' /tmp/nginx.log |sort |uniq -c | sort -n |tail -10
筛选出log日志中的第一列$1 ip
sort 它将文件进行排序,并将排序结果标准输出
sort (option) (parameters)
-n: Sorting in accordance with the size of the value;
-r: sorted in reverse order;
-d: when sorting, handling letters, numbers and spaces outside the character, ignoring other characters; 
uniq command for reporting or ignoring repeated lines in a file, usually with the sort used in conjunction command.
the uniq (Option) (parameter) 

-C or --count: The row number displayed next is repeated for each column;
-d or --repeated: display only ranks recurring;

Exercise 4:

1, to find the most visited page, / topics / 1234 topics / 4567 as a page

awk '{print $7}' /tmp/nginx.log |sort | sed 's/\/[0-9].*//g' |uniq -c |sort -n | tail -1

Exercise 5:

提取 https://testing-studio.com/ 中的所有链接
找出不能被访问的连接(已完成)
封装为函数,传入一个网站,自动检查这个网站上的链接(已完成)

# A design function, passing a string or file return code to determine which code based on the link to find out where the error Link
# / bin / bash!
#Author: Frank
findFailedUrl () {
urls = $ (curl $ 1 | grep - -E O "HTTP [S] *: // [? ^ '\"] * "| uniq -c);
for $ url in urls; do
code = $ (curl the -I -m 10 -o / dev / null HTTP_CODE% -w} {-s $ URL);
IF (($ {code}> = 400)); the then echo "$ {} URL access failure! "; Fi
DONE
}
findFailedUrl https://testing-studio.com
# $ (10 -o curl -I -m / dev / null -s -w {%} $ HTTP_CODE URL)
# -I tested only HTTP headers
# - m 10, at most 10s
# -o / dev / null output original mask information
# -s silent mode, does not output anything
# -w% {http_code} additional output control



Guess you like

Origin www.cnblogs.com/1026164853qqcom/p/11120310.html