Linux command wget

The wget in the Linux system is a tool for downloading files, which is used under the command line. It is an essential tool for Linux users, we often have to download some software or restore backups from remote servers to local servers. wget supports HTTP, HTTPS and FTP protocols, and can use HTTP proxy. The so-called automatic download means that wget can be executed in the background after the user exits the system. This means that you can log into the system, start a wget download task, and then log out of the system, and wget will execute in the background until the task is completed. Big trouble.

wget can follow the links on an HTML page and download them sequentially to create a local version of the remote server, completely rebuilding the directory structure of the original site. This is often referred to as "recursive downloading". When recursively downloading, wget follows the Robot Exclusion standard (/robots.txt). While downloading, wget can convert the link to point to a local file to facilitate offline browsing.

wget is very stable, and it has strong adaptability in the case of narrow bandwidth and unstable network. If the download fails due to network reasons, wget will continue to try until the entire file is downloaded. If the server interrupts the download process, it will connect to the server again and continue the download from where it left off. This is useful for downloading large files from servers with limited connection times.

1. Command format:

wget [parameters] [URL address]

2. Command function:

It is used to download resources from the network. If no directory is specified, the downloaded resource defaults to the current directory. Although wget is powerful, it is relatively simple to use:

1) Support breakpoint download function; this is also the biggest selling point of Internet Ant and FlashGet in those years. Now, Wget can also use this function, and those users whose network is not very good can rest assured;

2) Both FTP and HTTP download methods are supported; although most software can now be downloaded using HTTP, sometimes, it is still necessary to use FTP to download software;

3) Proxy server is supported; for systems with high security intensity, they generally do not directly expose their systems to the Internet, so supporting proxy is a must-have function for downloading software;

4) The setting is convenient and simple; perhaps, users who are accustomed to the graphical interface are not too accustomed to the command line, but the command line actually has more advantages in setting, at least, the mouse can be clicked many times less, and don’t worry about whether it is wrong. mouse;

5) Small program, completely free; small program can be ignored, because the hard disk is too big now; completely free has to be considered, even though there are many so-called free software on the Internet, but the advertisements of these software are not ours like.

3. Command parameters:

Start parameters:

-V, --version display wget version and exit

-h, --help print syntax help

-b, --background switch to background execution after startup

-e, --execute=COMMAND Execute commands in `.wgetrc' format, see /etc/wgetrc or ~/.wgetrc for wgetrc format

Record and input file parameters:

-o, --output-file=FILE write records to FILE file

-a, --append-output=FILE append records to FILE file

-d, --debug print debug output

-q, --quiet quiet mode (no output)

-v, --verbose verbose mode (this is the default)

-nv, --non-verbose turn off verbose mode, but not quiet mode

-i, --input-file=FILE download URLs appearing in FILE file

-F, --force-html treat input files as HTML format files

-B, --base=URL Prefix relative links appearing in the file specified by the -F -i parameters with the URL

--sslcertfile=FILE optional client certificate

--sslcertkey=KEYFILE optional client certificate KEYFILE

--egd-file=FILE specifies the file name of the EGD socket

Download parameters:

--bind-address=ADDRESS specifies the local use address (hostname or IP, used when there are multiple local IPs or names)

-t, --tries=NUMBER Set the maximum number of connection attempts (0 means unlimited).

-O --output-document=FILE write the document to the FILE file

-nc, --no-clobber don't overwrite existing files or use .# prefix

-c, --continue Continue to download the unfinished files

--progress=TYPE set the progress bar label

-N, --timestamping don't re-download files unless they are newer than local files

-S, --server-response print server response

--spider don't download anything

-T, --timeout=SECONDS set response timeout in seconds

-w, --wait=SECONDS wait SECONDS seconds between attempts

--waitretry=SECONDS wait 1...SECONDS seconds between reconnections

--random-wait wait 0...2*WAIT seconds between downloads

-Y, --proxy=on/off Turn proxy on or off

-Q, --quota=NUMBER set download capacity limit

--limit-rate=RATE limit download rate

Directory parameters:

-nd --no-directories Do not create directories

-x, --force-directories force create directories

-nH, --no-host-directories do not create host directories

-P, --directory-prefix=PREFIX save files to directory PREFIX/…

--cut-dirs=NUMBER ignore NUMBER layer remote directory

HTTP option parameters:

--http-user=USER Set the HTTP user name to USER.

--http-passwd=PASS set http password to PASS

-C, --cache=on/off enable/disable server-side data caching (normally allowed)

-E, --html-extension save all text/html documents with .html extension

--ignore-length ignore `Content-Length' header field

--header=STRING insert string STRING in headers

--proxy-user=USER set proxy user name as USER

--proxy-passwd=PASS set proxy password to PASS

--referer=URL include `Referer: URL' header in HTTP requests

-s, --save-headers save HTTP headers to file

-U, --user-agent=AGENT set agent name to AGENT instead of Wget/VERSION

--no-http-keep-alive turn off HTTP live connection (alive connection)

--cookies=off do not use cookies

--load-cookies=FILE load cookies from file FILE before starting session

--save-cookies=FILE save cookies to file FILE after session ends

FTP option parameters:

-nr, --dont-remove-listing don't remove `.listing' files

-g, --glob=on/off Turn filename globbing on or off

--passive-ftp use passive transfer mode (default).

--active-ftp use active transfer mode

--retr-symlinks When recursing, point links to files (not directories)

Recursive download parameters:

-r, --recursive download recursively -- use with caution!

-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite)

--delete-after delete files locally after now

-k, --convert-links convert non-relative links to relative links

-K, --backup-converted backup file X as X.orig before converting it

-m, --mirror is equivalent to -r -N -l inf -nr

-p, --page-requisites download all images showing HTML files

Inclusions and exclusions (accept/reject) in recursive downloads:

-A, --accept=LIST semicolon separated list of accepted extensions

-R, --reject=LIST semicolon separated list of rejected extensions

-D, --domains=LIST semicolon separated list of accepted domains

--exclude-domains=LIST semicolon separated list of excluded domains

--follow-ftp follow FTP links in HTML documents

--follow-tags=LIST semicolon separated list of followed HTML tags

-G, --ignore-tags=LIST semicolon separated list of ignored HTML tags

-H, --span-hosts go to foreign hosts when recursing

-L, --relative only follow relative links

-I, --include-directories=LIST list of allowed directories

-X, --exclude-directories=LIST list of excluded directories

-np, --no-parent don't trace back to parent directory

wget -S –spider url does not download but only displays the process

4. Example of use:

Example 1: Download a single file using wget

Order:

wget http://www.minjieren.com/wordpress-3.1-zh_CN.zip

illustrate:

The following example is to download a file from the network and save it in the current directory. During the download process, a progress bar will be displayed, including (download completion percentage, downloaded bytes, current download speed, remaining download time).

Example 2: Download with wget -O and save with a different filename

Order:

: wget -O wordpress.zip http://www.minjieren.com/download.aspx?id=1080

illustrate:

By default, wget will use the last character that matches "/" to command, and the file name will usually be incorrect for dynamic link downloads.

Error: The following example will download a file and save it with the name download.aspx?id=1080

wget http://www.minjieren.com/download?id=1

Even though the downloaded file is in zip format, it still ends with download.php?id=1080 command.

Correct: To solve this problem, we can use the parameter -O to specify a filename:

wget -O wordpress.zip http://www.minjieren.com/download.aspx?id=1080

Example 3: Use wget –limit -rate to limit speed download

Order:

wget --limit-rate=300k http://www.minjieren.com/wordpress-3.1-zh_CN.zip

illustrate:

When you run wget, it defaults to using all possible broadband downloads. But when you are going to download a large file and you need to download other files, it is necessary to limit the speed.

Example 4: Use wget -c to continue uploading

Order:

wget -c http://www.minjieren.com/wordpress-3.1-zh_CN.zip

illustrate:

Use wget -c to restart the interrupted file download. It is very helpful for us to download large files suddenly due to network and other reasons. We can continue to download instead of re-downloading a file. The -c parameter can be used when an interrupted download needs to be continued.

Example 5: Use wget -b to download in the background

Order:

wget -b http://www.minjieren.com/wordpress-3.1-zh_CN.zip

illustrate:

For downloading very large files, we can use the parameter -b to download in the background.

wget -b http://www.minjieren.com/wordpress-3.1-zh_CN.zip

Continuing in background, pid 1840.

Output will be written to `wget-log'.

You can check the download progress with the following command:

tail -f wget-log

Example 6: Downloading under the guise of proxy name

Order:

wget --user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16" http://www.minjieren.com/wordpress-3.1-zh_CN.zip

illustrate:

Some sites can deny your download request by judging that the proxy name is not your browser. But you can masquerade through the --user-agent parameter.

Example 7: Use wget –spider to test the download link

Order:

wget --spider URL

illustrate:

When you plan to perform scheduled downloads, you should test whether the download link works at the scheduled time. We can increase the --spider parameter to check.

wget --spider URL

If the download link is correct, it will display

wget --spider URL

Spider mode enabled. Check if remote file exists.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [text/html]

Remote file exists and could contain further links,

but recursion is disabled -- not retrieving.

This ensures that the download can be performed at the scheduled time, but when you give a wrong link, the following error will be displayed

wget --spider url

Spider mode enabled. Check if remote file exists.

HTTP request sent, awaiting response... 404 Not Found

Remote file does not exist -- broken link!!!

You can use the spider parameter in the following situations:

Check before scheduled download

Check whether the website is available at intervals

Check website pages for dead links

Example 8: Increase the number of retries using wget –tries

Order:

wget --tries=40 URL

illustrate:

It may also fail if there are network problems or if the download is a large file. By default, wget retries 20 times to connect to download files. You can increase the number of retries with --tries if needed.

Example 9: Download multiple files using wget -i

Order:

wget -i filelist.txt

illustrate:

First, save a download link file

cat > filelist.txt

url1

url2

url3

url4

Then use this file and the parameter -i to download

Example 10: Using wget –mirror to mirror a website

Order:

wget --mirror -p --convert-links -P ./LOCAL URL

illustrate:

Download the entire website locally.

–miror: account opening mirror download

-p: download all the files for normal display of html pages

–convert-links: After downloading, convert to local links

-P ./LOCAL: Save all files and directories to the local specified directory

Example 11: Use wget –reject to filter specified format downloads

Instruction:
wget --reject=gif ur

illustrate:

To download a website, but you don't want to download images, you can use the following command.

Example 12: Use wget -o to store download information in a log file

Order:

wget -o download.log URL

illustrate:

If you don't want the download information to be displayed directly on the terminal but in a log file, you can use

Example 13: Using wget -Q to limit the total download file size

Order:

wget -Q5m -i filelist.txt

illustrate:

When the file you want to download exceeds 5M and quit the download, you can use it. Note: This parameter does not work for single file downloads, it is only valid for recursive downloads.

Example 14: Use wget -r -A to download specified format files

Order:

wget -r -A.pdf url

illustrate:

You can use this feature in the following situations:

Download all images for a website

Download all videos of a website

Download all PDF files of a website

Example 15: FTP download using wget

Order:

wget ftp-url

wget --ftp-user=USERNAME --ftp-password=PASSWORD url

illustrate:

Downloading from ftp links can be done using wget.

Anonymous ftp download using wget:

wget ftp-url

ftp download with wget username and password authentication

wget --ftp-user=USERNAME --ftp-password=PASSWORD url

Remarks: compile and install

Compile and install using the following command: 

# tar zxvf wget-1.9.1.tar.gz 

# cd wget-1.9.1 

# ./configure 

# make 

# make install 

Guess you like

Origin blog.csdn.net/m0_64560763/article/details/130675511