The wget in the Linux system is a tool for downloading files, which is used under the command line. It is an essential tool for Linux users, we often have to download some software or restore backups from remote servers to local servers. wget supports HTTP, HTTPS and FTP protocols, and can use HTTP proxy. The so-called automatic download means that wget can be executed in the background after the user exits the system. This means that you can log into the system, start a wget download task, and then log out of the system, and wget will execute in the background until the task is completed. Big trouble.
wget can follow the links on an HTML page and download them sequentially to create a local version of the remote server, completely rebuilding the directory structure of the original site. This is often referred to as "recursive downloading". When recursively downloading, wget follows the Robot Exclusion standard (/robots.txt). While downloading, wget can convert the link to point to a local file to facilitate offline browsing.
wget is very stable, and it has strong adaptability in the case of narrow bandwidth and unstable network. If the download fails due to network reasons, wget will continue to try until the entire file is downloaded. If the server interrupts the download process, it will connect to the server again and continue the download from where it left off. This is useful for downloading large files from servers with limited connection times.
1. Command format:
wget [parameters] [URL address]
2. Command function:
It is used to download resources from the network. If no directory is specified, the downloaded resource defaults to the current directory. Although wget is powerful, it is relatively simple to use:
1) Support breakpoint download function; this is also the biggest selling point of Internet Ant and FlashGet in those years. Now, Wget can also use this function, and those users whose network is not very good can rest assured;
2) Both FTP and HTTP download methods are supported; although most software can now be downloaded using HTTP, sometimes, it is still necessary to use FTP to download software;
3) Proxy server is supported; for systems with high security intensity, they generally do not directly expose their systems to the Internet, so supporting proxy is a must-have function for downloading software;
4) The setting is convenient and simple; perhaps, users who are accustomed to the graphical interface are not too accustomed to the command line, but the command line actually has more advantages in setting, at least, the mouse can be clicked many times less, and don’t worry about whether it is wrong. mouse;
5) Small program, completely free; small program can be ignored, because the hard disk is too big now; completely free has to be considered, even though there are many so-called free software on the Internet, but the advertisements of these software are not ours like.
3. Command parameters:
Start parameters:
-V, --version display wget version and exit
-h, --help print syntax help
-b, --background switch to background execution after startup
-e, --execute=COMMAND Execute commands in `.wgetrc' format, see /etc/wgetrc or ~/.wgetrc for wgetrc format
Record and input file parameters:
-o, --output-file=FILE write records to FILE file
-a, --append-output=FILE append records to FILE file
-d, --debug print debug output
-q, --quiet quiet mode (no output)
-v, --verbose verbose mode (this is the default)
-nv, --non-verbose turn off verbose mode, but not quiet mode
-i, --input-file=FILE download URLs appearing in FILE file
-F, --force-html treat input files as HTML format files
-B, --base=URL Prefix relative links appearing in the file specified by the -F -i parameters with the URL
--sslcertfile=FILE optional client certificate
--sslcertkey=KEYFILE optional client certificate KEYFILE
--egd-file=FILE specifies the file name of the EGD socket
Download parameters:
--bind-address=ADDRESS specifies the local use address (hostname or IP, used when there are multiple local IPs or names)
-t, --tries=NUMBER Set the maximum number of connection attempts (0 means unlimited).
-O --output-document=FILE write the document to the FILE file
-nc, --no-clobber don't overwrite existing files or use .# prefix
-c, --continue Continue to download the unfinished files
--progress=TYPE set the progress bar label
-N, --timestamping don't re-download files unless they are newer than local files
-S, --server-response print server response
--spider don't download anything
-T, --timeout=SECONDS set response timeout in seconds
-w, --wait=SECONDS wait SECONDS seconds between attempts
--waitretry=SECONDS wait 1...SECONDS seconds between reconnections
--random-wait wait 0...2*WAIT seconds between downloads
-Y, --proxy=on/off Turn proxy on or off
-Q, --quota=NUMBER set download capacity limit
--limit-rate=RATE limit download rate
Directory parameters:
-nd --no-directories Do not create directories
-x, --force-directories force create directories
-nH, --no-host-directories do not create host directories
-P, --directory-prefix=PREFIX save files to directory PREFIX/…
--cut-dirs=NUMBER ignore NUMBER layer remote directory
HTTP option parameters:
--http-user=USER Set the HTTP user name to USER.
--http-passwd=PASS set http password to PASS
-C, --cache=on/off enable/disable server-side data caching (normally allowed)
-E, --html-extension save all text/html documents with .html extension
--ignore-length ignore `Content-Length' header field
--header=STRING insert string STRING in headers
--proxy-user=USER set proxy user name as USER
--proxy-passwd=PASS set proxy password to PASS
--referer=URL include `Referer: URL' header in HTTP requests
-s, --save-headers save HTTP headers to file
-U, --user-agent=AGENT set agent name to AGENT instead of Wget/VERSION
--no-http-keep-alive turn off HTTP live connection (alive connection)
--cookies=off do not use cookies
--load-cookies=FILE load cookies from file FILE before starting session
--save-cookies=FILE save cookies to file FILE after session ends
FTP option parameters:
-nr, --dont-remove-listing don't remove `.listing' files
-g, --glob=on/off Turn filename globbing on or off
--passive-ftp use passive transfer mode (default).
--active-ftp use active transfer mode
--retr-symlinks When recursing, point links to files (not directories)
Recursive download parameters:
-r, --recursive download recursively -- use with caution!
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite)
--delete-after delete files locally after now
-k, --convert-links convert non-relative links to relative links
-K, --backup-converted backup file X as X.orig before converting it
-m, --mirror is equivalent to -r -N -l inf -nr
-p, --page-requisites download all images showing HTML files
Inclusions and exclusions (accept/reject) in recursive downloads:
-A, --accept=LIST semicolon separated list of accepted extensions
-R, --reject=LIST semicolon separated list of rejected extensions
-D, --domains=LIST semicolon separated list of accepted domains
--exclude-domains=LIST semicolon separated list of excluded domains
--follow-ftp follow FTP links in HTML documents
--follow-tags=LIST semicolon separated list of followed HTML tags
-G, --ignore-tags=LIST semicolon separated list of ignored HTML tags
-H, --span-hosts go to foreign hosts when recursing
-L, --relative only follow relative links
-I, --include-directories=LIST list of allowed directories
-X, --exclude-directories=LIST list of excluded directories
-np, --no-parent don't trace back to parent directory
wget -S –spider url does not download but only displays the process
4. Example of use:
Example 1: Download a single file using wget
Order:
wget http://www.minjieren.com/wordpress-3.1-zh_CN.zip
illustrate:
The following example is to download a file from the network and save it in the current directory. During the download process, a progress bar will be displayed, including (download completion percentage, downloaded bytes, current download speed, remaining download time).
Example 2: Download with wget -O and save with a different filename
Order:
: wget -O wordpress.zip http://www.minjieren.com/download.aspx?id=1080
illustrate:
By default, wget will use the last character that matches "/" to command, and the file name will usually be incorrect for dynamic link downloads.
Error: The following example will download a file and save it with the name download.aspx?id=1080
wget http://www.minjieren.com/download?id=1
Even though the downloaded file is in zip format, it still ends with download.php?id=1080 command.
Correct: To solve this problem, we can use the parameter -O to specify a filename:
wget -O wordpress.zip http://www.minjieren.com/download.aspx?id=1080
Example 3: Use wget –limit -rate to limit speed download
Order:
wget --limit-rate=300k http://www.minjieren.com/wordpress-3.1-zh_CN.zip
illustrate:
When you run wget, it defaults to using all possible broadband downloads. But when you are going to download a large file and you need to download other files, it is necessary to limit the speed.
Example 4: Use wget -c to continue uploading
Order:
wget -c http://www.minjieren.com/wordpress-3.1-zh_CN.zip
illustrate:
Use wget -c to restart the interrupted file download. It is very helpful for us to download large files suddenly due to network and other reasons. We can continue to download instead of re-downloading a file. The -c parameter can be used when an interrupted download needs to be continued.
Example 5: Use wget -b to download in the background
Order:
wget -b http://www.minjieren.com/wordpress-3.1-zh_CN.zip
illustrate:
For downloading very large files, we can use the parameter -b to download in the background.
wget -b http://www.minjieren.com/wordpress-3.1-zh_CN.zip
Continuing in background, pid 1840.
Output will be written to `wget-log'.
You can check the download progress with the following command:
tail -f wget-log
Example 6: Downloading under the guise of proxy name
Order:
wget --user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16" http://www.minjieren.com/wordpress-3.1-zh_CN.zip
illustrate:
Some sites can deny your download request by judging that the proxy name is not your browser. But you can masquerade through the --user-agent parameter.
Example 7: Use wget –spider to test the download link
Order:
wget --spider URL
illustrate:
When you plan to perform scheduled downloads, you should test whether the download link works at the scheduled time. We can increase the --spider parameter to check.
wget --spider URL
If the download link is correct, it will display
wget --spider URL
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.
This ensures that the download can be performed at the scheduled time, but when you give a wrong link, the following error will be displayed
wget --spider url
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!
You can use the spider parameter in the following situations:
Check before scheduled download
Check whether the website is available at intervals
Check website pages for dead links
Example 8: Increase the number of retries using wget –tries
Order:
wget --tries=40 URL
illustrate:
It may also fail if there are network problems or if the download is a large file. By default, wget retries 20 times to connect to download files. You can increase the number of retries with --tries if needed.
Example 9: Download multiple files using wget -i
Order:
wget -i filelist.txt
illustrate:
First, save a download link file
cat > filelist.txt
url1
url2
url3
url4
Then use this file and the parameter -i to download
Example 10: Using wget –mirror to mirror a website
Order:
wget --mirror -p --convert-links -P ./LOCAL URL
illustrate:
Download the entire website locally.
–miror: account opening mirror download
-p: download all the files for normal display of html pages
–convert-links: After downloading, convert to local links
-P ./LOCAL: Save all files and directories to the local specified directory
Example 11: Use wget –reject to filter specified format downloads
Instruction:
wget --reject=gif ur
illustrate:
To download a website, but you don't want to download images, you can use the following command.
Example 12: Use wget -o to store download information in a log file
Order:
wget -o download.log URL
illustrate:
If you don't want the download information to be displayed directly on the terminal but in a log file, you can use
Example 13: Using wget -Q to limit the total download file size
Order:
wget -Q5m -i filelist.txt
illustrate:
When the file you want to download exceeds 5M and quit the download, you can use it. Note: This parameter does not work for single file downloads, it is only valid for recursive downloads.
Example 14: Use wget -r -A to download specified format files
Order:
wget -r -A.pdf url
illustrate:
You can use this feature in the following situations:
Download all images for a website
Download all videos of a website
Download all PDF files of a website
Example 15: FTP download using wget
Order:
wget ftp-url
wget --ftp-user=USERNAME --ftp-password=PASSWORD url
illustrate:
Downloading from ftp links can be done using wget.
Anonymous ftp download using wget:
wget ftp-url
ftp download with wget username and password authentication
wget --ftp-user=USERNAME --ftp-password=PASSWORD url
Remarks: compile and install
Compile and install using the following command:
# tar zxvf wget-1.9.1.tar.gz
# cd wget-1.9.1
# ./configure
# make
# make install