Web cache—Squid proxy service

1. Squid related knowledge
 1.1 The concept of squid

Squid servers cache frequently requested web pages, media files, and other content to speed up answer times and reduce bandwidth congestion.

Squid proxy server (Squid proxy server) is generally installed on a separate server rather than a network server together with the original file. Squid works by tracking the usage of objects on the network. Squid initially acted as an intermediary, simply passing client requests to the server and storing a copy of the request object. If the same client or the same batch of clients request the same object while still in the Squid cache (cache), Squid can serve it immediately, speeding up the download and saving bandwidth.
Squid mainly provides functions of cache acceleration and application layer filtering control.

1.2 The working mechanism of squid proxy 
(1) replace the client to request data from the website, so that the real IP address of the user can be hidden.

(2) Save the obtained web page data (static web elements) into the cache and send it to the client, so that the next time the same data is requested for a quick response.

1.3 The concept and function of proxy server:

A proxy server is a server between the client and the original (resource) server. In order to obtain content from the original server, the client sends a request to the proxy server and specifies the target original server, and then the proxy server forwards the request to the original server and will get content is returned to the client.

Caching proxies are crucial to the web, especially for large and highly loaded web sites. Caching can be used as an important means of performance optimization, which can greatly reduce the load on the back-end server. Usually static resources, that is, resources that are less frequently updated, such as pictures, css or js, etc. are cached, so that each time the browser is refreshed, it is not necessary to re-request, but read from the cache, which can reduce server traffic. pressure.

1.4 Its main functions are:

Resource acquisition: replace the client to obtain resources from the original server;
Accelerate access: The proxy server may be closer to the original server, thus playing a certain role in acceleration;
Cache function: The proxy server saves the resources obtained from the original server, so as to realize The client obtains quickly;
hides the real address: the proxy server replaces the client to obtain the original server resources, thereby hiding the real information of the client.
 

The most basic function: to improve the speed of web access and hide the real IP address of the client.

1.5 Types of Squid Proxies 

Traditional Proxy: Applicable to Internet forward proxy, the address and port of the proxy server need to be specified on the client computer.

Transparent proxy: The client does not need to specify the address and port of the proxy server, but redirects the Web access to the proxy server through the default route and firewall policy.

Reverse proxy: If the requested resource is cached in the Squid reverse proxy server, the requested resource will be returned directly to the client; otherwise, the reverse proxy server will request the resource from the background WEB server, and then return the requested response to the client, and also cache the response locally for use by the next requester
 

 1.6 The difference between squid proxy server and SNAT|DNAT
Just looking at the role of data forwarding, it is easy to confuse squid proxy server with SNAT|DNAT, and think that there is not much difference between the two, but in fact the two are still very different in essence the difference

Main difference SNAT and DNAT Proxy server mode
Network layer Network layer Application layer
Workflow LAN users access the external network in snat mode, the same data packet, change the source address of the data packet header, and then send the data packet to the Internet. Do not change the data packet header information, proxy the data packet to the Internet server, based on the filtering
effect of the application layer Intranet users go to the external network (SNAT) and intranet services are published to the public network (DNAT) Cache page, speed up access, ACL resource access The control
proxy server follows the protocols of the application layer: http, ftp, pop, smtp, p2p, etc.

reverse proxy:

Internet users access the LAN server Squid as a reverse proxy server, forwarding the user's request to the real server behind, which can achieve the effect of load balancing, and at the same time buffer the pages frequently visited by users to improve the access speed

 2. Installation and operation of Squid service
Install Squid service
 #Installation environment dependency package
yum -y install gcc gcc-c++ make 
 #Decompress squid installation package
tar zxvf squid-3.5.28.tar.gz -C /opt/ 
 #Switch to source code Package directory, set the installation path and install the module
cd /opt/squid-3.5.28 
 
./configure --prefix=/usr/local/squid \
--sysconfdir=/etc \
--enable-arp-acl \
--enable -linux-netfilter \
--enable-linux-tproxy \
--enable-async-io=100 \
--enable-err-language="Simplify_Chinese" \
--enable-underscore \
--disable-poll \
--enable -epoll \
--enable-gnuregex
#################### Comment ####################### #########
./configure --prefix=/usr/local/squid\      
#Specify the installation directory path
--sysconfdir=/etc \                         
#Specify the configuration file path
--enable-arp-acl \                          
#MAC address control to prevent clients from using IP spoofing
--enable-linux-netfilter \                 
 #Use kernel filtering
--enable-linux-tproxy \                     
#Support transparent mode
--enable-async-io=100 \                     
#Asynchronous IO, improve storage performance. Write to the cache first, then write to the hard disk
--enable-err-language="Simplify_Chinese" \  
#Error message display language
--enable-underscore \                       
#Allow underscores in the URL
--disable-poll \                            
#Close the default use of poll Mode
--enable-epoll \        
#Enable epoll mode to improve performance, epoll mode can support IO multiplex replication, asynchronous non-blocking
--enable-gnuregex                         
  #Use GNU regular expressions

 ​​​make  
-j2 && make install #Open 2 cores to compile and install
 ​ln
-s /usr/local/squid/sbin/* /usr/local/sbin/  
 #Optimize the path so that the system can directly recognize the squid command
useradd -M - s /sbin/nologin squid             
#Create a squid user, do not create a home directory, and cannot log in to the system
chown -R squid:squid /usr/local/squid/var/   
#Modify the owner and group, this directory is used to store cache files

  2.2 Modify the Squid configuration file
vim /etc/squid.conf
 
 
 --line 56 --insert
 http_access allow all    
#Before http_access deny all, any client is allowed to use the proxy service, and the control rules match
 http_access deny all
 http_port from top to bottom 3128           
#Used to specify the address and port that the proxy service listens to (the default port number is 3128)
 --61 lines--insert
 cache_effective_user squid      
#Add, specify the program user, used to set the account for initialization and runtime cache, otherwise start
 cache_effective_group squid    
 #Add, specify the account basic group
 coredump_dir /usr/local/squid/var/cache/squid    
 #Specify the cache file directory. There is this line by default, generally do not modify
 

2.3 Operation control of Squid 
 #Check whether the syntax of the configuration file is correct
squid -k parse
 ​#Start
 Squid, when the Squid service is started for the first time, the cache directory will be automatically initialized
squid -z #-z option is used to initialize the cache directory
squid #Start squid Service
 ​netstat
-anpt | grep "squid" #Check whether the startup is successful


2.4 Create Squid service script for system service management 
 vim /etc/init.d/squid
 #!/bin/bash
 #chkconfig: 2345 90 25
 ​PID
 ="/usr/local/squid/var/run/squid.pid"
 CONF="/etc/squid.conf"
 CMD="/usr/local/squid/sbin/squid"
 ​case
 "$1" in
    start)
      netstat -natp | grep squid &> /dev/null
      if [ $? -eq 0 ]
      then
        echo "squid is running"
      else
        echo "starting squid..."
        $CMD
      fi
    ;;
    stop)
      $CMD -k kill &> /dev/null
      rm -rf $PID &> /dev/null     
 #kill the command to kill the process, the PID file will not be deleted, you need to manually delete the PID file, otherwise There will be problems with the next startup
    ;;
    status)
      [ -f $PID ] &> /dev/null
         if [ $? -eq 0 ]
           then
             netstat -natp | grep squid
           else
             echo "squid is not running"
         fi
    ;;
    restart)
       $0 stop &> /dev/null
       echo "Squid is shutting down..."
       $0 start &> /dev/null
       echo "Squid is starting..."
    ;;
    reload)
       $CMD -k reconfigure
    ;;
    check)
       $CMD -k parse
    ;;
    *)
       echo "用法:$0 {start|stop|status|reload|check|restart}"
    ;;
 esac
 ​
 #2345 is the default self-starting run level, if it is - it means that any run level will not start automatically; 90 is the start priority,
25 is the stop priority, the priority range is 0-100, the larger the number, the lower the priority.  ​​​​chmod
  +x /etc/init.d/squid #Add execution permission to the script chkconfig --add squid #Add system service management chkconfig --list squid #Check which run levels automatically start




 


 3. Experimental design of building a traditional proxy server (forward proxy) Experimental
requirements
Build a squid service that can replace the client to cache resources on the web server (this mode requires the client to manually add a proxy server)

 Experimental component deployment
Squid proxy server: 192.168.50.26/24

web server: 192.168.50.25/24

Client: 192.168.50.24/24

1 Configure proxy server
vim /etc/squid.conf
......
http_access allow all
http_access deny all
http_port 3128
cache_effective_user squid
cache_effective_group squid
 --63 lines -- insert
cache_mem 1024 MB               
 #Specify the memory space used by the cache function, It is convenient to keep frequently accessed WEB objects.
The capacity is preferably a multiple of 4, and the unit is MB. It is recommended to set it to 1/4 of the physical memory.
reply_body_max_size 100 MB          
 #The maximum file size that users are allowed to download, in bytes, when downloading When the web object exceeds the specified size,
the error message "request or access is too large" will appear on the browser's error page. The default setting of 0 means no restriction.
If there is no restriction, this line needs to be commented out.
maximum_object_size 100 MB          
 #The maximum object size allowed to be saved to the cache space, in KB. Files exceeding the size limit will not be cached,
but will be forwarded directly to the user. If you don’t use many large files, the web server will respond directly to reduce the space occupied by the cache server
 service
squid restart #Restart squid
 service
 #It is also necessary to modify the firewall rules in the production environment
iptables -F
iptables -I INPUT -p tcp --dport 3128 -j ACCEPT   
 #Allow traffic on port 3128 of the tcp protocol
 to pass

 2. Install nginx
systemctl stop firewalld
setenforce 0
script on web server to compile and install nginx
echo "<h1> this is web1 side</h1>" >
 /usr/local/nginx/html/index.html
 

3. Proxy configuration of the client, access the web server
 and open the browser, tools --> Internet options --> connection --> LAN settings --> enable the proxy server
 (address: Squid server IP address, port: 3128) 

 4. Check the cache hit situation in the new record of Squid access log
tail -f /usr/local/squid/var/logs/access.log
TCP_MISS/200 #Indicates cache miss       
TCP_MEM_HIT/200 #Indicates cache
 hit

 5. View new records in the Web access log

tail -f /var/log/httpd/access_log
 Enter the IP address of the web server in the browser to access, and view the web server access log, which shows that the proxy server is accessing for the client. The address is not the real client address, but the proxy server address.

 4. Transparent Proxy
Experimental Requirements 
Transparent Proxy Requirements:

Requires the proxy server to be a gateway server.
The gateway server has at least two network cards, and the routing and forwarding function is enabled.
Firewall iptables/firewalld sets redirection rules.
There is no need for the user to manually set the proxy server. When the user starts to access the same resource as the first time for the second time, the resource service is provided by the proxy server 

Experimental component 
 squid proxy server: ens33: 192.168.50.25 ens35: 12.0.0.254

webserver: 12.0.0.12/24

Client: 192.168.50.24

How to setup a network of 3 machines?

squid host:

Add a network card (nat mode), check it is ens35 through ifconfig

Modify ens33 to comment out gateway and dns

Modify ens35 Modify Ip to 12.0.0.12, comment out gateway and dns

Client computer

Modify ens33, the gateway is squid ens33 network card address 192.168.50.25, comment out NDS

web server

Modify ens33, the gateway is squid ens35 network card address 12.0.0.12, comment out NDS

(1) Configuration of suqid server
vim /etc/squid.conf #Edit configuration file
 ......
 http_access allow all
 http_access deny all
 #--60 lines--modify and add the IP address that provides intranet services, and support transparency Proxy option transparent
 http_port 192.168.73.110:3128 transparent  
 #Fill in the network card address of the connected client network segment, that is, the internal network card address. Monitor the address of your own intranet network card, and support the transparent proxy option transparent
 ​systemctl
restart squid #Restart squid service
 ​#Open
 routing and forwarding to realize address forwarding of different network segments in this machine
echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf
sysctl -p
 ​#Modify
 firewall rules, set redirection rules
iptables -F
iptables -t nat -F
 #For forwarding http protocol. Port 80 is redirected to port 3128, which is requested by the proxy server
iptables -t nat -A PREROUTING -i ens33 -s 192.168.73.0/24 -p tcp --dport 80 -j REDIRECT --to 3128
 #For forwarding https protocol. Port 443 is redirected to port 3128, which is requested by the proxy server
 iptables -t nat -A PREROUTING -i ens33 -s 192.168.73.0/24 -p tcp --dport 443 -j REDIRECT --to 3128
 #Accept 3128 of the tcp protocol Port incoming data
iptables -A INPUT -p tcp --dport 3128 -j  
 ACCEPT
 

(2) Web server configuration


systemctl stop firewalld
setenforce 0
yum install -y httpd
systemctl start httpd
echo "this is test" > /var/www/html/index.html
 ​

(3) Client configuration, modify the gateway address, the browser does not use the proxy
 
 gateway address and modify it to the proxy server’s intranet network card address: 192.168.73.110  After
 closing  ​​#View    the new record of Squid access log  tail -f /usr/local/squid/var/logs/access.log  ​#View the new record of Web access log, which shows the proxy server The external network port instead of the client is accessing tail -f /var/log/httpd/access_log







 


 5. ACL access control
In the configuration file squid.conf, ACL access control is implemented through the following two steps:

(1) Use the acl configuration item to define the conditions that need to be controlled;

(2) Use the http_access configuration item to control "allow" or "deny" access to the defined list.

Define the format of the access list: 

 acl list-name list-type list-content...
EXPERIMENTAL


 First of all, both clients need to configure Squid's forward proxy
5.1 Configuration  application
1. Edit the configuration file and define the control access list
vim /etc/squid.conf
 ......
 acl localhost src 192.168.73.112/32             
 # Client source address is 192.168.73.112
 acl MYLAN src 192.168.73.0/24 12.0.0.0/24    
#Client network segment
 acl destinationhost dst 12.0.0.12/32       
 #Destination address is 12.0.0.12
 acl MC20 maxconn 20                           
  #Maximum concurrent connections 20
 acl PORT port 21                               
 #target port 21
 acl DMBLOCK dstdomain .qq.com                  
 #target domain, matching all sites in the domain
 acl BURL url_regex -i ^rtsp:// ^emule://   
 #begins with rtsp://, emule:// The target URL address, -i means ignore case. match regular expression
 acl PURL urlpath_regex -i .mp3$ .mp4$ .rmvb$     
#Target URL path ending with .mp3, .mp4, .rmvb
 acl WORKTIME time MTWHF 08:30-17:30             
 #Time is Monday to Friday 8:30 ~17:30, "MTWHF" is the English initials from Monday to Friday
 ​http_access
 deny (or allow) destinationhost  
#Note, if it is a deny list, it needs to be placed  before    
 

 
 
 


 
 
 

 5.2 Application of ACL access control
5.2.1 Application 1
 1. Edit configuration file, define control access list
 vim /etc/squid.conf
 ......
  25 acl CONNECT method CONNECT

  #Define the name of the list as myhost, the source address is 192.168.73.112, it must start at 26, otherwise it will have no effect
  26 acl myhost src 192.168.50.20/32
  27
  28 http_access deny myhost #Reject the address in the myhost list to access
  #Attention, deny The list needs to be placed before http_access allow all  Restart
  squid service systemctl restart  squid  




 Test effect

 Access from other addresses:

5.2.2 Application 2

Start object list management


#Start object list management
 mkdir /etc/squid
 vim /etc/squid/dest.list #Write an address management list
 192.168.50.21
 

#Edit configuration file
 vim /etc/squid.conf        
 ......
 acl destinationhost dst "/etc/squid/dest.list" #Call the list content in the specified file
 ......
 http_access deny (or allow) destination host

 #Note, if it is a deny list, it needs to be placed in front of http_access allow all
 

The effect is that all addresses in the dest.list table are denied access, which can reduce operations on configuration files

Access test:

 6. Reverse proxy
6.1 Working mechanism of reverse proxy 
 
If the requested resources are cached in the Squid reverse proxy server, the requested resources will be returned directly to the client; otherwise, the reverse proxy server will request resources from the background Web server , and then return the response to the request to the client, and also cache the response locally for use by the next requester.

Static content/cacheable content, if written to cache, served by the cache server. Dynamic content is forwarded by the proxy server to the web server for request.


Working Mechanism:

Cache web page objects to reduce repeated requests.
Internet requests are trained in rotation or assigned to intranet web servers according to weight.
Proxy user requests to prevent users from directly accessing the Web server and improve security.

6.2 Build Squid reverse proxy server 
squid server: 192.168.50.26/24 

web1 server: 192.168.50.22/24

web2 server: 192.168.50.23/24

Client: 192.168.50.20/24

 vim /etc/squid.conf
 ......
 #--60 lines--modify, insert the following lines
 http_port 192.168.73.110:80 accel vhost vport
 cache_peer 192.168.73.111 parent 80 0 no-query originserver round-robin max_conn =30 weight=1 name=web1
 cache_peer 192.168.73.112 parent 80 0 no-query originserver round-robin max_conn=30 weight=1 name=web2
 cache_peer_domain web1 web2 www.yang.com
 #Represents a request to www.yang.com, Squid sends a request to port 80 of 192.168.73.111 and 192.168.73.112
 ​----------------
 The dotted line is a comment -------------- -------------------------------------------------- ------------------------
 http_port 80 accel vhost vport 
 ##Squid has changed from a cache to a reverse proxy acceleration mode of a Web server. At this time, Squid listens to requests on port 80, and at the same time binds to the request port (vhost vport) of the web server. At this time, Squid is requested, and Squid is not used Instead of forwarding the request, it directly either fetches the data from the cache or directly requests the data from the bound port.
 accel: Reverse proxy acceleration mode.
 vhost : supports domain name or hostname to represent proxy nodes.
 vport : supports IP and port to represent proxy nodes.
 ​parent
 : Represents the parent node, upper-lower relationship, and non-horizontal relationship.
 80 : Proxy port 80 of the internal web server.
 0: no icp (telecom operator), means only one squid server.
 no-query : No query operation is performed, and the data is obtained directly.
 originserver : Specifies the origin server.
 round-robin: Specifies that Squid distributes requests to one of the parent nodes by polling.
 max_conn : Specifies the maximum number of connections.
 weight : Specifies the weight.
 name : Set an alias.
 -------------------------------------------------- -------------------------------------------------- ------  ​​#Empty
   the iptables rules configured in transparent mode before iptables -F



iptables -t nat  -Fnetstat
   -natp | grep :80 #Check whether port 80 is occupied, if it is occupied, you need to close httpd systemctl stop httpd #Prevent port 80 used by httpd service and squid reverse proxy configuration Listening port conflict  systemctl restart squid #Restart squid service



http_port 80 accel vhost vport

#squid has changed from a cache to a reverse proxy acceleration mode of a web server. At this time, squid listens to requests on port 80, and at the same time binds to the request port (vhost vport) of the web server. At this time, the request arrives at squid, and squid does not need to forward Requested, but directly either get the data from the cache or directly request the data from the bound port.
accel: reverse proxy acceleration mode
vhost: support domain name or host name to represent proxy node
vport: support IP and port to represent proxy node

parent: Represents the parent node, upper-lower relationship, non-level relationship
80: proxy internal web server port 80
0: does not use icp (telecom operator), means only one squid server
no-query: no query operation, directly Get data
originserver: specify the source server
round-robin: specify squid to distribute the request to one of the parent nodes through polling
max_conn: specify the maximum number of connections
weight: specify the weight
name: set an alias

(2) Configuration of two web servers
 systemctl stop firewalld
 setenforce 0
 yum install -y httpd
 systemctl start httpd
 #
 web1:
 echo "this is test1" > /var/www/html/index.html
 #web2:
 echo "this is test2" > /var/www/html/index.html
 echo "this web2 test" > /var/www/html/test.html
 
 

(3) The domain name mapping configuration of the client, for access verification 
windows system, modify the C:\Windows\System32\drivers\etc\hosts file
 192.168.73.110 www.yang.com
 ​Linux
 system, modify the /etc/hosts file, add Mapping relationship
 echo "192.168.73.110 www.yang.com" >>/etc/hosts #squid server address
 ​Browser
 does not open proxy access 
 http://www.yang.com
 http://www.yang.com/test .html

Squid log analysis
#Install image processing software package

yum install -y pcre-devel gd gd-devel

mkdir /usr/local/sarg
tar zxvf sarg-2.3.7.tar.gz -C /opt/

cd /opt/sarg-2.3.7
./configure --prefix=/usr/local/sarg \
--sysconfdir=/etc/sarg \ #Configuration file directory, the default is /usr/local/etc
--enable-extraprotection #Additional security protection
make && make install

Modify the configuration file
vim /etc/sarg/sarg.conf

--7 lines--uncomment
access_log /usr/local/squid/var/logs/access.log #Specify access log file
--25 lines--uncomment
title "Squid User Access Reports" #Web page title
--120 lines --Uncomment, modify
output_dir /var/www/html/sarg #Report output directory
--178 lines--uncomment
user_ip no #Use user name display
--184 lines--uncomment, modify
topuser_sort_field connect reverse #topsort , the number of specified connections is sorted in descending order, and the ascending order is normal
--line 190--uncomment, modify
user_sort_field connect reverse #For user access records, the number of connections is sorted in descending order--
line 206--uncomment, modify
exclude_hosts /usr/ local/sarg/noreport #Specify the file that is not included in the sorted site list
--line 257--uncomment
overwrite_report no #whether the log with the same name and date is overwritten
--line 289--uncomment, modify
mail_utility mailq.postfix #send Mail report command --
line 434 -- uncomment, modify
charset UTF-8 #Specify character set UTF-8
--518 lines -- uncomment
weekdays 0-6 #top ranking week period
--525 lines -- uncomment
hours 0-23 #top ranking time period
--633 Line -- uncomment
www_document_root /var/www/html #Specify the root directory of the web page

#Adding is not included in the site file, the added domain name will not be displayed in the sorting
touch /usr/local/sarg/noreport


ln -s /usr/local/sarg/bin/sarg /usr/local/bin/
sarg --help

#verify
yum install httpd -y
systemctl start httpd

# run
sarg


Access http://192.168.50.26/sarg with a browser to view the sarg report page.

Guess you like

Origin blog.csdn.net/zl965230/article/details/130803497