Cookie anti-crawler application and bypass principle

To be continued...

One, recognize cookies

1,cookie

Basic knowledge of web crawlers: HTTP and HTTPS, cookies and sessions .

2. The operation of cookies in django

1. Get the cookie content:

request.COOKIES.get['uuid']
request.COOKIES['uuid']

2. Delete cookies from the response content:

return HttpResponse('hello world!')
response.delete_cookie('key')
return response

3. Add cookies to the response content:

return HttpResponse('hello world!')
response.set_cookie('key', 'value') 
return response

Second, cookie anti-crawler

1. Principle of cookie anti-crawler

First, let's take a look at how django implements cookie operations.
Django allows set_cookie()multiple parameters to be added to achieve rich control over cookies:

    def set_cookie(self, key, value='', max_age=None, expires=None, path='/',
                   domain=None, secure=False, httponly=False, samesite=None):
        """
        Set a cookie.
        ``expires`` can be:
        - a string in the correct format,
        - a naive ``datetime.datetime`` object in UTC,
        - an aware ``datetime.datetime`` object in any time zone.
        If it is a ``datetime.datetime`` object then calculate ``max_age``.
        """
        self.cookies[key] = value
        if expires is not None:
            if isinstance(expires, datetime.datetime):
                if timezone.is_aware(expires):
                    expires = timezone.make_naive(expires, timezone.utc)
                delta = expires - expires.utcnow()
                # Add one second so the date matches exactly (a fraction of
                # time gets lost between converting to a timedelta and
                # then the date string).
                delta = delta + datetime.timedelta(seconds=1)
                # Just set max_age - the max_age logic will set expires.
                expires = None
                max_age = max(0, delta.days * 86400 + delta.seconds)
            else:
                self.cookies[key]['expires'] = expires
        else:
            self.cookies[key]['expires'] = ''
        if max_age is not None:
            self.cookies[key]['max-age'] = max_age
            # IE requires expires, so set it if hasn't been already.
            if not expires:
                self.cookies[key]['expires'] = http_date(time.time() + max_age)
        if path is not None:
            self.cookies[key]['path'] = path
        if domain is not None:
            self.cookies[key]['domain'] = domain
        if secure:
            self.cookies[key]['secure'] = True
        if httponly:
            self.cookies[key]['httponly'] = True
        if samesite:
            if samesite.lower() not in ('lax', 'strict'):
                raise ValueError('samesite must be "lax" or "strict".')
            self.cookies[key]['samesite'] = samesite

Among them are the following parameters:

  • key
  • value='', value
  • max_age=None, the timeout period, in seconds, if it is not set, it will be invalid after closing the browser
  • expires=None, timeout period (IE requires expires, so set it if hasn't been already.), the unit is date format
  • path='/', the path where the cookie takes effect. The cookie of the root path can be accessed by any url page
  • domain=None, the domain name where the cookie takes effect
  • secure=False, whether to use https transmission
  • httponly=False, can only be transmitted by http protocol, and cannot be easily obtained by JavaScript
  • samesite, samesite must be “lax” or “strict”, set mandatory mode to avoid CSRF attacks

In cookie anti-crawling, max_age, expires, and path are usually set. The first two are used for cookie time control, and the last one hides the cookie generation process.
But when using set_cookie(), the value needs to be encrypted to realize cookie data hiding. In order to simplify the setup and encryption process, django provides set_signed_cookie()methods:

    def set_signed_cookie(self, key, value, salt='', **kwargs):
        value = signing.get_cookie_signer(salt=key + salt).sign(value)
        return self.set_cookie(key, value, **kwargs)

Among them are the following parameters:

  • key, set key
  • value, set value
  • salt, used for pairing encryption
  • kwargs, set the optional parameter
    set_signed_cookie() in set_cookie, which will set set-cookie in the header of the response to the request, and the browser will save it locally after analyzing the corresponding set-cookie and adding it to future requests This set-cookie.
    The encryption process of set_signed_cookie() is get_cookie_signer()implemented by :
def get_cookie_signer(salt='django.core.signing.get_cookie_signer'):
    Signer = import_string(settings.SIGNING_BACKEND)
    key = force_bytes(settings.SECRET_KEY)  # SECRET_KEY may be str or bytes.
    return Signer(b'django.http.cookies' + key, salt=salt)

Among them, it needs to obtain two configurations:

  • SIGNING_BACKEND, cookie encryption engine, needs to be configured in seetings.py.
  • SECRET_KEY, security key, which is automatically generated in settings.py when the project is successfully created.
    set_cookie() This will set the set-cookie in the header of the response to the request. After the browser parses the corresponding and finds the set-cookie, it will save it locally, and add this set-cookie in future requests.
    To sum up: Cookie anti-crawler is to decode and judge specific cookie information, the decoding and encryption process are consistent, and then the validity of the information is judged. The cookie generation process also needs to be hidden through AJAX technology.

2. Implementation of cookie anti-crawler in django

index/views.py:
from django.http import Http404, HttpResponse
from django.shortcuts import render, redirect
from django.shortcuts import reverse

def index(request):
    return render(request, 'index.html') # 这里使用重定向进行隐藏。

def create(request):
    r = redirect(reverse('index:index'))
    # 添加Cookie
    r.set_signed_cookie('uuid', 'id', salt='MyDj', max_age=30)
    return r

def myCookie(request):
    cookieExist = request.COOKIES.get('uuid', '')
    if cookieExist:
        # 验证加密后的Cookies是否有效
        try:
            request.get_signed_cookie('uuid', salt='MyDj')
        except:
            raise Http404('当前Cookie无效哦!')
        return HttpResponse('当前Cookie为:' + cookieExist)
    else:
        raise Http404('当前访问没有Cookie哦!')

Here, the cookie generation is hidden by redirecting the content page to the home page.

MyDjango/settings.py:
# 默认的Cookie加密(解密)引擎
SIGNING_BACKEND = 'django.core.signing.TimestampSigner'
# 当然也可以自定义
templates/index.html:
<!DOCTYPE html>
<html>
<body>
<div class="page-header">
    <h3>Hello Cookies</h3>
</div>
<a class='link_1' href="{% url 'index:create' %}">先创建Cookie</a>
<br>
<a class='link_2' href="{% url 'index:myCookie' %}">再查看Cookie</a>
</body>
</html>
index/urls.py:
from django.urls import path
from . import views

urlpatterns = [
    # 定义路由
    path('', views.index, name='index'),
    path('create', views.create, name='create'),
    path('myCookie', views.myCookie, name='myCookie'),
]

First clear the browsing information, and then visit http://127.0.0.1:8000/. The first visit is no cookie generated:
Insert picture description here
after the cookie is created:
Insert picture description here
check the cookie again: the
Insert picture description here
cookie will automatically become invalid after timeout.

Three, cookie anti-crawler bypass

1. Development environment preparation

1. Depends on the above django project:

To be continued...

Guess you like

Origin blog.csdn.net/dangfulin/article/details/108209173