Four: Crawler-Cookie and Session actual combat

fourth:CookiegivingSessionpractical

​ When browsing the website, we often encounter situations where we need to log in. Some pages can only be accessed after logging in. After logging in, you can visit the website many times in succession, but sometimes you need to log in again after a while. There are also some websites that automatically log in when you open the browser and will not expire for a long time. The ones involved are Session and Cookie Relevant knowledge

(1)Cookie

​ Determine user identity through information recorded on the client

​ HTTP is a connectionless protocol. The interaction between the client and the server is limited to the request/response process and is disconnected after the end. The server will consider it a new client during the next request. In order to maintain the connection between them, let The server knows that this is a request initiated by the previous user, so it must save the client information in one place

(2)Session

Session, called session in Chinese, determines the user's identity through the information recorded on the server. Its original meaning refers to a series of actions that have a beginning and an end. For example, when making a phone call, the series of processes from picking up the phone to dialing to hanging up the phone can be called a Session

Reptile GuideCookieYoSession – Registration process:

Insert image description here

Character analysis:

​ 1. Logging in again is actually to verify the previously generated account and password, and query whether the current user exists from the database. If it exists, the login is successful, and then returns through the encryption algorithmsession_id
2.set_cookie is returned by the server, because only what the server returns isset_cookie

​ 3. The server retrieves the database named based on the carried in cookie (assuming a> exists) to find whether is stored in the database named session_idsessionsession_idsessionsession_id

​ 4. The advantage of this is that the user only needs to enter the account and password once, and then when accessing the web page, he only needs to enter headersInformation UtilizationCookieIncludingSession_id, the background can determine whether the user is logged in based onSession_id

Cookie and Session practical case – 12306 ticket checking example:

import requests

headers = {
    
    
    "Accept": "*/*",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Cache-Control": "no-cache",
    "Connection": "keep-alive",
    "If-Modified-Since": "0",
    "Pragma": "no-cache",
    "Referer": "https://kyfw.12306.cn/otn/leftTicket/init?linktypeid=dc",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest",
    "sec-ch-ua": "^\\^Google",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "^\\^Windows^^"
}
cookies = {
    
    
    "_uab_collina": "170210568210505922888592",
    "JSESSIONID": "44EBFDF0F56EDAAB390BD3821713F910",
    "BIGipServerpassport": "921174282.50215.0000",
    "guidesStatus": "off",
    "highContrastMode": "defaltMode",
    "cursorStatus": "off",
    "route": "495c805987d0f5c8c84b14f60212447d",
    "BIGipServerotn": "2698445066.64545.0000",
    "_jc_save_fromStation": "^%^u5317^%^u4EAC^%^2CBJP",
    "_jc_save_toStation": "^%^u4E0A^%^u6D77^%^2CSHH",
    "_jc_save_toDate": "2023-12-09",
    "_jc_save_wfdc_flag": "dc",
    "_jc_save_fromDate": "2023-12-10"
}
url = "https://kyfw.12306.cn/otn/leftTicket/query"
params = {
    
    
    "leftTicketDTO.train_date": "2023-12-10",
    "leftTicketDTO.from_station": "BJP",
    "leftTicketDTO.to_station": "SHH",
    "purpose_codes": "ADULT"
}
response = requests.get(url, headers=headers, cookies=cookies, params=params)

data = response.json()
# print(data,type(data))

result = data['data']['result']
# print(result,type(result))

for item in result: # item代表每一个车次数据
    # print(item,"此时是字符串")
    data_li = item.split('|') # 代表每一车次的杂乱数据
    # print(data_li,"此时是列表")
    # for i,f in enumerate(data_li): # 这个for循环是为了确定一等做和车次的索引
    #     pass
    #     print(i,f) # 通过 '|' 分割,拿到每一个杂乱小数据的索引值与具体的值

    '''
    车次 -- 索引为3
    一等座 -- 索引为31
    '''

    if data_li[31] != "无" and data_li[31] != "":
        print(data_li[3], "有票", "一等座剩余:", data_li[31])
    else:
        print(data_li[3], "无票")
     print(data_li[3], "有票", "一等座剩余:", data_li[31])
    else:
        print(data_li[3], "无票")

Guess you like

Origin blog.csdn.net/qiao_yue/article/details/134902988