selenium use proxy IP

1. Apply for proxy IP

If a user visits a website multiple times, it may be identified as a crawler, thus restricting the access of its client ip. For some more formal websites, the anti-crawling system is very strong, and this situation is most likely to occur, so Sometimes it is necessary to use a proxy IP, I generally choose to use a random dynamic proxy IP, which can ensure a random user instead of a fixed user each time I visit.

Not much to say, register IPIDEA to enter, and you will get free 100M traffic when you register. If you have special needs, you can buy it again:

http://www.ipidea.net/?utm-source=gejing&utm-keyword=?gejing

Generate API:

insert image description here
Click the generate link
insert image description here

insert image description here

Copy the link package and save it for later use.
insert image description here

2. Actual combat using proxy IP in selenium (1)

Set proxy base format:

import requests
proxies = {
    
    
    'http': 'http://222.89.32.159:21079',
    'https': 'http://222.89.32.159:21079'
}
headers = {
    
    
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
}
res = requests.get(url=urls,headers=headers,proxies=proxies)

I thought about it for a long time and couldn't figure out which websites are anti-climbing strong, so I just found a website to test, you can try to visit your school's educational administration system, 12360, facebook, etc...

Destination URL:

https://www.taobao.com/

insert image description here
So positioning is easy:

driver.find_element_by_name('q')

I wrote about using an agent to play crawler once before, which is the requests module, but as an individual, the more I learn, I find that selenium is used more and more, and requests are gradually abandoned, so here is a selenium to add an agent.

The way is simple:

ops.add_argument('--proxy-server=http://%s' % a) #添加代理

Note that the format of a here is: ip:port

Note: Using the proxy ip requires the module selenium-wire to be installed:

pip install selenium-wire

You should be:

from seleniumwire import webdriver

instead of:

from selenium import webdriver

For example, in X treasure search: XX mobile phone

Full code:

from selenium import webdriver
from fake_useragent import UserAgent
from selenium.webdriver.chrome.options import Options

headers = {
    
    'User-Agent': UserAgent().random}
ops = Options()

driver = webdriver.Chrome(r'D:\360安全浏览器下载\chromedriver.exe')

api_url = '让你复制的代理api链接'

driver.get(api_url)

a = driver.find_element_by_xpath('/html/body/pre').text  # 获取代理

ops.add_argument('--proxy-server=http://%s' % a) #添加代理

driver.delete_all_cookies()  #清楚cookies

driver.get('https://www.taobao.com/')

driver.find_element_by_name('q').send_keys('华为手机')

The next step is to click the button:
insert image description here
determine the place element to click, and then use click to click:

from selenium.webdriver import ActionChains

b= driver.find_element_by_class_name('search-button')  #定位搜索

ActionChains(driver).click(b).perform()

Is it possible that the anti-pick mechanism was triggered? I need to log in, I don't know my X treasure account password, just enter it for demonstration... The rest is my own operation
Here is the account password analysis:
insert image description here
insert image description here
So the positioning account password is as follows, the account I set the input: chuanchuan, the password setting input: 123456 It's just a nonsense. It's based on your actual account. I won't go on. It's just a little bit of positioning.

driver.find_element_by_name('fm-login-id').send_keys('chuanchuan')  # 输入账号

driver.find_element_by_name('fm-login-password').send_keys('123456')  # 输入密码

The effect is as follows:
insert image description here

3. Actual combat using proxy IP in selenium (2)

Note: Climbing with an agent requires a foreign environment. In order to demonstrate, I had to buy a foreign environment test, please see: foreign environment server
For example:

https://www.facebook.com/

Analysis account password login:
insert image description here
insert image description here
insert image description here

The code is:

from fake_useragent import UserAgent
import requests
from selenium import webdriver
from selenium.webdriver import ChromeOptions

headers = {
    
    'User-Agent': UserAgent().random}

api_url = '复制你的api'

res = requests.post(api_url, headers=headers, verify=True)
PROXY = res.text
print(PROXY)

ops = ChromeOptions()

ops.add_argument('--proxy-server=%s' % PROXY)  # 添加代理

driver = webdriver.Chrome(r'D:\360安全浏览器下载\chromedriver.exe', chrome_options=ops)

driver.get("https://m.facebook.com/")
driver.find_element_by_name('email').send_keys("你的账号")
driver.find_element_by_name('pass').send_keys('你的密码')
# 按钮
btnSubmit = driver.find_element_by_name('login')
btnSubmit.click()

The effect is as follows:
insert image description here

My account has been blocked, and the follow-up operation will not continue to demonstrate. I can operate it myself according to the selenium knowledge points I talked about. It is nothing more than a little bit of positioning, positioning, saving and saving.

Three, selenium single element positioning practice review

3.1 Fill in the positioning

Take Microsoft search engine as an example:

https://cn.bing.com/?mkt=zh-CN

analyze:

insert image description here

so:

from selenium import webdriver

driver = webdriver.Chrome(r'D:\360安全浏览器下载\chromedriver.exe')

driver.get('https://cn.bing.com/?mkt=zh-CN')

driver.find_element_by_name('q').send_keys('川川菜鸟')

As follows:
insert image description here
You can also do the following two ways:

driver.find_element_by_id('sb_form_q').send_keys('川川菜鸟')
driver.find_element_by_class_name('sb_form_q').send_keys('川川菜鸟')

The send_keys function is to fill in the information.

3.2 Click to search

Analysis: id or class
insert image description here

b=driver.find_element_by_id('search_icon')

ActionChains(driver).click(b).perform()

As follows:
insert image description here
The above is the id positioning, and the class positioning is also written:

b = driver.find_element_by_class_name('search')
ActionChains(driver).click(b).perform()

Interestingly, when locating the class is: search instead of the search icon tooltip. Personally, I think it may be because of this space. Fortunately, I have crawler experience, otherwise it would be wrong to be stuck in this position.

3.3 Complete code

Please replace the proxy api with yours and apply according to my method:

# coding=gbk
"""
作者:川川
公众号:玩转大数据
@时间  : 2022/3/3 17:11
群:428335755
"""
from selenium import webdriver
from selenium.webdriver import ActionChains

driver = webdriver.Chrome(r'D:\360安全浏览器下载\chromedriver.exe')

driver.get('https://cn.bing.com/?mkt=zh-CN')

# driver.find_element_by_name('q').send_keys('川川菜鸟')

# driver.find_element_by_id('sb_form_q').send_keys('川川菜鸟')

driver.find_element_by_class_name('sb_form_q').send_keys('川川菜鸟')

# b=driver.find_element_by_id('search_icon')
b = driver.find_element_by_class_name('search')
ActionChains(driver).click(b).perform()

Guess you like

Origin blog.csdn.net/weixin_46211269/article/details/123251070