1. Apply for proxy IP
If a user visits a website multiple times, it may be identified as a crawler, thus restricting the access of its client ip. For some more formal websites, the anti-crawling system is very strong, and this situation is most likely to occur, so Sometimes it is necessary to use a proxy IP, I generally choose to use a random dynamic proxy IP, which can ensure a random user instead of a fixed user each time I visit.
Not much to say, register IPIDEA to enter, and you will get free 100M traffic when you register. If you have special needs, you can buy it again:
http://www.ipidea.net/?utm-source=gejing&utm-keyword=?gejing
Generate API:
Click the generate link
Copy the link package and save it for later use.
2. Actual combat using proxy IP in selenium (1)
Set proxy base format:
import requests
proxies = {
'http': 'http://222.89.32.159:21079',
'https': 'http://222.89.32.159:21079'
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
}
res = requests.get(url=urls,headers=headers,proxies=proxies)
I thought about it for a long time and couldn't figure out which websites are anti-climbing strong, so I just found a website to test, you can try to visit your school's educational administration system, 12360, facebook, etc...
Destination URL:
https://www.taobao.com/
So positioning is easy:
driver.find_element_by_name('q')
I wrote about using an agent to play crawler once before, which is the requests module, but as an individual, the more I learn, I find that selenium is used more and more, and requests are gradually abandoned, so here is a selenium to add an agent.
The way is simple:
ops.add_argument('--proxy-server=http://%s' % a) #添加代理
Note that the format of a here is: ip:port
Note: Using the proxy ip requires the module selenium-wire to be installed:
pip install selenium-wire
You should be:
from seleniumwire import webdriver
instead of:
from selenium import webdriver
For example, in X treasure search: XX mobile phone
Full code:
from selenium import webdriver
from fake_useragent import UserAgent
from selenium.webdriver.chrome.options import Options
headers = {
'User-Agent': UserAgent().random}
ops = Options()
driver = webdriver.Chrome(r'D:\360安全浏览器下载\chromedriver.exe')
api_url = '让你复制的代理api链接'
driver.get(api_url)
a = driver.find_element_by_xpath('/html/body/pre').text # 获取代理
ops.add_argument('--proxy-server=http://%s' % a) #添加代理
driver.delete_all_cookies() #清楚cookies
driver.get('https://www.taobao.com/')
driver.find_element_by_name('q').send_keys('华为手机')
The next step is to click the button:
determine the place element to click, and then use click to click:
from selenium.webdriver import ActionChains
b= driver.find_element_by_class_name('search-button') #定位搜索
ActionChains(driver).click(b).perform()
Is it possible that the anti-pick mechanism was triggered? I need to log in, I don't know my X treasure account password, just enter it for demonstration... The rest is my own operation
Here is the account password analysis:
So the positioning account password is as follows, the account I set the input: chuanchuan, the password setting input: 123456 It's just a nonsense. It's based on your actual account. I won't go on. It's just a little bit of positioning.
driver.find_element_by_name('fm-login-id').send_keys('chuanchuan') # 输入账号
driver.find_element_by_name('fm-login-password').send_keys('123456') # 输入密码
The effect is as follows:
3. Actual combat using proxy IP in selenium (2)
Note: Climbing with an agent requires a foreign environment. In order to demonstrate, I had to buy a foreign environment test, please see: foreign environment server
For example:
https://www.facebook.com/
Analysis account password login:
The code is:
from fake_useragent import UserAgent
import requests
from selenium import webdriver
from selenium.webdriver import ChromeOptions
headers = {
'User-Agent': UserAgent().random}
api_url = '复制你的api'
res = requests.post(api_url, headers=headers, verify=True)
PROXY = res.text
print(PROXY)
ops = ChromeOptions()
ops.add_argument('--proxy-server=%s' % PROXY) # 添加代理
driver = webdriver.Chrome(r'D:\360安全浏览器下载\chromedriver.exe', chrome_options=ops)
driver.get("https://m.facebook.com/")
driver.find_element_by_name('email').send_keys("你的账号")
driver.find_element_by_name('pass').send_keys('你的密码')
# 按钮
btnSubmit = driver.find_element_by_name('login')
btnSubmit.click()
The effect is as follows:
My account has been blocked, and the follow-up operation will not continue to demonstrate. I can operate it myself according to the selenium knowledge points I talked about. It is nothing more than a little bit of positioning, positioning, saving and saving.
Three, selenium single element positioning practice review
3.1 Fill in the positioning
Take Microsoft search engine as an example:
https://cn.bing.com/?mkt=zh-CN
analyze:
so:
from selenium import webdriver
driver = webdriver.Chrome(r'D:\360安全浏览器下载\chromedriver.exe')
driver.get('https://cn.bing.com/?mkt=zh-CN')
driver.find_element_by_name('q').send_keys('川川菜鸟')
As follows:
You can also do the following two ways:
driver.find_element_by_id('sb_form_q').send_keys('川川菜鸟')
driver.find_element_by_class_name('sb_form_q').send_keys('川川菜鸟')
The send_keys function is to fill in the information.
3.2 Click to search
Analysis: id or class
b=driver.find_element_by_id('search_icon')
ActionChains(driver).click(b).perform()
As follows:
The above is the id positioning, and the class positioning is also written:
b = driver.find_element_by_class_name('search')
ActionChains(driver).click(b).perform()
Interestingly, when locating the class is: search instead of the search icon tooltip. Personally, I think it may be because of this space. Fortunately, I have crawler experience, otherwise it would be wrong to be stuck in this position.
3.3 Complete code
Please replace the proxy api with yours and apply according to my method:
# coding=gbk
"""
作者:川川
公众号:玩转大数据
@时间 : 2022/3/3 17:11
群:428335755
"""
from selenium import webdriver
from selenium.webdriver import ActionChains
driver = webdriver.Chrome(r'D:\360安全浏览器下载\chromedriver.exe')
driver.get('https://cn.bing.com/?mkt=zh-CN')
# driver.find_element_by_name('q').send_keys('川川菜鸟')
# driver.find_element_by_id('sb_form_q').send_keys('川川菜鸟')
driver.find_element_by_class_name('sb_form_q').send_keys('川川菜鸟')
# b=driver.find_element_by_id('search_icon')
b = driver.find_element_by_class_name('search')
ActionChains(driver).click(b).perform()