Talking about using Python for automated testing

What is Python ?

Insert picture description here

  • Python is an interpreted, object-oriented, high-level programming language with dynamic data types.
  • Python was invented by Guido van Rossum at the end of 1989, and the first public release was released in 1991.
  • Like the Perl language, the Python source code also follows the GPL (GNU General Public License) agreement.
  • It is officially announced that on January 1, 2020, the update of Python 2 will be stopped.
  • Python 2.7 was determined to be the last Python 2.x version.

Comparison of Hello World programs written in various languages

  • C++ version:
#include<iostream>
using namespace std;
int main(){
    
    
	cout << "Hello World!" << endl;
}
  • Java version:
package test;
    
public class HelloWord{
    
    
	public static void main(String[] args){
    
    
		System.out.println("Hello World!");
	}
}
  • Python version:
print("Hello World!")

You can see that Python has the least amount of code, and the purpose of Python is simplicity

How to use Python'slacking off'

  • At present, many office softwares on the market have many functions, but they are more or less criticized as "inflexible"

  • Nowadays, many Pythons on the market talk about the concept of automated office again, because the machine is best at repetition. However, in life, many of our tasks are repeated. Why not leave it to the machine?

  • The reason why Python can complete most of the daily work is due to the streamlined syntax of Python on the one hand, and the rich third-party libraries in Python. These libraries are very powerful and can even automatically help you complete simple PS jobs.

  • Today I mainly introduce crawlers and data ETL in Python

What is a crawler?

Insert picture description here

  • The technology of web crawlers did not appear recently, but became popular recently.
  • Reptile, don’t hear this name is unfamiliar. In fact, everyone has come into contact with it. It is Baidu that we come into contact with in our daily lives. It is actually a huge reptile.
  • Crawler, just like the meaning of its name, we can imagine the entire Internet as a huge spider web, on this spider web is covered with a lot of bugs (that is, the data resources we need), the crawler is just one on this web A crawling spider, through programming, we can let it find the data we want on this web, and then collect them for us.

We still need a little bit of front-end knowledge when we officially enter the world of crawlers

What happens when the browser browses a website?

  1. Initiate a request Initiate a request
    to the target site through the HTTP library, that is, send a Request, which can contain additional headers and other information, and wait for the server to respond. There are many types of requests (GET, POST, DELETE, UPDATE, PUT...) The most commonly used Or the first two
  2. Get response content
    If the server can respond normally, it will get a Response. The content of this Response is the content of the page to be obtained. The type may be HTML, Json, binary data (such as image data).
  3. Parse and display the content. After
    receiving the Response from the server, the front end will parse the returned body through js code, and then load the parsed data into the page for display according to the set rules

What is Request?

1. Request methods
mainly include GET and POST, in addition to HEAD, PUT, DELETE, etc.
Insert picture description here

2. Request URL
URL Uniform Resource Locator throughout the entire process, such as a web page document, a picture, a video, etc. can be uniquely determined by the URL.
Insert picture description here

3. The request header
contains the header information of the request, such as User-Agent, Host, Cookies and other information.
Insert picture description here

4. Request body The
additional data carried in the request is the data carried when the form is submitted, such as the form data when the form is submitted.
Insert picture description here

What does Response contain?

1. Response status
There are multiple response statuses, such as 200 for success, 301 jump, 404 page not found, and 502 server error
. 2. Response headers
such as content type, content length, server information, cookie setting, etc.
Insert picture description here

3. The main part of the response body
contains the content of the requested resource, such as web page HTML, image binary data, etc.
Insert picture description here

Start a simple crawler

Step1 Simulate HTTP request

  • In fact, the essence of the working mechanism of crawlers is to artificially simulate the process of HTTP requests through code.
  • This requirement can be easily accomplished in Python. Python3 has (requests, urllib3...) to achieve it, but it is recommended to use the former because it is highly encapsulated and more convenient to use.
# 导入requests 库
import requests

# 调用requests的get方法,对 www.baidu.com 发起一个 GET 请求
response = requests.get("http://www.baidu.com")

# 输出响应体信息
print("响应码:",response.status_code)
print("响应体:",response.text)

Step2 add request header

  • Because many websites now have anti-crawling processing, because too many crawling programs will cause additional pressure on other people's servers, so we have to pretend to be a browser to access, the principle is to construct User-Agent
# 导入requests 库
import requests

# 设置headers
my_headers={
    
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"}

# 调用requests的get方法,对 www.baidu.com 发起一个 GET 请求
response = requests.get("http://www.baidu.com",headers=my_headers)

# 输出响应体信息
print("响应码:",response.status_code)
print("响应体:",response.text)

After you run this code, you will find that the content of the response body returned this time is obviously more than the previous one. You should be the last time that Baidu's server has identified you as a robot visit, so you have blocked a lot of content to return to you Response body

data processing

  • The format of the data returned by the server is always unsatisfactory. We need to standardize the data by ourselves to extract the data we want.

JSON data processing

  • Often the data we get on the server side is returned in JSON format. It is very convenient to process JSON in Python. It does not need to be parsed in specific entities like JAVA. Python is a dynamically typed language and it is very convenient to parse JSON. , It can parse the JSON string into a dictionary in Python (dict)
# 导入处理JSON的库
import json

test_json_str = """
{
	"people": [{
		"school": "ttt",
		"name": "zs",
		"age": 15
	}, {
		"school": "du",
		"name": "ls",
		"age": 17
	}, {
		"school": "rw",
		"name": "ww",
		"age": 18
	}],
	"status": "OK"
}
"""

json_ = json.loads(test_json_str)

print("状态:",json_["status"])
print("人员: ",json_["people"])
print("第一个人的学校名:",json_["people"][0]["school"])

Structured data processing

  • For structured data python also has a very powerful library that can handle it, called pandas
import pandas as pd

data_json = {
    
    
"name":["张三","李四","王五"],
"age": [18,19,17],
"math":[120,112,99],
"english":[95,89,120],
"chinese":[110,102,113]
}

sc_df = pd.DataFrame(data_json)
# 选取英语没有及格的学生
print(sc_df[sc_df["english"] < 90])

Selenium a powerful web automation testing tool

What is Selenium?

Selenium is a tool for web application testing. Selenium tests run directly in the browser, just like a real user is operating. Supported browsers include IE (7, 8, 9, 10, 11), Mozilla Firefox, Safari, Google Chrome, Opera, etc. The main functions of this tool include: test compatibility with browsers-test your application to see if it can work well on different browsers and operating systems. Test system functions-create regression tests to verify software functions and user requirements.

Use python's Selenium library to operate the browser

step1 Open a web page

# 导入web驱动的库
from selenium import webdriver
# 创建一个Chrome浏览器的驱动对象
browser = webdriver.Chrome()
# 在浏览器中输入百度的地址,访问百度的网页
browser.get("http://www.baidu.com")

step2 Realize automatic search term

# 导入web驱动的库
from selenium import webdriver
# 创建一个Chrome浏览器的驱动对象
browser = webdriver.Chrome()
# 在浏览器中输入百度的地址,访问百度的网页
browser.get("http://www.baidu.com")
# 定位到输入搜索关键字的标签,并填入想要搜索的词条
browser.find_element_by_xpath('//*[@id="kw"]').send_keys("爬虫")
# 点击搜索按钮
browser.find_element_by_xpath('//*[@id="su"]').click()

step3 show the webpage source code

# 导入web驱动的库
from selenium import webdriver
# 创建一个Chrome浏览器的驱动对象
browser = webdriver.Chrome()
# 在浏览器中输入百度的地址,访问百度的网页
browser.get("http://www.baidu.com")
# 打印网页源码
print(browser.page_source)
```python
# 导入web驱动的库
from selenium import webdriver
# 创建一个Chrome浏览器的驱动对象
browser = webdriver.Chrome()
# 在浏览器中输入百度的地址,访问百度的网页
browser.get("http://www.baidu.com")
# 打印网页源码
print(browser.page_source)

Guess you like

Origin blog.csdn.net/qq_42359956/article/details/109265547