Novice programmers must learn advanced, Python common modules and usage summary (the content is dry, it is recommended to collect)

When many novice programmers learn python, they always feel headaches about python modules and libraries, because there are too many! It is impossible for anyone to remember the usage of each module. Therefore, I have summarized the commonly used modules and usage here, hoping to be helpful to everyone!

The details of the benefits at the end of the article (python learning dry goods summary) are at the end of previous articles: student party benefits! Learn python computer and configure a large inventory, games and learning are correct

Table of contents

1. OS module

2. Sys module

3. XPath module

4. re module

5. Parsel module

6. Urlparse module

7. Soket module

8. Threading module

9. Types module

10. Selenium module

11. Pygame module

12、numpy

13、pandas

14、Requests

15、Beautiful Soup


1. OS module

The OS module provides convenient access to operating system functions

OS common methods (part):

os.remove() deletes a file

os.unlink() deletes a file

os.rename() renames a file

os.listdir() lists all files in the specified directory

os.getcwd() Get the current file path

os.mkdir() Create a new directory

os.rmdir() delete empty directory (delete non-empty directory, use shutil.rmtree())

os.makedirs() Create multi-level directories

os.system() Execute operating system commands

os.execvp() starts a new process

os.execvp() Execute external program script (Uinx)


2. Sys module

The SYS module provides functions for accessing variables used or maintained by the interpreter and for interacting with the interpreter.

Simply put, os is responsible for the interaction between the program and the operating system, and provides the interface for the program to access the bottom layer of the operating system; sys is mainly responsible for the exchange between the program and the python parser, providing a series of functions and variables for controlling the operating environment of pyhton.

Sys common method:

sys.argv Command line parameter list, the first element is the path of the program itself

sys.modules.keys() returns a list of all imported modules

sys.exit(n) Exit the program, exit(0) when exiting normally

sys.version Get the version information of the Python interpreter

sys.platform returns the operating system platform name

sys.stdout standard output

sys.stdout.writelines() no newline output

sys.stdin standard input

sys.stdin.read() Input a line

sys.stderr error output

sys.executable Python interpreter path

sys.getwindowsversion() Get the version of Windows

3. XPath module

XPath, a language for finding information in XML documents, includes a standard library of functions. In short, xpath is a syntax for finding elements based on paths in xml documents.

xpath common terms:

Element: A tag in the document tree is an element.

Node: Indicates a certain position of the xml document tree, for example / represents the root node, which represents the starting position of the document tree, and elements can also be regarded as nodes at a certain position.

Attribute: lang in <title lang="eng">Harry Potter</title> is an attribute of a certain node.

Text: Harry Potter in <title lang="eng">Harry Potter</title> is the text.

4. re module

Regular expressions (English: Regular Expression, often abbreviated as regex, regexp or RE), also known as regular expressions, regular expressions, regular expressions, and regular expressions, are a concept in computer science. A regular expression uses a single string to describe and match a series of strings that meet certain syntax rules. In many text editors, regular expressions are usually used to retrieve and replace text that matches a certain pattern. In short, regular expressions use special characters to match specific text to extract data.

Re regular expression common syntax: (part)

'^' 匹配字符串开头

‘$’ 匹配结尾

'\d' 匹配数字,等于[0-9] re.findall('\d','电话:10086')结果['1', '0', '0', '8', '6']

'\D' 匹配非数字,等于[^0-9] re.findall('\D','电话:10086')结果['电', '话', ':']

'\w' 匹配字母和数字,等于[A-Za-z0-9] re.findall('\w','alex123,./;;;')结果['a', 'l', 'e', 'x', '1', '2', '3']

'\s' 匹配空白字符 re.findall('\s','3*ds \t\n')结果[' ', '\t', '\n']

'\S' 匹配非空白字符 re.findall('\s','3*ds \t\n')结果['3', '*', 'd', 's']

'\A' 匹配字符串开头

'\Z' 匹配字符串结尾

'\b' 匹配单词的词首和词尾,单词被定义为一个字母数字序列,因此词尾是用空白符或非字母数字符来表示的

'\B' 与\b相反,只在当前位置不在单词边界时匹配

5. Parsel module

The parser module is well known as a third-party library of python, its functions and functions are equivalent to the collection version of css selector, xpath and re. Compared with other parsing modules, such as BeautifulSoup, xpath, etc., parser is more efficient and easier to use.

import requests

import parsel response = requests.get(url)

sel = parsel.Selector(response.text)  #注意这里的S要大写 # re正则

# print(sel.re('正则匹配格式')) # xpath

# print(sel.xpath('xpath').getall()) #getall获取所有 # css选择器

# print(sel.css('css选择器 ::text').extract_first())#获取第一个

6. Urlparse module

The urlparse module is mainly used to parse the parameters in the url, and split or splice the url according to a certain format. urlparse This module has been renamed urllib.parse in python 3.0.

Urlparse, a third-party module, contains functions such as urljoin, urlsplit, urlunsplit, urlparse, etc.

For example, urlparse.urlparse:

Breaks down a URL into 6 pieces, returning a tuple including protocol, base address, relative address, etc.

import urlparse

url = urlparse.urlparse('http://blog.csdn.net/?ref=toolbar')

print url

The output is:

ParseResult(scheme='http', netloc='blog.csdn.net', path='/', params='', query='ref=toolbar', fragment='')

scheme is the protocol, netloc is the server address, path is the relative path, params is the parameter, and query is the query condition.

If you know the address of the server, you can use the address of the server as the base address to splice other relative paths to form a new URL.

7. Soket module

Socket, also known as 'socket, is used to describe the IP address and port, and is the end point of a communication.

Socket originated from Unix, and one of the basic philosophies of Unix/Linux is "everything is a file". For files, use the [open] [read and write] [close] mode to operate. Socket is an implementation of this mode, socket is a special file, and some socket functions are operations on it (read/write IO, open, close)

The difference between socket and file:

The file module is to [open] [read and write] [close] for a specified file

The socket module is to [open] [read and write] [close] for the server-side and client-side Sockets

8. Threading module

This module builds a higher-level threading interface on top of the lower-level thread module. See also the mutex and Queue modules.

Hreading provides a higher-level API than the thread module to provide thread concurrency. These threads run concurrently and share memory.

Let's look at the specific usage of the threading module:

The use of Thread, the target function can instantiate a Thread object, each Thread object represents a thread, and can start running through the start() method.

Here is a comparison between using multi-threaded concurrency and not applying multi-threaded concurrency:

The first is an operation that does not use multithreading:

code show as below:

#!/usr/bin/python

#compare for multi threads

import time

def worker():

    print "worker"

    time.sleep(1)

    return

if __name__ == "__main__":

    for i in xrange(5):

        worker()

The following operations are performed concurrently using multiple threads:

code show as below:

#!/usr/bin/python

import threading

import time

def worker():

    print "worker"

    time.sleep(1)

    return

for i in xrange(5):

    t = threading.Thread(target=worker)

    t.start()

It can be clearly seen that using multi-threaded concurrent operations takes much less time.

9. Types module

What are the types:

The types module contains various common data types in python, such as IntType (integer), FloatType (floating point) and so on.

Common usage of types:

# 100是整型吗?

>>> isinstance(100, types.IntType)

True

>>>type(100)

int

# 看下types的源码就会发现types.IntType就是int

>>> types.IntType is int

True


但有些类型并不是int这样简单的数据类型:

class Foo:

    def run(self):

        return None

def bark(self):

    print('barking')

a = Foo()

print(type(1))

print(type(Foo))

print(type(Foo.run))

print(type(Foo().run))

print(type(bark))

Output result:

<class 'int'>

<class 'type'>

<class 'function'>

<class 'method'>

<class 'function'>

10. Selenium module

Selenium was originally an automated testing tool, and it was used in crawlers mainly to solve the problem that requests cannot directly execute JavaScript code.

The essence of selenium is to drive the browser to completely simulate the browser's operations, such as jumping, input, click, drop-down, etc., to get the results after web page rendering, and can support multiple browsers.

from selenium import webdriver

browser=webdriver.Chrome()

browser=webdriver.Firefox()

browser=webdriver.PhantomJS()

browser=webdriver.Safari()

browser=webdriver.Edge()



from selenium import webdriver

from selenium.webdriver import ActionChains

from selenium.webdriver.common.by import By #按照什么方式查找,By.ID,By.CSS_SELECTOR

from selenium.webdriver.common.keys import Keys #键盘按键操作

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.wait import WebDriverWait #等待页面加载某些元素


browser=webdriver.Chrome()


try:

    browser.get('https://www.baidu.com')


    input_tag=browser.find_element_by_id('kw')

    input_tag.send_keys('美女') #python2中输入中文错误,字符串前加个u

    input_tag.send_keys(Keys.ENTER) #输入回车


    wait=WebDriverWait(browser,10)

    wait.until(EC.presence_of_element_located((By.ID,'content_left'))) #等到id为content_left的元素加载完毕,最多等10秒


    print(browser.page_source)

    print(browser.current_url)

    print(browser.get_cookies())


finally:

    browser.close()

11. Pygame module

Is a simple game development function library

To develop games in python, the pygame module is usually used.

Overview of pygame modules:

module

effect

cdrom

Manage cdrom devices and audio playback

cursors

Load cursor images, including standard cursors

display

control display window or screen

draw

Draw simple shapes on the surface

event

Manage events and event queues

font

Create and render Truetype fonts

image

Save and load images

joystick

Management joystick device

key

manage keyboard

mouse

manage mouse

movie

mpeg movie playback

sndarray

digitally processed sound

surfarray

digitally processed images

time

control time

transform

Scale, rotate and flip images

12、numpy

NumPy (Numerical Python) is an extended program library for Python that supports a large number of dimensional arrays and matrix operations, and also provides a large number of mathematical function libraries for array operations. Nupmy can be used to store and process large matrices, which is much more efficient than Python's own nested list structure (which can also be used to represent matrices). It is said that NumPy turns the Python equivalent into a more powerful MatLab system for free.

NumPy is a very fast mathematical library, mainly used for array calculations, including:

A powerful N-dimensional array object ndarray

broadcast function

Tools for integrating C/C++/Fortran code

Linear algebra, Fourier transform, random number generation and other functions

One of the most important objects of NumPy is its N-dimensional array object ndarray, which is a collection of data of the same type, and the items in the collection can be accessed using 0-based indexing.

The ndarray object is a multidimensional array used to store elements of the same type. Each element in an ndarray uses the same sized chunk in memory. Each element in the ndarray is an object of data type object (called dtype)

numpy.array( object ,  dtype = None , ndmin = 0 ,copy = True , order = None ,  subok = False )

 Generally, only object, dtype and ndmin parameters are commonly used, and other parameters are not commonly used.

13、pandas

pandas is a NumPy-based tool created to solve data analysis tasks. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to efficiently manipulate large datasets. pandas provides a large number of functions and methods that allow us to process data quickly and easily. As you'll soon discover, it's one of the things that makes Python a powerful and efficient data analysis environment.

Common data types:

 Series (one-dimensional array, similar to one-dimensional array in Numpy. The two are also very similar to Python's basic data structure List. Series can now store different data types, strings, boolean values, numbers, etc. can be stored in Series middle.)

 DataFrame (two-dimensional tabular data structure. Many functions are similar to data.frame in R. DataFrame can be understood as a container of Series.)

Panel (a three-dimensional array, which can be understood as a DataFrame container.) …

14、Requests

requests is an HTTP library licensed under the Apache2 license. Written in python, it is more concise than the urllib2 module.

Request supports HTTP connection retention and connection pooling, session retention using cookies, file uploading, encoding of automatic response content, and automatic encoding of internationalized URL and POST data.

A high degree of encapsulation is carried out on the basis of python's built-in modules, so that python becomes humanized when making network requests. Using Requests can easily complete any operation that the browser can have. It is a common module for crawlers!

Corresponding to different request types of http, the requests library has different methods:

1.requests.get():

The main method of getting HTML web pages, corresponding to HTTP's GET

2.requests.post():

A method of submitting a POST request to an HTML web page, corresponding to HTTP POST

3.requests.head():

The method to obtain the header information of HTML web pages, corresponding to the HEAD of HTTP

4.requests.put():

Submit a PUT request to an HTML page, corresponding to HTTP's PUT

5.requests.patch():

Submit partial modification requests to HTML web pages, corresponding to HTTP PATCH

6.requests.delete():

Submit a delete request to the HTML page, corresponding to HTTP DELETE

15、BeautifulSoup

HTML and XML parsing library, BeautifulSoup is a library of Python, the main function is to crawl the data we need from the web. BeautifulSoup parses html into objects for processing, and converts all pages into dictionaries or arrays. Compared with regular expressions, the processing process can be greatly simplified.

Basic usage:

from bs4 import BeautifulSoup

import requests, re


req_obj = requests.get('https://www.baidu.com')

soup = BeautifulSoup(req_obj.text, 'lxml')


'''标签查找'''

print(soup.title)  # 只是查找出第一个

print(soup.find('title'))  # 效果和上面一样

print(soup.find_all('div'))  # 查出所有的div标签


'''获取标签里的属性'''

tag = soup.div

print(tag['class'])  # 多属性的话,会返回一个列表

print(tag['id'])  # 查找标签的id属性

print(tag.attrs)  # 查找标签所有的属性,返回一个字典(属性名:属性值)


'''标签包的字符串'''

tag = soup.title

print(tag.string)  # 获取标签里的字符串

tag.string.replace_with("哈哈")  # 字符串不能直接编辑,可以替换


'''子节点的操作'''

tag = soup.head

print(tag.title)  # 获取head标签后再获取它包含的子标签


'''contents 和 .children'''

tag = soup.body

print(tag.contents)  # 将标签的子节点以列表返回

print([child for child in tag.children])  # 输出和上面一样


'''descendants'''

tag = soup.body

[print(child_tag) for child_tag in tag.descendants]  # 获取所有子节点和子子节点


'''strings和.stripped_strings'''

tag = soup.body

[print(str) for str in tag.strings]  # 输出所有所有文本内容

[print(str) for str in tag.stripped_strings]  # 输出所有所有文本内容,去除空格或空行


'''.parent和.parents'''

tag = soup.title

print(tag.parent)  # 输出便签的父标签

[print(parent) for parent in tag.parents]  # 输出所有的父标签



'''.next_siblings 和 .previous_siblings

    查出所有的兄弟节点

'''

'''.next_element 和 .previous_element

    下一个兄弟节点

'''

'''find_all的keyword 参数'''

soup.find_all(id='link2')  # 查找所有包含 id 属性的标签

soup.find_all(href=re.compile("elsie"))  # href 参数,Beautiful Soup会搜索每个标签的href属性:

soup.find_all(id=True)  # 找出所有的有id属性的标签

soup.find_all(href=re.compile("elsie"), id='link1')  # 也可以组合查找

soup.find_all(attrs={"属性名": "属性值"})  # 也可以通过字典的方式查找

In order to thank the readers, I would like to share with you some of my recent favorite programming dry goods, to give back to every reader, and hope to help you. For details of dry goods, please refer to previous articles~

  | Student party benefits! Learning python computer and configuration inventory, games and learning are correct |

Guess you like

Origin blog.csdn.net/Modeler_xiaoyu/article/details/119110957