[Python crawler development basics ①] Python basics (variables and their naming conventions)

Since the articles in this column are biased towards crawlers, it is impossible to cover everything about python, so here we only focus on the key points.
If you feel that there is something that has not been mentioned, welcome to add~



1 Python variable type and its detailed explanation

Variables are indispensable for every programming language. Compared with C language or Java, python's variable definition is relatively simple. There is no need to declare the type of variable in front of the variable name, just write variable name = variable value. , the python interpreter will automatically recognize and set the variable name as the variable type.

There are different types of variables in every language, and python is no exception. It provides 6 data types, namely numbers (Number), strings (String), lists (List), tuples (Tuple), sets ( Set), Dictionary (Dictionary), let's introduce it in detail:

1.1 Numbers

It supports 4 types, which are integer, floating point, complex, Boolean

  • The definition of integer is very simple, which is equivalent to the int type of C language. The definition method is as follows:
a = 10	# 定义了一个整型变量a,其值为10
  • Floating-point numbers are also decimals, but unlike C or Java, python does not need to specify whether it is a long integer or a short integer in advance. The definition method is as follows
b = 3.1415926
  • Complex numbers are rarely used. They are divided into real and imaginary parts. In python, the real part is represented by a number, and the imaginary part is represented by a number + j. The definition method is as follows:
c = 3+4j	# 定义了一个复数c,实部为3,虚部为4
  • Boolean type is used to judge true and false, only True and False two values, the definition method is as follows:
d = True	# 定义了一个布尔类型的变量d,取值为“真”

1.2 sequence

1.2.1 Strings

String is more convenient to define and use in python. It is a data type that describes text and can consist of any number of characters. Its definition and use need to follow the following rules:

  • In Python strings, you can use single quotes or double quotes to define. For example:
str1 = 'I am a student.'
str2 = "I am a student too."
  • If a string contains a single quote, it should be defined with double quotes, for example:
str3 = "It's a big city!"
  • When a string contains double quotes, use single quotes to define, for example:
str4 = 'When I was doing homework, "I love you!", she said.'
  • A backslash (\) means an "escape character" in a python string, for example:
str5 = 'It\'s a big city!'	# 输出It's a big city!
  • rFor escape characters, raw strings can also be used by adding letters before the first quote , for example:
str6 = r'D:\nBaiduNetdiskDownload'	# 输出D:\nBaiduNetdiskDownload
  • A multi-line string, which can also be used as a multi-line comment, is created using three pairs of single quotes
'''
这是一个多行字符串
这是一个多行字符串
'''

1.2.2 List

In fact, lists, tuples, sets, and dictionaries are all python data containers. Let’s introduce one of them—the list (List). The
list is equivalent to the array in C language. Its definition is simpler, and python also provides various operation methods for list objects. The python list specification is as follows:

  • Each data in the list is called an element
  • Marked by []
  • Each element in the list is separated by a comma
  • In a list, single, multiple or nested data can be defined

There are also many ways to create lists, for example:

  • Place element definition lists directly within square brackets
list1 = [1, 2, 3, 4]
list2 = ['IT', 123, True]
list3 = [[1, 2, 3], [4, 5, 6]]
list4 = []	# 也可直接定义空链表

As a kind of container, list has more requirements in application, for example, the selection method of elements determines the efficiency of code writing. So python provides a variety of access methods, and the following tuples, sets, and dictionaries are no exception.

  • First of all, like C and Java, python can of course be accessed through subscripts, so I won’t say much here. Secondly, containers such as lists are widely used in for loops, and the usage methods are as follows:
list5 = [1, 2, 3, 4, 5, 6]
for num in list5:
	print(num)

1.2.3 Tuples

Unlike lists, tuples cannot be modified once defined, but tuples, like lists, can encapsulate multiple elements of different types. Therefore, when we need to encapsulate data in the program and do not want the encapsulated data to be tampered with, then tuples are very suitable.
Parentheses are required to define a tuple, and commas are used to separate each data. At the same time, the data can be of different data types.

  • Tuples are also created in a similar way to lists:
t1 = (1, 2, 3, 4)
t2 = ('IT', 123, True)
t3 = ((1, 2, 3), (4, 5, 6))

Python also provides multiple methods for tuples to implement related operations, such as index() is used to find a certain data, if the data exists, it returns the corresponding subscript, otherwise an error is reported; count() is used to count a certain data in the current The number of times the tuple appears; len() is used to count the number of elements in the tuple, etc.

1.2.4 Slicing operation of sequence

First, what is a sequence?

Sequence refers to a type of data container whose content is continuous and ordered, and which can be indexed by subscripts .
Lists, tuples, and strings can all be viewed as sequences.

What is a slice?

Slicing refers to taking out a subsequence from a sequence

In python, the syntax used for slicing is: sequence [start subscript: end subscript: step size] , which means that from the sequence, starting from the specified position, taking out elements in sequence, and ending at the specified position, a new sequence is obtained .

Slicing operations need to pay attention to:

  1. The start subscript indicates where to start, it can be left blank, and it will be regarded as starting from the beginning
  2. The end subscript ( not included ) indicates where to end, it can be left blank, it will be regarded as intercepted to the end
  3. The step size indicates that the interval of elements is taken in turn

For the step size, there are many programming tricks:

  • Step size 1 means, take elements one by one
  • A step size of 2 means that one element is skipped each time
  • The step size N means that N-1 elements are skipped each time to take
  • A negative step size means reverse fetching (note that the start subscript and end subscript should also be marked in reverse )

Demonstration of sequence slice operation (forward operation):

my_list = [1, 2, 3, 4, 5]
new_list = my_list[1:4]	# 下标1开始,下标4(不含)结束,步长1
print(new_list)		# 结果:[2, 3, 4]

my_tuple = (1, 2, 3, 4, 5)
new_tuple = my_tuple[:]	# 从头开始,到最后结束,步长1
print(new_tuple)		# 结果:(1, 2, 3, 4, 5)

my_list = [1, 2, 3, 4, 5]
new_list = my_list[::2]		# 从头开始,到最后结束,步长2
print(new_list)		# 结果:[1, 3, 5]

my_str = "12345"
new_str = my_str[:4:2]	# 从头开始,到下标4(不含)结束,步长2
print(new_str)		# 结果:"13"

Demonstration of sequence slicing operation (reverse operation):

my_str = "12345"
new_str = my_str[::-1]	# 从头(最后)开始,到尾结束,步长-1(倒序)
print(new_str)		# 结果:"54321"

my_list = [1, 2, 3, 4, 5]
new_list = my_list[3:1:-1]	# 从下标3开始,到下标1(不含)结束,步长-1(倒序)
print(new_list)		# 结果:[4, 3]

my_tuple = (1, 2, 3, 4, 5)
new_tuple = my_tuple[:1:-2] 	# 从头(最后)开始,到下标1(不含)结束,步长-2(倒序)
print(new_tuple)		# 结果:(5, 3)

1.3 Collection

Similar to the above three, collection is also a data container provided by python, but it has a special function, that is, deduplication . Whether it is a list, tuple or string, the same element is allowed to appear repeatedly, but for a collection, when the element to be inserted has already appeared in it, it cannot be inserted repeatedly .

Sets can be created using curly braces { } or the set() function

Note : To create an empty collection, you must use set() instead of { }, because { } is used to create an empty dictionary.

  • Demonstration of deduplication function:
s1 = {
    
    1, 2, 3, 4, 4, 3, 2, 1}
print(s1)		# {1, 2, 3, 4}
  • Collections store elements unordered:
a = set('abcdefg')
b = set('abc')
print(a)	# {'f', 'd', 'e', 'b', 'g', 'c', 'a'}
print(b)	# {'b', 'c', 'a'}
  • Some set operations can be performed between sets, such as intersection, merge, and complement
print(a - b)	# {'d', 'g', 'f', 'e'}	
print(a ^ b)	# {'e', 'f', 'd', 'g'}
print(a & b)	# {'b', 'c', 'a'}
print(a | b)	# {'b', 'e', 'f', 'd', 'g', 'c', 'a'}

1.4 Dictionaries

In the crawler operation, many parts of the corresponding response body are composed of dictionaries, so it is particularly important in the development of crawlers. At the same time, it can also be converted to and from json objects, which will be introduced in a later article.

Dictionaries, like collections, are represented by {}, but the difference is that each element of the dictionary has two values, separated by a colon, the former is called the key, and the latter is called the value. The syntax is as follows:

  • Use { } to store primitives, each element is a key-value pair
  • Each key-value pair contains Key and Value (separated by colons)
  • Separate key-value pairs with commas
  • Key and Value can be any type of data ( key cannot be a dictionary )
  • The key cannot be repeated , and the repetition will overwrite the original data

Dictionary definitions can use the following forms:

dict = {
    
    "nane": "张三", "age": 18, "sex": "男"}
dict1 = {
    
    }	# 定义了一个空字典

At the same time, dictionaries can also be nested, but the key cannot be a dictionary type , for example:

stuScore = {
    
    "张三" : {
    
    "语文" : 89, "数学" : 78, "英语" : 77}, "李四" : {
    
    "语文" : 69, "数学" : 94, "英语" : 79}}

When you need to get the content in the dictionary , you can use the value in the key to get the value, for example:

print(stuScore["张三"])	# {'语文': 89, '数学': 78, '英语': 77}

In addition, you can also add, delete, and modify the dictionary :

dict = {
    
    "nane": "张三", "age": 18, "sex": "男"}	# 定义了一个空字典
dict["score"] = 100	# 增加元素
del dict["name"]	# 删除元素
dict["name"]="老六"	# 修改元素

For dictionaries, the operations provided by python are:

  • len (dictionary): used to return the number of key-value pairs in the dictionary
  • Dictionary.pop(Key): Used to take out the Value corresponding to the Key and delete the key-value pair of this Key in the dictionary
  • Dictionary.clear(): Clear the dictionary
  • Dictionary.keys(): Get all the keys of the dictionary , which can be used to loop through the dictionary

Among them, the keys, values, and items methods can be used to traverse the dictionary :

# 1.使用keys
for i in dict.keys():
    print(i)
# 2.使用values
for i in dict.values():
    print(i)
# 3.使用items
for i in dict.items():
    print(i)
# 4.可使用items获取key和value
for key,value in dict.items():
    print(key,value)

1.5 Comparison of Data Containers

the list tuple string gather dictionary
element type arbitrarily arbitrarily characters only arbitrarily Key is arbitrary except for dictionaries, and Value is arbitrary
subscript index support support support not support not support
repeating element support support support not support not support
Modifiability support not support not support support support
data order yes yes yes no no
scenes to be used A batch of data records that can be modified and repeated Unmodifiable, repeatable batch of data records A string of records non-repeatable data records Retrieve the data record of Value by Key

2 Python variable naming convention

First of all, the naming rules of variables are the same as C, Java and other languages:

  1. Can only consist of letters, numbers, and underscores
  2. It cannot start with a number , for example, 1s is the wrong variable name
  3. Cannot be python built-in keywords , such as for, whileetc.
  4. Strictly case-sensitive, for example Stuand stuis not a variable

Secondly, the naming habits of variables generally use the camel case naming method, that is, except the first letter of the first word is lowercase, the first letter of the rest of the words is capitalized, such aspageRank

Finally, one more thing to note is that the naming of variables should be well-known in the text. For example, the variable name of name should be written as nameor my_nameetc. If it is written a, b, clike this, it will affect the development of the project.

Guess you like

Origin blog.csdn.net/z135733/article/details/131068353