2020 Python data analysis study notes function explanation (3)

table of Contents

 

1. Function writing

2. JSON file analysis

3. Processing of strings

(1) Escape character

(2) String formatting

(3) String method

4. Python advanced functions (commonly used in data analysis)

(1) Lambda function (anonymous function)

(2) map function

(3) Reduce function

(4) filter function

5. Introduction to commonly used libraries for Python data analysis

(1)Numpy

(2)Pandas

(3) Matplotlib (drawing library)

(4) Introduction to other related libraries


1. Function writing

Function definition : gather some statements together so that they can be run repeatedly in the program.

The significance of using functions : improve programming efficiency and avoid a lot of repetitive work.

Built-in functions : functions that can be called directly.

Functions related to third-party modules : program segments written by oneself according to certain specifications.

User-defined function : program segment written by yourself according to certain specifications.

int(10)   # 整数10
str(10)   # 字符串函数

a = list((1, 2, 3, 4))  # 创建列表
max(a)     # 计算最大值
min(a)     # 计算最小值
round(2.643335345,2)    # 四舍五入函数
type(a)      # 确定类型
len(a)       # 返回数据中的元素个数
isinstance(a,list)   # 判断结构函数,返回值为True或者False
# 枚举函数
a = ['student','teacher','parents']
for i , j in enumerate(a):     # 枚举
    print(i, j)
# >>>0 student
# >>>1 teacher
# >>>2 parents
# zip()函数,组合函数,将传入每个序列的相同位置的函数做一个组合返回成一个元组
list1 = ['a', 'b', 'c', 'd']
list2 = [1, 2, 3, 4]
print(list(zip(list1, list2)))
# >>>[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
import math    # 第三方模块函数

print(math.floor(4.8))  # 向下取整
# >>> 4
print(math.ceil(4.8))   # 向上取整
# >>> 5

import numpy as np

print(np.min([1, 5, 3, 9]))     # 返回最小值
# >>> 1
print(np.argmax([1, 5, 3, 9]))   # 返回最大值所对应的位置
# >>> 3

Custom function:

1. Start with a def statement, function name and parentheses;

2. Function represents the name of the function;

3. Put any incoming parameters in parentheses and end with a colon ;

4. After the colon is indented, write the function;

5. Generally, the function ends with a return expression, and express means the value to be returned.

def function(part1, part2,....):
    suite函数主体
    return expression

de f<函数名>  (参数列表):
     <函数语句>
     return <返回值>
# 其中参数和返回值不是必须的
def ListSum(L):
    result = 0
    for i in L:
        result = result + i
    return result

List = [ 1, 2, 3,4 ]
Sum = ListSum( List )
print( Sum )

# >>> 10
def Cube( a, b, c):
    if a == None:
        a = 1
    if b == None:
        b = 2
    if c == None:
        c = 3
    return a * b + c

print(Cube(None, None ,4))   # 加入判断

# >>> 6

 

2. JSON file analysis

Definition: Json is a JavaScript object notation. The json format is a lightweight text data exchange format that has the advantages of small storage space and fast processing speed.

Data structure: Json is essentially a nested dictionary format, but the value corresponding to the key is often more complicated, not only numbers, but also strings, arrays, lists, etc.

# 文件读取
import json
with open('文件名.json', mode = 'r', encoding = 'utf-8') as f:     # 打开json文件
    f_read = f.read()     # 读取数据文件

data = json.loads(f_read)   # 将读取的字符串解析成json格式


# 文件写入,将json数据保存为json格式
import json
with open('文件名.json', mode = 'w', encoding = 'utf-8') as f:
    json.dump(json_data,f,indent=0)
f.close()

 

3. Processing of strings

String learning website: https://www.runoob.com/python/python-strings.html

(1) Escape character

print("人生苦短我用python。")
# >>> 人生苦短我用python。
print("人生苦短\n我用python。")    # \n换行
# >>>人生苦短
# >>>我用python。
print(r"人生苦短\n我用python。")   # r——防止被转义
# >>> 人生苦短\n我用python。

(2) String formatting

Formatted output:

# 格式化输出实例说明
# 1、打印字符串——%s
print("My name is %s ." % "Tom")
# >>> My name is Tom .
 
# 2、打印整数——%d
print("My age is %d ." % 2)
# >>> My age is 20 .
 
# 3、打印浮点数——%f
print("My height is %f m" % 1.8)
# >>> My height is 1.800000 m
 
# 4.打印浮点数(指定保留小数点位数)——%.2f
print("My height is %.2f m" % 1.8)
# >>> My height is 1.80 m
# 举例说明
class teacher(object):
    def speak(self):  # 定义方法
        print("%s 说:同学们,还有%s天你们就要毕业了!" % (self.name, self.day))
 
 
T = teacher()  # 转换为实例对象
T.name = "张老师"  # 给对象添加属性
T.day = "300"
 
T.speak()
# >>>张老师 说:同学们,还有300天你们就要毕业了!
print('我今年{:.2f}岁,目前在{}上学。'.format(20,'广州大学'))   # .format()方法
# >>> 我今年20.00岁,目前在广州大学上学。

(3) String method

String learning website: https://www.runoob.com/python/python-strings.html

4. Python advanced functions (commonly used in data analysis)

(1) Lambda function (anonymous function)

a No need to use def to define functions;

b. No specific name;

c. Use lambda to define functions;

# lambda定义一个参数
y = lambda x: x**2
print(y(3))
# >>> 9

f = lambda x: 'A' if x == 1 else 'B'
print(f(50))
# >>> B

# lambda定义两个参数
t = lambda x, y: x + y*2
print(t(3,4))
# >>> 11

# 定义三个参数
k = lambda x, y, z: 2*y + 3*y +z
print(k(2,6,4))
# >>> 34

(2) map function

Definition : The map() method maps a function to each element of the sequence to generate a new sequence, including all function return values. Each element in the sequence is treated as an x ​​variable and placed in a function f(x). The result is a new sequence composed of (x1), f(x2), f(x3)...

# 调用方法
map(function,list_input)
function # 代表函数
list_input  # 代表输入序列

Example code display:

# 方法一
items = [1, 2, 3, 4, 5, 6, 7]
def f(x):
    return x**2
s = list(map(f, items))
print(s)
# >>> [1, 4, 9, 16, 25, 36, 49]


# 方法二
items = [1,2,3,4,5,6,7]
a = list(map(lambda x: x**2,items))
print(a)

(3) Reduce function

Definition: In the process of selecting a sequence, first pass the first two elements (only two) to the function. After the function is processed, then pass the result and the third element as two parameters to the function parameter. The result obtained after processing and the fourth element are passed to the function parameter as two parameters, and so on.

# 调用方法
reduce(function,iterable)
function:代表函数
iterable:序列

Cumulative summation example code display:

# 导入reduce
from functools import reduce
# 定义函数
def f(x, y):
    return x + y

# 定义序列,含1~10的元素
items = range(1, 11)
# items = [1,2,3,4,5,6,7,8,9,10]

result = reduce(f, items)
print(result)

# >>> 55

Concatenation of strings:

from functools import reduce    # 导入reduce

def str_add(x, y):              # 定义函数
    return x +y

items = ['c','h','i','n','a']   # 定义序列
result = reduce(str_add, items)
print(result)

# >>> china

(4) filter function

a. Used to filter the sequence, filter out the elements that do not meet the conditions, and return the sequence composed of the elements that meet the conditions.
b. Apply function to each item of sequnce in turn, that is, function (item), and form a List/String/Tuple with iterm whose return value is True (depending on the type of sequnce)

c. python3 unified return iterator

Find even code display:

a = list(filter(lambda x: x % 2 == 0, range(21)))    # 直接调用,不需要导入
print(a)

# >>> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

Find examples of integer codes:

# 方法一
items = [1,2,3,4,5,'python','3.1415',-7]
a = list(filter(lambda x: 1 if isinstance(x, int) else 0 ,items))    # isinstance,对数据结构进行判断
print(a)

# >>> [1, 2, 3, 4, 5, -7]


# 方法二
items = [1,2,3,4,5,'python','3.1415',-7]
def int_num(x):
    if isinstance(x, int):
        return True
    else:
        return False
a = list(filter(int_num, items))
print(a)

# >>> [1, 2, 3, 4, 5, -7

5. Introduction to commonly used libraries for Python data analysis

(1)Numpy

Numpy is a basic module of data science calculations for numerical calculations. Based on array operations , it has high efficiency, has many advanced functions, can efficiently process data, and can perform linear algebra related operations.

(2)Pandas

Introduction: Pandas is specially used for data processing and analysis. It is very efficient and concise to use. It has many complex functions. It is very efficient and convenient to use. It is one of the most widely used libraries in the field of data analysis. Pandas application field It is very extensive, including financial, e-commerce, and scientific research in universities. Pandas is powerful, supports data processing similar to SQL, and has a wealth of data processing functions, supports time series analysis, etc.

Application: Pandas has a wide range of applications, including finance, e-commerce, and scientific research in universities. Pandas is powerful, supports data processing similar to SQL, and has a wealth of data processing functions, supports time series analysis, etc.

import  pandas as pd

s = pd.Series([1,2,3],index = ['a','b','c'])
print(s)

# >>> a    1
# >>> b    2
# >>> c    3
# >>> dtype: int64
# 二维数组创建
import pandas as pd

data = pd.DataFrame([[1,2,3],[4,5,6]],columns=['a','b','c'])
print(data)

'''输出结果:
   a  b  c
0  1  2  3
1  4  5  6
'''
import pandas as pd

data = pd.read_excel('读取的文件名.xlsx')     # 读取excel文件
print(data.head(5))     # 读取前五行数据

(3) Matplotlib (drawing library)

matplotlib: The basic module for data visualization.

# 创建一个等差数列
import matplotlib.pyplot as plt    # pyplot函数是绘图时主要的函数
import numpy as np

x = np.linspace(0,10,1000)
y = np.sin(x)

plt.xlabel('Times')              # 设置X轴标题
plt.ylabel('vol')                # 设置Y轴标题
plt.title('this is a title.')   # 设置图表标题
plt.legend(loc='left')          # 设置图例的位置
plt.plot(x, y, label='y=sinx', color='red', linewidth=2)   # 设置图例
plt.show()        # 展示图表

(4) Introduction to other related libraries

scikit-learn: A dedicated library for machine learning, which provides a complete machine learning toolbox;

seaborn: used to draw more refined graphics;

scipy: numerical analysis, such as linear algebra, integration and interpolation, etc.;

statsmodels: often used for statistical modeling analysis.

Guess you like

Origin blog.csdn.net/weixin_44940488/article/details/106431364