Python advanced
1. Modules and packages 2. random random number 3. datetime date module 4. pandas and DataFrame |
1. Modules and packages
1.1 Module: It is a file ending with .py. Functions, classes, variables, and executable code can be defined in the module. Python modules are divided into: built-in modules and third-party modules
(1) Module installation: pip3 install module name 1 -i domestic mirror address (or set à python interpreter in pycharm ) Huawei mirror source https://mirrors.huaweicloud.com/ Alibaba Cloud http://mirrors.aliyun.com/pypi/simple/ University of Science and Technology of China http://pypi.mirrors.ustc.edu.cn/simple/ Tsinghua University https://pypi.tuna.tsinghua.edu.cn/simple/ Zhejiang University Open Source Mirror Station http://mirrors.zju.edu.cn/ Tencent open source mirror station http://mirrors.cloud.tencent.com/pypi/simple Douban http://pypi.douban.com/simple/ NetEase open source mirror site http://mirrors.163.com/ Sohu open source mirror http://mirrors.sohu.com/ (2) Import all functions in the module: import module name 1 as alias, module name 2 as alias 2 Some functions: from module name import function name 1 as alias 1, function name 2 as alias 2 (3) Usage after importing the module: module name/alias. function/variable name |
1.2 Package: The package contains multiple modules, and multiple modules with similar or related functions can be managed together. |
1.3 Common methods of os module: (1) Get the current working path: getcwd()
(2) Get the list of files and folders under the current path: listdir() or walk()
(3) Determine whether the file under the path exists: .path.exists ("path") (4) Create a folder: mkdir("path") creates a single folder, which cannot be created when the file exists, makedirs () recursively creates multiple nested folders
(5) Delete folder: rmdir ("path"), only empty folders can be deleted (6) Path splicing: os.path.join("Path 1", "Path 2") (7) Split path: os.path.split (path) into absolute path and file name (8) Get the absolute path: os.path.dirname(path) (8) Get the file name: os.path.basename(path) (9) Determine whether the path is a folder: os.path.isdir (path) Determine whether the path is a file: os.path.isfile(path) (10) Return the path separator under the current operating system: os.path.sep (11) Check the file byte size: os.path.getsize(path)
(12) View modules or methods under the current module: dir() |
2.random random number
2.1 Commonly used methods: (1) random.random(): randomly generates a floating point number from 0-1 (2) random.randint(a,b): Randomly generate a random integer between a and b (3)random.uniform(a,b):随机生成a,b之间的浮点数 (3)random.choice(序列):随机从序列中生成一个随机数 (4)random.sample(序列,长度):随机从序列中获得指定长度的随机数 (5)random.shuffle(序列):打乱 |
3.datetime日期模块
3.1 date (1)获取当前日期:datetime.date.today(),获取年,月,日
|
3.2 time 时间转换 |
3.3 datetime:now()获得程序当前时间,today()获得今天时间 |
3.4 timedelta |
3.5 tzinfo虚拟基 |
4.pandas和DataFrame
4.1 series:是pandas中一维的、可变长度的、有序的、带标签的数组 (1)series的生成:通过列表、字典、标量等等可迭代对象生成(可指定索引)
(2)series矢量计算:
numpy.nan 为空值:任意计算都为空,需做处理为0后计算
索引对齐:索引相同的可以直接计算,索引不同的自动补齐NaN值计算出空值 (3)位置索引(通过iloc[]取值)和标签索引(通过loc[]取值):
(4)切片: 通过位置索引切片:series名.iloc[start : stop : step](左闭右开) 或通过标签索引切片:series名.loc[start : stop : step](左闭右闭) (5)添加或修改值:有该索引则修改,无则新增(位置索引只能修改,不能用此方法新增)
(6)删除元素(默认inplace=False,删除则为原地操作True):原地操作即在原件中操作,False则为复制一份再删除
(7)其他常用函数:add(),sub(),div(),mul(),isnull(),notnull(),sum(),max()........ replace():替换操作 contains():查看每个元素是否包含某个字符,每次判断返回一个bool值 |
4.2 DataFrame:一种数据类型(可以变相的理解为多个sseries组合而成) (1)创建方式: 通过列表创建
通过字典创建: 使用series转换成DataFrame:.to_frame() (2)读取表格:read_excel或read_csv对应读取不同格式的表格 (3)写出表格:写出为csv文件或其他类型的文件to_csv、to_excel......... (4)更改表格中字段数据类型: df = pandas.read_csv(r"E:\desktop\1_copy.csv",usecols=["列名"]) df["列名"] = df["列名"].astype(float) (5)查看行数df.index: 列名list(df): 查看具体信息df.info(): 读取前n行,默认为5行df.head(): 读取后n行,默认为5行df.tail(): 查看数据形状df.shape: 随机抽取数据中n行,默认抽取一行df.sample(): 查看单列数据:df["列索引"]或df. 列索引 查看多列数据: (6)增加或修改单列:df["列索引"] = 列数据(如果列索引存在则修改,不存在则新增) 修改或增加多列:df.assign(列索引=数据,列索引2=数据..........) (7)删除单列:df.drop("列索引", axis = 1)(axis默认为0为行,=1则为列) 删除多列:df.drop(["列索引1","列索引2"..........], axis = 1) del ,pop:删除 (8)位置索引和标签索引: 切片:通过位置索引切片:df.iloc[start : stop : step](左闭右开); 或通过标签索引切片:df.loc[start : stop : step](左闭右闭) (9)增加一行数据(了解,不推荐使用):df._append([[数据1,数据2......]], colnmns=[列名1,列名2......]) 增加行(常用):pandas.concat([df1,df2,df3.....] , keys= , axis= )keys为指定合并后的列名,多个列名使用列表传入;axis合并方向按照行或列合并,默认为0按行合并(合并成多行),为1则为按列合并(合并成多列) (10)删除行:df.drop([行索引1,行索引2.......],inplace = True/False)inplace默认为False非原地操作 (11)获取表格中某个元素: df.loc[3].loc["c"]可合并成àdf.loc[3, "c"] 或df.iloc[2].loc[2]索引种类不同不能合并 随机生成df,行标签索引为1-10,列标签索引为a-e: 先取行再取列等价查询方式:先取行再取列一共有四种取法:loc[].iloc[] , loc[].loc[], iloc[].loc[], iloc[].loc[]
先取列,再取行:直接用df.列标签索引.行索引(df.列标签索引取出来为series多个元素)
(12)获取表格中多个元素:[]中括号内填写列表 取多行
|