Python Advanced-Modules and Packages/random/datetime/ pandas and DataFrame

Python advanced

1. Modules and packages

2. random random number

3. datetime date module

4. pandas and DataFrame

1. Modules and packages

1.1  Module: It is a file ending with .py. Functions, classes, variables, and executable code can be defined in the module.

Python modules are divided into: built-in modules and third-party modules

     

(1) Module installation: pip3 install module name 1 -i domestic mirror address (or set à python interpreter in pycharm )

Huawei mirror source https://mirrors.huaweicloud.com/

Alibaba Cloud http://mirrors.aliyun.com/pypi/simple/

University of Science and Technology of China http://pypi.mirrors.ustc.edu.cn/simple/

Tsinghua University https://pypi.tuna.tsinghua.edu.cn/simple/

Zhejiang University Open Source Mirror Station http://mirrors.zju.edu.cn/

Tencent open source mirror station http://mirrors.cloud.tencent.com/pypi/simple

Douban http://pypi.douban.com/simple/

NetEase open source mirror site http://mirrors.163.com/

Sohu open source mirror http://mirrors.sohu.com/

(2) Import all functions in the module: import module name 1 as alias, module name 2 as alias 2       

Some functions: from module name import function name 1 as alias 1, function name 2 as alias 2

(3) Usage after importing the module: module name/alias. function/variable name

1.2  Package: The package contains multiple modules, and multiple modules with similar or related functions can be managed together.

1.3 Common methods of os module:

(1) Get the current working path: getcwd()

     

(2) Get the list of files and folders under the current path: listdir() or walk()

      

(3) Determine whether the file under the path exists: .path.exists ("path")

(4) Create a folder: mkdir("path") creates a single folder, which cannot be created when the file exists,

makedirs () recursively creates multiple nested folders

      

(5) Delete folder: rmdir ("path"), only empty folders can be deleted  

(6) Path splicing: os.path.join("Path 1", "Path 2")

(7) Split path: os.path.split (path) into absolute path and file name

(8) Get the absolute path: os.path.dirname(path)

(8) Get the file name: os.path.basename(path)

(9) Determine whether the path is a folder: os.path.isdir (path)

     Determine whether the path is a file: os.path.isfile(path)

(10) Return the path separator under the current operating system: os.path.sep

(11) Check the file byte size: os.path.getsize(path)

       

(12) View modules or methods under the current module: dir()

2.random random number

2.1  Commonly used methods:

(1) random.random(): randomly generates a floating point number from 0-1

(2) random.randint(a,b): Randomly generate a random integer between a and b

(3)random.uniform(a,b):随机生成a,b之间的浮点数

(3)random.choice(序列):随机从序列中生成一个随机数

(4)random.sample(序列,长度):随机从序列中获得指定长度的随机数

(5)random.shuffle(序列):打乱

3.datetime日期模块

3.1  date

(1)获取当前日期:datetime.date.today(),获取年,月,日

     

3.2  time

时间转换

3.3  datetime:now()获得程序当前时间,today()获得今天时间

3.4  timedelta

3.5  tzinfo虚拟基

4.pandas和DataFrame

4.1  series:是pandas中一维的、可变长度的、有序的、带标签的数组

(1)series的生成:通过列表、字典、标量等等可迭代对象生成(可指定索引)

 

(2)series矢量计算:

  

numpy.nan 为空值:任意计算都为空,需做处理为0后计算

  

索引对齐:索引相同的可以直接计算,索引不同的自动补齐NaN值计算出空值

(3)位置索引(通过iloc[]取值)和标签索引(通过loc[]取值):

  

(4)切片:

通过位置索引切片:series名.iloc[start : stop : step](左闭右开)

或通过标签索引切片:series名.loc[start : stop : step](左闭右闭)

(5)添加或修改值:有该索引则修改,无则新增(位置索引只能修改,不能用此方法新增)

 

(6)删除元素(默认inplace=False,删除则为原地操作True):原地操作即在原件中操作,False则为复制一份再删除

 

(7)其他常用函数:add(),sub(),div(),mul(),isnull(),notnull(),sum(),max()........

replace():替换操作

contains():查看每个元素是否包含某个字符,每次判断返回一个bool值

4.2  DataFrame:一种数据类型(可以变相的理解为多个sseries组合而成)

(1)创建方式:

通过列表创建

   

通过字典创建:

使用series转换成DataFrame:.to_frame()

(2)读取表格:read_excel或read_csv对应读取不同格式的表格

(3)写出表格:写出为csv文件或其他类型的文件to_csv、to_excel.........

(4)更改表格中字段数据类型:

df = pandas.read_csv(r"E:\desktop\1_copy.csv",usecols=["列名"])

df["列名"] = df["列名"].astype(float)

(5)查看行数df.index:

列名list(df):

查看具体信息df.info():

读取前n行,默认为5行df.head():

读取后n行,默认为5行df.tail():

查看数据形状df.shape:

随机抽取数据中n行,默认抽取一行df.sample():

查看单列数据:df["列索引"]或df. 列索引

查看多列数据:

(6)增加或修改单列:df["列索引"] = 列数据(如果列索引存在则修改,不存在则新增)

修改或增加多列:df.assign(列索引=数据,列索引2=数据..........)

(7)删除单列:df.drop("列索引", axis = 1)(axis默认为0为行,=1则为列)

删除多列:df.drop(["列索引1","列索引2"..........], axis = 1)

del ,pop:删除

(8)位置索引和标签索引:

切片:通过位置索引切片:df.iloc[start : stop : step](左闭右开);

或通过标签索引切片:df.loc[start : stop : step](左闭右闭)

(9)增加一行数据(了解,不推荐使用):df._append([[数据1,数据2......]], colnmns=[列名1,列名2......])

增加行(常用):pandas.concat([df1,df2,df3.....] , keys= , axis= )keys为指定合并后的列名,多个列名使用列表传入;axis合并方向按照行或列合并,默认为0按行合并(合并成多行),为1则为按列合并(合并成多列)

(10)删除行:df.drop([行索引1,行索引2.......],inplace = True/False)inplace默认为False非原地操作

(11)获取表格中某个元素:

df.loc[3].loc["c"]可合并成àdf.loc[3, "c"]   或df.iloc[2].loc[2]索引种类不同不能合并

随机生成df,行标签索引为1-10,列标签索引为a-e:

先取行再取列等价查询方式:先取行再取列一共有四种取法:loc[].iloc[] ,  loc[].loc[],  iloc[].loc[],  iloc[].loc[]

  

先取列,再取行:直接用df.列标签索引.行索引(df.列标签索引取出来为series多个元素)

            

(12)获取表格中多个元素:[]中括号内填写列表

取多行

      

Guess you like

Origin blog.csdn.net/weixin_63713552/article/details/132216347