Data analysis tool pandas series of tutorials (B): from the powerful DataFrame

In the article data analysis tool pandas series of tutorials (a): Speaking from the Series in: Details of the pandas underlying data structures Series, today to talk about another data structure DataFrame.

Here Insert Picture Description

dataframe tabular data structure is by an ordered set of columns, the dictionary can be viewed as composed of a Series, for example:

/ name sex course grade
0 Bob male math 99
1 Alice female english 92
2 Joe male chinese 89
3 Bob male chinese 88
4 Alice female chinese 95
5 Joe male english 93
6 Bob male english 95
7 Alice female math 79
8 Joe male math 89

Create a common way of dataframe

The same series as, dataframe there are index, the difference is, series in addition to index, only one, but dataframe usually have a lot of columns, such as the above dataframe there are four, and has a name: name, sex, course, grade, these name, can index to a column, called a column names (index), therefore, in dataframe, I prefer to be called index row index and column index in order to distinguish.

Creating dataframe fact, there are N ways, no need to eleven master, after all commonly used, but two are three and I do not intend to create all the way say it again, as there are virtuoso of the suspects, according to their own understanding, I put these Creating a unified manner divided into two categories: the way to create columns, the lines created by the way, they talk about each of these two categories under way to create the most representative.

Dataframe to create the above example, the same.

Created by the column

import pandas as pd
#没有设置行索引 index,取默认值
df = pd.DataFrame({'name':['Bob','Alice','Joe']*3,
               'sex':['male','female','male']*3,
               'course':['math','english','chinese','chinese','chinese','english','english','math','math'],
               'grade':[99,92,89,88,95,93,95,79,89]})
print(df)

Here Insert Picture Description

Created by line

data = [['Bob','male','math',99],
['Alice','female','english',92],
['Joe','male','chinese',89],
['Bob','male','chinese',88],
['Alice','female','chinese',95],
['Joe','male','english',93],
['Bob','male','english',95],
['Alice','female','math',79],
['Joe','male','math',89]]
columns = ['name','sex','course','grade']
df = pd.DataFrame(data=data,columns=columns)
print(df)

Print the results above.

dataframe basic properties and overall description

Attributes meaning
df.shape Df number of rows of columns
df.index df row index
df.columns Df column index of (name)
df.dtypes Each column data types df
df.valuse df object value, is a two-dimensional array ndarray
print(df.shape,'\n')
print(df.index,'\n')
print(df.columns,'\n')
print(df.dtypes,'\n')
print(df.values,'\n')

Here Insert Picture Description

Note that the data type of each column, since their pandas can infer the data type, grade is thus 64 instead of an int type object.

function effect
df.head() Print front row n, the default line 5
df.tail() Print n rows behind, the default line 5
df.info() Number of printed lines, columns, column index, the column number of non-null value, etc. Overview of the entire information
df.describe() Print count, mean, variance, the minimum, quartiles, maximum and so the overall description
print(df.head(),'\n')
print(df.tail(3),'\n')
print(df.info(),'\n')
print(df.describe(),'\n')

Here Insert Picture Description
Here Insert Picture Description

dataframe inquiry

LOC [] and iLoc []

Read data analysis tool pandas series of tutorials (a): Speaking from the Series readers should be aware iloc[]of iis integerthe meaning of, meaning iloc[]only through the location query, and loc[]can row, column index query; similarly, these two functions both queries, you can also add, edit.

To reflect the difference, we first converted into a row index from 0-8 1-9 (refer front closed closed section, and range()a front opening and closing section):

df.index = range(1,10)

Here Insert Picture Description

Suppose we want to complete a task: Bob's math scores into 100.

With loc[]the completion of the following:

df.loc[1,'grade'] = 100
print(df,'\n')

Here Insert Picture Description

And use iloc[], the corresponding code is as follows:

df.iloc[0,3] = 100
print(df,'\n')

iloc[]It is based, and where queries row index, column index of little relationship, and this is the reason why I modify the line in advance of the index, to facilitate comparison iloc[]and loc[]information in the first parameter.

This point two queries are queries, in fact, loc[]and iloc[]also supports block inquiries, sample code as follows:

print(df.loc[[1,3,9],['name','grade']],'\n')
print(df.iloc[[0,2,8],[0,3]])

Here Insert Picture Description

Traversal query

for index,row in df.iterrows():
	print(index,': ',row['name'],row['sex'],row['course'],row['grade'])

Here Insert Picture Description

Relations with the Series

You can create dataframe by series:

names = pd.Series(['Bob','Alice','Joe']*3)
sexs = pd.Series(['male','female','male']*3)
courses = pd.Series(['math','english','chinese','chinese','chinese','english','english','math','math'])
grades = pd.Series([99,92,89,88,95,93,95,79,89])
df = pd.DataFrame({'name':names,'sex':sexs,'course':courses,'grade':grades})

The result is printed at the beginning of the article that dataframe, which can be divided to create a way to create a column of the way, but not commonly talked about above that way.

Series can be obtained by the dataframe DF [Column Name] by:

print(df['name'],type(df['name']),'\n')

Here Insert Picture Description

Consequently, all of the series of operations for df['name']:

print(df['name'].values,type(df['name'].values),'\n')
print(df['name'].unique(),type(df['name'].unique()),'\n')

Here Insert Picture Description

I am here to correct wrong with my last article in: series.values or series.unique () does not return the list, although the print results as a list (because of the __str__()function is overloaded), but in fact it is ndarrayan object a similar list of the array, you can .tolist()turn the list.

print(df['name'].values.tolist(),type(df['name'].values.tolist()),'\n')
print(df['name'].unique().tolist(),type(df['name'].unique().tolist()),'\n')

Here Insert Picture Description

series, said last missing an important operation apply(): the data for processing on the column, it can use lambda expressions as parameters, you can also use the function name already defined functions (no tape ()) as a parameter, for example, we make every personal achievement of each course plus or minus 10 minutes:

# lambda 表达式适用于比较简单的处理
df['grade'] = df['grade'].apply(lambda x:x-10)
print(df,'\n')
# 定义函数适用于比较复杂的处理,这里仅作示例
def operate(x):
	return x+10
df['grade'] = df['grade'].apply(operate)
print(df)

Here Insert Picture Description

Note that apply()the function is the return value, and is to use df['grade']the reception rather than df, otherwise the whole dataframe leaving only grade this column.

New deleting rows or columns

Add / delete rows or columns less enumeration methods, and here I only said initiate several commonly used.

Delete Row / column by drop()function to complete:

# drop() 的第一个参数是行索引或者列索引
# axis = 0  删除行
df.drop([0,7,8],axis=0,inplace=True) # 删除所有人的数学成绩
# axis = 0  删除列
df.drop(['sex'],axis=1,inplace=True) # 删除所有人的性别信息
print(df)

Here Insert Picture Description

And the series as a new row is available set_value(), , at[]ifloc[] the row index exists, it is modified, or is new; the following three lines, each line the same effect, are amended english Alice's score of 100:

# 不一定非得要列表,只要是可迭代对象即可
df.loc[1] = ['Alice', 'english', 100]
df.at[1] = ['Alice', 'english', 100]
# set_value 会在将来被舍弃
df.set_value(1, df.columns, ['Alice', 'english', 100], takeable=False)

Add a can df[列名]=可迭代对象or df[:,列名]=可迭代对象implemented to the driving task, such as a new performance levels, fail to 60 points or less, good as 60-89, preferably 90-100 of:

level = []
for grade in df['grade'].values.tolist():
	if grade<60:
		level.append('不及格')
	elif grade>=60 and grade<90:
		level.append('良')
	else:
		level.append('优')
df['level'] = level
print(df)

Here Insert Picture Description

Thus, pandas in two basic data structures finished, the next pit pandas to talk about reading and writing files in a variety of functions.

Published 84 original articles · won praise 250 · Views 150,000 +

Guess you like

Origin blog.csdn.net/ygdxt/article/details/104214096