The data object pandas DataFrame

DataFrame create

    DataFrame is a two-dimensional array object, DataFrame dictionary can be seen as composed Series, a line and a common data.

pandas can read csv file, using read_csv function.

a = pd.DataFrame({"one":[1,2,3],"two":[4,5,6]})
print(a)
b = pd.DataFrame({"one":[1,2,3],"two":[4,5,6]}, index=['a','b','c'])
print(b)
c=pd.read_csv("d:/abc.csv")

DataFrame property

    

a = pd.DataFrame({"one":[1,2,3],"two":[4,5,6]})
print(a)
b = pd.DataFrame({"one":[1,2,3],"two":[4,5,6]}, index=['a','b','c'])
print(b)
print(a.index)
print(b.index)
print(a.values)
print(b.values)
print(a.T)
print(a.columns)
print(a.describe)
print(a.describe())

DataFrame indexing and sliced

    Obtain data by indexing, the default column before the election, re-election row. Loc and iloc recommended way to specify access. loc tags specify access.

iloc is designated to obtain data in accordance with the subject under way. The ranks of the index can be a part of regular index, slice, Boolean index, fancy indexes.

a = pd.DataFrame({"one":[1,2,3],"two":[4,5,6],"three":[7,8,9]})
print(a)
print(a["one"][0])
print(a.loc[0,"one"])
print(a.loc[0,])
print(a.loc[[0,2],:])

DataFrame missing data processing and data alignment

    DataFrame during operation, but also need to be aligned, rows and columns will be aligned.

Missing data can be used to fill fillna function can also be used to delete rows NaN located by dropna.

    c.dropna (how = "all") line of all data is NaN was deleted.
    NaN of data columns (axis = 1) contained in the delete column c.dropna.

a = pd.DataFrame({"one":[1,2,3],"two":[4,5,3],"three":[7,8,2]},index=['a','b','c'])
b = pd.DataFrame({"two":[5,2,8],"one":[4,1,6],"three":[7,3,5]},index=['a','c','b'])
print(a)
print(b)
c=a+b
print(c)
c.loc["a","three"] = np.nan
print(c)
c.fillna(0, inplace=True)
print(c)
c.dropna()
c.dropna(how="all")
c.dropna(axis=1)

DataFrame common method

  Mean () method is used to obtain an average value, averaging the default columns, axis can be specified by row or column by column, column 0, row 1.

sum () method is used to summation. sort_values ​​() is sorted by value, sort_index () is sorted by index.

a = pd.DataFrame({"one":[1,2,3],"two":[4,5,3],"three":[7,8,2]},index=['a','b','c'])
b = pd.DataFrame({"two":[5,2,8],"one":[4,1,6],"three":[7,3,5]},index=['a','c','b'])
print(a)
print(b)
print(a.mean())
print(a.mean(axis = 1))
print(b.sum())
print(b.sum(axis = 1))
print(a.sort_values(by='two'))
print(a.sort_values(by='two', ascending=False))
print(b.sort_index())
print(b.sort_index(axis=1, ascending=False))

 

 

If you enjoyed this article, do not forget to point praise and comments Oh!

Published 175 original articles · won praise 94 · views 380 000 +

Guess you like

Origin blog.csdn.net/chenzhanhai/article/details/104620986