pandas index data search, sort and de-emphasis Summary

Because Pandas index is more complex, often easy to confuse the use of the process, so organize a search, sorting on the index, to summarize the document heavy.

import pandas as pd
import numpy as np
#定义DataFrame
dict={'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]}
df=pd.DataFrame(dict,index=['one','two','three'])
df
a b c
one 1 4 7
two 2 5 8
three 3 6 9

1.Series look through the index

Available Index, or with a numeric subscript

s1=df['b']
s1['two']
s1[['two','one']]  # 用数组列出离散的标签,要用[ ]括起来
s1['two':'three']  # 标签切片
s1[0:2]            # 标号切片的右区间是开的

2.DataFrame look through the index

(1) directly by taking a column index of the column

df['b']
df[['b','c']]  # 用数组列出离散的标签,要用[ ]括起来

(2) .loc index data through the label

  • The first to write row labels, column labels after writing
  • Include a plurality of columns, use an array, with [] enclosed
  • Include slices, without using [] enclosed. Note that the label slice right parenthesis is closed
df.loc[['two','one']]   #索引多行,行名用数组
df.loc['two':'three']   #索引多行,行名用切片
df.loc[:,['b','a']] #索引某行多列,列名用数组
df.loc[:,'b':'a']   #索引某行多列,列名用切片

(3) .iloc data acquired by the reference numeral

  • The first to write a line number, column number to write
  • Available single value can also be sliced, Note: right parenthesis numeral is sliced open interval
df.iloc[1:3,1:3]

3. Conversion between the index and column

(1) Columns -> Index

  • inplace parameter can define whether to overwrite the original data
df.set_index('a', inplace=True)  # inplace=True 会在原变量直接改,没有返回值
df
df1=df.set_index('a', inplace=False)  # inplace=False则有返回值(默认),原变量不变
df1

(2) Index -> column

  • Index becomes the first row of the column came in DataFrame
df.reset_index(inplace=True) 
df

4. Using an index sorting

  • By indexing arrangement, in ascending order, does not cover the original data, if the value of the missing on the front
df.sort_index(ascending=True, inplace=False, na_position='first')

5. The index data deduplication

  • keep = 'first' or 'last', the option to keep the first or last
df=df[~df.index.duplicated(keep='first')]

Guess you like

Origin www.cnblogs.com/laiyaling/p/11793938.html