Summary of Indexing operation in DataFrame of Pandas

For new users of pandas, the index of DataFrame may seem confusing, so personally I list all its usage in detail and finally make a conclusion about the result of exploration on indexing operation on DataFrame of pandas.

import pandas as pd

import numpy as np

df=pd.DataFrame(np.arange(16).reshape(4,4),index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four']);df

	one	two	three	four
Ohio	0	1	2	3
Colorado	4	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

(1) df[val]

when val is a number,df[val] selects single column from DataFrame,returnning Series type.

df['one']

Ohio         0
Colorado     4
Utah         8
New York    12
Name: one, dtype: int32

when val is a list,df[val] selects sequence columns from DataFrame,returnning DataFrame type.

df[['one','two']]

	one	two
Ohio	0	1
Colorado	4	5
Utah	8	9
New York	12	13

when val is :num, df[val] selects rows, and that is for a convenience purpose.That is equivalent to df.iloc[:num],which is specially used to deal with row selection.

df[:2]

	one	two	three	four
Ohio	0	1	2	3
Colorado	4	5	6	7

df.iloc[:2] # the same with above

	one	two	three	four
Ohio	0	1	2	3
Colorado	4	5	6	7

df[1:3]

	one	two	three	four
Colorado	4	5	6	7
Utah	8	9	10	11

df.iloc[1:3]

	one	two	three	four
Colorado	4	5	6	7
Utah	8	9	10	11

when val is boolean DataFrame, df[val] sets values based on boolean

df<5

	one	two	three	four
Ohio	True	True	True	True
Colorado	True	False	False	False
Utah	False	False	False	False
New York	False	False	False	False

df[df<5]

	one	two	three	four
Ohio	0.0	1.0	2.0	3.0
Colorado	4.0	NaN	NaN	NaN
Utah	NaN	NaN	NaN	NaN
New York	NaN	NaN	NaN	NaN

df[df<5]=0;df

	one	two	three	four
Ohio	0	0	0	0
Colorado	0	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

(2)df.loc[val]

when val is a single index value,selects corresponding row,returnning Series type, and when val is list of index vale, selects corresponding rows,returnning DataFrame type.

df.loc['Colorado']

one      0
two      5
three    6
four     7
Name: Colorado, dtype: int32

df.loc[['Colorado','New York']]

	one	two	three	four
Colorado	0	5	6	7
New York	12	13	14	15

(3)df.loc[:,val]

when val is a single column value,selects corresponding column,returning Series type and when val is list of columns,select corresponding columns,returnning DataFrame type.

df.loc[:,'two']

Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

df.loc[:,['two']] # Note that ,as long as val is a list even though containing just one element ,it will return DataFrame type.

	two
Ohio	0
Colorado	5
Utah	9
New York	13

df.loc[:,['one','two']]

	one	two
Ohio	0	0
Colorado	0	5
Utah	8	9
New York	12	13

df[['one','two']] # The same with above df.loc[:,['one','two']]

	one	two
Ohio	0	0
Colorado	0	5
Utah	8	9
New York	12	13

(3)df.loc[val1,val2]

when val1 may be a single index value or list of index values,and val2 may be a single column value or list of column values,selects the combination data decided by both val1 and val2.And specially, val1 or val2 can both be : to participate in the combination.

df.loc['Ohio','one']

df.loc[['Ohio','Utah'],'one']

Ohio    0
Utah    8
Name: one, dtype: int32

df.loc['Ohio',['one','two']]

one    0
two    0
Name: Ohio, dtype: int32

df.loc[['Ohio','Utah'],['one','two']]

	one	two
Ohio	0	0
Utah	8	9

df.loc[:,:]

	one	two	three	four
Ohio	0	0	0	0
Colorado	0	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

df.loc['Ohio',:]

one      0
two      0
three    0
four     0
Name: Ohio, dtype: int32

df.loc[:,'two']

Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

df.loc[:,['one','two']]

	one	two
Ohio	0	0
Colorado	0	5
Utah	8	9
New York	12	13

(4) df.iloc[val]

Compared with df.loc,val shall be integer or lists of integer which represents the index number and the function is the same with df.loc

df.iloc[1]

one      0
two      5
three    6
four     7
Name: Colorado, dtype: int32

df.iloc[[1,3]]

	one	two	three	four
Colorado	0	5	6	7
New York	12	13	14	15

(5)df.iloc[:,val]

The same with df.loc,except that val shall be integer or list of integers.

df

	one	two	three	four
Ohio	0	0	0	0
Colorado	0	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

df.iloc[:,1]

Ohio         0
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

df.iloc[:,[1,3]]

	two	four
Ohio	0	0
Colorado	5	7
Utah	9	11
New York	13	15

(6)df.iloc[val1,val2]

The same with df.loc,except val1 and val2 shall be integer or list of integers

df.iloc[1,2]

df.iloc[1,[1,2,3]]

two      5
three    6
four     7
Name: Colorado, dtype: int32

df.iloc[[1,2],2]

Colorado     6
Utah        10
Name: three, dtype: int32

df.iloc[[1,2],[1,2]]

	two	three
Colorado	5	6
Utah	9	10

df.iloc[:,[1,2]]

	two	three
Ohio	0	0
Colorado	5	6
Utah	9	10
New York	13	14

df.iloc[[1,2],:]

	one	two	three	four
Colorado	0	5	6	7
Utah	8	9	10	11

(7)df.at[val1,val2]

val1 shall be a single index value,val2 shall be a single column value.

df.at['Utah','one']

df.loc['Utah','one'] # The same with above

df.at[['Utah','Colorado'],'one'] # Raise exception

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
   2538         try:
-> 2539             return engine.get_value(series._values, index)
   2540         except (TypeError, ValueError):


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()


TypeError: '['Utah', 'Colorado']' is an invalid key


During handling of the above exception, another exception occurred:


TypeError                                 Traceback (most recent call last)

<ipython-input-77-c52a9db91739> in <module>()
----> 1 df.at[['Utah','Colorado'],'one']


D:\Anaconda\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   2140 
   2141         key = self._convert_key(key)
-> 2142         return self.obj._get_value(*key, takeable=self._takeable)
   2143 
   2144     def __setitem__(self, key, value):


D:\Anaconda\lib\site-packages\pandas\core\frame.py in _get_value(self, index, col, takeable)
   2543             # use positional
   2544             col = self.columns.get_loc(col)
-> 2545             index = self.index.get_loc(index)
   2546             return self._get_value(index, col, takeable=True)
   2547     _get_value.__doc__ = get_value.__doc__


D:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3076                                  'backfill or nearest lookups')
   3077             try:
-> 3078                 return self._engine.get_loc(key)
   3079             except KeyError:
   3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()


TypeError: '['Utah', 'Colorado']' is an invalid key

(8) df.iat[val1,val2]

The same with df.at,except val1 and val2 shall be both integer

df.iat[2,2]

df

	one	two	three	four
Ohio	0	0	0	0
Colorado	0	5	6	7
Utah	8	9	10	11
New York	12	13	14	15

Conclusion

val in df[val] can be a column value or list of column values in this case to selecting the whole column,and specially can also be set :val meaning to select corresponding sliced rows.And also can be boolean DataFrame to set values.
Generally speaking, df.loc[val] is mainly used to select rows or the combination of rows and columns,so val has the following forms:single row value,list of row values,val1,val2(val1 and val2 can be single value or list of values or :,and in this form,it selects the combination index value val1 and column value val2
df.iloc[val] is the same with df.loc,except val demands integer,whatever single integer value or lists of integers.
df.at[val1,val2] shall be only single value and this also applies to df.iat[val1,val2]

Summary of Indexing operation in DataFrame of Pandas

Summary of Indexing operation in DataFrame of Pandas

(1) df[val]

(2)df.loc[val]

(3)df.loc[:,val]

(3)df.loc[val1,val2]

(4) df.iloc[val]

(5)df.iloc[:,val]

(6)df.iloc[val1,val2]

(7)df.at[val1,val2]

(8) df.iat[val1,val2]

Conclusion

猜你喜欢