目录
一、二者的特点
- loc 可用“字符”、“整数”、“布尔值”作为索引,也就是标签索引
注意:此处的“整数”将被解释为index的一个label而不是index的位置
- iloc 只允许“整数”作为索引,也就是位置索引,和列表索引类似,里面只能是数字
注意:此处的“整数”将被解释为index的位置,前闭后开
其中,loc是指location的意思,iloc中的i是指integer。
用人话说
- 用“index名称”或“column名称”索引:
df.loc["Adam", "Age"] # 返回 df 中 index=="Adam" and column=="Age"的值;
df.loc["Adam"] # 返回 df 中 index=="Adam"的行的所有值,形为Series,该Series的index为df的column,values为该行的值。
- 用df 的位置索引:
df.iloc[2, 3] # 返回 df 中 index==2 and column==3的值;
df.iloc[1:5, 3:6] # 返回 df 中 index从1到4行 and column从3到5行,形为DataFrame 。
二、官网原文
Access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a boolean array.Allowed inputs are:
A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index, and never as an integer position along the index).A list or array of labels, e.g.
['a', 'b', 'c']
.A slice object with labels, e.g.
'a':'f'
.Warning:Note that contrary to usual python slices, both the start and the stop are included
A boolean array of the same length as the axis being sliced, e.g.
[True, False, True]
.A
callable
function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)
Purely integer-location based indexing for selection by position.
.iloc[]
is primarily integer position based (from0
tolength-1
of the axis), but may also be used with a boolean array.Allowed inputs are:
- An integer, e.g.
5
.- A list or array of integers, e.g.
[4, 3, 0]
.- A slice object with ints, e.g.
1:7
.- A boolean array.
- A
callable
function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.
.iloc
will raiseIndexError
if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).
三、例子——总有一款适合你
- loc
取值:
# 初始化df:
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
# 取df 的一行:以 Series的形式返回该行
>>> df.loc['viper']
max_speed 4
shield 5
Name: viper, dtype: int64
# 取df的多行:以 DataFrame的形式返回这些值
>>> df.loc[['viper', 'sidewinder']] # 注意:要使用 [[]]
max_speed shield
viper 4 5
sidewinder 7 8
# 取df的一个值:
>>> df.loc['cobra', 'shield']
2
# 以“布尔值”为元素的列表,也可以取值,True取,False不取
>>> df.loc[[False, False, True]]
max_speed shield
sidewinder 7 8
# 设定判断条件后,返回“布尔值”构成的Series,也可以取值
# 在'shield'列中筛选大于6的行,取这些行的全部值
>>> df.loc[df['shield'] > 6]
max_speed shield
sidewinder 7 8
# 在'shield'列中筛选大于6的行,取['max_speed']列的对应元素(例如,筛选身高大于1.8米者的体重)
>>> df.loc[df['shield'] > 6, ['max_speed']]
max_speed
sidewinder 7
# 以lambda表达式做判断,返回“布尔值”构成的Series,实现取值
>>> df.loc[lambda df: df['shield'] == 8]
max_speed shield
sidewinder 7 8
赋值:
与“取值”类似
all_data.loc[all_data["GarageType"].isnull(), ["GarageType"]] = "No Garage"