Why does assigning with [:] versus iloc[:] yield different results in pandas?

Tommy Yip :

I am so confused with different indexing methods using iloc in pandas.

Let say I am trying to convert a 1-d Dataframe to a 2-d Dataframe. First I have the following 1-d Dataframe

a_array = [1,2,3,4,5,6,7,8]
a_df = pd.DataFrame(a_array).T

And I am going to convert that into a 2-d Dataframe with the size of 2x4. I start by preseting the 2-d Dataframe as follow:

b_df = pd.DataFrame(columns=range(4),index=range(2))

Then I use for-loop to help me converting a_df (1-d) to b_df (2-d) with the following code

for i in range(2):
    b_df.iloc[i,:] = a_df.iloc[0,i*4:(i+1)*4]

It only gives me the following results

     0    1    2    3
0    1    2    3    4
1  NaN  NaN  NaN  NaN

But when I changed b_df.iloc[i,:] to b_df.iloc[i][:]. The result is correct like the following, which is what I want

   0  1  2  3
0  1  2  3  4
1  5  6  7  8

Could anyone explain to me what the difference between .iloc[i,:] and .iloc[i][:] is, and why .iloc[i][:] worked in my example above but not .iloc[i,:]

cs95 :

There is a very, very big difference between series.iloc[:] and series[:], when assigning back. (i)loc always checks to make sure whatever you're assigning from matches the index of the assignee. Meanwhile, the [:] syntax assigns to the underlying NumPy array, bypassing index alignment.

s = pd.Series(index=[0, 1, 2, 3], dtype='float')  
s                                                                          

0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

# Let's get a reference to the underlying array with `copy=False`
arr = s.to_numpy(copy=False) 
arr 
# array([nan, nan, nan, nan])

# Reassign using slicing syntax
s[:] = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])                 
s                                                                          

0    1
1    2
2    3
3    4
dtype: int64

arr 
# array([1., 2., 3., 4.]) # underlying array has changed

# Now, reassign again with `iloc`
s.iloc[:] = pd.Series([5, 6, 7, 8], index=[3, 4, 5, 6]) 
s                                                                          

0    NaN
1    NaN
2    NaN
3    5.0
dtype: float64

arr 
# array([1., 2., 3., 4.])  # `iloc` created a new array for the series
                           # during reassignment leaving this unchanged

s.to_numpy(copy=False)     # the new underlying array, for reference                                                   
# array([nan, nan, nan,  5.]) 

Now that you understand the difference, let's look at what happens in your code. Just print out the RHS of your loops to see what you are assigning:

for i in range(2): 
    print(a_df.iloc[0, i*4:(i+1)*4]) 

# output - first row                                                                   
0    1
1    2
2    3
3    4
Name: 0, dtype: int64
# second row. Notice the index is different
4    5
5    6
6    7
7    8
Name: 0, dtype: int64   

When assigning to b_df.iloc[i, :] in the second iteration, the indexes are different so nothing is assigned and you only see NaNs. However, changing b_df.iloc[i, :] to b_df.iloc[i][:] will mean you assign to the underlying NumPy array, so indexing alignment is bypassed. This operation is better expressed as

for i in range(2):
    b_df.iloc[i, :] = a_df.iloc[0, i*4:(i+1)*4].to_numpy()

b_df                                                                       

   0  1  2  3
0  1  2  3  4
1  5  6  7  8

It's also worth mentioning this is a form of chained assignment, which is not a good thing, and also makes your code harder to read and understand.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=293232&siteId=1