Dealing with a multi-index pandas Series and DataFrame

Introduction

Whether we like it or not, in pandas we will come across to Series or DataFrame with multi-index. A multi-index often be generated from method .groupby() or .set_index(). We will tend to use reset_index() to set it back to normal Series/ DataFrame. But in some situation knowing how to deal with a multi-index will be a benifit. And the method we used in multi-index will give us a deeper understanding of DataFrame and Series.

In this article we will talk about:

1. What is a multi-index Series/ DataFrame?

2. How to select a multi-index?

3. How to concat two multi-index DataFrame?

What is a multi-index Series/ DataFrame?

Visually we will know which Series/ DataFrame is a multi-index. But also we can use .index to check if it is a multi-index. If it is, pandas will show it in the return values.

On a deeper level, a multi-index Series/ DataFrame is no more than a Series/ DataFrame, but has an added-dimention. Which makes a multi-index Series acts more like an normal DataFrame. We will talk about this again in the next section.

A multi-index can be .unstack(), and If we .unstack() a multi-index Series, we will have a normal index DataFrame.

How to select a multi-index?

1. Multi-index Series

Using .loc[], from outer level to inner level. Using .loc[:, ] to skip the outer level.

 If we look carefully, this .loc[] operation is exactly the same as we are choosing a DataFrame. Frist element is rows, comma, and second element is columns. Actually it is the DataFrame we .unstack() from original multi-index Series.

2. Multi-index DataFrame

Using .loc[], from outer level to inner level. But different with Series, because it is already a DataFrame, we can not just use a comma to seperate. We will use a () to tell Python they are both for rows. Then use a comma to choose columns.

 

 Both outer level and inner level can be a list, for our multi-selection.

 To skip the outer level is a little bit tricky. We may think of:

df.loc[(:, '2016-10-03'), 'Close']

But actually this can not work. And the correct way is using slice(None).

How to concat two multi-index DataFrame?

 If we have new columns to add, we can use pd.merge(). But we have to use arguments left_index=True and right_index=True.

Summary

Make a long story short: Using .loc operator to choose multi-index Series and DataFrame. Using pd.merge to concate two multi-index DataFrame.

A tidy data requires: each variable must has it's own column, each observation must has it own row, each value must has it's own cell. So a multi-index data is not a tidy data. We can use .reset_index() to change it into tidy data.

 

猜你喜欢

转载自www.cnblogs.com/drvongoosewing/p/12031235.html