9 value_counts() tricks to improve the efficiency of Python data analysis

Data scientists typically spend most of their time exploring and preprocessing data. Pandas value_counts() is one of the most popular functions when it comes to data analysis and understanding data structures. The function returns a series containing counts of unique values. The resulting Series can be sorted in descending or ascending order, including or excluding NAs controlled by parameters.

In this article, we will explore different use cases for Pandas value_counts(). How to use it to handle the following common tasks. If you like this article, remember to bookmark, follow, and like.

1. Default parameters

2. Sort the results in ascending order

3. Arrange the results in alphabetical order

4. The result contains null values

5. Display the result as a percentage count

6. Divide continuous data into discrete intervals

7. Group and call value_counts()

8. Convert the resulting series to a DataFrame

9. Apply to DataFrame

recommended article

1. Default parameters

The Pandas value_counts() function returns a series containing counts of unique values. By default, the resulting series are sorted in descending order and do not contain any NA values. For example, let's get the count for the "Embarked" column from the Titanic dataset.

>>> df['Embarked'].value_counts() 
 
S    644 
C    168 
Q     77 
Name: Embarked, dtype: int64

2. Sort the results in ascending order

The series returned by value_count() are sorted by default in descending order. For ascending results, we can set the parameter ascending to True.

>>> df['Embarked'].value_counts(ascending=True) 
 
Q     77 
C    168 
S    644 
Name: Embarked, dtype: int64

3. Arrange the results in alphabetical order

We have already learned the parameter ascending order to get the result sorted by value count ASC or DESC. In some cases it is better to display our results in alphabetical order. This can be done by calling sort_index(ascending=True) after value_counts(), e.g.

>>> df['Embarked'].value_counts(ascending=True).sort_index(ascending=True) 
 
C    168 
Q     77 
S    644 
Name: Embarked, dtype: int64

4. Include NA in the result

By default, rows containing any NA values ​​are ignored in the results. There is a parameter dropna to configure it. We can set this value to False to include the number of rows with NA.

df['Embarked'].value_counts(dropna=False) 
S      644 
C      168 
Q       77 
NaN      2 
Name: Embarked, dtype: int64

5. Display the result as a percentage count

When doing exploratory data analysis, it is sometimes more useful to look at the percent count of unique values. This can be done by setting the parameter normalize to True, for example:

df['Embarked'].value_counts(normalize=True) 
 
S    0.724409 
C    0.188976 
Q    0.086614 
Name: Embarked, dtype: float64

If we prefer to format the results with percent signs (%), we can set the Pandas display options as follows:

>>> pd.set_option('display.float_format', '{:.2f}%'.format) 
>>> df['Embarked'].value_counts(normalize = True) 
 
S   0.72% 
C   0.19% 
Q   0.09% 
Name: Embarked, dtype: float64

6. Divide continuous data into discrete intervals

Pandas value_counts() can be used to divide continuous data into discrete intervals using the bin parameter. Similar to the Pandas cut() function, we can pass integers or lists to the bin parameter.

When integers are passed to bins, the function discretizes continuous values ​​into equal-sized bins, for example:

>>> df['Fare'].value_counts(bins=3) 
(-0.513, 170.776]     871 
(170.776, 341.553]     17 
(341.553, 512.329]      3 
Name: Fare, dtype: int64

When a list is passed to bin, the function divides consecutive values ​​into custom groups, for example:

>>> df['Fare'].value_counts(bins=[-1, 20, 100, 550]) 
(-1.001, 20.0]    515 
(20.0, 100.0]     323 
(100.0, 550.0]     53 
Name: Fare, dtype: int64

7. Group and execute value_counts()

Pandas groupby() allows us to separate data into different groups to perform calculations for better analysis. A common use case is to group by a certain column and then get a count of unique values ​​for another column. For example, let's group by the "Embarked" column and get a count of distinct "Sex" values.

>>> df.groupby('Embarked')['Sex'].value_counts() 
 
Embarked  Sex    
C         male       95 
          female     73 
Q         male       41 
          female     36 
S         male      441 
          female    203 
Name: Sex, dtype: int64

8. Convert the resulting series to a DataFrame

Pandas value_counts() returns a Series, including the previous example with a MultiIndex. If we want our results to be displayed as a DataFrame, we can call to_frame() after value_count().

y('Embarked')['Sex'].value_counts().to_frame()

picture

9. Apply to DataFrame

So far we have been applying value_counts() to Pandas Series, there is an equivalent method in Pandas DataFrame. Pandas DataFrame.value_counts() returns a series containing the counts of unique rows in the DataFrame.

Let's see an example to understand it better:

df = pd.DataFrame({
    
     
    'num_legs': [2, 4, 4, 6], 
    'num_wings': [2, 0, 0, 0]}, 
    index=['falcon', 'dog', 'cat', 'ant'] 
) 
>>> df.value_counts() 
 
num_legs  num_wings 
4         0            2 
6         0            1 
2         2            1 
dtype: int64

By calling value_counts() on df, it returns a MultiIndex series indexed by num_legs and num_wings. From the result, we can find that there are 2 records with num_legs=4 and num_wing=0.

Similarly, we can call to_frame() to convert the result to a DataFrame

>>> df.value_counts().to_frame()

picture

Summarize

In this article, we explored different use cases for Pandas value_counts(). I hope this article helps you save time learning Pandas. I suggest you look at the documentation for the value_counts() API and learn about other things you can do.

Technology Exchange

Welcome to reprint, collect, like and support!

insert image description here

At present, a technical exchange group has been opened, and the group has more than 2,000 members . The best way to remark when adding is: source + interest direction, which is convenient to find like-minded friends

  • Method 1. Send the following picture to WeChat, long press to identify, and reply in the background: add group;
  • Method ②, add micro-signal: dkl88191 , note: from CSDN
  • Method ③, WeChat search public account: Python learning and data mining , background reply: add group

long press follow

Guess you like

Origin blog.csdn.net/weixin_38037405/article/details/123699175