Data scientists typically spend most of their time exploring and preprocessing data. Pandas value_counts() is one of the most popular functions when it comes to data analysis and understanding data structures. The function returns a series containing counts of unique values. The resulting Series can be sorted in descending or ascending order, including or excluding NAs controlled by parameters.
In this article, we will explore different use cases for Pandas value_counts(). How to use it to handle the following common tasks. If you like this article, remember to bookmark, follow, and like.
1. Default parameters
2. Sort the results in ascending order
3. Arrange the results in alphabetical order
4. The result contains null values
5. Display the result as a percentage count
6. Divide continuous data into discrete intervals
7. Group and call value_counts()
8. Convert the resulting series to a DataFrame
9. Apply to DataFrame
recommended article
-
Li Hongyi's "Machine Learning" Mandarin Course (2022) is here
-
Someone made a Chinese version of Mr. Wu Enda's machine learning and deep learning
-
I'm addicted, and recently I gave the company a big visual screen (with source code)
-
So elegant, 4 Python automatic data analysis artifacts are really fragrant
1. Default parameters
The Pandas value_counts() function returns a series containing counts of unique values. By default, the resulting series are sorted in descending order and do not contain any NA values. For example, let's get the count for the "Embarked" column from the Titanic dataset.
>>> df['Embarked'].value_counts()
S 644
C 168
Q 77
Name: Embarked, dtype: int64
2. Sort the results in ascending order
The series returned by value_count() are sorted by default in descending order. For ascending results, we can set the parameter ascending to True.
>>> df['Embarked'].value_counts(ascending=True)
Q 77
C 168
S 644
Name: Embarked, dtype: int64
3. Arrange the results in alphabetical order
We have already learned the parameter ascending order to get the result sorted by value count ASC or DESC. In some cases it is better to display our results in alphabetical order. This can be done by calling sort_index(ascending=True) after value_counts(), e.g.
>>> df['Embarked'].value_counts(ascending=True).sort_index(ascending=True)
C 168
Q 77
S 644
Name: Embarked, dtype: int64
4. Include NA in the result
By default, rows containing any NA values are ignored in the results. There is a parameter dropna to configure it. We can set this value to False to include the number of rows with NA.
df['Embarked'].value_counts(dropna=False)
S 644
C 168
Q 77
NaN 2
Name: Embarked, dtype: int64
5. Display the result as a percentage count
When doing exploratory data analysis, it is sometimes more useful to look at the percent count of unique values. This can be done by setting the parameter normalize to True, for example:
df['Embarked'].value_counts(normalize=True)
S 0.724409
C 0.188976
Q 0.086614
Name: Embarked, dtype: float64
If we prefer to format the results with percent signs (%), we can set the Pandas display options as follows:
>>> pd.set_option('display.float_format', '{:.2f}%'.format)
>>> df['Embarked'].value_counts(normalize = True)
S 0.72%
C 0.19%
Q 0.09%
Name: Embarked, dtype: float64
6. Divide continuous data into discrete intervals
Pandas value_counts() can be used to divide continuous data into discrete intervals using the bin parameter. Similar to the Pandas cut() function, we can pass integers or lists to the bin parameter.
When integers are passed to bins, the function discretizes continuous values into equal-sized bins, for example:
>>> df['Fare'].value_counts(bins=3)
(-0.513, 170.776] 871
(170.776, 341.553] 17
(341.553, 512.329] 3
Name: Fare, dtype: int64
When a list is passed to bin, the function divides consecutive values into custom groups, for example:
>>> df['Fare'].value_counts(bins=[-1, 20, 100, 550])
(-1.001, 20.0] 515
(20.0, 100.0] 323
(100.0, 550.0] 53
Name: Fare, dtype: int64
7. Group and execute value_counts()
Pandas groupby() allows us to separate data into different groups to perform calculations for better analysis. A common use case is to group by a certain column and then get a count of unique values for another column. For example, let's group by the "Embarked" column and get a count of distinct "Sex" values.
>>> df.groupby('Embarked')['Sex'].value_counts()
Embarked Sex
C male 95
female 73
Q male 41
female 36
S male 441
female 203
Name: Sex, dtype: int64
8. Convert the resulting series to a DataFrame
Pandas value_counts() returns a Series, including the previous example with a MultiIndex. If we want our results to be displayed as a DataFrame, we can call to_frame() after value_count().
y('Embarked')['Sex'].value_counts().to_frame()
9. Apply to DataFrame
So far we have been applying value_counts() to Pandas Series, there is an equivalent method in Pandas DataFrame. Pandas DataFrame.value_counts() returns a series containing the counts of unique rows in the DataFrame.
Let's see an example to understand it better:
df = pd.DataFrame({
'num_legs': [2, 4, 4, 6],
'num_wings': [2, 0, 0, 0]},
index=['falcon', 'dog', 'cat', 'ant']
)
>>> df.value_counts()
num_legs num_wings
4 0 2
6 0 1
2 2 1
dtype: int64
By calling value_counts() on df, it returns a MultiIndex series indexed by num_legs and num_wings. From the result, we can find that there are 2 records with num_legs=4 and num_wing=0.
Similarly, we can call to_frame() to convert the result to a DataFrame
>>> df.value_counts().to_frame()
Summarize
In this article, we explored different use cases for Pandas value_counts(). I hope this article helps you save time learning Pandas. I suggest you look at the documentation for the value_counts() API and learn about other things you can do.
Technology Exchange
Welcome to reprint, collect, like and support!
At present, a technical exchange group has been opened, and the group has more than 2,000 members . The best way to remark when adding is: source + interest direction, which is convenient to find like-minded friends
- Method 1. Send the following picture to WeChat, long press to identify, and reply in the background: add group;
- Method ②, add micro-signal: dkl88191 , note: from CSDN
- Method ③, WeChat search public account: Python learning and data mining , background reply: add group