Pandas provides powerful data manipulation and analysis functions, and is an essential everyday tool for data science. In this article, we will introduce the 15 most commonly used Pandas code snippets. These snippets will help simplify data analysis tasks and extract valuable insights from data sets.
1. Filter data
Pandas provides various ways to filter data.
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)
2. Group and aggregate data
# Grouping by a column and calculating the mean
grouped = df.groupby('Age').mean()
print(grouped)
3. Data missing values
# Check for missing values
missing_values = df.isnull().sum()
# Fill missing values with a specific value
df['Age'].fillna(0, inplace=True)
4. Apply functions to columns
apply()
Functions allow custom functions to be applied on the rows or columns of a DataFrame to enable more complex data manipulation and transformation operations.
df['Age'] = df['Age'].apply(lambda x: x * 2)
5. Connect DataFrames
The connection here is mainly the connection of rows, that is to say, connecting two DataFrames with the same column structure
# Concatenate two DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})
result = pd.concat([df1, df2], ignore_index=True)
print(result)
6. Merge DataFrames
The merging here refers to the merging of columns, that is to say, based on one or several same columns, merging
# Merge two DataFrames
left = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
right = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
merged = pd.merge(left, right, on='key', how='inner')
print(merged)
7. Pivot table
pivot_table
is one of the important functions used for pivoting data. It rearranges and summarizes data based on the values of one or more columns to better understand the structure and relationships of the data.
# Creating a pivot table
pivot_table = df.pivot_table(index='Name', columns='Age', values='Value')
print(pivot_table)
8. Processing time/date type data
# Converting a column to DateTime
df['Date'] = pd.to_datetime(df['Date'])
9. Data reshaping
pandas.melt()
It is used to convert the data table in wide format to long format. This function is often used in data reshaping operations for easier data analysis and visualization.
pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
Parameter Description:
frame
: The data table (DataFrame) to be reshaped.id_vars
: Columns that need to be preserved, they will become identifier variables in the long format and will not be "melted".value_vars
: Columns that need to be "melted", they will be integrated into one column, and represented by a new column name.var_name
: The name of the new column used to store the "melted" column names.value_name
: The name of the new column to store the "melted" values.col_level
: If the input data is a multi-level index (MultiIndex), you can specify on which level to apply the "melt" operation.
Here is an example showing how to use
melt()
The function converts wide format data to long format, assuming the following wide format data table
df
:
ID Name Math English History
0 1 Amy 90 85 88
1 2 Bob 78 92 76
2 3 John 88 79 90
we are going to
Math
、
English
and
History
To "melt" the columns into a long-form data table, do this:
melted_df=pd.melt(df, id_vars=['ID', 'Name'], value_vars=['Math', 'English', 'History'], var_name='Subject', value_name='Score')
Converted long-form data table
melted_df
As follows:
ID Name Subject Score
0 1 Amy Math 90
1 2 Bob Math 78
2 3 John Math 88
3 1 Amy English 85
4 2 Bob English 92
5 3 John English 79
6 1 Amy History 88
7 2 Bob History 76
8 3 John History 90
In this way, you can combine multiple columns of data in a wide-format data table into a single column for easier analysis, visualization, or other manipulations.
melt()
Functions are very useful during the data cleaning and transformation phase.
melt()
Or can be understood as the above
pivot_table
or
unstack
reverse operation.
10. Categorical data
astype('category')
Is a method for converting a column of data types to a category (Category) type. Converting a data column to a categorical type can help save memory and improve performance, especially when the data column contains a limited number of distinct values.
# Encoding categorical variables
df['Category'] = df['Category'].astype('category')
df['Category'] = df['Category'].cat.codes
11. Data Sampling
# Randomly sample rows from a DataFrame
sampled_df = df.sample(n=2)
12. Calculate cumulative sum
# Calculating cumulative sum
df['Cumulative_Sum'] = df['Values'].cumsum()
13. Delete duplicate data
# Removing duplicate rows
df.drop_duplicates(subset=['Column1', 'Column2'], keep='first', inplace=True)
14. Create dummy variables
pandas.get_dummies()
is a function used in Pandas to perform One-Hot Encoding.
# Creating dummy variables for categorical data
dummy_df = pd.get_dummies(df, columns=['Category'])
15. Data export
There are many to methods, which can be exported to different formats
# Exporting DataFrame to CSV
df.to_csv('output.csv', index=False)
Summarize
The above 15 Pandas code snippets are the most commonly used data manipulation and analysis operations in our daily life. Mastering it proficiently and incorporating them into workflows can increase the efficiency and effectiveness of processing and exploring datasets.
https://avoid.overfit.cn/post/d5097a67e5c34a0ab42395d8c22091e1
Author:pythonfundamentals