15 basic and commonly used Pandas code snippets

Pandas provides powerful data manipulation and analysis functions, and is an essential everyday tool for data science. In this article, we will introduce the 15 most commonly used Pandas code snippets. These snippets will help simplify data analysis tasks and extract valuable insights from data sets.

1. Filter data

Pandas provides various ways to filter data.

 import pandas as pd
 
 # Create a DataFrame
 data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
         'Age': [25, 30, 35, 40]}
 
 df = pd.DataFrame(data)
 
 # Filter rows where Age is greater than 30
 filtered_df = df[df['Age'] > 30]
 print(filtered_df)

2. Group and aggregate data

 # Grouping by a column and calculating the mean
 grouped = df.groupby('Age').mean()
 print(grouped)

3. Data missing values

 # Check for missing values
 missing_values = df.isnull().sum()
 
 # Fill missing values with a specific value
 df['Age'].fillna(0, inplace=True)

4. Apply functions to columns

apply()

Functions allow custom functions to be applied on the rows or columns of a DataFrame to enable more complex data manipulation and transformation operations.

 df['Age'] = df['Age'].apply(lambda x: x * 2)

5. Connect DataFrames

The connection here is mainly the connection of rows, that is to say, connecting two DataFrames with the same column structure

 # Concatenate two DataFrames
 df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
 df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})
 
 result = pd.concat([df1, df2], ignore_index=True)
 print(result)

6. Merge DataFrames

The merging here refers to the merging of columns, that is to say, based on one or several same columns, merging

 # Merge two DataFrames
 left = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
 right = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
 
 merged = pd.merge(left, right, on='key', how='inner')
 print(merged)

7. Pivot table

pivot_table

is one of the important functions used for pivoting data. It rearranges and summarizes data based on the values ​​of one or more columns to better understand the structure and relationships of the data.

 # Creating a pivot table
 pivot_table = df.pivot_table(index='Name', columns='Age', values='Value')
 print(pivot_table)

8. Processing time/date type data

 # Converting a column to DateTime
 df['Date'] = pd.to_datetime(df['Date'])

9. Data reshaping

pandas.melt()

It is used to convert the data table in wide format to long format. This function is often used in data reshaping operations for easier data analysis and visualization.

 pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)

Parameter Description:

  • frame: The data table (DataFrame) to be reshaped.
  • id_vars: Columns that need to be preserved, they will become identifier variables in the long format and will not be "melted".
  • value_vars: Columns that need to be "melted", they will be integrated into one column, and represented by a new column name.
  • var_name: The name of the new column used to store the "melted" column names.
  • value_name: The name of the new column to store the "melted" values.
  • col_level: If the input data is a multi-level index (MultiIndex), you can specify on which level to apply the "melt" operation.

Here is an example showing how to use

melt()

The function converts wide format data to long format, assuming the following wide format data table

df

    ID  Name  Math  English  History
 0   1   Amy    90       85       88
 1   2   Bob    78       92       76
 2   3  John    88       79       90

we are going to

Math

English

and

History

To "melt" the columns into a long-form data table, do this:

 melted_df=pd.melt(df, id_vars=['ID', 'Name'], value_vars=['Math', 'English', 'History'], var_name='Subject', value_name='Score')

Converted long-form data table

melted_df

As follows:

    ID  Name  Subject  Score
 0   1   Amy     Math     90
 1   2   Bob     Math     78
 2   3  John     Math     88
 3   1   Amy  English     85
 4   2   Bob  English     92
 5   3  John  English     79
 6   1   Amy  History     88
 7   2   Bob  History     76
 8   3  John  History     90

In this way, you can combine multiple columns of data in a wide-format data table into a single column for easier analysis, visualization, or other manipulations.

melt()

Functions are very useful during the data cleaning and transformation phase.

melt()

Or can be understood as the above

pivot_table

or

unstack

reverse operation.

10. Categorical data

astype('category')

Is a method for converting a column of data types to a category (Category) type. Converting a data column to a categorical type can help save memory and improve performance, especially when the data column contains a limited number of distinct values.

 # Encoding categorical variables
 df['Category'] = df['Category'].astype('category')
 df['Category'] = df['Category'].cat.codes

11. Data Sampling

 # Randomly sample rows from a DataFrame
 sampled_df = df.sample(n=2)

12. Calculate cumulative sum

 # Calculating cumulative sum
 df['Cumulative_Sum'] = df['Values'].cumsum()

13. Delete duplicate data

 # Removing duplicate rows
 df.drop_duplicates(subset=['Column1', 'Column2'], keep='first', inplace=True)

14. Create dummy variables

pandas.get_dummies()

is a function used in Pandas to perform One-Hot Encoding.

 # Creating dummy variables for categorical data
 dummy_df = pd.get_dummies(df, columns=['Category'])

15. Data export

There are many to methods, which can be exported to different formats

 # Exporting DataFrame to CSV
 df.to_csv('output.csv', index=False)

Summarize

The above 15 Pandas code snippets are the most commonly used data manipulation and analysis operations in our daily life. Mastering it proficiently and incorporating them into workflows can increase the efficiency and effectiveness of processing and exploring datasets.

https://avoid.overfit.cn/post/d5097a67e5c34a0ab42395d8c22091e1

Author:pythonfundamentals

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/132646171