[python] pandas study notes (9) to realize the merge of DataFrame

https://blog.csdn.net/weixin_37226516/article/details/64137043

on: Column name, the name of the column used by join for alignment. When using this parameter, you must ensure that the column used for alignment in the left and right tables has the same column name.

left_on: Columns aligned to the left table, which can be column names or arrays of the same length as the dataframe.

right_on: Columns aligned to the right table, which can be column names or arrays of the same length as the dataframe.

left_index/ right_index: If it is True haunted, use index as the alignment key

how: The method of data fusion.

sort: According to the dataframe merged keys are sorted in dictionary order, the default is, if set to false, performance can be improved.

import pandas as pd

rating_path = "./ant-learn-pandas-master/datas/movielens-1m/ratings.dat"
users_path = "./ant-learn-pandas-master/datas/movielens-1m/users.dat"
movies_path = "./ant-learn-pandas-master/datas/movielens-1m/movies.dat"

ratings = pd.read_csv(rating_path, sep='::', engine='python', names="UserID::MovieID::Ratings::TimeStamp".split("::"))
users = pd.read_csv(users_path, sep='::', engine='python', names="UserID::Gender::Age::Occupation::Zip-Code".split("::"))
movies = pd.read_csv(movies_path, sep='::', engine='python', names="MovieID::Titles::Genres".split("::"))

Two merge

ratings_user = pd.merge(
    ratings, users, left_on="UserID", right_on="UserID", how="inner"
)

If you encounter a column or two tables with the same name in the process of merging with the table, but the values ​​are different, and you want to keep it when you merge, you can use suffixes to add a suffix to the duplicate column names of each table.

result = pd.merge(left, right, on=‘k’, suffixes=[’_l’, ‘_r’])

Another merge method is concat
Insert picture description here

Guess you like

Origin blog.csdn.net/Sgmple/article/details/113076882