1. By connecting AWS Databricks, get the data table content DataFrame. We need to be transformed with the data toPandas () method
df = data_df.toPandas()
2. Get a specified column:
test = df['test'] # 获取test 整个列
3. traverse the value of a specified column:
for i in test:
print("列值: %s" % i)
4. Comparison of two columns:
def compare_diff_cloumn(df1,df2):
df1_cols = set(df1.columns)
df2_cols = set(df2.columns)
return list(df2_cols.difference(df1_cols))
compare_diff_cloumn(df1, df2)