sparksql_monotonically_increasing_id 生成唯一自增ID

#发现有重复的ID,我们可能需要重新给每行数据分分配唯一的新的ID来标示它们
# 增加一个新列
df.withColumn('new_id', fn.monotonically_increasing_id()).show()
#withColums 新增一列
#monotonically_increasing_id 生成唯一自增ID
+---+------+------+---+------+-------------+
| id|weight|height|age|gender|       new_id|
+---+------+------+---+------+-------------+
|  5| 133.2|   5.7| 54|     F|  25769803776|
|  4| 144.5|   5.9| 33|     M| 171798691840|
|  2| 167.2|   5.4| 45|     M| 592705486848|
|  3| 124.1|   5.2| 23|     F|1236950581248|
|  5| 129.2|   5.3| 42|     M|1365799600128|
+---+------+------+---+------+-------------+

发布了273 篇原创文章 · 获赞 1 · 访问量 4706

猜你喜欢

转载自blog.csdn.net/wj1298250240/article/details/103944979