Spark报错:The pivot column feature has more than 10000 distinct values

(Author: Chen Yujue data-master )

When using pyspark to make a narrow table to a wide table, an error occurs:

pyspark.sql.utils.AnalysisException: 
u'The pivot column feature has more than 10000 distinct values, 
this could indicate an error. 
If this was intended, 
set spark.sql.pivotMaxValues to at least 
the number of distinct values of the pivot column.;'

insert image description here

It’s scary. Look at the literal meaning. There are more than 1W items in my narrow table. You must know that this is an artificial factor, that is to say, there are more than 1W factors?

Although it is scary, since you encounter a problem, you must solve the problem. The solution to the problem has been given in the error report, which is to set the spark.sql.pivotMaxValues ​​parameter.

Then I checked directly in hive, how many items I have in total, I found out that it was 2W6, 于是把参数spark.sql.pivotMaxValues设置成30000, and the error message disappeared.

Guess you like

Origin blog.csdn.net/weixin_39750084/article/details/107618042