1. schema参数,AssertionError: dataType should be DataType
# AssertionError: dataType Should BE DataType Schema = StructType ([ # to true representatives not empty StructField ( " COL_1 " , StringType, True), StructField ( " COL_2 " , StringType, True), StructField ( " col_3 " , StringType, True) , ] ) # Cause: StringType the like is not followed by parentheses "()" # modified as: Schema = StructType ([ # to true representatives not empty StructField ( " COL_1 " , the StringType (), True), StructField ( "col_2", StringType(), True), StructField("col_3", StringType(), True), ] )
2. pyspark current data types are:
NullType, StringType, BinaryType, BooleanType, DateType, TimestampType, DecimalType, DoubleType, FloatType, ByteType, IntegerType, LongType, ShortType, ArrayType, MapType, StructType (StructField), etc., to use under the circumstances, to possible overflow problems.
Wherein the python big brother summary data corresponding to the following types:
null type | None |
StringType | base string |
BinaryType | bytearray |
BooleanType | bool |
DateType | datetime.date |
TimestampType | datetime.datetime |
DecimalType | decimal.Decimal |
DoubleType | float(double precision floats) |
FloatType | float(single precision floats) |
Byte type | int(a signed integer) |
IntegerType | int(a signed 32-bit integer) |
LongType | long(a signed 64-bit integer) |
ShortType | int(a signed 16-bit integer) |
Reference: https://www.cnblogs.com/yurunmiao/p/4923694.html