Pyspark Two Notes

1. schema参数,AssertionError: dataType should be DataType

# AssertionError: dataType Should BE DataType 
Schema = StructType ([
         # to true representatives not empty 
        StructField ( " COL_1 " , StringType, True), 
        StructField ( " COL_2 " , StringType, True), 
        StructField ( " col_3 " , StringType, True) , 
      ] 
) 
# Cause: StringType the like is not followed by parentheses "()" 
# modified as: 
Schema = StructType ([
         # to true representatives not empty 
        StructField ( " COL_1 " , the StringType (), True), 
        StructField ( "col_2", StringType(), True),
        StructField("col_3", StringType(), True),
      ]
)

2. pyspark current data types are:

NullType, StringType, BinaryType, BooleanType, DateType, TimestampType, DecimalType, DoubleType, FloatType, ByteType, IntegerType, LongType, ShortType, ArrayType, MapType, StructType (StructField), etc., to use under the circumstances, to possible overflow problems.

Wherein the python big brother summary data corresponding to the following types:

null type None
StringType base string
BinaryType bytearray
BooleanType bool
DateType datetime.date
TimestampType datetime.datetime
DecimalType decimal.Decimal
DoubleType float(double precision floats)
FloatType float(single precision floats)
Byte type int(a signed integer)
IntegerType int(a signed 32-bit integer)
LongType long(a signed 64-bit integer)
ShortType int(a signed 16-bit integer)

Reference: https://www.cnblogs.com/yurunmiao/p/4923694.html

 

Guess you like

Origin www.cnblogs.com/qi-yuan-008/p/11770339.html