create dataframe from rdd

Sometimes we want to convert spark dataframe to pandas dataframe. First, we need to convert rdd to spark dataframe. Here is a method:

from pyspark.sql.types import *
from pyspark.sql import Row

schema = StructType([StructField('name', StringType()), StructField('age',IntegerType())])
rows = [Row(name='Severin', age=33), Row(name='John', age=48)]
df = spark.createDataFrame(rows, schema)

df.printSchema()
df.show()

输出:
root
|– name: string (nullable = true)
|– age: integer (nullable = true)

+——-+—+
| name|age|
+——-+—+
|Severin| 33|
| John| 48|
+——-+—+
Next, use the df.toPandas() method to dataframe to pandas dataframe~
references:
https://stackoverflow.com/questions/44948465/creating-a-dataframe-from-row-results-in-infer-schema-issue
http://spark.apache.org/ docs/latest/api/python/pyspark.sql.html#pyspark.sql.SparkSession.createDataFrame

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325900945&siteId=291194637