Spark SQL 笔记(19)——spark SQL 总结(2) DataFrame VS SQL

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u012292754/article/details/84194170

1 DataFrame

  • DataFrame = RDD + Schema
  • DataFrame is just a type alias for Dataset of Row
  • DataFrame over RDD : Catalyst optimization&schemas
  • DataFrame can handle : Text,JSON,Parquet,…
  • Both SQL and API Functions in DF still Catalyst optimized

2 Schema

https://spark.apache.org/docs/2.1.3/sql-programming-guide.html#interoperating-with-rdds

  • inferred
  • explicit

3 Loading & Saving Results

https://spark.apache.org/docs/2.1.3/sql-programming-guide.html#save-modes

4 SQL Function Coverage

SQL 覆盖面

  • SQL 2003 support
  • Runs all 99 of TPC-DS benchmark queries
  • Subquery supports
  • vectorization

5 外部数据源

https://spark-packages.org/

  • rdbms,need JDBC jars,
  • Parquet,Phoenix,csv,avro,…

猜你喜欢

转载自blog.csdn.net/u012292754/article/details/84194170
今日推荐