Military big data - structured data analysis and processing

Level 1: Getting Started with Spark SQL

1. What is the entrance to the Spark SQL program?

C、SparkSession

2. In order to support the processing of structured data, what two programming abstractions does Spark SQL provide?

A、DataFrame

B、Dataset

Level 2: Using Spark SQL to count fighter flight performance

mission details

The flight performance ranking of fighter jets is calculated based on flight speed.

related information

This level uses mySQL to count fighter flight performance.

Programming requirements

Please complete the Begin-End code on the right to calculate the top three fighter jets in the world.

This training provides a Json data of global fighter-related indicator parameters (the data is in /root/jun.json).

One of the data is as follows:

{"发动机数量":"双发","武器装备":"(1)机炮:30 mm机炮 150发; (2)导弹:鹰击-62反舰巡航导弹,鹰击-83反舰导弹,鹰击-91反舰导弹,鹰击-9多用途导弹,雷电-10反辐射导弹,霹雳-8空空导弹,霹雳-11空空导弹,霹雳-12中程空空导弹; (3)炸弹:雷霆2-雷射导引弹,雷石6-滑翔炸弹,200A反机场炸弹,通用炸弹500千克,1500千克。","发动机":"AL-31F涡扇发动机","机长":"21.19米","名称":"歼-16战机","乘员":"2人","关注度":"(5分)","研发单位":"中国沈阳飞机公司","气动布局":"后掠翼","机高":"5.9米","最大飞行速度":"1,438千米每小时","翼展":"14.7米","最大航程":"4,288千米","飞行速度":"超音速","首飞时间":"2011年10月17日"}

Each piece of Json data may have a different number of members, and the value of a member may be empty.

After counting the indicators, save the results to the /root/airspark directory in csv format.

# coding=utf-8
from pyspark.sql import SparkSession
#**********Begin**********#
#创建SparkSession
spark = SparkSession \
    .builder \
    .appName("Python Spark SQL ") \
    .master("local")\
    .getOrCreate()
#读取/root/jun.json中数据
df = spark.read.json("/root/jun.json").coalesce(1)
#创建视图
df.createOrReplaceTempView("table1")
#统计出全球飞行速度排名前三的战斗机
out=spark.sql("select cast(regexp_replace(regexp_extract(`最大飞行速度`,'[\\\d,\\\.]+',0),'\\\,','') as float) as speed,`名称` from table1  order by cast(regexp_replace(regexp_extract(`最大飞行速度`,'[\\\d,\\\.]+',0),'\\\,','') as float)  DESC limit 3")
#保存结果
out.write.mode("overwrite").format("csv").save("/root/airspark")
#**********End**********#
spark.stop()

Level 3: Use Spark SQL to count the proportion of fighter jets developed by each R&D unit

mission details

Calculate the proportion of fighter jets developed by each research and development unit.

related information

Use Spark SQL to count the proportion of fighter jets developed by each research and development unit.

# coding=utf-8
from pyspark.sql import SparkSession
#**********Begin**********#
#创建SparkSession
spark = SparkSession \
    .builder \
    .appName("Python Spark SQL ") \
    .master("local")\
    .getOrCreate()
#读取/root/jun.json中数据
df = spark.read.json("/root/jun.json").coalesce(1)
#创建视图
df.createOrReplaceTempView("table1")
#统计出全球各研发单位研制的战斗机在全球所有战斗机中的占比
out=spark.sql("select   concat(round(count(`研发单位`)*100/(select count(`研发单位`) as num from table1 where `研发单位` is not null and `名称`is not null ),2),'%') as ratio, `研发单位` from table1  where  `研发单位` is not null and `名称`is not null group by  `研发单位`")
#保存结果
out.write.mode("overwrite").format("csv").save("/root/airspark")
#**********End**********#
spark.stop()

Guess you like

Origin blog.csdn.net/Elm_Forest/article/details/128306330