【Offline Data Warehouse Project from 0】——New Energy Vehicle Data Warehouse Project Introduction

Table of contents

1. The concept of data warehouse

2. Project requirements and architecture design

3. Cluster resource planning and design

 4. Vehicle log field description


1. The concept of data warehouse

Data warehouse (Data Warehouse) is a tool that provides data support for enterprises to assist enterprises in making decisions, improving business processes, and improving product quality. It can receive various types of input data, such as business data, log data, and crawler data. However, in this project, we only perform statistics and analysis on log data.

Specifically, we will focus on a specific type of log data, sensor data while the car is running, which records the usage of each sensor and related data during the running of the car. This data is very important for us to improve car performance, diagnose problems, analyze driving behavior and so on.

2. Project requirements and architecture design

  • Project requirements:

  •  Technology selection:

  • Core architecture:

Mark the dimension information of the vehicle (full synchronization of fixed-point DataX), first upload the data to HFDS, and create a table mapping with Hive; upload the vehicle driving log to the data warehouse with Flume, save the data in ODS, and complete the public subquery in DWS. Finally, export the ADS to Mysql for machine learning.

  • Frame version selection

 Apache framework version used in this project:

  •  Server selection

  •  cluster size

 

3. Cluster resource planning and design

In an enterprise, a set of production clusters and a set of test clusters are usually built. The production cluster runs production tasks, and the test cluster is used for code writing and testing before going live.

  • production cluster

Refer to the official recommended deployment of Tencent Cloud EMR

  • Master node: a management node to ensure the normal scheduling of the cluster; it mainly deploys processes such as NameNode, ResourceManager, and HMaster; the number is 1 in non-HA mode, and 2 in HA mode.
  • Core node: It is a computing and storage node. All your data in HDFS is stored in the core node. Therefore, in order to ensure data security, scaling down is not allowed after expanding the core node; mainly deploy processes such as DataNode, NodeManager, and RegionServer. The number is ≥2 in non-HA mode, and the number is ≥3 in HA mode.
  • Common node: Provides data sharing synchronization and high-availability fault-tolerant services for the HA cluster Master node; mainly deploys distributed coordinator components, such as ZooKeeper, JournalNode and other nodes. The number in non-HA mode is 0, and the number in HA mode is ≥3.

Separate deployments that consume memory.

Data transmission data are put together closely (Kafka, clickhouse).

The client should be placed on one or two servers as much as possible to facilitate external access.

If there are dependencies, try to put them on the same server (for example: Ds-worker and hive/spark).

Master

Master

core

core

core

common

common

common

nn

nn

dn

dn

dn

JournalNode

JournalNode

JournalNode

rm

rm

nm

nm

nm

zk

zk

zk

hive

hive

hive

hive

hive

kafka

kafka

kafka

spark

spark

spark

spark

spark

datax

datax

datax

datax

datax

Ds-master

Ds-master

Ds-worker

Ds-worker

Ds-worker

mysql

mysql

flume

flume

flume

Test cluster server planning

service name

subservice

server

hadoop102

server

hadoop103

server

hadoop104

HDFS

NameNode

DataNode

SecondaryNameNode

Yarn

NodeManager

Resourcemanager

Zookeeper

Zookeeper Server

Flume (collecting logs)

Flume

Kafka

Kafka

Flume

(Consume Kafka logs)

Flume

Hive

MySQL

MySQL

DataX

Spark

DolphinScheduler

ApiApplicationServer

AlertServer

MasterServer

WorkerServer

LoggerServer

Total number of services

15

11

11

 4. Vehicle log field description

All the data processed this time is vehicle log data, that is, the record of the vehicle's own state sent every 30 seconds during the driving process. In addition to log data, we also need to deal with vehicle dimension data, which is stored in the database.

Vehicle log data is critical to how we analyze and predict vehicle performance, maintenance needs, and problem diagnosis. Vehicle dimension data, on the other hand, provides additional information about the vehicle, such as production date, make and model, etc., which can help us better understand the performance and characteristics of the vehicle. In this data processing, we will process both types of data.

vehicle log data

The vehicle log data is a text file in Json format. Each line is a complete Json string, and the meanings of the fields are as follows:

field name

Field Type

vin

Vehicle unique code

timestamp

Log collection time

car_status

vehicle status

charg_status

charging

execution_mode

运行模式

velocity

车速

mileage

里程

voltage

总电压

electric_current

总电流

soc

SOC

dc_status

DC-DC状态

gear

挡位

insulation_resistance

绝缘电阻

motor_count

驱动电机个数

motor_list

驱动电机列表

fuel_cell_voltage

燃料电池电压

fuel_cell_current

燃料电池电流

fuel_cell_consume_rate

燃料消耗率

fuel_cell_temperature_probe_count

燃料电池温度探针总数

fuel_cell_temperature

燃料电池温度值

fuel_cell_max_temperature

氢系统中最高温度

fuel_cell_max_temperature_probe_id

氢系统中最高温度探针号

fuel_cell_max_hydrogen_consistency

氢气最高浓度

fuel_cell_max_hydrogen_consistency_probe_id

氢气最高浓度传感器代号

fuel_cell_max_hydrogen_pressure

氢气最高压力

fuel_cell_max_hydrogen_pressure_probe_id

氢气最高压力传感器代号

fuel_cell_dc_status

高压DC-DC状态

engine_status

发动机状态

crankshaft_speed

曲轴转速

fuel_consume_rate

燃料消耗率

max_voltage_battery_pack_id

最高电压电池子系统号

max_voltage_battery_id

最高电压电池单体代号

max_voltage

电池单体电压最高值

min_temperature_subsystem_id

最低电压电池子系统号

min_voltage_battery_id

最低电压电池单体代号

min_voltage

电池单体电压最低值

max_temperature_subsystem_id

最高温度子系统号

max_temperature_probe_id

最高温度探针号

max_temperature

最高温度值

min_voltage_battery_pack_id

最低温度子系统号

min_temperature_probe_id

最低温度探针号

min_temperature

最低温度值

alarm_level

最高报警等级

alarm_sign

通用报警标志

custom_battery_alarm_count

可充电储能装置故障总数N1

custom_battery_alarm_list

可充电储能装置故障代码列表

custom_motor_alarm_count

驱动电机故障总数N2

custom_motor_alarm_list

驱动电机故障代码列表

custom_engine_alarm_count

发动机故障总数N3

custom_engine_alarm_list

发动机故障代码列表

other_alarm_count

其他故障总数N4

other_alarm_list

其他故障代码列表

battery_count

单体电池总数

battery_pack_count

单体电池包总数

battery_voltages

单体电池电压值列表

battery_temperature_probe_count

单体电池温度探针总数

battery_pack_temperature_count

单体电池包总数

battery_temperatures

单体电池温度值列表

其中电机列表为嵌套字段,其含义如下:

字段名

字段说明

id

驱动电机序号

status

驱动电机状态

controller_temperature

驱动电机控制器温度

rev

驱动电机转速

torque

驱动电机转矩

temperature

驱动电机温度

voltage

电机控制器输入电压

electric_current

电机控制器直流母线电流

车辆维度数据

字段名

字段说明

id

车辆唯一编码

type_id

车型ID

type

车型

sale_type

销售车型

trademark

品牌

company

厂商

seating_capacity

准载人数

power_type

车辆动力类型

charge_type

车辆支持充电类型

category

车辆分类

weight_kg

总质量(kg)

warranty

整车质保期(年/万公里)

 本项目参考尚硅谷课程:

【尚硅谷大数据项目之新能源汽车数仓,离线数据仓库项目实战】 https://www.bilibili.com/video/BV1uF411o74x/?p=7&share_source=copy_web&vd_source=2d7beee727c4b0510439779fd78c22f7

附录: 基于Stable Diffusion生成的新能源Tesla。

Guess you like

Origin blog.csdn.net/lxwssjszsdnr_/article/details/131586115
Recommended