Number of positions 1 learning

Generally speaking, the big data platform consists of three parts:

  • Data related tools, products and technologies:

bulk data acquisition and transmission sqoop, spark

off-line data processing Hadoop, Hive, Spark

real-time streaming Storm, Spark Streaming, Flink

•  Data assets:

Data business itself and precipitation

data generated by the operation of the company (such as financial, administrative)

Third-party data: external purchase, exchange or crawlers from data

•  Data Management: With the tools and data that needs to be managed in order to allow maximum data value and minimize risk

Data management techniques and concepts: data warehousing, data modeling, data quality, data standards, data security and metadata management

Heart-shaped model

Dimension tables: Some attribute the dictionary table product information,

Fact Table: User behavior

Snowflake model

For example, the user's age, gender id --- "id name, age

Uniform standards: for example, a business unit to delete online 1 0, and the other, delete the line Y N 

Caliber is often said that where the filter conditions

 

The above data modeling a business line

The whole big data sector data warehouse ------- "data mart {

Pulling the relevant fields to establish wide table -------- "on the basis of wide tables -----" extract the field of each service form the corresponding service table (machine learning, data analysis) ----- - "statistical analysis (join, or staging table) -----"

}

This is for all types of data (Buried collect data, employee data, business product data) exists all data warehouse ============== "Follow the corresponding sub-department use to build the table

 

Modeling --- "The benefits of tiered: decoupling, less impact on downstream upstream data dependency tables to find business issues

 

ODSOperational Data Store,操作数据存储):原始数据层,数据源头表通常会原封不动的存储一份。DW层(DWDDWS层):

DWD(data warehouse detail明细层

DWS(data warehouse service 汇总层

数据仓库明细层DWD和数据仓库汇总层DWS是数据平台的主要内容。它们是通过ODS层经过ETL清洗、转换、加载生成的

基于维度建模理论来构建,通过一致性维度和数据总线来保证各个子主题的维度一致性。(就算数据表被删了也可以重新跑 从ODS恢复过来)

ADS(集市数据层,也称应用层):应用层主要是各个业务方或者部门基于DWD和DWS建立的数据集市(DM),数据集市是相对于数据仓库来说的。一般应用层的数据是来源于DW层,原则上是不能访问ODS层的。对比于DW层,应用层只包含部门或业务方自己关心的明细层和汇总层的数据。(一般是将各个要用的表join起来形成宽表,供下游业务分析人员 select * )

 

 准备区:在hdfs备份一份原始数据

dw:数据仓库,数据开发建模

dm:数据集市应用  多表join的结果

OLTPOLAP的区别:

OLTP(online transaction Processing) 联机事务处理过程:侧重于单条数据的查新,主要是在关系型数据库上

OLAP联机分析处理:专门的分析性数据库,侧重于批量的数据请求,更加试用于大数据查询处理

列式存储的好处:

对于OLAP 查询都是相关的列,不需要读取整个表所有字段进行处理

对于OLTP 进行增删改查,多半是对整行数据进行操作

 

Guess you like

Origin www.cnblogs.com/hejunhong/p/11241656.html