Druid principle and property insurance practice

image

This article is based on the "Druid Principles and Practices of Property Insurance" shared by Ping An Property & Casualty Big Data Development Engineers Li Kaibo and Guan Zhihua in the Ping An Property & Casualty & DataFunTalk Big Data Technology Salon.

image

The content shared today is divided into two parts. The first part is the principle of Druid, including related selection, principle, architecture and tuning experience. The second part is the usage scenario of BDAS, which is a monitoring log report system based on Druid.

image

Druid is a non-Alibaba open source data connection pool. It is a MOLAP database with an MMDB architecture and a multi-node system. It is also an in-memory database, column-oriented storage. At the same time, PreAGG will be used, which is a NoSQL database that processes time series databases with strong correlation between records and time. Druid also supports many plug-ins in the community, such as kafka plug-ins, mysql plug-ins, hdfs plug-ins, etc.

image

We started the technical selection in May last year. Spark is a relatively widely used framework and features Schema free . It can be stored and analyzed without first defining the data format; it is efficient (intermediate results are not placed on the disk), and the response time is based on the amount of data. Depends. In the end, spark was not selected because the amount of concurrency could not increase, because our business concurrency may be thousands, and using spark can easily cause high temperatures. Elastic Search is also a very popular field . A common understanding is that it is a full-text search engine. In fact, there are many new technologies in analysis. Its feature is also Schema free, and its architecture is compatible with this data format. Compared with Druid, the advantage is that it saves the original data. At the same time, it has a complete technology stack (elk), which is very versatile and perfect. On the basis of analysis, there is a search function. In fact, the choice of framework depends on the specific scenario. Different scenarios and different frameworks have different advantages. Supports high base dimensions, but the disadvantage is that the amount of data does not go up. Sometimes data storage requires inverted indexing, but the amount of indexed data is not much different from the original data, and it is finally discarded. Druid needs to define dimensions and indicators in advance, and also supports pre-aggregation, which is based on time or dimension, so that the original data will be discarded after storage. The data response is sub-second, and the data is available in milliseconds, which basically meets the needs; the Lambda architecture has high scalability and fault tolerance. We chose Druid. The main technologies of SQL on Hadoop are MPP (Massively Parallel Processing) and CS (Columnar Storage), which are characterized by large throughput and require offline batch processing. We currently use it in real-time and offline in parallel. Enterprise-level features of other commercial products, good SQL support, customized hardware, low ceiling (below PB level), non-linear expansion, expansion requires shutdown and maintenance, and the most important point is the difficulty of secondary development. Finally, the technology selection is Druid, which is positioned as an ascending SaaS layer service that is available in real time, supports OLAP scenarios on large-scale cold data, and realizes support for a multi-dimensional high-base sub-second response.

image

The following are some concepts of the original data from the beginning to the storage. The original data is a bit similar to the traditional database format. However, publishers and advertisers consider it to be a multi-dimensional, which must be defined during storage. Another feature of Druid is that row-oriented segmentation is performed based on time. Different rows may be cut into different segments, and columns will be compressed. Segment is the basic unit of Druid storage. The data is divided into blocks by timestamp. The advantage of this is that global scanning can be avoided when querying. The query is to traverse the start time and end time and find the corresponding data block, so the query scenario is faster. The real data block naming format is data source plus start time and end time. It should be noted that if it is a relatively large scene, the data volume may reach the terabyte level in a few hours. At this time, it is recommended to make another block on the data block.

image

Next, let’s talk about Druid data flow. There are many nodes in the flow graph, and each node has its own responsibilities. There is a zookeeper in the middle, and every node is more or less connected to it. Zookeeper is responsible for synchronization in it. Each node will not do strong correlation work, and only needs to be synchronized with zookeeper. From left to right is a data writing process, with offline data and batch data.

The central node Broker is a query node that provides a  REST interface to the outside, accepts queries from external clients, and forwards these queries to Realtime and Historical nodes. Take the data from these two nodes, then return the node to the Broker, merge the data and return it to the client. Here the broker node plays a role of forwarding and merging. The merging process requires a specified memory. It is recommended to configure a relatively large memory.

Historical node Historical node is a place where non-real-time data is processed, stored and queried, and only responds to Broker requests. When querying data, it is now locally searched, and then searched in the deep storage, and returned to the Broker after the search is found, without being associated with other nodes. Provide services under the management of Zookeeper, and use Zookeeper to monitor signals to load or delete new data segments. This node is also very memory intensive. The node can have multiple nodes. It is recommended to use multiple nodes. Each node does not communicate with each other. It also uses zookeeper synchronization to decouple the information.

image

Coordinator plays the role of a manager , responsible for the data load balancing of the Historical node group, ensuring that the data is available, replicable, and in the "optimal" configuration. At the same time, it determines which data segments should be loaded in the cluster by reading the metadata information of the data segment from My SQL, uses Zookeeper to determine which Historical node exists, and creates a Zookeeper entry to tell the Historical node to load and delete new data segments. This node can be one or multiple nodes to elect a leader, and the remaining nodes are used as backups. Generally, two nodes are also sufficient.

The real-time node Realtime  captures data in real time , and is responsible for monitoring the input data stream and allowing it to be immediately obtained by the internal Druid system. If you don't need to load the data in real time, you can remove the node, and it will only respond to the broker's request and return the data to the broker. If the Realtime and Historical nodes return the same data at the same time, the Broker will consider the Historical node data to be credible, and if the data enters the deep storage Druid, the default data is unchanged. The node itself will store data. If it exceeds a certain period of time, the data will be transferred to the deep storage, and the deep storage will provide the data to the historical node.

MySQL、zookeeper、深度存储都是Druid的外部依赖,Deep Storage:可以是 HDFS 或 S3 或本地磁盘,用来保存“冷数据”,有两个个数据来源,一个是批数据摄入, 另一个来自实时节点;ZooKeeper 集群:为集群服务发现和维持当前的数据拓扑而服务; My SQL 实例:用来维持系统服务所需的数据段的元数据,比如去哪里加载数据段、每个数据段的元信息。

image

总结下各节点间分工明确,而且职责分离,挂掉某一个节点不影响其他节点的工作,对扩容友好,容错率高。冷热数据分离,不同数据通过硬件资源进行物理隔离。查询需求与数据如何在集群内分布的需求分离:确保用户的查询请求不会影响数据在集群内的分布情况,避免局部过热,影响查询性能。没有绝对master结构,不仅仅是一个内存数据库,增加了内存映射的能力。Lamada架构,能够实时校正数据,如果数据进入进来节点没有被消费掉会被丢弃掉,就会出现数据库性能问题。社区比较成熟的框架就是数据实时进来写到kafka,kafka数据两次消费,一次在存储节点上,一次在Hadoop上,如果数据不完整就再在Hadoop做一次embedding操作,补回数据。

image

上面是一个推荐的架构,希望broker节点越多越好,Coordinator节点两个overload两个realtime 其他节点也是越多越好。性能方面也会做不同性能的转换。调优方面经验,对于broker消耗内存大户 ,建议 20G-30G 堆内存,历史节点除了内存还有硬盘消耗,希望用更多的内存去释放硬盘的IO,Coordinator 消耗内存相对较小,只需要满足要求即可。查询时尽量做一些聚合优化,在摄入就做聚合,尽量少去group by。Historical  Realtime 分离,Coordinator  Broker 分离,在 Broker 上加 Nginx 做负载均衡,并高可用。异构硬件方面通过划分 Tier,让 Historical 加载不同时间范围的数据。

image

接下来讲一下具体项目应用,产险原使用 Cognos ( Oracle )处理清单报表,上线有十年历史。随着数据量的增长、以及分析处理的诉求增加,Cognos 在 cube 过大时受限的弊端日益体现,无法满足实际生产需要。需要实现的第一个就是要快,第二个是想实现行级别的全列控制。

DBAS系统从去年五月份调入Druid,九月份上线了清单功能,直接查hive上数据做业务分析,12月份完全引入Druid,实现多维分析功能。线上一共有数十个数据源,最大数据源有上百个维度,单一维度最大属性有几十万万。聚合后单表行记录有几十亿,最大单一数据源有几十G,日均访问量数千级,主要应用于产险内部分析,并发峰值数百,平均响应时间 <2s。

image

image

接下来介绍下在HDFS下的使用场景,第一种是透视图概念,用户在某一定条件(不断衰减)查看数据大体概要,一般采用Top N查询,秒级响应。响应方式是在前端一个维度一个维度拖动,后端将上一次结果缓存,最后只查询几个维度。Top N查询第一次查询只查单一维度,当增加维度在redis中取上一次缓存结果加下一维度,多维度会呈指数级增长,查询速度明显下降。我们引入单线程当初考虑了两种方式,第一种方式是依次将N个维度的top N都查出来,然后构造M*N*P个多线程,这样查询速度会很快,大概就是一个top N的时间,这样存在一个问题就是顺序不能保障。第二种方式采用递归的方式,并统一由线程池执行(是不是线程开线程?不是)更细粒度的缓存:如由维度A ,维度A+维度B 改为 维度A+A1,维度A+A2,维度A+A1+维度B+B1 ,这样可以充分利用Druid的升降序,花费的时间可能多点,,大约需要N*M个top N的时间。

image

第二种场景是交叉表,分析人需要看到全量数据而不是概要数据。开始就是无论查多少维度都将其组装成一起,当超过4-5个维度就会效率很低。改进的方式也是采用多线程,前面基本按照top N的方式构造,保留最后两个维度进行group by,A1+B1+C维在查询时有缓存策略,由于小集群采用block缓存,这样可以省去网络传输。两种场景一种采用top N,一种采用group by。两者区别top N可能会不准确,top 1000能保证前900是准确的。

image

The third scenario is indicator calculation. The first method is to calculate it and store it on hive first, and then enter Druid, which consumes a lot of money . The second way is to calculate in Druid, and you can get the result faster every time you customize the query.

image

Dimension merging and hiding. Merging means that users want to treat some attribute values ​​uniformly. Hiding is to reduce eye interference. In fact, a better way is to reduce one dimension. The fourth is to achieve full control of the row. This is achieved by accessing the user account. The user has a department code, so four columns are set in each data source, and the full control of the row is achieved after filtering.

about the author:

Two teachers, Li Kaibo and Guan Zhihua, have many years of working experience in the field of big data, have a deep understanding of the bdas system, Druid database system, etc., and focus on the research and development of the application of big data technology in the financial field.

——END——


image

image


Guess you like

Origin blog.51cto.com/15060460/2677344