Road to Real-time DQC Big Data Quality Construction at Station B

d5a33f6c212d92622e4f326e4f60e779.png3 million words! The most complete big data learning interview community on the whole network is waiting for you!

background

Data quality is one of the important prerequisites and guarantees for the effectiveness of applications derived from big data. The rapid development of business needs of Station B and the vision of incubating more in-depth and competitive applications relying on big data in the future require our data platform to provide real-time, accurate, and reliable data that can be trusted by all business parties. It can be said that reliable data is the embodiment of the core competitiveness of the big data platform. Therefore, in the construction process of the big data platform of station B, the data quality platform has become an indispensable part, because its mission is to escort the data quality of the big data platform.

 Quality platform

platform components

dfaea51dcc9f458c8fe70c4b8b5aca94.png

 Introduction to DQC

Function diagram

d00083b8b36fa59394ba8fbeedae4c07.png

  • The main working link of DQC is basically similar to that of ordinary monitoring systems such as prometheus. The process includes data collection, data inspection, and alarm notification. It is necessary to minimize the impact on the normal operation of the monitored objects, and at the same time, timely alarm the abnormalities of the monitored objects. .

  • There are offline and real-time functions

  • Offline DQC: It is mainly used to guarantee the data produced by offline tasks. The trigger rule is mainly to notify DQC to collect after the offline scheduling task is completed, and then check and alarm according to the threshold. Usually the user configures the inspection rules, which will generally take effect after the next task is completed.

  • Real-time DQC: mainly for Kafka data sources. Kafka is a very important basic component in the big data platform, especially in the real-time data warehouse, it is used more frequently. Generally, the data in a certain window is collected and inspected in a fixed inspection cycle according to the rules. Usually the user configures the checking rules, which will generally take effect after the next data period.

Real-time DQC

 The first version of the plan

6b7b484a6968dd39e17642e536fdc6cb.png

The mainstream framework for flow computing within the company is Flink, and all real-time DQC also uses the Flink computing framework.

Real-time DQC is much more complicated than offline DQC, because its data collection tasks are always running, and it is difficult to update rules or add collection objects to an already running task.

Therefore, in the first version of the solution, every time the user adds a collection object or a rule, the DQC service will regenerate a new Flink task for the user to complete the collection, and write the collection results into MySQL. At the same time, the quality rule check is triggered regularly, the check result is written into MySQL, and an alarm notification is determined based on the check result.

This solution has a simple structure, but there are still the following disadvantages:

Low resource utilization:  Due to one inspection rule and one data collection task, Flink tasks are idle most of the time, especially in some topics with low traffic.

High consumption of network bandwidth:  For different rules of a topic, multiple tasks are required to consume. If the topic traffic is heavy, it will have a great impact on the network bandwidth.

Poor stability:  The modification of the rules requires restarting the data collection task. When the cluster resources are tight, the startup may fail because the YARN/K8s resources cannot be applied for.

The monitoring program itself takes up a lot of resources, and it is out of date now that cost reduction and efficiency enhancement are the mainstream. Therefore, we urgently need to transform the real-time DQC solution. There are three requirements for resource usage:

  1. Start once, never reboot again. Avoid resource competition and improve system availability.

  2. One task can be shared by multiple topics. In practical applications, some Topic traffic is very small, causing Flink tasks to be relatively idle most of the time. If a Flink can consume several more such topics, the same resource can do more things, and the resource utilization rate is higher.

  3. One-time consumption, multi-rule verification. Avoid repeated consumption of a topic by multiple tasks and reduce network bandwidth consumption.

 new plan

a091a2f65c26d3dce2eb4529ec8fbf36.png

  • Design goals:

    • Accuracy of real-time DQC collection procedure;

    • Timeliness of rule checks;

    • Minimize the resource consumption of real-time DQC;

    • Minimize the impact of Kafka's normal operation;

Overall architecture

95d4d281c2761ac232af704c9ee13e0b.png

Currently, there are 7000+ online topics. In order to facilitate management, the real-time DQC divides the topics into large topics, medium topics, and small topics mainly based on the QPS acceptable to Influxdb and the utilization rate of Flink tasks. Small topic: message volume <1000/s, medium topic: message volume (1000/s—10w/s), large topic: message volume> 10w/s.

case analysis:

  • Large, medium, and small topics are managed separately, and different paths are used for data collection. All data in small, medium, and small topics are imported into the full table (the fields will be filtered according to the inspection rules), and then the Influxdb CQ function is used to aggregate and store them in the CQ table to complete data collection. , while the big topic directly calculates the final aggregation data in the Flink task and stores it in the CQ table;

  • Influxdb full scale table: used by small and medium topics, save the full amount of data in the topic (the fields will be filtered according to the inspection rules), and the general valid time is one hour;

  • Influxdb CQ table: All topics will be used, and the aggregated data of topics will be saved for rule checking. The general valid time is two weeks;

  • Topic dynamics: You can add or delete monitored topics during runtime without restarting the Fllink task;

  • Dynamic rules: You can add, delete, and modify Topic monitoring rules during runtime without restarting the Fllink task;

  • DQC resource management: This includes two parts, a Flink task and Influxdb resource, rationally use these two resources, and dynamically allocate and manage Topic or Topic rules;

Small and Medium Topic Solution

ef92e1d952aa417fac4345b3597d56fc.png

  • Import all the data into the full table (the fields will be filtered according to the inspection rules)

  • Topic and Mapper support dynamic:

  • Topic is dynamic, and the Topic list is maintained in the configuration center. KafkaConsumer can perceive configuration changes to achieve dynamic addition and deletion of Topics;

  • Mapper is dynamic, according to the consumption Topic list, Topic internal data format, and corresponding DQC rules, the processing logic of Mapper is dynamically formed and pushed to Mapper Warpper;

Small and Medium Topic Solution—kafkaconsumer

e28b14eaa8c8446642ff536a84e159f0.png

Solution: Due to the development manpower and time, a new Kafka Consumer that supports dynamic Topic is not fully developed, but FlinkKafkaConsumer is extended, and KafkaFetcher && KafkaConsumerThread is handled by hacking.

Small and Medium Topic Solution—Mapper

08ad9ee84048778ea2c8810a843d80aa.png

Solution: Form the bytecode of Mapper according to TopicList, Topic content, and DQC rules, store it in the configuration center in Base64, and obtain Mapper Wrapper dynamically to form a new Mapper to replace the old Mapper.

Big Topic Solution

Due to the large traffic of large topics, if the same solution is used as small and medium topics, it will cause great pressure on both network bandwidth and underlying storage. Therefore, we chose another solution---single topic, single task + rule dynamics.

Each Flink task only consumes one topic. In this task, we need to be able to dynamically perceive changes in the inspection rules for this topic, and mark the consumed real-time data streams according to the inspection rules. Press the Windows are aggregated.

b012e03f6e7da4c7de96cce72358dba2.png

  • Dynamic logic: the rules are dynamic, and the meta information of the rules is maintained in the configuration center. FlatMap can sense the changes of the rules, and output the records hit by the rules backward for aggregation.

Rule Dynamics

751a3b13c5cf83e5842df113fe3ba2ec.png

In order to reduce the user's learning cost, in this solution, we are consistent with the offline DQC configuration, and the user can filter the data through the SQL custom where clause to achieve more refined quality inspection. The rule parser will perform lexical analysis on the where clause in the SQL, and generate corresponding filters for data filtering. After marking the records that meet the rules, the rule ID will be output backward according to the rule ID, and then the aggregation operator will perform local aggregation.

data bloat

Data expansion may occur during the process of outputting to the window for aggregation after rule marking.

c514bb93e5fbfbcb7eb7b3d6aa5a37a1.png

As shown in the figure above, when the data enters FlatMap, FlatMap will tag the data with a rule ID according to the rule filter, and then send the data back to the aggregation operator, and the aggregation operator will aggregate according to the rules. If a record does not hit any rules, it will not be output backwards. As can be seen from the above figure, the number of input items is 1, the number of output items is 4, and the data is expanded by 4 times. This is because the format of the data we output backwards is, using the rule ID as the grouping condition. If a record hits multiple rules, multiple records will be output. Since this solution is aimed at a large topic, the traffic itself is very large. After being amplified by the rule filter, the bandwidth pressure is very heavy, which does not meet the original design intention.

In response to this situation, we have made two optimizations:

  1. The FlatMap data output format is adjusted to <group Key, RuleIdList, Data>. Merge all the rule IDs hit by the data together and output them backwards. After the aggregation operator receives the data, it performs business calculations based on the RuleIdList.

  2. Using the map->reduce->reduce architecture, the output results after marking the rules will be aggregated locally first, and then the data will be summarized globally.

3bf070c2d3a56e036486ee30c28d0f0d.png

However, this design does not completely solve the problem.

rule
processing logic
group key
table row number class rule Local accumulation, when summarizing, the local calculation results are accumulated to obtain the summary value data hash
Aggregated Value Class Rules Local accumulation, when summarizing, the local calculation results are accumulated to obtain the summary value data hash
Maximum/Minimum Class Rules Take the maximum/minimum value locally, and take the maximum/minimum value of the local calculation results when summarizing to obtain the global maximum/minimum value data hash
mean class rule Local calculation <accumulated value of field value, accumulated value of number of entries>, when summarizing, combine the local calculation results to calculate the global <accumulated value of field value, accumulated value of number of entries> to obtain the global <accumulated value of field value, accumulated value of number of entries> data hash
Field deduplication (Distinct) class rules Locally deduplicate the field value and output it backwards, and accumulate the number of fields when summarizing field value

After analyzing the real-time DQC quality rules, except for the field deduplication (Distinct) rules, all other rules are completely independent of the specific value of the key of the group, because the business logic of the partial aggregation is consistent with the final aggregation. However, the field deduplication (Distinct) class rule is to deduplicate locally, and the number of deduplicated records will be calculated in the final summary. Therefore, the grouping key must use the field value during local aggregation. In this regard, we adjusted the optimization plan of the first step:

  1. When the topic has no field deduplication (Distinct) rules, all the rules are combined and output, and the output format is

  2. When the topic has a field deduplication (Distinct) rule, all rules will be combined and output, and the output format is <field value, ruleIDList, Data>

  3. When a topic has multiple field deduplication (Distinct) rules and uses different fields. Merge output of one of them with all other non-deduplication rules. The remaining numeric field deduplication (Distinct) rules are output separately.

The final shape is shown in the figure below:

ce296db3211a24194b545661b4398efe.png

Before optimization, due to the high data expansion rate of FlatMap, aggregation operators often experience back pressure, resulting in a decline in consumption performance.

After optimization, the data bloat rate is greatly reduced, but still exists. The expansion rate depends on the number of field deduplication (Distinct) rules, and the expansion rate is predictable and controllable. At present, in the big data platform of Station B, except for special businesses, there is basically no use of such rules, but it is still necessary to monitor the use of the rules, increase resources in a timely manner or take the rules offline to ensure task stability.

Influxdb proxy solution

In order to better adapt and handle the connection between read and write requests and Influxdb, we have introduced the Influxdb proxy service.

dc08ac73061d89646980ae6fe4a05581.png

The back-end Influxdb cluster contains multiple groups of instances, and the data in each group of instances is not fragmented, but only backs up each other, and the final consistency is guaranteed by proxy double-writing.

During the double-write process, if there is a write failure, Influxdb proxy will record the failed request content in a local file and retry until it is successfully written.

Each Influxdb instance node contains the complete full table and CQ table data assigned to the instance.

Optimizing read requests

When querying, the proxy will select the optimal Influxdb node (data integrity, node performance) at the back end to query.

If all Influxdb nodes in the backend have problems, the query will be degraded.

Optimize write requests

Since real-time DQC writes a large amount of data and is extremely frequent, in order to reduce network IO traffic overhead, Influxdb proxy uses gzip compression and batch writing to improve writing performance.

After the Influxdb proxy went online, due to the large amount of data written in real time during actual operation, the network IO has been in a peak state. After investigation, it was found that the original data input was a request containing 5000+ complete Influxdb data insertion statements, and most of them were the same db and the same measurement, and each insertion statement had too many invalid tags. Then we optimized the interface protocol for data writing. Each request can only be written to the same db, and the same measurement is extracted, and the invalid tag in the insert statement is deleted at the same time.

After the above transformation, the network IO traffic overhead has been significantly improved and decreased.

Operation and Maintenance Guarantee

In Influxdb proxy, Prometheus monitoring is introduced to monitor the qps of read and write requests, the written db, the number and distribution of measurements, etc. in real time, so as to better strengthen the utilization of related resources.

Supports back-end Influxdb cluster management, including node data synchronization, new node addition, deletion, node data recovery, etc.

Influxdb scheme

full scale

In the processing of small and medium-sized topics, we choose to store all the data in the database, and each topic will be stored separately as an Influxdb measurement. The structure of the measurement is as follows:

1ecfaec01e599d9d3eaef72275b7566e.png

  • time: data time

  • subtask: business default field, flink subtask number consumed

  • sinknum: business default field, subtask consumption order

  • Extended tag: topic field, used for indexing

  • record_num: business default field, the value is 1, used to calculate the row number rules

  • Extended field: topic field

Business default fields:In the design of the full scale table, we added two additional tag fields: subtask and sinknum. Due to the characteristics of Influxdb, when the time and tag of two records are exactly the same, the record written later will overwrite the previous one. This situation can lead to false positives when calculating table row count rules. Therefore, in order to ensure that the data can be written, we added two fields, subtask and sinknum. The subtask number in the flink context used by subtask, and sinknum are implemented using a circular array with a value range of 1-200, which is used to ensure that each Records are unique and there will be no overwriting problems.

TTL time:The role of the full scale is mainly to prepare for the CQ form. Influxdb will regularly read the full scale table, perform data aggregation according to the quality rules, and write the aggregation result to the CQ table. The characteristics of real-time DQC determine that the full scale data that has been calculated into the CQ table has expired. Therefore, in order to avoid unlimited growth of resources such as the number of data items and sequence numbers, we design the TTL time of the full scale table as 1 hour, and the full scale table records that exceed 1 hour will be deleted by Influxdb.

The relationship between Tag and Field:Tag is used as an index in Influxdb, and is usually used as a query condition, while field is the specific value of an event generated at a certain point in time. In Influxdb, if a query does not query any field, then the query is meaningless and will not return any results. Because only the field can reflect the information at that point in time. Therefore, our business reserves a record_num field as a field by default, and the table row number rule will use this field first.

What fields will be stored in Influxdb?

The number of fields in a topic cannot be determined, several or hundreds are possible. At first, we did not filter these fields, and selected all the databases. During the operation, we found that the call to the Influxdb-proxy service repeatedly occurred abnormally. Such an abnormal phenomenon was found in the problem location. The Influxdb-proxy machine consumes a lot of network bandwidth and high CPU usage. The final conclusion is that when Influxdb-proxy writes to Influxdb, it will compress the data, reduce network bandwidth consumption, and improve writing performance. However, due to the excessive redundant information of the written data, the compression process will cause a very high CPU load. The Influxdb-porxy machine cannot continue to provide services due to high load.

Therefore, removing redundant data brings us to the point of optimization. After business analysis, most of the fields in a topic of our task do not need to be stored. Taking a TopicA (40 fields) of the live broadcast business as an example, calculate the quality rules under a certain time window:

rule ID
rule
Business SOL
Fields that need to be stored
1
number of rows select count(1) from TopicA none
2
The number of records from the IOS platform select count(1) from TopicA where platform = 'IOS' platform
3
Different mid numbers from Android platform select count(distinct mid) from TopicA where platform = 'android' platform,mid

According to the rules, we found that the only fields we actually use are the platform and mid fields, and the rest of the fields are redundant for the current business. Considering the current situation of using real-time quality rules, removing redundant fields and storing only necessary information can reduce network bandwidth consumption by more than 90%, and the usage rate of Influxdb-proxy has been maintained in a stable range. In the subsequent use, this kind of abnormality has never been reproduced again.

What field will be applied as tag?

Tag is used as an index in Influxdb. All tags in a record are called sequence Series. According to the performance of the machine, the number of sequences that each machine can carry is limited. The expansion of the number of sequences will cause the read and write performance of Influxdb A sharp decline.

In the early days of the launch, such an accident occurred because the strict SOP process was not carried out when accessing the Topic. A topic contains the XxxId field, and the operation and maintenance personnel mistakenly believe that this field is a dictionary information, and the value range is limited. Therefore, it is directly accessed as a Tag field, and as a result, the field has a high cardinality, causing the number of Influxdb sequences to rapidly expand to 300W in a short period of time, and Influxdb can hardly provide external services. In the summary meeting, in order to avoid this kind of situation from happening again, we formulated the Topic access SOP and strictly checked the Tag field.

The selection of the Tag field is very important to the stability of Influxdb. How do we determine what field will be applied as a tag?

b54d9327bd964036bbe22a25385d92d1.png

We believe that the use of the tag field needs to meet the following two points:

  1. The where clause is used as a filter condition, such as where platform = 'ios' clause, the platform field will be selected to continue filtering

  2. The value range of the field can be enumerated, and there is no high base. For example, the value range of the platform field is in ios, android, and web, which conforms to the selection characteristics of the tag field.

In the above example, the platform field will be applied as a tag, and the mid field will be applied as a field due to its high base. In practical applications, fields that cannot be used as tags and need to be used will be stored as fields.

Add new rules, how to expand the fields?

Similar to unstructured databases such as mongodb, measurement in Influxdb does not require a predefined schema and has good scalability. When adding a new field, you only need to specify the new field and field value in the write statement to complete the field addition.

CQ table

The CQ table stores the result of real-time data aggregation according to business rules. Its structure is simple and fixed, as shown in the following figure:

de27ad8e5ce406a4a9c26da325c665b4.png

  • time: calculation window start time

  • rule_id: quality rule ID

  • value: the calculation result of the quality rule in the time window

There are two sources of CQ table data:

  1. Scheduled aggregate writing of full scale tables: It is realized by the basic capability Continuous Query provided by Influxdb itself. Relying on this capability, Influxdb can automatically and periodically run queries on real-time data, and write the query results into the specified CQ table. Currently, Continuous Query runs for 99 minutes Bit time-consuming <30ms.

  2. Flink task (big topic) aggregates and writes in real time: large topic consumption tasks will perform quality rule calculations within the time window, and write the calculation results into the CQ table

TTL time of the CQ table: The CQ table is the result of aggregation by minute, and its TTL time is designed according to the business. Due to the existence of rules of volatility ratio types such as day-to-day and week-to-week, the TTL time of the CQ table is currently set to 14 days to avoid false alarms due to data expiration and deletion.

Influxdb water level

Influxdb is the core of the whole solution, and ensuring the stability of Influxdb is a very important part. Therefore, the monitoring of Influxdb is the top priority. After testing the actual machine we use, the current single-machine write bottleneck is 150W/s, and the peak number of sequences is around 200W. Exceeding the value may lead to a decline in write performance.

We have taken the following measures in this regard:

  1. Influxdb supports horizontal expansion. According to the topic metadata and Influxdb monitoring information, if the current topic traffic carried by a certain Influxdb or the real-time writing volume of Influxdb has reached 80% of the bottleneck, the newly connected topic will be written to the new Influxdb.

  2. Sequence number growth monitoring, due to the limitation of data ttl time, the number of sequences should change periodically, and the peak value should be kept within a data range stably. However, if the tag field is not selected properly and a high-base field is selected, the number of sequences will increase rapidly. When this situation is detected, it is necessary to deal with related topics in a timely manner to ensure service stability.

O&& Abnormal Situation

In order to ensure the stability of the real-time DQC new architecture, we mainly considered the following exceptions.

Traffic surge, message accumulation

There are two main cases of sudden traffic increase and message accumulation:

Predictable growth on the business side: There will be more such situations, such as certain live broadcast events, related topic traffic surges, similar to the recent live broadcast of the League of Legends S12 finals

Unpredictable growth of the business side: This kind of situation will be relatively rare. The UP master uploaded a certain video and became a hit. This phenomenon is unpredictable.

After evaluation, when messages accumulate, the alarm at this time may be inaccurate, so we need to try our best to avoid message accumulation.

In response to predictable growth, we will increase resources in advance to increase the consumption capacity of tasks.

In view of unpredictable growth or when messages accumulate even if resources are increased, we believe that the task is running in an abnormal state at this time. We will downgrade it and close the relevant alarms triggered by the task. High-priority topics such as P0 will send Notifications are given to users to inform them of the affected areas.

730d9d1db4ae42b5e7bd2d7627ed5ea1.png

Program Crash

Use Flink's checkpoint mechanism to ensure abnormal recovery of tasks. But the following problems still exist.

Small and medium Topic:

In this solution, the data needs to be written into Influxdb. To improve the throughput, the data received in the sink phase will not be written into the storage every time, but put into the buffer first, and then written when the data length or waiting time reaches the threshold. into Influxdb. Therefore, if the program crashes, the latest data in the buffer may be lost, which may lead to omissions and false alarms.

At present, the time threshold we set is 10 seconds, and the number of records is 1000. Considering the cost of saving the data in the cache to the file system at each checkpoint, we think such a loss is acceptable. In the product The announcement will also explain to users.

Big Topic:

Using Flink's checkpoint mechanism can well guarantee that the task can be restored from the last state. When writing to Influxdb, the aggregated results will be overwritten based on time and tag information to avoid repeated writing of results.

What needs to be considered is that the topics targeted by this solution are all high-traffic. If the recovery time is too long, data accumulation may occur. It is necessary to consider the processing scheme of data accumulation.

Repeat consumption

Multiple tasks repeatedly consume the same topic, resulting in more data written to Influxdb, which may lead to false alarms.

Note: the group.id configured in kafkaconsumer no longer has the characteristics of a consumer group, because kafkaconsumer uses the underlying API to allocate TP.

solution:

A task is strongly bound to a topic. When assigning a topic to a consumer task, register the task and the topic into the database. If the same topic already exists, the startup will fail and an alarm will be issued.

Combined with the above abnormal situation, in order to ensure the stability of the new architecture, we mainly took the following measures in terms of operation and maintenance:

  1. Flink cut-off and accumulation monitoring

  2. Influxdb cluster status monitoring

  3. Influxdb sequence number monitoring

With the help of these measures, we can help us find problems and solve them in time.

Real-time DQC follow-up work

 engineering

7aa707231b8c2d8c6f9600ce18e2e705.png

Although the new solution has been launched and implemented, because the new structure is slightly more complex than before, it still lacks in automatic engineering. The existing DQC rules are manually transplanted by developers, and the subsequent addition of rules also requires the participation of developers.

On the other hand, hierarchical guarantee is also a starting point for our engineering. Topics at the P0 level need our priority protection under any circumstances. The topics of P0 and P1 will not be merged and consumed in one task. If the start of the P0 task fails, according to the failure exception information, it will choose to stop a running P1 level task, release resources, and use it to start the P0 task first. When the P1 task stops, an alarm will be sent to the user and the operation and maintenance personnel to manually judge whether to increase the queue resources.

In the next stage, we will continue to make efforts in automatic engineering and hierarchical guarantees to present a better user experience to users.

 Flink task management

Now that cost reduction and efficiency enhancement are the mainstream, relying on the topic dynamic and rule dynamic designs can save a lot of cluster resource consumption, but there is still room for optimization, that is, the more refined management of multiple Flink tasks. For example, in the topic dynamic solution, should a topic be added dynamically to an existing task, or should a new task be started for consumption? This needs to be judged in conjunction with the current load of the task.

There is also the bottleneck problem of Influxdb stand-alone processing. In our plan, Influxdb can be expanded horizontally. When the Flink task starts, it needs to combine the Influxdb cluster load and current task information to select the optimal Influxdb node to write data.

 Effect on normal flow

The quality monitoring of topics should not affect the normal online tasks. Currently, the queue used by the quality task is still mixed with other tasks, which may cause resource competition, occupy machine resources and affect other tasks on the machine, etc., and will be adjusted according to resource planning in the future.

If this article is helpful to you, don't forget to  "Like",  "Like",  and "Favorite"  three times!

352b69d030dfc192a913efbc57edbf42.png

6b481338e734786b027d55820bd4ef8f.jpeg

It will be released on the whole network in 2022 | Big data expert-level skill model and learning guide (Shengtian Banzi)

The Internet's worst era may indeed be here

I am studying in university at Bilibili, majoring in big data

What are we learning when we are learning Flink?

193 articles beat Flink violently, you need to pay attention to this collection

Flink production environment TOP problems and optimization, Alibaba Tibetan Scripture Pavilion YYDS

Flink CDC I'm sure Jesus can't keep him! | Flink CDC online problem inventory

What are we learning when we are learning Spark?

Among all Spark modules, I would like to call SparkSQL the strongest!

Hard Gang Hive | 40,000-word Basic Tuning Interview Summary

A Small Encyclopedia of Data Governance Methodologies and Practices

A small guide to user portrait construction under the label system

40,000-word long text | ClickHouse basics & practice & tuning full perspective analysis

[Interview & Personal Growth] More than half of 2021, the experience of social recruitment and school recruitment

Another decade begins in the direction of big data | The first edition of "Hard Gang Series" ends

Articles I have written about growth/interview/career advancement

What are we learning when we are learning Hive? "Hard Hive Sequel"

Guess you like

Origin blog.csdn.net/u013411339/article/details/131335778