Zhihu Case Sharing: 4D Explains the Structure and Practice of User Portraits and Real-time Data Warehouse

User portraits and real-time data analysis are the data core of Internet companies. Based on Apache Doris, the Zhihu data empowerment team builds a real-time data architecture with high response, low cost, stability and flexibility based on cloud services, and supports three core business flows: real-time business analysis, real-time algorithm features, and user portraits , significantly improve the perception and response speed of time-sensitive hotspots and potentials, greatly reduce the cost of crowd targeting in business scenarios such as operations and marketing, and bring significant gains to the accuracy of real-time algorithms and core business indicators.

Keywords: data warehouse, Apache Doris, user portrait, real-time data

insert image description here

1 Introduction

In Zhihu business, with the development of various business lines, there are more and more demands for user portraits and real-time data. In terms of user portraits, it is expected to have faster, more accurate, and more convenient crowd screening tools and convenient user group analysis capabilities. For real-time data, it is expected to have user behavior streams that can respond in real time. At the same time, there are more and more demands for real-time data in business scenarios such as algorithm features, index statistics, and business display.

In August 2021, the Zhihu platform team established a data empowerment team. In view of the phenomenon that there is no undertaker for historical real-time data demand, the existing user portrait system cannot meet the current situation of diverse crowd orientation, and the business demands of the business side for further crowd analysis, it is proposed that the infrastructure layer choose Apache Doris as the real-time data warehouse technology selection, The business tool layer builds systems such as real-time data integration, real-time data scheduling, and real-time data quality center, and the application layer builds solutions for real-time data applications and user portrait applications. This solution specifically solves business pain points and meets business demands.

Splitting the current business is mainly difficult in real-time data and user portraits. It includes the following three directions:

  • Real-time business data
    By providing real-time business indicators, it solves the control of business hotspots and potentials, assists production and consumption, and improves high-quality creation and content consumption capabilities.

    Provide real-time explicit indicators of complex calculations, enhance user experience, solve the high maintenance cost and complexity of back-end script calculations on the business side, save costs, and improve human efficiency.

  • Real-time algorithm features
    Based on real-time data, provide a variety of real-time algorithm features, and work with the algorithm team to improve core indicators such as DAU, retention, and user payment.

  • User portraits
    User screening, multi-dimensional and multi-type targeted screening, and access to marketing, advertising, operating platforms and other systems to improve business efficiency and reduce personnel costs.

    User analysis, multi-angle user analysis, directional user analysis report 0 cost, help business departments quickly grasp the core customer market.

This article is based on the data empowerment team of the Zhihu platform. Based on the goals of the above three directions, we will introduce the technical practical experience and experience in this area one by one with regard to these four questions:

  • How to drive business development through real-time data?
  • How to build a real-time data center from 0 -> 1?
  • How to build an efficient and fast user portrait system to solve various problems of the historical system?
  • How to quickly and efficiently develop business functions and ensure business quality?

1.1 Explanation of terms

noun/ abbreviation describe
UBS User Behavior System. Zhihu's real-time user behavior system. Contains real-time user behavior streams and related quick query storage.
DMP Data Management Platform. Zhihu's user portrait system. Including crowd screening, crowd analysis and other functions.

1.2 Combination of real-time data, user portraits and various services

insert image description here

2. Challenges and pain points

For the current business objectives, there are mainly the following specific requirements.

2.1 Valuable

1) How to discover business value through effectiveness?

  • Set up time-based indicators such as hotspots and potentials and related rankings to directly support business development.

2) How to maximize the screening and analysis capabilities of user portraits?

  • It is necessary to fully cover the various needs of multi-dimensional user screening.
  • Multi-angle and multi-way coverage user analysis.

2.2 Data effectiveness

1) How to immediately perceive the latest user behavior when browsing 6 pieces of content on the top screen of the recommended page for the second time?

  • Improve effectiveness through UBS construction (described below)

2) In the recommendation algorithm, the effect of a very real-time feature recommendation algorithm is much better than that of an algorithm that updates features at the daily level. How to ensure that the algorithm is subject to feature changes within 10 minutes?

  • Through the real-time data system and the joint construction of Apache Doris, upgrade to update within 10 minutes (described below)

2.3 Interface real-time performance

For hotspot operation scenarios, it is expected that the user portrait service can quickly screen out a large number of people at the second level, and how to solve the operation scenarios such as user follow-up pushes?

Through the joint construction of the user portrait system and Apache Doris, the speed of crowd screening is improved (described below)

2.4 Complexity

1) There are almost no count and sum requirements for real-time data. It is almost always the case of complex deduplication and joint calculation of multiple data.

Take playback volume as an example. Under multiple conditions such as broadcast start, pause, end broadcast, heartbeat, etc., there will be multiple points at the same time, which need to be deduplicated. At the same time, based on the video answer, the relationship between the video and the relationship between the two authors' joint creation, it needs to be superimposed, and at the same time, it is guaranteed to filter some of the playback behaviors when the content of the parent and child is abnormal.

2) For the crowd analysis business, it is expected to carry out crowd correlation calculations from multiple angles and dimensions, and at the same time perform TGI calculations for the current crowd and comparison crowd based on all user characteristics, and screen out salient features. How to solve this problem?

Through the joint construction of the user portrait system and Apache Doris, complex crowd analysis can be solved (introduced below)

3) There is add/delete/modify logic in the business data, how to synchronize in real time?

The real-time data integration system is jointly built with Apache Doris to solve the addition/deletion/modification logic (described below)

4) There is a lag in the discovery of abnormalities in the detailed data. After the abnormality is discovered, it is necessary to correct the construction method in a targeted manner and restore the retrospective data. How to solve it?

Solved by choosing the Lambda architecture as the data architecture (described below)

3. Practice and experience sharing

3.1 Overall Business Architecture

Based on the current business, it was split from top to bottom. It is mainly divided into application layer, business model layer, business tool layer, and infrastructure layer. Based on our current business form, top-down

  • Application layer: Responsible for our current business applications, directly provide tools for the business or provide certain modules of the business, share goals with the business, and empower the business.

  • Business model layer: supports application layer construction and certain real-time analysis capabilities, and is also used as a functional module of a certain business process to build external business and its own application layer, share goals with the business, and empower the business.

  • Business tool layer: supports the development of application layer and business model layer, provides common tools, aims to reduce the construction cost of application layer and business model layer, improve the engineering efficiency of overall construction, and ensure business stability and accurate data quality.

  • Infrastructure: The infrastructure and cloud services provided by the technology center provide stable and available basic functions to ensure the stability of the superstructure.
    insert image description here

3.2 Data architecture selection for real-time data

Data architectures that solve current problems generally include Lambda architecture and Kappa architecture. According to the current business characteristics, complex calculations and occasional abnormal problems require large data volume backtracking and other characteristics. The current data architecture of real-time data adopts the Lambda architecture. Doris carries the batch processing at the minute level, and Flink carries the stream processing of simple logic at the second level. details as follows:
insert image description here

3.3 Application layer construction experience sharing

3.3.1 Real-time data system

01 Business Scenario

The real-time data system mainly has two major directions: real-time business data and real-time algorithm features.

(1) Real-time business data.

  • By providing real-time business indicators, it solves the control of business hotspots and potentials, assists production and consumption, and improves the amount of high-quality creation and content consumption.

  • Provide real-time explicit indicators of complex calculations, enhance user experience, solve the high maintenance cost and complexity of back-end script calculations on the business side, save costs, and improve human efficiency.

(2) Real-time algorithm features.

  • Based on real-time data, it provides a variety of real-time algorithm features, and works with the recommendation algorithm team to improve core indicators such as DAU, retention, and user payment.

02 Difficulties faced

(1) There are many dependent data sources and the calculation rules are complicated. Take our playback volume calculation as an example:

  • There are multiple behaviors, which need to be deduplicated.

  • There are many filtering and summing rules, which need to rely on different data results from multiple data sources for calculation.

insert image description here
insert image description here
(2) High time sensitivity

  • Taking the algorithm feature as an example, after a user browses a certain content, after a series of calculations for subsequent associations, the calculation results need to be produced within a certain period of time (the effect of follow-up recommendations will fluctuate if there is no output in 10 minutes, and the effect of this feature will be reduced to 0)

(3) High coordination cost in the scheduling process

  • It needs to be in the scheduling system, and at the same time be able to identify the progress of kafka stream consumption and task completion.

  • It is necessary to strictly align the consumption progress of multiple dependencies. When the unified progress is reached, follow-up task calculations are concentrated.

  • Data Warehouse: Scheduling System

03 Solutions

(1) Build a real-time data base, build a corresponding data model, and reduce construction costs.

insert image description here
(2) Aiming at problems such as a large number of dependent data, complex calculation rules, and difficult quality assurance. Reduce the cost of solving problems with construction tools.

  • By building real-time data integration and real-time data scheduling capabilities, the speed of data access and data model construction is guaranteed, access time is reduced, and service access efficiency is improved (see below for details)

  • By building a real-time data quality center, the data quality is guaranteed, the time to discover data quality problems is reduced, the discovery efficiency is improved, and business delivery results are ensured (see below for details)

(3) High time sensitivity, strengthen monitoring, and jointly improve throughput efficiency and computing efficiency with Doris cluster:

  • Set up monitoring of write delay and calculation delay to quickly find problems.

  • Change the parameters of the Doris cluster, adjust the amount of data written in batches, time and frequency, etc. for optimization.

    • Currently our Load mainly includes Broker Load and Routine Load. Among them, the one with high timeliness requirement is Routine Load. We adjusted the parameters in a targeted manner.
  • Doris added Runtime Filter, and improved Join performance through BloomFilter.

    • The Doris cluster has added the Runtime Filter filter in version 0.14, which has significantly improved the filtering of a large number of keys in Join;
    • This change has significantly improved the performance of several of our current business scheduling. The time is increased from 40+s to about 10s;

3.3.2 User portrait system DMP

01 Business Scenario

The user portrait system has two main functions: user retrieval and user analysis.
(1) User retrieval.
The key point is to quickly complete the circle selection of the crowd. At the same time, in the process of changing the circle selection conditions, it is necessary to quickly calculate who are the users who are expected to be circled?

(2) User analysis.
The focus is on the comparison and analysis of various dimensions of the multi-group package, and the most obvious user characteristics can be found through the analysis conclusion (judged by the TGI value)

02 Difficulties faced

(1) The data scale is large.
We currently have 200+ tags, each tag has a different enumeration value, with a total of 3 million+ tags. The labeling level of tag for users is 90 billion+ records. Due to the daily update of the label, the import volume is very large.

(2) Screening response time requirements are high.

For simple screening, it is required to produce results at the second level. For complex crowd screening, it is required to complete the generation of crowd packets within 20 seconds when there are a large number of people after screening.
(3) In addition to the user id of long type, the crowd package also needs to have a variety of different device ids and device id md5 as the screening results.

(4) In the user analysis scenario, the multi-crowd cross TGI calculation for 3 million+ tags needs to be completed within 10 minutes.

03 Solutions

(1) DMP business structure
insert image description here
(2) DMP business process:
insert image description here

(3) Performance problems are solved in a targeted manner; large-scale data can improve import performance and divide and conquer.

  • Data model changes, split files.

    • The storage of Doris is scattered on the cluster according to Tablet. By adjusting the data model, ensure that the distribution is even and each file is as small as possible.
  • To import changes, split the import.

    • Since each Broker Load import has a performance bottleneck, split 90 billion+ rows of data into 1000+ Broker Load import tasks to ensure that the total amount of each import is small enough.

(4) Improve the calculation speed of crowd screening and crowd analysis, divide and conquer.

  • Business logic changes and users are split.

    • Divide users from 0 to 1 million into a group.

    • The intersection and difference for all users is equivalent to the union after the intersection and difference for all group users.

    • The total number of cross-merge differences for all users is equivalent to summing the total number of cross-merge differences for group users.

  • Data model changes, split files.

    • Set the grouping parameter of the bitmap, and set the grouping as colocate group. Make sure that the intersection and difference calculation of each group is completed in its own BE without shuffle.

    • Split the buckets of the bitmap table more, and calculate the acceleration results at the same time through more files.

  • Calculation parameters are changed to improve concurrency.

    • Because the calculation process is divided into multiple small tasks by means of divide and conquer. Further optimize the calculation speed by increasing the parallelism parallel_fragment_exec_instance_num.

04 effect

After going online, Zhihu has access to the business of multiple main scenarios, supporting the crowd orientation and analysis capabilities of multiple business parties. Bring direct indicators such as exposure and conversion rate to the business.

At the same time, in terms of tool performance, there are the following performances:

  • Import speed. Currently, 90 billion+ rows of data are imported every day, and the import is completed within 3 hours.
  • Crowd estimates. Crowd estimation can basically be completed within 1s, P95 985ms.
  • Crowd circled. The crowd circle selection process is completed within 5 seconds, and the overall crowd circle is about 2 minutes. (to be introduced in the upgrade)
  • crowd analysis. The population analysis process was completed within 5 minutes.

05 to be improved

(1) Function extension

  • Lack of custom crowd diffusion capabilities. Multi-service scenarios have complex and diverse requirements for the diffusion of existing groups of people.

  • There is a lack of coloring of user groups, and it is impossible to complete the recovery of user effects and subsequent analysis in multiple links.

(2) Performance improvement

  • Currently, Doris's row-to-column conversion function is under construction. In the user portrait business, the user id is replaced with the device id, and the crowd reduction (reducing a specific crowd package to a relatively small crowd package for subsequent operation actions) is implemented through business code, which reduces performance.
    • After the subsequent results are converted by ranks and columns, in the user portrait result processing process, the device id acquisition method will be realized through the join dimension table, and the crowd reduction will be realized through order by rand limit, which will have a more obvious performance improvement.
  • Currently Doris's read bitmap function is under construction. The business code cannot read the bitmap, and can only read the bitmap converted to text through the bitmap_to_string method, which increases the transmission volume and reduces the circle selection performance.
    • After the bitmap can be directly read in the future, the business logic will be replaced by direct acquisition of the bitmap, which will greatly reduce the amount of data transmission, and at the same time, the business logic can be cached in a targeted manner.
  • For crowd estimation logic, it is currently completed through two functions such as bitmap_count(bitmap_and), and Doris will provide bitmap_and_count to be combined into one function in the future, which can improve calculation efficiency after replacement.

3.4 Experience Sharing in Tool Layer Construction

3.4.1 Data Integration

01 Business Scenario

"It's hard to cook without rice", without data, there will be nothing behind, and data collection is crucial as the basis. The various data import methods that come with the Doris data warehouse are very convenient for data warehousing, but we also encountered some problems during our use. For example:
(1) Data dependencies are lost when broker loads are performed from offline data warehouses, and upstream data errors cannot evaluate the affected range.
(2) It is necessary to write lengthy etl processing logic codes, and the small operation change process is very long, requiring a full process (at least 30 minutes) of online operations; in addition, each deployment operation may encounter various problems in initializing MQ consumers
(3) Lack of running status monitoring, and abnormal problems cannot be found at the minute or even hour level;
(4) Online import only supports kafka json, and upstream pulsar and protobuf data still need to be forwarded by code development, resulting in data access every time Both require the development of conversion functions and the online operation of the same whole process;
(5) In the business logic, the expected business is what the data in Doris is, making the business unaware. This kind of full incremental synchronization is expected to be wrapped without doing a lot of configuration or developing a lot of code to achieve it.

02 Solution

In the process of building a real-time data model. It needs to rely on the data of many businesses, and at the same time, it needs to build a data model layer by layer for the data. Explored and built a real-time data integration system and a real-time scheduling system, and sank down to the tool layer.
(1) Real-time data integration. Build fast and customized configurations, and build import capabilities for different data sources.
(2) Cooperate with Doris's Broker Load and Routine Load, and build a full incremental synchronization for business on this basis.
(3) Encapsulation and integration capabilities For internally exposed interfaces, the business layer does not need to understand the intermediate process, and only selects the synchronized database and data table to perform real-time synchronization.

insert image description here

03 effect

(1) Synchronous configuration
insert image description here

(2) Synchronous tasks
insert image description here

(3) Before going online

  • In the early stage of using Doris to develop real-time data business process, due to the need for a certain data full/incremental synchronization, data conversion is performed at the same time. It is necessary to build a Doris data model, complete full data import, build incremental data ETL and Routine Load and other developments. It takes 1 engineer and 1 day to connect a table to Doris and perform full incremental real-time synchronization.

  • There are many intermediate links and lack of alarms. For important links, the cost of construction management and alarms is high, and it takes about 0.5 days.

    • Full amount: original database TiDB -> middle part (DataX) -> Doris

    • Increment: original database TiDB -> TiCDC -> Canal Binlog Kafka -> ETL (fill data) -> Kafka -> Routine Load -> Doris

(4) After going online

  • It only takes 10 minutes to configure, and the data integration includes model, data import, intermediate ETL conversion, additional data supplement and Routine Load are all built. The business layer does not need to perceive the intermediate link of the data, but only needs to describe which table I expect to be synchronized.

  • There is no need to care about the business after going online. After the first step of configuration is completed, subsequent monitoring, alarming and consistency are fully resolved through integration.

3.4.2 Data Scheduling

01 Business Scenario

In the initial process of building real-time data through Doris, we used the data after Routine Load, and then scheduled tasks to execute subsequent calculation logic, and then exported the calculation results to bearer storage, such as Redis and Zetta (Zhihu self-developed HBase protocol) Complete external pressure bearing. During this process, the following problems were encountered:

(1) Depend on the execution of the follow-up task before it is ready. For example, for exposure in the last 24 hours, run the query from 15:00 yesterday to 15:00 today at 15:05. At this time, if Routine Load only imports the data at 14:50, the execution result is abnormal this time;

(2) Doris has limited resources, but many tasks are performed on the hour and minute, and a large number of computing tasks at one time cause the cluster to crash;

(3) Whether the task is successfully executed, whether the task is delayed, whether it affects the business, no alarm and no feedback;

(4) The exported stored procedure is common and repeated code development requires 0.5 - 1 man-day each time to develop writing and business interface.

02 Solution

(1) Architecture diagram
insert image description here

(2) Flowchart
insert image description here

03 effect

(1) Synchronous task
insert image description here

(2) Earnings

  • Establish a task-dependent mechanism, and judge whether the current calculation task can be executed by checking whether the offset of kafka and the pre-table are completed. There has never been a situation in which data calculation was started before the data was imported.
  • Monitor the current Doris indicators through the backoff strategy, and avoid submitting SQL under high load conditions. Avoid peaks and tend to valleys to maximize the use of resources. Subsequent adoption of this solution has avoided the problem of instantaneously running up the overall cluster to a certain extent.
  • The whole link monitors the execution status of tasks and delays. Once delays and alarms are reported, timely communication is made to resolve and resume business. Once the task is delayed, the monitoring can quickly find related problems, and in most cases, the recovery can be completed within the acceptable range of the business.
  • After going online, the engineering capability development time that originally required 1 day is reduced to 0. You only need to have a queryable SQL in Doris, and after a simple configuration, you can complete the delivery of business-related data and leaderboards within a certain period of time.

3.4.3 Data Quality

01 Business Scenario

Data has become an important asset that Internet companies rely on. The quality of data is directly related to the accuracy of information, and also affects the survival and competitiveness of enterprises. Michael Hammer (author of "Reengineering the Corporation") once said that seemingly insignificant data quality problems are actually important signs of dismantling business processes. Data quality management is a set of processing guidelines for measuring, improving, and verifying quality, as well as methods for integrating organizational data. The characteristics of large volume, fast speed, and diversity determine the processing required for the quality of big data, which is different from traditional The quality management approach for the Information Governance program.

Specific to each business of Zhihu:
AI platform, growth team, content platform, etc. have gradually migrated some or all of their businesses to the real-time computing platform, and enjoy the benefits brought by more real-time and faster access to data In addition, data quality becomes even more important.

insert image description here
(1) Integrity:
Data integrity issues include: incomplete model design, such as: incomplete unique constraints, incomplete references; incomplete data entries, such as: data records are lost or unavailable; data attributes are incomplete, such as: The data attribute has an empty value. The value that incomplete data can learn from will be greatly reduced, and it is also the most basic and common type of data quality problem;

(2) Consistency:
The data model of multi-source data is inconsistent, for example: inconsistent naming, inconsistent data structure, and inconsistent constraint rules. Inconsistencies in data entities, such as: inconsistent data encoding, inconsistent naming and meaning, inconsistent classification levels, inconsistent life cycles... Inconsistent data and data content conflicts in the case of multiple copies of the same data;

(3) Accuracy:
Accuracy, also called reliability, is used to analyze and identify inaccurate or invalid data. Unreliable data can cause serious problems, resulting in flawed methods and poor decision-making ;

(4) Uniqueness:
used to identify and measure duplicate data and redundant data. Duplicated data is an important factor that leads to uncoordinated business and untraceable processes, and it is also the most basic data problem that data governance needs to solve;

(5) Relevance:
The problem of data relevancy refers to the lack or error of data relations with data associations, such as: functional relations, correlation coefficients, primary and foreign key relations, index relations, etc. There are data correlation problems, which will directly affect the results of data analysis and then affect management decisions;

(6) Authenticity:
Data must truly and accurately reflect the existence of objective entities or real business. Authentic and reliable original statistical data is the soul of enterprise statistical work, the basis of all management work, and is essential for operators to make correct business decisions. little first-hand information;

(7) Timeliness:
The timeliness of data refers to whether the data can be obtained when needed. The timeliness of data is directly related to the data processing speed and efficiency of the enterprise, and is a key indicator affecting business processing and management efficiency.

02 Solution

(1) Full-process data links and quality assurance methods at all levels
insert image description here

(2) Business structure
insert image description here

(3) Business process
insert image description here

03 effect

(1) Monitoring of the health of a certain business The health
of a certain business is monitored through DQC. This business consists of multiple export tasks, intermediate computing tasks and some data sources. The current situation is that everything is normal. During this period, if there is any abnormality of a certain node, it can be found in time.
insert image description here

(2) Intermediate logic monitoring of a task

Some of the rules in the intermediate calculation of the task did not meet the standards, which caused the task to fail.
insert image description here

04 Earnings

(1) Before going online

  • On the premise that there was no similar DQC system guarantee in the early days, many of our problems were found at the day level or even after going online. There were 3 problems, which caused rework and unreliable delivery, which had a huge impact on the business.
  • In the early development, it is necessary to continuously compare various detailed rules during the development process, and it will always take a certain amount of time to verify layer by layer, and the cost is huge.

(2) After going online

  • Within 1 month of going online, through the DQC system rules, 14 errors have been found, and they were found in about 1 - 2 hours, and they were repaired immediately. The impact on business is minimized.

  • After the system goes online, during the development process, after the development of relevant data, if there is an exception, an abnormal alarm will be generated, which greatly saves the cost of manual discovery, because the repair time is early, and it has been repaired before the subsequent development starts, which greatly saves Minimize rework costs during development.

4. Summary and Outlook

4.1 Benefit Summary

4.1.1 Business Development

01 For real-time business data

  • Provides time-based control of hotspots and potentials. Accelerate the use of business in production and consumption, thereby increasing the amount of high-quality creation and the ability of users to consume content.
  • At the same time, it provides explicit indicators that provide real-time complex calculations, enhances user experience, and removes the method of calculating indicators through scripts at the business backend, which reduces business complexity, saves costs, and improves human efficiency.

02 For real-time algorithm features

  • Provides real-time algorithm features based on creators, content, and consumers. Together with the algorithm team in multiple projects, core indicators such as DAU, retention, and user payment have been significantly improved.

03 For user portraits

  • Improve and upgrade user screening to achieve multi-dimensional and multi-type targeted screening, and access to operating platforms, marketing platforms and other systems, which improves business efficiency and reduces the cost of business personnel for crowd targeting.
  • Build and improve user analysis, achieve multi-angle user analysis, target user analysis reports with zero cost, and help business departments quickly grasp the core customer market.

4.1.2 Tool Construction

  • Completed the layout of the real-time data domain and user domain, built relevant development and maintenance tools, and solved the previous problems of no infrastructure, no business tools, and high development costs in this regard.

  • Built integration, scheduling, and quality systems. The cost of business development and iteration is reduced by means of tools, allowing the business to develop rapidly, while ensuring the quality of delivery and improving the business baseline.

4.1.3 Personnel organization

  • The ability to split real-time data and user portraits from top to bottom is divided into application layer, business model layer, business tool layer and infrastructure layer. Through organizational division, the boundaries of different levels are clarified and the achievement of business goals is accelerated.

  • Build and improve the echelon of multi-level team personnel. Different OKR goals are given to students in different directions to achieve cross-level direction isolation, the same level direction, and the same module goals. Work together for the overall real-time data and user portrait service construction.

4.2 Future Outlook

Since its establishment in August 2021, we have been thinking about how to provide better real-time data services? What kind of applications can real-time data build to create value for the business? How to do a good job of user portrait service? How can the screening and analysis capabilities of user portrait services create greater value for the business? While crossing the river by feeling the stones, we are also constantly exploring and building related business capabilities and infrastructure. In next year's development, we will further develop the following aspects:

01 Based on real-time data

Strengthen the construction of the basic capability tool layer, and continue to reduce the construction and delivery costs based on real-time data.

Improve the coverage of data quality tools, provide quality assurance for business models, and provide image quality assurance capabilities based on real-time data.

Based on the current business demands, some scenarios cannot be satisfied in real-time at the 5-minute level. Further explore the real-time capabilities of second-level complex situations and provide capability support.

02 Based on user portrait

Strengthen and further build on user portraits, user understanding, user insights & models, etc. By combining with specific businesses, build user understanding results and corresponding analysis capabilities that fit business scenarios, and find business retention points.

Further strengthen the construction of new tool capabilities, through the construction of user understanding tools and user analysis tools, reduce the cost of generating understanding and business analysis, improve business efficiency, and quickly discover business value.

Guess you like

Origin blog.csdn.net/qq_31557939/article/details/127816173