Research on User Accurate Perception System Based on Multimodal Fusion and Graph Neural Network

Summary

In the 5G era, communication operators are facing challenges such as network complexity, business differentiation, and user demand diversification. The introduction of artificial intelligence technology to achieve accurate perception of users and provide users with personalized services on demand has become an important part of operators' digital transformation. focus direction. The construction scheme of the user's precise perception system has been studied. The system has the ability to perform multimodal fusion of operator network data, business data, and voice, image, and text data, and can perform modeling and analysis through graph neural network algorithms to realize user perception. precise perception and insight. In addition, it introduces the application scenario of retaining high-risk lost users based on the precise user perception system, providing a new idea for the application of artificial intelligence technology in the digital transformation of operators.

foreword

In recent years, the development of artificial intelligence technology has affected all aspects of enterprises and individuals. As a branch of artificial intelligence technology, deep learning technology has developed particularly rapidly, and has achieved great success in text, voice, image and other data. At present, multi-modal fusion of voice, image, text and other types of data to serve specific business scenarios is an important development direction of artificial intelligence.

In addition to data such as images, texts, and voices, there are also graph-structured data in many industries, such as social networks in social sciences, the relationship between products and users in e-commerce, and the topology of communication networks. As a feature of graph-structured data, graph neural network technology has been proposed and has shown great potential for development in many industries.

Communication operators have a large amount of network and business data, among which the operator's network topology, human-computer interaction topology, and user social relations are all data that can be represented in a graph structure, which has the natural advantage of applying graph neural network technology. In addition, operators store a large amount of voice, text, image and other data during business development, and have the basis for multi-modal fusion. The graph neural network algorithm in the field of artificial intelligence combines multi-modal data (text, voice, image, etc.) to show better decision-making ability than traditional machine learning in complex scenarios, and applies new technologies such as graph neural network and multi-modal fusion The business scenario of communication operators has become a new hotspot in the research of artificial intelligence in the field of communication.

01 The key technology of the user's precise perception system

1.1 Multimodal Technology

Different forms of existence or sources of information can be called a modality, and data composed of two or more modalities is called multimodal data. Multimodal data can be used to represent different forms of data or different formats of the same form, generally including text, pictures, audio, video, mixed data, etc.

Natural phenomena have very rich characteristics, and it is difficult for a single modality to provide complete information about a phenomenon. Multimodal data fusion is responsible for effectively integrating information from multiple modalities, absorbing the advantages of different modalities, and completing information fusion. For the same description object, multimodal data can obtain data through different fields or perspectives, and through multimodal fusion, the advantages of each field or perspective of data can be maximized, and the value of each modal data in application scenarios can be fully utilized.

1.2 Graph Neural Network Technology

Graph data is another form of data representation in addition to data such as voice, text, images, videos, and tables. Graph data is modeled by converting entities in real scenes into nodes and relationships between entities into edges. Graph structure data has a strong ability to express real scenes.

Although the application of deep learning technology in audio-graphic data has achieved amazing results, there are many challenges when it is applied to graph data. Graph data has an irregular structure, the nodes of the graph are out of order, and there are complex dependencies between each node in the graph and other nodes. Common Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) capabilities to extract image and text features are not applicable to graph data. The graph neural network algorithm is proposed to model the dependencies between graph nodes, thereby representing graph nodes, fully extracting graph data features, and applying them to downstream tasks.

The basic idea of ​​the graph neural network is to aggregate neighbors. In order to describe each node more comprehensively, in addition to the attribute information of the node itself, more comprehensive structural information is needed, so the information of neighbor nodes must be aggregated. The graph neural network model was originally derived from the spectral method. The convolution kernel is applied to the input signal in the spectral space, and the convolution theorem is used to realize graph convolution to complete the information aggregation of neighbor nodes. The Graph Convolutional Network (GCN) algorithm shifts the research of graph neural network from spectral domain convolution to spatial domain convolution.

In order to adapt to real business scenarios, researchers designed different aggregation functions to aggregate the information of the central node and neighbor nodes, proposed the GraphSAGE model, introduced the attention mechanism into the graph algorithm, and proposed the Graph Attention Network (GAT )Model. With the development of graph neural network technology, more and more algorithm models have been proposed, and good results have been obtained in many fields of application.

02 Users can accurately perceive system functions

In this paper, a user precise perception system based on multimodal data fusion and graph neural network is studied. The functional architecture of the system is shown in Figure 1, including functional modules such as data access, database, operator library, graph technology capability opening center, and knowledge center. Based on the above functional modules, at the data level, the system has data processing capabilities such as applying multi-modal data fusion technology to integrate multi-source data, create knowledge centers, and build user relationship portraits; at the algorithm level, the system has the ability to apply machine learning, deep learning, The ability to model and analyze artificial intelligence algorithms such as graph neural networks to serve specific application scenarios.

Fig. 1 Functional architecture of user precise perception system based on graph neural network and multimodal fusion

2.1 Data access module

Raw data can be divided into B-domain data, O-domain data, M-domain data, etc. from the operator’s business domain; from the data type, it can be divided into structured data, semi-structured data, unstructured data, graph data, etc. . The original data is collected by the data collection function and placed on the data bus of the data preprocessing layer. In order to ensure the real-time performance of data processing, the data bus generally uses the distributed message platform Kafka. According to the type of data, the distributed file system HBASE or data warehouse is used Hive. The system data access module mainly includes the following three functions.

a) Operator network and service data access. It supports docking with the existing data warehouse of the operator, and obtains the network data and business data of the operator. The acquisition method adopts the method of directly connecting with the existing data warehouse of the operator. Customize the synchronization period.

b) Acoustic, graphic and text data access. Support multi-modal data (voice, image, text, web page, etc.) access, the access method supports local file upload, and supports FTP, SFTP, etc. transmission.

c) Access to external knowledge centers. It supports docking with existing knowledge centers, has the ability to synchronize data in other business knowledge centers, and can connect to graph data stored in the underlying storage such as neo4j, JanusGraph, and HugeGraph.

2.2 Database module

Raw data is first preprocessed, and the complete steps of data preprocessing include data cleaning, data integration, data specification, data transformation, data description, feature selection, feature combination, and feature extraction. According to different preprocessing targets, after the above steps or some steps, different data enters the database module for storage. The database module includes the following functions.

a) Graph data storage. The system has the ability to store graph data, adapts to common graph databases such as neo4j, JanusGraph, and HugeGraph, and supports the query and storage of highly connected data.

b) Vector storage. The system supports vector retrieval for high-performance similarity search on large-scale deep learning vectors, and can handle the storage and retrieval of massive vector data.

c) Object storage. The system supports the storage of massive unstructured data in formats such as photos, videos, audios, and documents.

d) Text storage. The system supports common text databases for storage of text data.

e) Distributed relational database. The system has the ability to store massive relational data, which is used to connect to the operator's data warehouse and store the operator's massive network data and user business data.

2.3 Operator library module

The system has a set of general operator libraries for multi-modal data processing and graph mining, providing personalized development tools for relevant personnel to develop algorithm models, including graph representation learning, graph neural network, data preprocessing, and multi-modal information fusion , Speech recognition, semantic understanding, text processing, machine learning, emotion recognition, text generation, relation extraction, deep learning, etc. The operator library module includes the following 6 functions.

a) Data preprocessing algorithm: support data operations required in actual business such as emptying, deduplication, and empty value filling.

b) Speech processing algorithm: operator tools such as speech recognition and sentiment analysis.

c) Text processing algorithms: operator tools such as named entity recognition, emotion recognition, and template language generation.

d) Traditional machine learning algorithms: commonly used operator tools such as lightGBM, Xgboost, and random forest.

e) Deep learning frameworks supported by deep learning algorithms: Tensor⁃flow, Pytorch, etc.

f) Graph algorithm: operator tools such as graph representation, link prediction and graph mining, graph embedding, and graph computing.

2.4 Figure Technical Capability Opening Center Module

The graph technology capability opening center module is oriented to the multi-modal graph mining scenarios of business applications, and provides resource interfaces such as computing power, data, and operators, so that relevant personnel can build a personalized development environment.

The system builds a development map technology capability open center module, realizes the controllable openness of computing power resources, data resources, and operator resources to upper-level applications through APIs, and provides relevant solutions to ensure the manageability and controllability of open capabilities.

Business or algorithm personnel can upload data or select database data in the graph technology capability open center module, divide the training set, verification set and test set in proportion, and then select a suitable algorithm from the operator library or write code programs independently according to actual business scenarios Carry out algorithm model training.

During the training process of the model, the system can illustrate the error curve, which is convenient for researchers to adjust parameters and optimize the model. The model results and intermediate feature data are important basis for the back-end application of the product. Therefore, the feature data and model data of each training cycle will be updated and imported online in real time. Features and models are in charge of the productization capabilities of the platform, and no additional operations are required by the user. Model-related data will be automatically updated to the productization background after the training task is completed.

2.5 Knowledge Center Module

The system knowledge center module has functions such as user portraits, family portraits, and relationship portraits to meet the needs of supporting algorithms and upper-layer applications.

2.5.1 User portrait

User portraits provide analysis and insight into operator user attributes, tags, and behaviors, and have the following functions.

a) Real-time collection of multi-source data. Based on the data access model, real-time access to different sources and different types of multi-party customer data, including embedded data, external platforms, internal business systems, etc. For buried point data and external platforms, there are two acquisition methods: the first is that the other party has an interface specification, and it can be acquired according to the interface specification. It supports multiple methods such as messages, text, and database tables, and data access is based on Kafka. The second is that the other party does not have an interface specification, and crawlers can be used to obtain data. After the data is obtained, it is preprocessed to form a format that the system can recognize and import into the system.

b) Flexible processing of customer labels. Provide simple and easy-to-operate label processing and customer grouping tools, realize a variety of custom label processing through the visual interface WYSIWYG, and quickly respond to individual needs. The system can group customers in a community based on the community search algorithm of the graph database, and can also apply the tag propagation algorithm based on the existing tags to propagate tags through the relationship between users, so as to realize the tagging of untagged users.

c) Precisely circle the target group. According to the business scenario of the operator, through the user attributes, behaviors, labels and other conditions to carry out the combination circle person of the intersection, union and subtraction relationship, realize the accurate division of the target group.

d) Multi-dimensional insight analysis. It needs to be analyzed based on multi-dimensional data combined with a variety of algorithms, including user personality analysis, conversion analysis, behavior analysis, retention analysis, RFM, life cycle analysis and other models provided by machine learning algorithms and deep learning algorithms, to analyze and understand from multiple dimensions client.

2.5.2 Home images

Build family portraits based on family basic business, family members, voice features, network features, product features, geographic location, consumption value, Internet behavior and other data, provide family-level feature analysis, including categories and personnel composition, etc.; provide clicks on family portraits interface, and can import external label data; provide low-threshold operation label processing and customer grouping tools, realize a variety of custom label processing through the visual interface WYSIWYG, and quickly respond to individual business needs.

2.5.3 Relationship Profile

Realize the portrait of the user relationship through the operator's fixed network business, call relationship, geographical location, user preference and other data. Taking users as nodes, user tags as node attributes, and various relationships between users as edges, through data extraction and conversion, the data is imported into the graph data model to construct user relationship portraits. Through the user relationship portrait, it is possible to query and analyze the relationship between users, realize second-level data calculation and data visualization, and visualize it in the form of a map. The system supports the query, analysis and exploration of the relationship between users based on the established user relationship portrait.

03 Retention of high-risk lost users based on precise user perception system

At present, the competition among communication operators is becoming increasingly fierce, and business development has also entered the stock operation stage. In the case of very limited stock users in the communication market, the business of communication operators is facing great market competition pressure. It is becoming more and more difficult for operators to develop new users. Maintaining high-value old customers can save enterprise costs more effectively than developing new users. Maintaining existing customers, improving user satisfaction, and accurately matching required services for users have become key issues in the digital transformation of communication operators.

The user's precise perception system based on multi-modal data fusion and graph neural network technology can fully mine user characteristics and attributes through multi-modal data fusion, combined with network data in the communication industry and the interaction between human and computer, and using graph neural network technology to analyze users Carry out precise perception and description to meet the closed loop of collaborative and precise operation in complex business scenarios.

This chapter introduces the prediction and retention of high-risk lost users based on the relevant capabilities of the user's accurate perception system. The process is shown in Figure 2. Realize data access through the data access capability of the system, realize data processing and various portrait construction through operator capabilities, and then generate user perception maps; realize user churn prediction modeling through graph neural network algorithms combined with machine learning algorithms; through the system The voice-text related algorithm of the operator library builds a personalized retention strategy and generates intelligent speech; realizes the business closed loop of precise perception, prediction, and retention of high-risk lost users.

Figure 2 Prediction and retention process of high-risk lost users

3.1 Construction of User Perception Graph

Based on the system's data access capability to access operator B domain and O domain data, the system can build user portrait tags through operator capabilities, including user attribute tags such as product, business, finance, preference, behavior, and personality. Mining user relationships through graph algorithms and machine learning algorithms, including user call relationships, family relationships, social relationships, work relationships, preference relationships, product relationships, etc.

Construct graph data based on user portraits and relationship portraits, in which the relevant attributes in user portraits are used as user node features, and the relationship portraits between users are used as graph edges to realize the construction of user perception maps. In addition to the original user attributes, the user perception map also contains the relationship information between users. Modeling based on this can achieve more accurate perception and description of users.

3.2 User Churn Prediction Model Based on Graph Neural Network Algorithm

At present, high-risk churn user prediction modeling mainly uses machine learning algorithms for churn prediction modeling. Based on the user perception map above, this paper models the user churn problem through the graph neural network algorithm, and compares it with the modeling results of traditional machine learning methods. The process is as follows.

a) Churn label definition. The current billing period users are used as the basic users for prediction, and it is predicted whether these users will be lost in the next 3 months. The users who have not lost are marked as 0, and the lost users are marked as 1.

b) Modeling with machine learning methods. Using user attribute information such as points, voice, traffic, SMS usage, fee usage, and arrears of payment as features, carry out feature combination and feature engineering, and train machine learning models such as logistic regression, decision trees, random forests, and lightGBM. Validate the effect of the model on the set.

c) Graph algorithm modeling. Based on the user perception map as the data basis, the user node embedding representation (em⁃bedding) is carried out through the graph neural network algorithm, and the user node embedding output by the machine learning method combined with the graph neural network model is used as the input of lightGBM, and the lightGBM model is trained. Validate the model results on the test set.

The comparison results of the machine learning algorithm and the graph algorithm modeling on the test set are shown in Figure 3. Compared with the traditional machine learning method, the graph algorithm modeling results are higher in precision (Precision), recall (Recall), and AUC values. It has been improved, among which the precision rate has increased by about 3%, and the recall rate and AUC have increased by more than 5%. Embedding user nodes through graph algorithms combined with user perception maps can tap some potential features, optimize the model effect to a certain extent, and lay a good foundation for subsequent user retention.

Figure 3 Comparison of results of different models

3.3 Personalized retention strategies for high-risk lost users

Based on the above-mentioned user churn prediction model, the high-risk churn user list is output, and the user list can be sent to the front-line business personnel through the model result distribution function of the system. The system has an intelligent speech function, which can support front-line business personnel to individually retain high-risk lost users.

The personalized retention function of high-risk lost users provided by the system is shown in Figure 4, and mainly includes the following steps.

Figure 4 User personalized retention
a) Churn prediction model output list. Based on the above prediction model of high-risk lost users, a list of high-risk lost users is output.

b) Construction of personalized retention strategies. Based on the user portraits, family portraits, relationship portraits, and package portraits constructed by the system knowledge center, a personalized user retention strategy is constructed through a strategy algorithm.

c) Smart words. Access customer service voice data through the system’s voice, image, and text data access capabilities, and enrich user personality tags through voice recognition, sentiment analysis and other technologies; based on personalized retention strategies, use natural language generation algorithms to generate intelligent speech skills for retention, and improve retention success rate.

Through the above-mentioned user perception map, user churn model, and personalized retention, the entire business process of perception, prediction, and retention of high-risk churn users is formed. Voice records and dialogue content in the process of user retention can be used to further refine the churn model and personalized strategies. optimization.

04 Summary

The application of artificial intelligence technology provides new ideas for business development in various industries. Among them, operators have a large amount of data in various modes, which provides soil for the application of multi-modal fusion technology. In addition, the operator's network and user relationships are all graph data, which has the conditions for applying graph neural network technology.

The business development of operators in the 5G era is under great pressure. How to apply new technologies to improve user satisfaction, reduce costs and increase efficiency for operators, and promote digital transformation of operators is an important research topic. This paper discusses and studies the core technology, system architecture and functions, and application scenarios of the user's precise perception system, and provides theoretical and technical support for operators to apply new technologies such as graph neural network and multi-modal fusion to meet business needs.

About the article
comes from: [1] Gao Wei, Wang Yue, Song Chuntao, etc. Research on User Accurate Perception System Based on Multimodal Fusion and Graph Neural Network [J]. Post and Telecommunications Design Technology, 2023, No.568(06): 30 -35.

Guess you like

Origin blog.csdn.net/MacWx/article/details/131703299