NetEase cloud music recommendation algorithm 600 million users

NetEase cloud music is a gathering place for music lovers, cloud music recommendation system dedicated to fall by AI algorithms, personalized recommendation thousand faces of thousands of customers, in order to bring a different experience of listening to music.

This focuses on sharing practical application of AI algorithms in music recommendations, as well as the challenges and solutions encountered in the process of landing algorithm.

Expand from two parts as follows:

  • Application of AI algorithms in music recommendations in
  • AI think in the music scene

From April 2013 formally launched so far, Netease cloud music platform continues to provide significant: screen music community, UGC (User Generated Content) single song, as well as precise recommendation services, hatching out of the music program, LOOK live, and anchor platform section.

Currently registered users of cloud music have 6 million, and won the first position continued in the music charts in the App class.

Application of AI algorithms in music recommendations in

Music recommended practical application scenario, we use AI techniques to distribute songs and song list. The more Typical applications are: Daily songs and private FM, they can be personalized according to the scene, to make recommendations related to the track.

The figure is a logic diagram of our entire music recommendation system, including various log stream, ETL, features, recall, sort, and final recommendations.

For the recommendation system, the most important is how to understand the user's portrait, that is, by the integration of front-end data to understand what kind of music users like concrete.

As shown in FIG:

  • In the data layer, we mainly used the Hive, Hadoop, Flink, SparkSQL and Mammut.
  • In machine learning level, we then used the SparkML, Tensorflow, Parameter Server and Caffe.

The above is a comparison chart construction data system, which system includes the construction of algorithms for the construction of the user's system, which is the application environment cloud music AI algorithms.

Our team is divided into:

  • Data Experience Team
  • Artificial intelligence algorithms team
  • Middle Office team
  • Business-related members

When it comes to the use of artificial intelligence in the recommended way, music recommendation and other commodities have different places. E.g:

  • As we can tell in a short time out of their own preferences, so we can do browse dozen dress was within 10 seconds. But the music is the need to take the time to experience, often a song we listened to 10 seconds or longer, only to find it is not our preference. So, music is not directly watching can be understood, we recommend in the process of making the product, the user experience should be oriented to the real appreciation of the music itself.
  • Often dress in unit time can only be consumed once, but people can cycle through playlists, and single cycle mode, repeatedly listening to music in a unit time. So this is a repeatable consumer behavior, we should grasp this recommendation when making laws.
  • As the cost of music consumption is relatively high, we need to focus on user experience, and user presence in its process of consumption, strong time has relevance. At the same time, whether to allow users to listen to 10 seconds of a song, 30 seconds, and 60 seconds, meaning for them, its expression is different. Therefore, we need to provide a truly meaningful consumer, so that these reflect the relevance of effective behavioral implications.
  • How to weigh the pros and cons music recommendation system? It is to investigate the user to use the platform length of time? Or to see the number of music tracks in his collection in Favorites? Of course, we have found that some users never click on to hearts way to any collection of songs. The latter through the exchange, we discovered he really just do not like the songs directly to their own encounter pull the black only. We can see that it is difficult to measure the effect of a single target music recommendation system.

Let us look at how cloud music platform is the application of AI technology:

① the complexity of the music

Given the complexity of the issues related to music mentioned above, how are we to understand music? On our platform, for different music, it has a wealth of UGC, as well as a wide range of quality user comments.

Therefore, we can use these comments and awareness for a single song, using two-way bi-Istm, to generate some descriptive statement for music.

Then, when new music is input, we can based on fewer language related to the development of a new interpretive description.

As shown above, for example, there is a "counter-current of the river," the song, with many related single song title and description below.

We can add keywords to restore a variety of word marks, and produce music for the relevant characteristics of the person's description.

On this basis, we then filter based on artificial vocabulary, such as automatically generated: "Network of Chinese Girl", "Hong Kong folk songs" and other phrases.

Thus, by virtue of the NLP (natural language processing) system, we are able to visualize the final phrase of the song.

Take this, for the community of users, they can not point to open even listen to a song, you can type generally learned the song belongs.

Second, we can use relatively simple "+ video + image convolution" technology to understand music.

For example, for some of the more popular songs, we use the expression has been generated, and the existing association, and then obtain the associated audio, identify the similarity between the songs loudness, rhythm, style and music, to music. " portrait".

Repeat consumption value ② music

Here it is mainly reflected in the music recommendation CF. As shown above, we have found that by tracking: a user listening to the song A 10 times, 9 times listening to the song B, and C just listen to the song 1 times.

Then we can similarity relevance A, B, C are understood as follows: the user preferences in categories A and B song some more, and the correlation between A, B also larger.

Accordingly, based on such a frequency of repetition consumer users, we can set the relationship between X, Y, Z coordinate axis, to express the position of the space between them, as well as differences in the spatial direction.

Obviously, with such a similarity calculation, there will be significantly improved our efficiency for all kinds of music recommendation.

High consumption costs ③ music, high correlation characteristics before and after, there are more requirements to express the needs of users appropriate model

As shown above, we have experienced a linear model beginning from the tree model, to large-scale FTRL, then the depth of neural networks, and finally to the depth of the timing network, iterative process recommended aspects of such a musical.

First, we start from the LR model. The explanatory model LR strong for us to choose. However, although its explanatory stronger, and faster iterations, but its expression is very limited.

Later, we go to the tree model. RF and the model having a model LGB like: can solve nonlinear part explain the advantage, of course, has the disadvantage that the fitting needs to be strengthened.

Then, we launched a large scale suitable for the expression of FTRL. The advantages: characterized by memory type, a timing based on previous learning and training, and depicts the expression of all of the characteristics associated with.

The disadvantage is the relatively large latitude features, namely: For the different needs from different companies, the required sample size will be more, the amount of calculation will be more complicated.

In order to increase the follow-up skills, we use the depth of the neural network, including: DNN, DeepFM and Wide & Deep-peer model.

Their advantage is highly theoretical, but the disadvantage is: due to the complexity of the neural network itself, so they interpretability relatively poor, can not learn all kinds of implicit timing relationship.

Finally, we used the depth of the network timing, the corresponding model includes: LSTM, GRU Transformer, DIN and dein, their advantages is the ability to learn different timing characteristics, and further has the ability to portray + generalization.

Of course, they are also drawbacks with the depth of neural networks mentioned above, namely: the network is more complex, but relatively poor interpretability.

Here we take a look at the model LR and trees. We said earlier, the characteristics of linear and tree models are: statistical category / class features generalization is very rich, but relatively poor generalization ability.

In the song scenario, we can directly related to the song, a wealth of data generated by a user's behavior, provided to the model.

By the algorithm, we need to all kinds of music to abstract index tag. However, although we have more than enough resources and conduct music samples, but due to the action sequences are often not linear, so we instead had encountered a fit, and features time travel (ie, features memory) issues.

We urgently through the line and the line characteristic consistency to effectively use data of different context-learning behavior, and thus enhance the ability of the model fit.

Therefore, in order to improve the fitting ability, we first try is DNN model. DNN by ReLU structurally connected to the low-order to ensure that the entire combination of features and feature combinations of high order, but it also results in the overall number of expansion.

Thus, we improved to DeepFM, it is possible to simultaneously lower order combinations of features and combinations of features of the higher-order modeling it is possible to learn the relationship between the order of combination of features. As shown above, we also introduced in the late DCN.

DCN can explicitly higher-order learning interactive features. We can take this to effectively capture cross highly nonlinear characteristics.

As the remains of the DeepFM model, we can effectively control the expansion of the vector, so that the spatial parameters to be reduced.

In the foregoing, we also mentioned the problem of expression of temporal association. In this regard, we once came DIN (Deep Interest Network) for the click-through rate.

In diverse points of interest in the user, DIN focus on those that will affect the current recommendation historical behavior. However, DIN can not be captured for users to dynamically change the type of musical interests.

For example, a user used to like electro-acoustic music, and later changed to a love ballad. So such "evolution" is the DIN can not be captured.

On this basis, we switched to the depth of interest in the evolution of the network (DIEN) model. The main features of this model are: evolution by focusing on user interest in the system, interest in abstraction layer design and evolution layer.

It uses the new network and the results of modeling in the form of, more precisely to the user's interest changes in the expression, and evolution of the timing process.

For more fine-grained control of user interest change, we also used the DSIN model. DSIN mainly consists of two parts: a characteristic is sparse, the other is a processing sequence of user behavior.

The model is able to find a user in the same Session, the similarities are browsing merchandise; and in different Session, the differences are browsing merchandise, and then extracts the user's timing of interest.

④ in music consumption, given the complexity of user needs, it is difficult to measure the merits of a single target music recommendation system

Although the recommendation system is a typical application of statistics, but statistics can only solve the problem of 95%, the remaining 5% are related to aspects of personal preference.

We often encounter various problems in practical applications, including: the relationship between CTR (Click-Through-Rate, CTR) and duration of consumption is not synchronous lifting, sometimes even showing a trend of the shift. So how can we solve the problem of multi-objective it?

For multi-objective problem, we have many kinds of solutions to choose from. As shown above, are: sample weight, Weight Loss, and part of the network share. Therefore, we have adopted a multi-target joint training, this simple implementation.

In the figure above, we first in the network layer ensures the output and to achieve shallow shared representation. Therefore, the training effect, although there are some differences between the target, but as we have introduced a discrepancy network training, the collection rate and duration of consumption have been significantly improved.

Visible, joint training of advantages:

  • In shallow represent shared by multiple objectives and tasks, we add the noise data between tasks. This not only reduces the network over-fitting, but also enhance the generalization of results.
  • In the multi-objective task of learning, we are in different tasks by having local minimum different locations, and can interact to help escape from local minima.
  • Multi-objective through joint training mission, so that the model as much as possible to the optimal solution to solve common multi-tasking.
  • Use "tapping" a manner similar to track the user's music collection for other operations, and then make appropriate judgments.

Review other types of music recommendation and recommended points of difference we mentioned in the previous article, we achieved the following point to point solution:

  • Differences: Based on the complexity of the music itself, how do we understand music resources? Solution: use NLP, video, and image technology to better understand the music.
  • Differences: Repeatable consumption and consumption can not be different from the repetition. Solution: use music consumption characteristics, to intelligently analyze the correlation between the different songs.
  • Difference: the high cost of all kinds of music not only consumption, but also before and after there is a clear correlation. In addition, the meaning of effective behavior tend to be more abundant. Solution: Using sophisticated AI model to explore the relevance of the user sequence of songs.
  • Differences: difficult to use a single goal, to measure the effect of the music recommendation system. Solution: MTL use of technology, to address the diversity of needs of users.

AI think in the music scene

So why must the music scene AI need it? Clearly, now is not the past that by buying CD, end the era of the album songs.

With hundreds of millions of users on our music recommendation platform. They are in a different mood states, faced with more than ten million piece of music generated by more than 100,000 musicians need to get a good mood by good music.

We can say without exaggeration that: "Internet era of headphones is oxygen tube, while the music is oxygen."

Therefore, we need in a 4-dimensional space, solve complex matching problem. And this is artificial intelligence comes in.

Recommended by AI-based system, we are able to continue to provide strong long tail ability to explore and precise matching capabilities, and then at the same time and improve the user experience, to promote their voluntary sharing of songs and discover more resources on Netease cloud music platform.

To achieve the above object, we have established a system configuration as shown in FIG. Specifically includes the following aspects:

  • The user's mental model system. They include: behavioral, cognitive and attitude.
  • User research system. They include: survey questionnaires.
  • Case Analysis System. They include: analysis of users, user groups and usage behavior.
  • Evaluation system. Including: collection rate, the rate of the song, as well as length and other use.
  • Data feedback system. Including: collection, cut songs, such as the positive and negative feedback left.

Through these user experience qualitative and quantitative evaluation system, we use knowledge maps, statistical learning, and learning ways to strengthen the combination, to construct a three-tier model system as follows:

  • Sorting system. Includes: sorting model, ee model, model and fashion trends.
  • Matching system. Include: the behavior of recommended models, as well as new content discovery model.
  • Data system. Include: behavioral data, user portraits, portraits and content.

By these, we keep abreast of knowledge related to the user data, as well as expert knowledge in order to better enhance the relevance and user acceptance platforms.

Published 18 original articles · won praise 588 · Views 1.03 million +

Guess you like

Origin blog.csdn.net/hellozhxy/article/details/104009328