Content Recommendation Algorithm Based on Machine Learning and Its Impact on Psychology and Sociology

The content recommendation algorithm based on machine learning is currently widely used in various content APPs. In the fields of shopping, fashion, news consultation, learning, etc., according to user preferences, more accurate user portraits and content recommendations are made. Such algorithms can not only analyze user characteristics more accurately, such as age, gender, etc., but also roughly determine user preferences through long-term tracking and maintenance. However, the potential psychological impact of overly precise recommendations on users has attracted more and more attention from the scientific community. This paper first introduces the basic principles of the recommendation algorithm, and then introduces its psychological and sociological impact on users.

1. Introduction to Recommendation Algorithms

A user's browsing or purchasing behavior, with a certain granularity as the unit, can form a series of chains on the historical timeline. However, subdivided recommendation scenarios can be roughly divided into two categories. One is simple interaction and the other is complex interaction.

Category A, simple interactive category : typically news, short videos. The time a user stays on a piece of content is expected to be measured in minutes and seconds, mainly for browsing, with simple barrage, likes and other replies. A user can generate hundreds of granular browsing behaviors in one day.

Category B, complex interaction category : typically shopping and learning. Users are more focused on one type of content, and stay in certain single content for a long time, and more complicated transactions occur, such as return, settlement, evaluation, etc. Users will only generate a small amount of particles in 1 day.

Although these two types of recommendation algorithms have quite different emphases on data models and training methods, they still have some things in common.

1.1 Content Model

To describe the attributes of a content so that machine learning or simple pattern classification algorithms can process it, it is necessary to convert the content into a vector containing various attributes.

Such as music, may contain many attributes. There are scalars of enumeration types such as genre, composition, singing, album, etc., as well as transform domain vectors obtained after processing waveforms, which often reflect the ups and downs, energy intervals, and frequency combinations of the entire audio track.

insert image description here

A typical work data contains a scalar field with a length of 16 and a vector field with a length of 128, forming a feature vector of 144. In the content model, this vector represents a certain song.

M ⃗ = [ M c ⃗ M v ⃗ ] \vec{M}=\begin{bmatrix} \vec{M_c} & \vec{M_v} \end{bmatrix} M =[Mc Mv ]

A user's browsing habit is the vector M ⃗ \vec{M}M A list of vectors in units, representing the user's n times of historical browsing.
{ M ⃗ 0 , M ⃗ 1 , M ⃗ 2 , . . . , M ⃗ n − 1 } \{\vec{M}_0,\vec{M}_1,\vec{M}_2,..., \vec{M}_{n-1}\}{ M 0,M 1,M 2,...,M n1}

1.2 Direct prediction without user portrait

For category A, due to the existence of massive browsing chains, a simple and plain prediction algorithm can be adopted. This algorithm attempts to make a scalar prediction of the next particle by inputting K browse data.

{ M ⃗ t − K , M ⃗ t − K + 1 , . . . , M ⃗ t − 1 } = = > M c , t ⃗ \{\vec{M}_{t-K},\vec{M}_{t-K+1},...,\vec{M}_{t-1}\}==> \vec{M_{c,t}} { M tK,M tK+1,...,M t1}==>Mc,t

Once the predicted scalars are obtained, the albums, singers, and genres involved in the scalars can be recommended to the user.

1.3 Recommendation based on user model

User models are mathematical descriptions of content audiences. Such as the user's gender, age, etc., as well as digital preference data. There are currently many categories of such algorithms, and there are also many open source models. What is more interesting is that the recommendation based on the user model does not emphasize the need to accurately obtain user characteristics that can be understood by natural people, such as age and gender. For example, a certain type of recommendation algorithm looks more like a generative algorithm for information compression and decompression.

This algorithm is divided into two steps: user feature extraction (learning) and feature-based recommendation. The idea is to randomly extract K groups of feature strings {M} from user habits to input into the model, pass through the NN network in area A, output user portrait P, and generate content model {M'} through area B. The purpose of training is to control the scale of P, and expect the output content set to be the most consistent with the user's historical data set.

Portrait and recommendation

In this case, although P represents user features, the specific meaning of the vector is no longer important. On a website with a large number of users, it is not necessary to conduct complete training on the complete set of users, and only need to collect the category of small-scale vector P, and then directly look up the table to obtain recommended content according to the category of new users.

2. Negative effects of accurate recommendation

Too precise content recommendation will have unexpected psychological and sociological effects, typically the information cocoon and the separation of groups.

2.1 Information Cocoon Room

A typical impact is the information cocoon. When a user browses a content website for the first time, the attributes of the consultation obtained are very broad and random, and the probability distribution of the content displayed on the homepage is smooth and uniform. This period of time is the stage where the algorithm collects user habits.

As the number of views increases, the recommendation algorithm can grasp the user's preferences more and more accurately, so that the content obtained by the user is concentrated on several points of interest, and the algorithm converges.

Recommended preferences
Users who do not understand the recommendation algorithm will be especially affected. They will not think that seeing these contents every day when they wake up in the morning is a phenomenon caused by being "favoured". The information obtained by users is bound by algorithms to a narrow set, and they cannot learn potentially important information.

This situation is no problem for learning and scientific websites. But for comprehensive content websites, there are disadvantages. Assuming that the user is under a certain period of time under great psychological pressure and searches for negative content, the recommendation algorithm may add fuel to the flames. Especially for users who are prone to depression, it may aggravate the condition.

2.2 Group fragmentation

Algorithms rely on user habits to profile users and push content accurately. And users are affected by the content pushed after the portrait, which will produce a group aggregation effect. From the perspective of population, communities formed by various symbols will gather in the information cocoon woven by algorithms, attracting individuals with the same characteristics, and exacerbating the division of the group.
insert image description here
This causes the whole group to be continuously divided and strengthened, forming many stable and inclusive classes. These classes cannot think from the perspective of each other, because each class lives in a cocoon compiled by algorithms, some statistically obvious small probability events are magnified in their respective cocoons, and some public problems that need attention cannot be separated in different cocoons. dissemination and consensus among groups. Over time, the population as a whole will be differentiated and lose its stability.

3. Response suggestions

From the perspective of algorithms, new inputs should be introduced in fields involving psychology and sociology. For example, after being found to be prone to depression, push the contents of the healing department, and increase the richness of the recommendation algorithm.

Guess you like

Origin blog.csdn.net/goldenhawking/article/details/131024655