How does Douyin's algorithm tie you in the information cocoon room?

(1) Why is Douyin so addictive?
Insert picture description here

Many people may have this experience:

In different scenes (at home, subway, company), at different time points (morning, noon, evening), even the same type of movie, the actual content received is slightly different.

For example, during the daytime, you will receive more humorous content, but at night you will receive slightly suspenseful video clips.

And no matter when you turn on Douyin, it can immerse users in it. It seems that they don't feel the passage of time, and often an hour or two can pass at once.

You will find that Douyin seems to understand you well, because all the content pushed to you is what you like to watch.

Putting aside Douyin's product design, immersive consumer experience, short and fast content rhythm, etc., this also involves factors such as algorithm recommendation mechanism and operation strategy.

The personalized distribution of content can essentially be explained in a single sentence:

Let users who like to watch sisters see content containing sisters.

But in the real environment, looking at the Internet, there are actually few companies that can do this well.

So where is the problem?

(2) Marking content is not as simple as thought

Defining labels is difficult, and marking content is also difficult.

Before labeling a piece of content, the first thing you need to do is to define the label.

That is to say clearly what is an apple and what is a pear, instead of calling an apple a pear.

A piece of content usually includes several levels such as primary classification, secondary classification, tertiary classification, and labeling.

Such as Anime>Japanese Manga>Naruto>Naruto and so on.

It is usually better to define these categories and labels with universal recognition.

But for labels such as funny and beautiful, it varies from person to person.

Because everyone has different smiles and different aesthetics.

What kind of content is funny, and how beautiful is it?

Turnip and greens have their own loves. Before the marking has started, it is stuck on the definition.

There are actually two concepts involved here-entity tags and semantic tags:

1. Entity tags

Guangzhou is Guangzhou, Shanghai is Shanghai; Ma Yun is Ma Yun, Taobao is Taobao.

They are all definite entities and usually do not cause much ambiguity among different people.

2. Semantic tags

Words such as sand sculptures, beauties, and wonderful flowers have no definite objects.

Different people have different perceptions, so the difficulty of marking usually appears in the definition of semantic tags.

The recommendation effect of semantic tags is a touchstone for testing a company's NLP (Natural Language Processing) technology level.

Different companies have different requirements for label granularity according to their business capabilities or needs.

For example, some companies split into Naruto and do not dismantle them, and directly treat this word as the label of the smallest granularity.

(3) User tags: probably the most difficult part

1. User tastes like a difficult girlfriend

More difficult than content tags are user tags.

Because Naruto is Naruto, once it is marked with this content tag, it will not become One Piece.

Content labeling can still be done by manual marking + machine training.

The user is different. Maybe this month he likes to watch Naruto, and the algorithm recommendation mechanism also matched him with relevant content. But next month he may start watching One Piece because of a recommendation from a friend or colleague.

If the algorithm has not reacted yet, continue to push him the relevant content of Naruto. At this time, the content is invalid content for him, which affects the efficiency of content distribution.

For example, carrots have always been carrots, but the tastes of users have been changing.

I want soup today and meat tomorrow.

This actually involves the issue of "recommendation narrowing". The poorer the algorithm, the more likely it is that the recommended content will be narrowed.

If you accidentally click on a few articles, the algorithm will assume that you like this type of content, and will continue to push relevant information since then, and it will not be able to change flexibly according to the changes in user needs.

Although in this era, no matter which content product is used, the phenomenon of "information cocoon room" will inevitably appear, but the actual product experience effect is still vastly different between mature NLP technology and beginners.

  1. Master the basic information of users

Before making a user tag, you need to master information related to the user, usually including gender, age, location, interest preferences, etc.

1) Gender helps to distribute content with obvious gender attributes: such as pushing sports to boys and pushing beauty and skin care to girls;

2) The same is true for age: push animation, games and other content to young people, and push health and health information to the elderly;

3) Location is used to push information related to regional hotspots: For example, to push Shanghai breaking news to users in Shanghai, the Beijing restriction does not seem to have much impact on users in Guangzhou.

The above three can usually be obtained by users automatically filling in and authorizing access to location information, and there will not be much change.

3. Master user interest preferences

Regarding user interest preferences, as mentioned above, it is the difficulty of making user tags.

The method used to obtain user interest preferences is to match the corresponding tags according to the content the user has consumed. The following methods are usually used for positioning:

1) Filter noise: If the user is attracted by the title party content, but the stay time is too long, it means that the user is not interested in the tag bound to the content, so as to filter the title party;

2) Hotspot rights reduction: For some social hotspots and breaking news (such as a celebrity cheating), although the user has browsed relevant information in a short period of time, it does not mean that the user must be particularly interested in "entertainment" content and needs to be The user’s "entertainment" interest preferences will be lowered;

3) Time decay: As mentioned above, the user's interest will shift, so the push strategy needs to be more biased towards new user behavior;

4) Penalty presentation: If an article recommended to the user is not clicked, the weight of related features of the content (such as content classification and tags) will be reduced.

  1. Recommendation weights for different content types

We all know that for comprehensive platforms, there is usually more than one type of content. Today, Toutiao includes long images, small videos, short videos, Q&A, and Wei Toutiao.

Even for the same label, such as "beauty", are the recommendation weights of different content types the same? This is also a problem that the algorithmic recommendation mechanism needs to consider.

Guess you like

Origin blog.csdn.net/liuliangpuzi/article/details/113566462