1000 things to do in Beijing after graduation, twenty-second, understand the recommendation algorithm

Whether it is Toutiao, Douyin, etc. in our current mobile phones, many APPs contain recommendation algorithms. The fundamental purpose of the recommendation algorithm is to satisfy thousands of people. Everyone sees different information when they open it. The system according to your tags to recommend relevant content for you.

For the recommendation algorithm, we can't always stay at the level of the people who eat melons. How to improve your own cognition, you can refer to the description of the recommendation algorithm by Twitter engineers. Gary Lam, software engineer at Twitter, talks about personalized feeds at QCon London. He shared Twitter's personalization and recommendation algorithms, and explained how these algorithms operate on a large scale under Twitter's large data volume and binary characteristics.

Personalized fanout means sending notifications only to interested users. For example, Musk published a tweet about electric vehicles, but not all his followers received the notification, but only those who were interested in electric vehicles. The personalization algorithm works by tracking two metrics, Lam explained. One of the indicators is recently engaged objects, which refers to users' likes, replies, and other user interactions on specific objects such as topics or users. Lam stresses that this data needs to be kept up-to-date, since users tend to only be interested in what they've interacted with most recently. The other is the focus object. Although a user may follow hundreds of other users, only a few are the focus objects, and the content they send is the most interesting. When using this algorithm, the first thing to do is to extract individual objects from tweets. Then, for each follower, check to see if they have interacted with that person recently, and then check to see if the Tweet is from their top follower. If these two conditions are met at the same time, it can be inferred that the user is interested in this tweet, and he will receive this notification.

Lam explained that the main problem with the personalization algorithm is asymmetry. Some users are followed by millions, so every time they post a tweet, the algorithm has to count for each follower. Other users may only have a few followers. Lam explained how they used data co-location to solve the above problems. User groups are first processed in shards, and their most recent interactions and top concerns are kept with the shards. This means that no network access occurs while the algorithm is running, which greatly reduces latency. Lam pointed out that given the time-sensitive nature of recent interaction data, the calculated data doesn't need to be kept for a long time, so they can be kept in memory. When data is sharded, data reconstruction is pre-optimized to ensure that users can receive notifications. This is done by replaying all the previous day's tweets in the queue, then packing the messages and removing redundant data before passing them into the shards. This process is called "trickle down" (slim firehose). At the same time, through the historical interaction between users, the primary focus target is calculated using offline machine learning algorithms. Since these are precomputed, they can be copied to the hard drive where the shard resides at boot time, and lazily loaded when needed.

In some large companies, the recommendation algorithm is actually constantly updated. We can't figure out the real rules, but the basic rules remain the same. The people you follow first and the videos you interact with recently are often recommended by the system first. On the Internet, often controversial topics, highly active accounts, and copywriting that incite emotions have a high dissemination advantage.

Guess you like

Origin blog.csdn.net/u010261924/article/details/131016187