Interpretation of Twitter Recommendation Algorithm

Interpretation of Twitter Recommendation Algorithm

Recently Twitter open sourced its most valuable asset - the recommendation algorithm!

insert image description here

Every day, people post more than 500 million tweets on Twitter, and Twitter sends more than 150 billion tweets to users. Twitter's recommendation algorithm will only recommend a small number of relevant and attractive popular tweets to users. Similar to UGC platforms such as Douyin, a good recommendation algorithm is the magic weapon for Twitter's success. This article will take you through how Twitter recommends content.

Recommendation Algorithm Composition

A recommendation algorithm consists of many parts, it is a collection of different models, features and services. How these components work together, please refer to the following diagram:

Major components of Twitter’s recommendation algorithm

The main components of Twitter's recommendation algorithm

All components together attempt to answer two important questions:

  • How likely are you to interact with other users in the future
  • What communities are on Twitter and what are the top tweets in them?

This is what a community looks like...

An illustration of Twitter communities

Twitter community concept illustration

Twitter currently has 145,000 communities, some with millions of members, that are updated every three weeks.

Demystifying the Recommendation Algorithm

The recommendation algorithm consists of three stages, which are connected in series by a pipeline:

  1. Candidate Tweet Collection
  2. tweet rank
  3. tweet filtering

1. Candidate Tweet Collection

First, the best 1500 candidate tweets relevant to the user are extracted from hundreds of millions of tweets.

Candidate tweets come from two main sources: people you follow and people you don't follow. Tweets come from both sources in a 50-50 split.

The tweet source uses two graph processing technologies: Real Graph , an embedded technology called SimClusters , and GraphJet , a custom matrix factorization algorithm.

Briefly, the components in the Candidate Tweet Collection System attempt to answer these questions:

  • How likely is engagement between two users?
  • How can we tell if a tweet is relevant to you if you don't follow the author?
  • What Tweets have people I follow interact with recently?
  • Who likes tweets similar to mine and what else have they liked lately?
  • Which Tweets and users are similar to my interests?

2. Tweet ranking

This step uses a neural network called Heavy Ranker to score the relevance of each candidate tweet. This neural network has about 48M parameters. The system considers thousands of characteristics to score each tweet.

The following is a description of the main feature groups input to the Twitter Heavy Ranking model.

aggregate feature

Twitter's aggregated features constitute most of Twitter's features and are generated by rolling aggregations that maintain feature values ​​within a specific range over a specific time window. Twitter calculates long-term (50-day calculations) and short-term ("real-time" - up to 3 days, typically 30-minute calculations) aggregations.

The list of aggregation features is as follows:

  • author_aggregate
  • author-topic_aggregate
  • list_aggregate
  • user_aggregate
  • user_author_aggregate
  • user_engager_aggregate
  • user_inferred_topic_aggregate
  • user_media_annotation_aggregate
  • user_mention_aggregate
  • user_request_context_aggregate
  • user_topic_aggregate
  • topic_aggregate
  • tweet_aggregate

non-aggregated features

Twitter also has a number of independent features for capturing information about the user, tweet, author, and tweet context.

  • two_hop
  • realgraph
  • authors.realgraph
  • recap.tweetfeature, recap.searchfeatureetc.
  • tweetsource
  • in_reply_to_tweet
  • timelines.earlybird
  • realtime_interaction_graph
  • user_tweet.recommendations
  • other

embedded features

Twhin is a large graph embedding trained on Twitter data. We use three 200-dimensional embeddings from the Twhin algorithm.

  • Twhin Follow Embeddings
  • Twhin Engagement Embeddings

⚠ Note that due to user settings or other constraints, not all features are available for each request, and there may be some variance in the "Recommended for you" ranking based on different variables.

3. Tweet filtering

Create a balanced and diverse feed by filtering candidate tweets based on various factors, such as

  • frozen account
  • repeat tweet
  • different authors
  • edited tweet
  • NSFW content and more

Tweets using the Home Mixer service

After the above three stages are completed, the selected tweets can be pushed to the user.

Twitter has a service called Home Mixer for building For You timelines. Home Mixer is mainly developed based on the Scala programming language and connects all the recommendation stages together. It's also responsible for mixing tweets with other non-tweeted content, such as ads, recommendations to follow, and login prompts.

The entire pipeline we discussed above runs about 5 billion times per day, with an average completion time of less than 1.5 seconds.

Summarize

Although this article does not go into the technical details of the algorithm, all the code and data Twitter has been open sourced on GitHub . Later, I will take you to explore the implementation details module by module. It is great that Twitter is willing to open source its most valuable and core algorithm. As Elon Musk said, he is really trying to liberate this blue bird and make it more transparent to users.

Guess you like

Origin blog.csdn.net/jarodyv/article/details/130372050