TensorFlow Recommenders is now open source, taking the recommendation system to the next level!

Sentence / Maciej Kula Japanese James Chen, Google Brain

Recommendation systems are a major application of machine learning, which can push relevant content based on user preferences, such as recommending movies, restaurants, matching fashion jewelry, or filtering media information streams.

Google has been exploring new deep learning technologies in the past few years, striving to provide better recommendations by combining multi-task learning, reinforcement learning, extracting better user representations, and constructing fairness indicators. These efforts and other developments have greatly improved our recommendations.

Today, we are proud to launch TensorFlow Recommenders (TFRS), an open source TensorFlow software package that simplifies the construction, evaluation, and application of complex recommendation models.

  • TensorFlow Recommenders (TFRS)
    https://tensorflow.google.cn/recommenders

TFRS is built with TensorFlow 2.x and helps:

  • Build and evaluate a flexible Candidate Nomination Model;

  • Freely integrate item, user and contextual information into the recommendation model;

  • Train a multi-task model that can jointly optimize multiple recommendation targets;

  • Use TensorFlow Serving to efficiently use the generated model.

TFRS is based on TensorFlow 2.x and Keras and is very easy to use. While adopting a modular design (you can customize each layer and evaluation index), it still forms a strong whole (each component can work well together). In the design process of TFRS, we have always emphasized flexibility and ease of use: reasonable default settings, intuitive and easy common tasks, and more complex or customized recommended tasks.

TensorFlow Recommenders is now open source on GitHub. Our goal is to allow it to continue to develop, be able to conduct academic research flexibly, and build a network-wide recommendation system in a highly scalable manner. We also plan to expand its functions in multi-task learning, feature cross-modeling, self-supervised learning, and the cutting edge (SOTA)  approximate nearest neighbor calculation .

  • GitHub
    https://github.com/tensorflow/recommenders

Example: Building a movie recommendation tool

Let's use a simple example to show how to use TensorFlow Recommenders. First, install TFRS using pip:

!pip install tensorflow_recommenders

Then, we can use the MovieLens dataset to train a simple movie recommendation model. The information contained in the data set includes which movies the user has watched and the user's rating of the movie.

We will use this data set to build a model to predict the movies that users have watched and not watched. This type of task usually chooses the twin tower model: a neural network with two sub-models to learn the representation of query and candidate respectively. The score for a given query-candidate pair is just the dot product of the outputs of these two towers.

The model architecture is quite flexible. The input of the query tower can be: user ID, search keywords or timestamp; for the candidate side: movie title, description, summary, and star list.

In this example, we only use the user ID in the query tower and only the movie title in the candidate tower.

Let's prepare the data first. Data can be obtained from TensorFlow Datasets.

import tensorflow as tf

import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
# Ratings data.
ratings = tfds.load("movie_lens/100k-ratings", split="train")
# Features of all the available movies.
movies = tfds.load("movie_lens/100k-movies", split="train")

Of all the available features in the data set, the most useful are the user ID and movie title. Although TFRS has a variety of optional features, for simplicity, we only use these two.

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
})
movies = movies.map(lambda x: x["movie_title"])

When only the user ID and movie title are used, our simple twin tower model is very similar to the typical matrix factorization model. We need to build with the following:

  • A user tower converts user ID into user embedding vector (represented by high-dimensional vector).

  • A movie tower that converts movie titles into movie embedding vectors.

  • A loss function, for viewing behavior, maximizes the prediction of the match between the user and the movie, and minimizes the unwatched behavior.

TFRS and Keras provide a large number of basic modules to achieve this goal. We can start by creating a model class. In the __init__method, we set some hyperparameters and the main components of the model.

class TwoTowerMovielensModel(tfrs.Model):
  """MovieLens prediction model."""

  def __init__(self):
    # The `__init__` method sets up the model architecture.
    super().__init__()

    # How large the representation vectors are for inputs: larger vectors make
    # for a more expressive model but may cause over-fitting.
    embedding_dim = 32
    num_unique_users = 1000
    num_unique_movies = 1700
    eval_batch_size = 128

The first major component is the user model: a set of layers that describe how to transform original user characteristics into digital user representations. Here we use the Keras preprocessing layer to convert the user ID into an integer index, and then map it to the learned embedding vector:

# Set up user and movie representations.
    self.user_model = tf.keras.Sequential([
      # We first turn the raw user ids into contiguous integers by looking them
      # up in a vocabulary.
      tf.keras.layers.experimental.preprocessing.StringLookup(
          max_tokens=num_unique_users),
      # We then map the result into embedding vectors.
      tf.keras.layers.Embedding(num_unique_users, embedding_dim)
    ])

The movie model looks very similar and can convert movie titles into embedding vectors:

self.movie_model = tf.keras.Sequential([
      tf.keras.layers.experimental.preprocessing.StringLookup(
          max_tokens=num_unique_movies),
      tf.keras.layers.Embedding(num_unique_movies, embedding_dim)
    ])

After getting the user and movie models, we need to define our goals and its evaluation indicators. In TFRS, this can be accomplished through Retrieval tasks (using in-batch softmax loss):

# The `Task` objects has two purposes: (1) it computes the loss and (2)
    # keeps track of metrics.
    self.task = tfrs.tasks.Retrieval(
        # In this case, our metrics are top-k metrics: given a user and a known
        # watched movie, how highly would the model rank the true movie out of
        # all possible movies?
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(eval_batch_size).map(self.movie_model)
        )
    )

We use the compute_loss method to view the training process of the model:

def compute_loss(self, features, training=False):
    # The `compute_loss` method determines how loss is computed.

    # Compute user and item embeddings.
    user_embeddings = self.user_model(features["user_id"])
    movie_embeddings = self.movie_model(features["movie_title"])

    # Pass them into the task to get the resulting loss. The lower the loss is, the
    # better the model is at telling apart true watches from watches that did
    # not happen in the training data.
    return self.task(user_embeddings, movie_embeddings)

We can call Keras's fit to fit this model:

model = MovielensModel()
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

model.fit(ratings.batch(4096), verbose=False)

To perform Sanity-Check (reasonability check) on model recommendations, we can use the TFRS BruteForce layer. The BruteForce layer sorts the pre-computed candidate representations, allowing us to calculate the scores of the query-candidate pairs for all possible candidates, and return the highest ranked movie (query):

index = tfrs.layers.ann.BruteForce(model.user_model)
index.index(movies.batch(100).map(model.movie_model), movies)

# Get recommendations.
_, titles = index(tf.constant(["42"]))
print(f"Recommendations for user 42: {titles[0, :3]}")

Of course, the BruteForce layer is only suitable for very small data sets. For an example of using TFRS with the approximate nearest neighbor library Annoy, see our complete tutorial.

  • Complete tutorial
    https://tensorflow.google.cn/recommenders/examples/basic_retrieval#building_a_candidate_ann_index

We hope this will give you an idea of ​​the features of TensorFlow Recommenders. To learn more, check out our tutorial or API reference. If you want to participate and promote the development of the TensorFlow recommendation system together, please consider contributing your strength! We will also announce the establishment of the TensorFlow Recommendations Special Interest Group (SIG) in the near future. We welcome everyone to cooperate and contribute on topics such as embedded vector learning and distributed training and applications. Stay tuned!

  • Tutorial
    https://tensorflow.google.cn/recommenders/examples/quickstart

  • API reference
    https://tensorflow.google.cn/recommenders/api_docs/python/tfrs/all_symbols

  • Contribute your strength
    https://github.com/tensorflow/recommenders/

Thanks

TensorFlow Recommenders is the result of the joint efforts of Google and other organizations. We would like to thank Tiansheng Yao, Xinyang Yi, and Ji Yang for their core contributions to the library, and thank Lichan Hong and Ed Chi for their leadership and guidance. We would also like to thank Zhe Zhao, Derek Cheng, Sagar Jain, Alexandre Passos, Francois Chollet, Sandeep Gupta, Eric Ni and others for their suggestions and support to the project.

If you want to learn more about mentioned herein related content, see the following documents. These documents delve into many of the topics mentioned in this article:

  • Reinforcement learning
    https://research.google/pubs/pub47647/

  • Better user characterization
    https://research.google/pubs/pub47954/

  • Fairness indicator
    https://research.google/pubs/pub48107/

  • Candidate Nomination Model
    https://research.google/pubs/pub48840/

  • Information
    https://tensorflow.google.cn/recommenders/examples/featurization

  • Multitask model
    https://tensorflow.google.cn/recommenders/examples/multitask

  • TensorFlow Serving
    https://tensorflow.google.cn/tfx/guide/serving

  • MovieLens dataset
    https://grouplens.org/datasets/movielens/

  • Twin tower model
    https://research.google/pubs/pub48840/

  • TensorFlow Datasets 
    https://tensorflow.google.cn/datasets/catalog/movielens

  • Keras preprocessing layer
    https://keras.io/guides/preprocessing_layers/

  • in-batch softmax loss
    https://research.google/pubs/pub48840/

To learn more, please click "Read the original" to visit GitHub.

Guess you like

Origin blog.csdn.net/jILRvRTrc/article/details/109301853
Recommended