Embedding表示map f: X(高维) -> Y(低维),减小数据维度,方便计算+提高准确率。
参看Kaggle Learn:https://www.kaggle.com/learn/embeddings
官方DNN示例:
user_id_input = keras.Input(shape=(1,), name='user_id') movie_id_input = keras.Input(shape=(1,), name='movie_id') user_embedded = keras.layers.Embedding(df.userId.max()+1, user_embedding_size, input_length=1, name='user_embedding')(user_id_input) movie_embedded = keras.layers.Embedding(df.movieId.max()+1, movie_embedding_size,
movie_embedding_size = user_embedding_size = 8 # Each instance consists of two inputs: a single user id, and a single movie id user_id_input = keras.Input(shape=(1,), name='user_id') movie_id_input = keras.Input(shape=(1,), name='movie_id') user_embedded = keras.layers.Embedding(df.userId.max()+1, user_embedding_size, input_length=1, name='user_embedding')(user_id_input) movie_embedded = keras.layers.Embedding(df.movieId.max()+1, movie_embedding_size, input_length=1, name='movie_embedding')(movie_id_input) dotted = keras.layers.Dot(2)([user_embedded, movie_embedded]) out = keras.layers.Flatten()(dotted)
两种类型对比如下,简单模型(蓝色)的表现也相当好,两个模型都有明显的过拟合。