hello Hello everyone,
Since deepctr v0.7.0 update after the end of November last year due to some personal reasons could not timely follow-up questions and answers related to friends in the github issue area, DeepCTR exchange group and submitted by mail, here to say sorry hoping no impact on friends, study and work.
Holiday home lie flat venting at home several days later, and finally in the mood to turn on the computer to change to change the code. This article mainly introduce the new version v0.7.1 is mainly made what changes ~
The main features and improvements
WeightedSequenceLayer parameters weight_normalization not exposed to the outside API
-
Problem Description
-
problem analysis
The problem is in v0.6.3 we support the sequence type with weight characteristics.
For heavy fraction rights, when we realize that the support of the weights are normalized operations, and user when in actual use can not be aware of this feature, default is not normalized model, it requires the user to modify the source code to make WeightedSequenceLayer
It is weight_normalization=True
to enable the realization of the weight fraction normalized pattern.
- solution
In VarLenSparseFeat
adding the weight_norm
parameters, the user defined time weighted sequence characteristics, by providing weight_norm
to True
or False
to control whether the weighted scores normalized by default True
.
Bug fixes
embedding dimension sparse features discrete linear feature column is not one model will lead to loss of memory
-
Problem Description
-
problem analysis
The problem is in v0.7.0 version we support a different embedding dimensions for different feature sets introduced.
In v0.7.0 previous versions, model support linear_feature_columns
and dnn_feature_columns
, respectively, represented by memory and representatives of the wide side of the deep side generalization. wide side SparseFeat
model which automatically sets an embedding dimension to simulate onehot memory characteristics, deep side SparseFeat
through the model embedding_size
parameters to control all features of an embedding dimension groups.
在v0.7.0中,特征组的embedding维度需要在定义特征列时使用SparseFeat
的embedding_dim
参数设置,该参数默认为4。
若用户没有指定输入进linear_feature_columns
的SparseFeat
的embedding_dim=1
,则会导致模型的wide侧失去记忆性。
- 解决方案
在获取wide侧logit的方法get_linear_logit
中,强制覆盖SparseFeat
的embedding_dim=1
。换言之,输入进linear_feature_columns
的SparseFeat
的embedding_dim
失效,会被模型强制设置为1。
版本检查中抛出的异常用户难以理解,影响后续使用
-
问题描述
-
问题分析
在一些无法连接网络或者pip配置有过修改的机器上,deepctr的版本检查会抛出大段用户难以理解的异常。
- 解决方案
版本检查在出现错误时提示用户访问deepctr网站进行人工版本检查,不再提示出错和异常信息。
API变化
deepctr.layers.sequence.WeightedSequenceLayer
WeightedSequenceLayer
中的weight_normalization
默认值变为True
。
-
旧:
deepctr.layers.sequence.WeightedSequenceLayer(weight_normalization=False, supports_masking=False)
-
新:
deepctr.layers.sequence.WeightedSequenceLayer(weight_normalization=True, supports_masking=False)
deepctr.inputs.VarLenSparseFeat
由于VarLenSparseFeat
和SparseFeat
存在较多相同参数,且很多情况下相同参数的取值也是相同的(如用户历史商品点击序列和待预估商品),故将VarLenSparseFeat
的初始化参数更改为由一个SparseFeat
的实例和其他序列相关的参数组成。
对用户而言,只需要理解SparseFeat
的参数含义以及一些序列相关的参数含义就可以使用VarLenSparseFeat
。
-
旧:
VarLenSparseFeat(name, maxlen, vocabulary_size, embedding_dim=4, combiner="mean", use_hash=False, dtype="float32", length_name=None, weight_name=None, embedding_name=None, group_name=DEFAULT_GROUP_NAME)
-
新:
VarLenSparseFeat(sparsefeat, maxlen, combiner="mean", length_name=None, weight_name=None, weight_norm=True)
以上就是本次更新的说明,快使用命令pip install -U deepctr
更新吧!希望朋友们能够多多支持,多多提意见!谢谢!