2018.Oct of Matrix Factorization
KBMF2K - M.gonen, 2012
Model
model flow chart
(drug specific)K: Nd*Nd Kernel matrix Λ: Nd*R Prior matrix A: Nd*R projection matrix G: R*Nd projected matrix F: Nd*Nt Score matrix Y: Nd*Nt associated interaction matrix
parameter design
upgrade algorithm
α-shape parameter of gamma distribution β-scale parameter of gamma distribution ν-margin parameter
Notation
R: {5, 10, 15, 20, 25} ν: {0, 1} α=1 β=1 σ=0.1 Kd - SIMCOMP score Kt - Smith-waterman score
Demodulation
to be continued...
Validation
Point estmates of probabilistic estimates instead of interval estimate cause the variances of estimates is very small.
Predictiing performance
Five fold cross validation repeated five times -- average method: Macro
Mark Down
1、marginal likelihood
2、conditional distribution
3、Kullback-Leibler Divergence
详情请见:如何理解K-L散度(相对熵)
统计学上,常会用一个更简单的近似分布q来替代复杂的分布p。这会涉及三个问题:(1)怎样定义两个分布“接近” (2)如何选择合适的分布模型 (3)找到合适的模型后如何调制合适的模型参数(这里不作展开讨论)
⚪首先,我们需要用到信息熵这个工具。熵和信息论有许多关联,是重要的信息度量单位。可以理解为编码信息所需要的最少位数,或者某个事件的信息量。
Entropy
对于某个事件xi,将其看为多个独立二元事件aj(j=1,2,...,n)的组合,即p(xi)=∏p(a1)p(a2)...p(an)。可直观地看到,对应这个假设,上式的对数取2,那么描述p(xi)的二进制编码位数n就求得了。上式也可以说明为什么事件的概率越小,其包含的信息越多。然后再对每个子事件加权求和,事件X的信息量(或平均编码长度),就得到了。
⚪有了上面的指标,我们就能用K-L散度来度量p被q替代后损失的信息。DKL(p||q)=H(q)-H(p)
entropy-p-q
DKL1
DKL2
KL散度-相对熵(relative entropy)能看作信息差的期望。
要注意:该期望是基于p(xi)求的,因为事件真实的概率分布是p(xi)而不是q(xi)。所以DKL(p||q)=DKL(q||p)是不成立的。不能简单地将其看作距离。
⚪另外附一DKL的平滑方法
▲总结一个规律:(p为双峰高斯分布,q为单峰高斯分布)用KL(q||p)做最优化,是希望p(x)为0的地方q(x)也要为0,否则q(x)/p(x)就会很大;反之,如果用KL(p||q)做最优化,就要尽量避免p(x)不为0而q(x)用0去拟合的情况,或者说p(x)不为0的地方q(x)也不要为0。
4、variational approximation using Gibbs sampling [Gelfand and Smith, 1990] -- verse deterministic variational approximation [Beal, 2003]
5、Jensen's inequality
6、Automatic relevance determination [Neal,1996]
7、Similarity score of compound
⚪Smith-Waterman score(for genomic similarity)
Smith-Waterman score
genomic similarity kernel
⚪SIMCOMP score (for bio-chemical compound similarity)
1、distinguish atoms into 68 atom types under different environments
2、find the Maximum Clique(MCL) first, then find the maximal simply connected common subgraph(SCCS) by maximize the association graph(AG)
weighting method (c=0.5)
v∈MCL(AG)
3、Improve the clique finding algorithm
(1) to suspend the clique finding procedure at the number of recursion steps R max (15000) , at most,
(2) to eliminate any small SCCSs whose cardinality is lower than S min (2) , and
(3) to extend the other SCCSs greedily while any candidate exists.
4、Normalized score
Jaccard coefficient of MCS
|X| is the cardinality of graph X, and q ranges from 0 to 1 represent the extent of absence of common substructure.
KBMF - M.Gonen, 2013
Modeling
flowchart
Main parts:
(a) kernel-based nonlinear dimensionality reduction
Scholkopf, B. and Smola, A. J. Learning with Kernels: Support Vector Machines, Regularization, Optimization,and Beyond. MIT Press, Cambridge, MA, 2002.
(b) multiple kernel learning
(c) matrix factorization
(d) binary classification
Notation
Same as the KBM2K above
except, ηx: priors vector of weight vector ex (αη,βη)can tune the kernel-level sparsity σg,σh is the standard deviation of G,H
(αη,βη, αλ,βλ)=(1, 1, 1, 1) (σg,σh)=(0.1,0.1) R={5, 10, ..., 40} σy=1 v=1
kx,m、kz,m is the similarity score method of drugs and targets
parameter design
parameter design
parameter design
parameter design
prediction
Demodulation
1、The conjugacy of the model resulting approximate posterior distribution of factors follow the same distribution as the corresponging factor.
2、The normality of kernel weights ensures a fully conjugate probabilistic model which help to derive variational approximation. [Gonen, 2012b]
to be continued...
Validation
5 replications of five-fold cross validation --average method: marco
CMF- Zheng Xiao Dong, 2013 √
Modeling
Y≈AB' Sd≈AA' St≈BB'
Algorithm Deduction
The first line is the WLRA (Weighted Low-Rank Approximation) term that tries to find the latent feature matrices A and B that reconstruct Y . The second line is the Tikhonov regularization term. The third and fourth lines are regularization terms that require latent feature vectors of similar drugs/targets to be similar and latent feature vectors of dissimilar drugs/targets to be dissimilar, respectively.
Note:
W·(Y-AB') is hadamard product and the second term of the loss function is the Tikhonov regularization regading A and B
Algorithm procession
Notation: K: {50, 100} λl{0.25, 0.5, ..., 2} λω: {2, 4, ..., 1024} λd、λt: {0.125, 0.25, ..., 32}
Demodulation
Alternating Least Squares Algorithm
take partial derivative of L respectively to update the parameters.
to be continued...
Validation
10-fold cross validation for 5 times average methods: Macro
NRLMF √
Modeling
estimating the global structure by LMF
Neighborhood Regularization objective of drug
Note: the adjacency matrices A and B are not symmetric
total object function
Algorithm
Notation:
U、V is the latent matrices of drugs and targets c: weight of the positive sample K1、K2:number of nearest neighborhood r: dimension α、β:neighborhood regularization parameter γ: descent step
λd,λt: {2^(-5), ...,2} α: {2^(-5), ...,4} β: {2^(-5), ...,1} γ: {2^(-3), ...,1 } K1=K2=5 c=5
▲Predition regularization
regularized the effect of c in prediction
Demodulation
Alternating gradient descent procedure
AdaGrad algorithm
to be continued...
Validation
five trials of 10-fold cross-validation average method: macro
GRMF Ali Ezzat - 2016
Modeling
two steps:
⚪GRMF
(i) WKNKN (weighted K nearest known neighbors), a preprocessing step that transforms the binary values in the given drug-target matrix, Y , into interaction likelihood values;
K nearest known neighbors
η≤1 is a decay term
(ii) GRMF (graph regularized matrix factorization), a matrix factorization technique for predicting drug-target interactions. A variant of GRMF called WGRMF (weighted GRMF) is also proposed.
sparsification of similarity matrices
objective function with LRA and Tikhonov term
objective function with normalized graph Laplacian
⚪WGRMF
weight matrix W identical to that used in CMF
parameter setting
K(number of NKN): 5 η: 0.7 k(dimension of latent vector): k=min(n,m)&maximum=100∈{50,100} λl∈{2^(-2), ..., 2} λd、λt∈{10^(-4), ..., 10^(-1)} p=5
Demodulation
⚪GRMF
matrix-wise ALS update rule
⚪WGRMF
row-wise ALS update rule
Hence, GRMF runs faster than WGMF.
Validation
5 repetitions of 10-fold cross validation(CV) -average method: macro
Supplement
⚪mix-max normalization
suitable for additive term
⚪ λ d is important under CVd , while λ t is important under CVt, because AUPR of CV is quite low if the corresponding λ is 0.
⚪The drug-to-target ratio will effect the outcome under different scenario CV.
⚪ TRPV6 and Hexobarbital are considered difficult cases. Specifically, the similarity of TRPV6 to its nearest neighboring target (according to St ) is as low as 0.05, while the similarity of Hexobarbital to its nearest neighboring drug (according to Sd ) is 0.35.
DNILMF Hao Ming - 2017
Modeling
Flowchart
1、Profile inferring and kernel construction
2、Similarity diffusion
[15] Improved prediction of drug-target interactions using regularized least squares integrating with kernel fusion technique
[16] Similarity network fusion for aggregating data types on a genomic scale
3、Dual-network integrated LMF(Like KBMF)
Interaction prediction based on d-t interaction and similarity network
Note: ° denotes Hadamard product
4、Smoothing new DTI by incorporating neighbor information
drug/target latent matrix update
Notation:
R(number of latent variables): {30, 40, ..., 100} K(number of neighbors)=5 diffusing iteration=2 (smoothing coefficients)α=0.5 {0, 0.1, ..., 1} β=γ=0.25 --α+β+γ=1 c(augmented folds for known DTI pairs)=5 {3, 4, ..., 10} λu=5 {1, 2, ..., 10} λv=1 {1, 2, ..., 10} k'(number of neighbors of smoothing prediction)=5 {1, 2, ..., 10}
to be continued...
Demodulation
similar to KBMF
Validation
5 trials of 10-fold cross-validation --average method: --
Supplement
⚪Two combination methods are often used to obtain ultimately learned matrices. One is supervised multiple kernel learning, and the others is unsupervised learning which is flexible since it can be obtained before the model building step.
⚪ Logistic matrix factorization is suitable for binary variables.
VB-MK-LMF Bence Bolgar - 2017
Modeling
Probabilistic model
to be continued...
Notation:
αu、αv=0.1 au、av=1 bu、bv=1000 c=10 L=10(NR)、15(others) D(interation):20-50
Demodulation
Variational approximation
distribution q ∗ (U) is non-conjugate due to the form p(R | U,V), and therefore the integral is intractable. Hence, Taylor approximation on the symmetrized logistic function is used.
Taylor approximation on the symmetrized logistic func-tion
to be continued...
Validation
5 trials of 10-fold cross validation --average method: unknown
supplement
⚪Term
1、Normalization(归一化)
On one hand,normalization rescales the values into a range of [0,1]. This might be useful in some cases where all parameters need to have the same positive scale. However, the outliers from the data set are lost.
On the other hand,In Linear Algebra, Normalization seems to refer to the dividing of a vector by its length.
2、Standardization(标准化)
Standardization rescales data to have a mean () of 0 and standard deviation () of 1 (unit variance).
For most applications standardization is recommended.
3、Regularization(正则化)
regularization is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting.
2018-Nov of Ensemble Clustering
MVEC - 2018
Supplement
⚪LRR
Conclusion
18年阅读的论文主要是应用MF方法的推荐系统的相关模型,接触到许多经典的矩阵分解模型(NMF、LMF、BMF)以及许多优化方法(GR、核方法、规则化等等)主要还慢慢掌握了阅读论文的方式,如何快速高效地从一篇论文中提取新颖的思想、经典的论证方法等。
此外,一个学期下来,还发现了自己存在不少的缺点。算法实现功底不足、解模的数学知识不牢固、剖析算法的特点以及作横向对比的积累不够等等等等,都需要在新一年的工作里得到重视和改善。