Phylobayes做Cross-Validation

Phylobayes做Cross-Validation

原理
Cross-validation (CV) is a general method for evaluating the fit of alternative models. The rationale is as follows: the dataset is randomly split into two (possibly unequal) parts, the training (or learning) set and the test set. The parameters of the model are estimated on the learning set (i.e. the model is ’trained’ on this subset of empirical observations), and these parameter values are then used to compute the likelihood of the test set (which measures how well the test set is ’predicted’ by the model). The overall procedure has to be repeated (and the resulting log likelihood scores averaged) over several random splits
CV用来评估最适替换模型,原理是将数据集分为训练集和测试集,用训练集去估计模型参数,然后将这些参数用于测试集,去计算似然值。该过程需要多次重复,计算出的似然值取平均输出。

Typically, 10-fold cross-validation (such that D2 represents 10% and D1 90% of the original dataset) has been used (e.g. Philippe et al., 2011), and ten replicates have been run (although ideally, 100 replicates would certainly be more adequate). However, alternative schemes are possible.
用户手册推荐训练集10%、测试集90%的分法(10 fold),重复10次

操作流程

  1. cvrep: prepare the replicates
  2. pb: run each model under each replicated learning set
  3. readcv: compute the cross-validation scores on each replicate
  4. sumcv: pool the cv-scores and combine them into a global scoring of the models

Step I:

cvrep -nrep 10 -nfold 10 -d 13PCG123.phy pcg

生成10对learn和test文件

Step II:

pb -d PCG0_learn.ali -T tree.nwk -x 10 11000 CATpcg0_learn.ali
pb -d PCG0_learn.ali -T tree.nwk -x 1 1100 -wag WAGpcg0_learn.ali

运行完全部的10个 PCG*_learn.ali文件

Step III: Calculate cross-validated likelihoods

readcv -nrep 10 -x 100 1 CAT pcg
readcv -nrep 10 -x 100 1 WAG pcg

Note that, when used with the -nrep option such as above, readcv will process each replicate successively, which may take a very long time. Alternatively readcv can be called on individual replicates. For instance:
readcv -rep 2 -x 100 10 CAT pcg

Step IV: Average the cv-log-likelihood scores over replicates

sumcv -nrep 10 WAG CAT pcg
sumcv -nrep 10 WAG CAT GTR pcg

The first model of the list (here WAG) as the reference

猜你喜欢

转载自blog.csdn.net/weixin_40099163/article/details/84140938
今日推荐