Interpretable paper reading notes 1-Tree Regularization

An AAAI2018 article on the interpretability of deep learning models. The main highlight of the article is the introduction of tree regularization to give a decision tree with interpretability capabilities while training deep learning networks.

Beyond Sparsity: Tree Regularization of Deep Models for Interpretabilityarxiv.org


link: https://arxiv.org/pdf/1711.06178.pdf

1. Importance of model interpretability

(1) The further application of the deep model requires model interpretability, such as finance, medical and other fields

(2) Bionic models are easier to understand and apply, such as decision tree models

2. Model Interpretion Introduction

There are two sets of ideas for model interpretability: one is to find interpretability for the model that has been trained; the other is to train a more interpretive model

(1) Looking interpretable model is already trained: Looking decision tree represents a neural network model [1] , the gradient of the input and output sensitivity analysis [2] , the programming model to find representation [3] , Find the rule set representation of the model [4]

(2) Train a more explanatory model: penalize more irrelevant features to obtain sparse features [5] , find the highlighted part from the text input [6]

3. Related work

Various types of regularization are used: L1 regularization method [7] , binary network [8] , Edge and node regularization [9]

4. Detailed description of the method

key idea : While training the deep learning model, obtain a decision tree with high accuracy and low complexity, and the complexity of the decision tree is used as a regular item;

The complexity of the decision tree : Using APL (average path length), the average number of nodes through which the samples in the training set make a decision, APL is Tree Regularization.

image.png

image

5. Experiments and results

Speech recognition task

image

Sepsis Critical Care

image

HIV Therapy Outcome

image

6. Code interpretation

tree-regularization-publicgithub.com


link: https://github.com/dtak/tree-regularization-public

Training process: you can see that image.png

 Sequential training process

def train(self, X_train, F_train, y_train, iters_retrain=25, num_iters=1000,
             batch_size=32, lr=1e-3, param_scale=0.01, log_every=10):
       npr.seed(42)
       num_retrains = num_iters // iters_retrain
       for i in xrange(num_retrains):
           self.gru.objective = self.objective
           # carry over weights from last training
           init_weights = self.gru.weights if i > 0 else None
           print('training deep net... [%d/%d], learning rate: %.4f' % (i + 1, num_retrains, lr))
           self.gru.train(X_train, F_train, y_train, num_iters=iters_retrain,
                          batch_size=batch_size, lr=lr, param_scale=param_scale,
                          log_every=log_every, init_weights=init_weights)
           print('building surrogate dataset...')
           W_train = deepcopy(self.gru.saved_weights.T)
           APL_train = self.average_path_length_batch(W_train, X_train, F_train, y_train)
           print('training surrogate net... [%d/%d]' % (i + 1, num_retrains))
           self.mlp.train(W_train[:self.gru.num_weights, :], APL_train, num_iters=3000,
                          lr=1e-3, param_scale=0.1, log_every=250)

       self.pred_fun = self.gru.pred_fun
       self.weights = self.gru.weights
       # save final decision tree
       self.tree = self.gru.fit_tree(self.weights, X_train, F_train, y_train)

       return self.weights


reference

  1. ^train decision trees for pretrained neural network (Craven and Shavlik (1996))

  2. ^Adler, P.; Falk, C.; Friedler, S. A.; Rybeck, G.; Scheidegger, C.; Smith, B.; and Venkatasubramanian, S. 2016. Auditing black-box models for indirect influence. In ICDM

  3. ^Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. Why should I trust you?: Explaining the predictions of any classifier. In KDD

  4. ^Lakkaraju, H.; Bach, S. H.; and Leskovec, J. 2016. Interpretable decision sets: A joint framework for description and prediction. In KDD.

  5. ^Ross, A.; Hughes, M. C.; and Doshi-Velez, F. 2017. Right for the right reasons: Training differentiable models by constraining their explanations. In IJCAI

  6. ^Ross, A.; Hughes, M. C.; and Doshi-Velez, F. 2017. Right for the right reasons: Training differentiable models by constraining their explanations. In IJCAI

  7. ^Zhang, Y.; Lee, J. D.; and Jordan, M. I. 2016. l1-regularized neural networks are improperly learnable in polynomial time. In ICML

  8. ^Tang, W.; Hua, G.; and Wang, L. 2017. How to train a compact binary neural network with high accuracy? In AAAI.

  9. ^Ochiai, T.; Matsuda, S.; Watanabe, H.; and Katagiri, S. 2017. Automatic node selection for deep neural networks using group lasso regularization. In ICASSP




本文由作者授权AINLP原创发布于公众号平台,点击'阅读原文'直达原文链接,欢迎投稿,AI、NLP均可。


原文链接:


https://zhuanlan.zhihu.com/p/99384386




关于AINLP


AINLP is an interesting natural language processing community with AI, focusing on the sharing of AI, NLP, machine learning, deep learning, recommendation algorithms and other related technologies. Topics include text summarization, intelligent question answering, chat robots, machine translation, automatic generation, and knowledge Graphs, pre-training models, recommendation systems, computational advertisements, recruitment information, job search experience sharing, etc. Welcome to follow! To add technical exchange group, please add AINLP Jun WeChat (id: AINLP2), note work/research direction + add group purpose.


image


Guess you like

Origin blog.51cto.com/15060464/2675628