Table of contents
2 Causal inference using a multi-task approach
2.3Deep counterfactual networks with propensity-dropout
1 Multi-task learning
Keras-mmoe/census_income_demo.py at master Drawbridge/Keras-mmoe GitHub
Recommendation System - (16) Multi-task Learning: Google MMOE Principles and Practice-Knowledge
1.1 Problem description
In recent years, deep neural networks have been used more and more widely, such as recommender systems. Recommendation systems usually need to optimize multiple goals at the same time. For example, in movie recommendation, it is not only necessary to predict whether the user will buy, but also to predict the user's rating of the movie. Therefore, the multi-task learning model has become a hot spot in the research field.
1.2 Dataset
- Example demo of running the model with the census-income dataset from UCI
- This dataset is the same one in Section 6.3 of the paper
1.3 Network structure
1.4 Results
1.5 Multitasking code
model = Model(inputs = [inputOrdInfo,inputTextInfo], outputs = [output,outputTextInfo,outputOrdInfo])
#lr_schedule = schedules.ExponentialDecay(initial_learning_rate=0.0015,decay_steps=100,decay_rate=0.95)
#adam = Adam(lr=0.005)
#adam = Adam(learning_rate = lr_schedule)
rmsprop = RMSprop(lr = 0.005)
#sgd = SGD(lr=0.001, momentum=0.0, decay=0.0, nesterov=False)
model.compile(optimizer=rmsprop, loss='binary_crossentropy', metrics=None,loss_weights=[0.19, 0.8,0.01])
# 0.2, 0.8,0.001
checkpoint = ModelCheckpoint('./bestModel.h5', monitor='val_output_loss', verbose=0, save_best_only=True, mode='min',save_weights_only=True)
earlystopping = EarlyStopping(monitor='val_output_loss', min_delta=0, patience=2, verbose=0, mode='min', baseline=None)
# model.fit([X_train_order,X_train_text],[y_train,y_train,y_train],batch_size=64, epochs=15,validation_data = ([X_val_ord_info,X_val_text],[y_val,y_val,y_val]),shuffle=True,callbacks = [earlystopping])
#note reduce_lr = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.fit([X_train_order,X_train_text],[y_train,y_train,y_train],batch_size=64, epochs=15,validation_data = ([X_val_ord_info,X_val_text],[y_val,y_val,y_val]),shuffle=True,
callbacks = [earlystopping]
,sample_weight = [W_train,W_train,W_train])
2 Causal inference using a multi-task approach
The multi-task learning method is used to learn the causal relationship, especially the multi-task learning mode of the multi-research recommendation system, and corresponding supplements are made.
2.1DRNet
Learning Counterfactual Representations for Estimating Individual Dose-Response Curves
- The parameters of L1 base layers participate in the training of all data sets, and the parameters of L2 treatment layers only participate in the training of Treatment group samples
- Can be applied to more complex intervention scenarios, discrete state intervention + continuous state intervention , for each combination of intervention, use the head network to learn
- Let's take an easy-to-understand case. We want to test the effects of different drugs on different patients. t=0~k-1 respectively represent different groups of patients, t=0 is the normal group, t=1~k-1 respectively represent the diabetic group, hypertensive patient group and other patient groups, the drug dosage level m is divided into a , b, c represent low dose/medium dose/high dose, respectively, and use head network learning for different combinations of t and m. Each treatment layer is further subdivided into E head layers (only the set of E = 3 head layers for t = 0 treatment is shown above).
2.2Dragonet
Adapting Neural Network for the Estimation of Treatment Effects
- dragonNet (learning non-linear relationships): a two-stage method, first learning the representation model, and then learning the inference model
If the propensity score network is lost, the network structure is the same as that of TARNET , and a test comparison with this method is done later. This part of the loss tends to be divided into parts, which will cause the network weight to automatically reduce the weight of the features with poor correlation with g(x), which is conducive to feature selection. The following introduces the target regularizaiton to improve the loss.
2.3Deep counterfactual networks with propensity-dropout
Abstract: We propose a new method for inferring individualized causal effects of treatments (interventions) from observational data. Our approach conceptualizes causal inference as a multitask learning problem; we use a deep multitask network with a set of shared layers between factual and counterfactual outcomes, and a set of outcome-specific Result modeling. The effect of selection bias in the observation data is mitigated by a propensity-dropout regularization scheme, where the network thins out each training example with a dropout probability that depends on the associated propensity score. The network is trained in alternating stages, in each stage we use training examples from one of the two potential outcomes (treated and control populations) to update the weights of the shared layer and the respective outcome-specific layer. Experiments based on data from real-world observational studies demonstrate that our algorithm outperforms state-of-the-art algorithms.
- The model adopts the idea of multi-objective modeling , and puts the Treatment group and Control group samples in the same model to reduce model redundancy
- The left part is a multi-objective framework. The samples of the Treatment group and the Control group have a shared layer and their own independent network layers, so as to learn the Treatment model and the Control model.
- The Propensity Network on the right mainly controls the complexity of the left model. If the data is well divided, the left model is controlled by generating Dropout-Propensity to make it simpler; if the data is not well divided, the left model is controlled to be more complicated.
- During training, the samples of the Treatment group and the Control group are trained separately. When the number of iterations is odd, the samples of the Treatment group are trained ; when the number of iterations is even, the samples of the Control group are trained.
network:
If a parameter requires_grad=False, and this parameter is in the optimizer, it will not be updated, and the program will not report an error
network.hidden1_Y1.weight.requires_grad = False
import torch
import torch.nn as nn
import torch.optim as optim
from DCN import DCN
class DCN_network:
def train(self, train_parameters, device):
epochs = train_parameters["epochs"]
treated_batch_size = train_parameters["treated_batch_size"]
control_batch_size = train_parameters["control_batch_size"]
lr = train_parameters["lr"]
shuffle = train_parameters["shuffle"]
model_save_path = train_parameters["model_save_path"].format(epochs, lr)
treated_set = train_parameters["treated_set"]
control_set = train_parameters["control_set"]
print("Saved model path: {0}".format(model_save_path))
treated_data_loader = torch.utils.data.DataLoader(treated_set,
batch_size=treated_batch_size,
shuffle=shuffle,
num_workers=1)
control_data_loader = torch.utils.data.DataLoader(control_set,
batch_size=control_batch_size,
shuffle=shuffle,
num_workers=1)
network = DCN(training_flag=True).to(device)
optimizer = optim.Adam(network.parameters(), lr=lr)
lossF = nn.MSELoss()
min_loss = 100000.0
dataset_loss = 0.0
print(".. Training started ..")
print(device)
for epoch in range(epochs):
network.train()
total_loss = 0
train_set_size = 0
if epoch % 2 == 0:
dataset_loss = 0
# train treated
network.hidden1_Y1.weight.requires_grad = True
network.hidden1_Y1.bias.requires_grad = True
network.hidden2_Y1.weight.requires_grad = True
network.hidden2_Y1.bias.requires_grad = True
network.out_Y1.weight.requires_grad = True
network.out_Y1.bias.requires_grad = True
network.hidden1_Y0.weight.requires_grad = False
network.hidden1_Y0.bias.requires_grad = False
network.hidden2_Y0.weight.requires_grad = False
network.hidden2_Y0.bias.requires_grad = False
network.out_Y0.weight.requires_grad = False
network.out_Y0.bias.requires_grad = False
for batch in treated_data_loader:
covariates_X, ps_score, y_f, y_cf = batch
covariates_X = covariates_X.to(device)
ps_score = ps_score.squeeze().to(device)
train_set_size += covariates_X.size(0)
treatment_pred = network(covariates_X, ps_score)
# treatment_pred[0] -> y1
# treatment_pred[1] -> y0
predicted_ITE = treatment_pred[0] - treatment_pred[1]
true_ITE = y_f - y_cf
if torch.cuda.is_available():
loss = lossF(predicted_ITE.float().cuda(),
true_ITE.float().cuda()).to(device)
else:
loss = lossF(predicted_ITE.float(),
true_ITE.float()).to(device)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
dataset_loss = total_loss
elif epoch % 2 == 1:
# train controlled
network.hidden1_Y1.weight.requires_grad = False
network.hidden1_Y1.bias.requires_grad = False
network.hidden2_Y1.weight.requires_grad = False
network.hidden2_Y1.bias.requires_grad = False
network.out_Y1.weight.requires_grad = False
network.out_Y1.bias.requires_grad = False
network.hidden1_Y0.weight.requires_grad = True
network.hidden1_Y0.bias.requires_grad = True
network.hidden2_Y0.weight.requires_grad = True
network.hidden2_Y0.bias.requires_grad = True
network.out_Y0.weight.requires_grad = True
network.out_Y0.bias.requires_grad = True
for batch in control_data_loader:
covariates_X, ps_score, y_f, y_cf = batch
covariates_X = covariates_X.to(device)
ps_score = ps_score.squeeze().to(device)
train_set_size += covariates_X.size(0)
treatment_pred = network(covariates_X, ps_score)
# treatment_pred[0] -> y1
# treatment_pred[1] -> y0
predicted_ITE = treatment_pred[0] - treatment_pred[1]
true_ITE = y_cf - y_f
if torch.cuda.is_available():
loss = lossF(predicted_ITE.float().cuda(),
true_ITE.float().cuda()).to(device)
else:
loss = lossF(predicted_ITE.float(),
true_ITE.float()).to(device)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
dataset_loss = dataset_loss + total_loss
print("epoch: {0}, train_set_size: {1} loss: {2}".
format(epoch, train_set_size, total_loss))
if epoch % 2 == 1:
print("Treated + Control loss: {0}".format(dataset_loss))
# if dataset_loss < min_loss:
# print("Current loss: {0}, over previous: {1}, Saving model".
# format(dataset_loss, min_loss))
# min_loss = dataset_loss
# torch.save(network.state_dict(), model_save_path)
torch.save(network.state_dict(), model_save_path)
@staticmethod
def eval(eval_parameters, device):
print(".. Evaluation started ..")
treated_set = eval_parameters["treated_set"]
control_set = eval_parameters["control_set"]
model_path = eval_parameters["model_save_path"]
network = DCN(training_flag=False).to(device)
network.load_state_dict(torch.load(model_path, map_location=device))
network.eval()
treated_data_loader = torch.utils.data.DataLoader(treated_set,
shuffle=False, num_workers=1)
control_data_loader = torch.utils.data.DataLoader(control_set,
shuffle=False, num_workers=1)
err_treated_list = []
err_control_list = []
for batch in treated_data_loader:
covariates_X, ps_score, y_f, y_cf = batch
covariates_X = covariates_X.to(device)
ps_score = ps_score.squeeze().to(device)
treatment_pred = network(covariates_X, ps_score)
predicted_ITE = treatment_pred[0] - treatment_pred[1]
true_ITE = y_f - y_cf
if torch.cuda.is_available():
diff = true_ITE.float().cuda() - predicted_ITE.float().cuda()
else:
diff = true_ITE.float() - predicted_ITE.float()
err_treated_list.append(diff.item())
for batch in control_data_loader:
covariates_X, ps_score, y_f, y_cf = batch
covariates_X = covariates_X.to(device)
ps_score = ps_score.squeeze().to(device)
treatment_pred = network(covariates_X, ps_score)
predicted_ITE = treatment_pred[0] - treatment_pred[1]
true_ITE = y_cf - y_f
if torch.cuda.is_available():
diff = true_ITE.float().cuda() - predicted_ITE.float().cuda()
else:
diff = true_ITE.float() - predicted_ITE.float()
err_control_list.append(diff.item())
# print(err_treated_list)
# print(err_control_list)
return {
"treated_err": err_treated_list,
"control_err": err_control_list,
}
We refer to our latent outcome model as a Deep Counterfactual Network (DCN), and we use the acronym DCN-pd to refer to a DCN with propensity-dropout regularization. Since our model captures both propensity scores and outcomes, it is a doubly-robust model.
2.4VCNet
@article{LizhenNie2021VCNetAF, title={VCNet and Functional Targeted Regularization For Learning Causal Effects of Continuous Treatments}, author={Lizhen Nie and Mao Ye and Qiang Liu and Dan L. Nicolae}, journal={arXiv: Learning}, year={2021}}
reference:
- dcn (deep cross network) trilogy - know almost
- Causal reasoning in practice (1) - learning task rules from teaching with the help of causal relationship-Knowledge
- Popular interpretation of causal reasoning causal inference bzdww
- AB experiment high-end gameplay series 1 - Let's take a look
- Collection | Talking about Multi-task Learning (Multi-task Learning) bzdww
- Application Exploration and Case Sharing of Multi-task Learning in Risk Control Scenarios-Knowledge
- Keras-mmoe/census_income_demo.py at master Drawbridge/Keras-mmoe GitHub
- Keras-mmoe/census_income_demo.py at master Drawbridge/Keras-mmoe GitHub
- Multi-objective modeling (1) bzdww
- Recommender System (8) - Summary of Multi-objective Optimization Application_1 - Deep Machine Learning - 博客园
- Multi-task Learning Applied to Causal Modeling - Programmer Sought
- Deep learning [22] Mxnet multi-task (multi-task) training_DCD_Lin's blog-CSDN blog_Multi-task training
- What are some good practices for causal inference in multi-task optimization scenarios? - Know almost
- https://huaweicloud.csdn.net/63802f23dacf622b8df8639e.html
- When neural network training multi-task learning (MTL), how to assign weights to multiple losses (with code)_Neural network multi-task training_Ciao112's blog-CSDN blog
question:
1. Multi-target training, X_T_Y (unique), special effect x [y1, y2], the row format is x [y1, null]?
Answer: Train in a way that the parameters are not updated.
3 thoughts
1. Use causal inference to correct the correlation model. What is the relationship between causal inference and machine learning?
2. What problems do causal inference and machine learning solve?
Answer: Machine learning solves the prediction problem without knowing the reason; causal inference knows the reason and predicts the result.
3. What problems cannot be solved by machine learning?
Answer: Do not intervene, do not explore the cause.
Answer: The question of cause and effect cannot be answered.
4. What problems cannot be solved by causal inference?
5. Is the non-intervention problem still a causal problem?
Answer: I don't understand.
6. How to solve the deviation?