Install environment
1. GPU environment
CUDA 10.1
2. create a new conda environment
conda create -n imagemol python=3.7.3
conda activate imagemol
3. download some packages
conda install -c rdkit rdkit
windows:
- pip install https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-win_amd64.whl
- pip install https://download.pytorch.org/whl/cu101/torchvision-0.5.0-cp37-cp37m-win_amd64.whl
linux:
- pip install https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl
- pip install https://download.pytorch.org/whl/cu101/torchvision-0.5.0-cp37-cp37m-linux_x86_64.whl
pip install torch-cluster torch-scatter torch-sparse torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.4.0%2Bcu101.html
pip install -r requirements.txt
source activate imagemol
Pretraining
1. preparing dataset
Download pretraining data and put it into ./datasets/pretraining/data/
Preprocess dataset:(这个时间比较长)
python ./data_process/smiles2img_pretrain.py --dataroot ./datasets/pretraining/ --dataset data
Note: You can find the toy dataset in ./datasets/toy/pretraining/
2. start to pretrain
Usage:
usage: pretrain.py [-h] [--lr LR] [--wd WD] [--workers WORKERS] [--val_workers VAL_WORKERS] [--epochs EPOCHS] [--start_epoch START_EPOCH] [--batch BATCH] [--momentum MOMENTUM] [--checkpoints CHECKPOINTS] [--seed SEED] [--dataroot DATAROOT] [--dataset DATASET] [--ckpt_dir CKPT_DIR] [--modelname {ResNet18}] [--verbose] [--ngpu NGPU] [--gpu GPU] [--nc NC] [--ndf NDF] [--imageSize IMAGESIZE] [--Jigsaw_lambda JIGSAW_LAMBDA] [--cluster_lambda CLUSTER_LAMBDA] [--constractive_lambda CONSTRACTIVE_LAMBDA] [--matcher_lambda MATCHER_LAMBDA] [--is_recover_training IS_RECOVER_TRAINING] [--cl_mask_type {random_mask,rectangle_mask,mix_mask}] [--cl_mask_shape_h CL_MASK_SHAPE_H] [--cl_mask_shape_w CL_MASK_SHAPE_W] [--cl_mask_ratio CL_MASK_RATIO]
Code to pretrain:
python pretrain.py --ckpt_dir ./ckpts/pretraining/ \ --checkpoints 1 \ --Jigsaw_lambda 1 \ --cluster_lambda 1 \ --constractive_lambda 1 \ --matcher_lambda 1 \ --is_recover_training 1 \ --batch 256 \ --dataroot ./datasets/pretraining/ \ --dataset data \ --gpu 0,1,2,3 \ --ngpu 4
For testing, you can simply pre-train ImageMol using single GPU on toy dataset:
python pretrain.py --ckpt_dir ./ckpts/pretraining-toy/ \ --checkpoints 1 \ --Jigsaw_lambda 1 \ --cluster_lambda 1 \ --constractive_lambda 1 \ --matcher_lambda 1 \ --is_recover_training 1 \ --batch 16 \ --dataroot ./datasets/toy/pretraining/ \ --dataset data \ --gpu 0 \ --ngpu 1