本教程介绍Atomic Convolutional模型。我们会看到Atomic Conv Model的结构,并写一些程序来运行Atomic Convolutions。
结构
ACNN直接使用分子的局部3维结构来分层学习更复杂的化学特征,通过头对头的方式同时优化模型和特征化过程。
原子型卷积使用近邻矩阵以从一个输入表示(迪卡尔原子坐标)提取编码局部化学环境提特征而不需要空间位置。以下是构建ACNN结构的方法:
距离矩阵。距离矩阵R由迪卡尔原子坐标X构建。它从距离张量D计算距离。距离矩构建接收输入a (N, 3)坐标矩阵C。这个矩阵“neighbor listed”到a (N, M) 矩阵R。
R = tf.reduce_sum(tf.multiply(D, D), 3) # D: Distance Tensor
R = tf.sqrt(R) # R: Distance Matrix
return R
原子型卷积。原子型卷积构建自距离矩阵R以及原子数量矩阵Z。矩阵R饲入stride 为1深度为Na的a (1x1)滤波器。Na是分子系统中唯一原子的数量(原子类型)。原子类型卷积核是个分步函数,对近邻距离矩阵R进行操作。
半径池化层。半径池化是有量纲的红色精益生产过程,它下取样原子类型卷积的输出。减少过程防止过拟合,通过提供通过特征丢弃以及减少学习参数的抽象表示,。数学上讲,半径池化层池化大小为(1xMx1) stride为 1深度为Nr的张量切片(接受域),Nr为希望的半径过滤器。
原子全链接网络。原子卷积层通过堆叠输入模式化的半径池化层的输出到原子类型卷积操作。最后,我们一行一行的饲入张量到全链接网络。给定分子的每个原子使用相同的全链接权重和偏置。
现在我们看到了ACNNs的结构,我们将深入模型并看如何训练它并看我们的期望输出是什么。
出于训练目的,我们使用公开的PDBbind数据集。本例中,每一行反映蛋白配体复合物。有如下列:唯一的复合物标识字;配体的SMILES字串;复合物中配体与蛋白的结合亲和力(Ki);蛋白的PDB文件列表;配体的PDB文件列表。
In [1]:
!curl -Lo conda_installer.py https://raw.githubusercontent.com/deepchem/deepchem/master/scripts/colab_install.py
import conda_installer
conda_installer.install()
!/root/miniconda/bin/conda info -e
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3489 100 3489 0 0 27046 0 --:--:-- --:--:-- --:--:-- 27046
add /root/miniconda/lib/python3.6/site-packages to PYTHONPATH
python version: 3.6.9
fetching installer from https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
done
installing miniconda to /root/miniconda
done
installing rdkit, openmm, pdbfixer
added omnia to channels
added conda-forge to channels
done
conda packages installation finished!
# conda environments:
#
base * /root/miniconda
In [2]:
!pip install --pre deepchem
import deepchem
deepchem.__version__
Collecting deepchem
Downloading https://files.pythonhosted.org/packages/b5/d7/3ba15ec6f676ef4d93855d01e40cba75e231339e7d9ea403a2f53cabbab0/deepchem-2.4.0rc1.dev20200805054153.tar.gz (351kB)
|████████████████████████████████| 358kB 2.8MB/s
Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from deepchem) (0.16.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.18.5)
Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.0.5)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.6/dist-packages (from deepchem) (0.22.2.post1)
Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from deepchem) (1.4.1)
Requirement already satisfied: python-dateutil>=2.6.1 in /usr/local/lib/python3.6/dist-packages (from pandas->deepchem) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas->deepchem) (2018.9)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.6.1->pandas->deepchem) (1.15.0)
Building wheels for collected packages: deepchem
Building wheel for deepchem (setup.py) ... done
Created wheel for deepchem: filename=deepchem-2.4.0rc1.dev20200805144642-cp36-none-any.whl size=438624 sha256=7e5b9b5d387726c10af3665c3fabc3cf8955c98122717ba2e3ccdb016174e99e
Stored in directory: /root/.cache/pip/wheels/41/0f/fe/5f2659dc8e26624863654100f689d8f36cae7c872d2b310394
Successfully built deepchem
Installing collected packages: deepchem
Successfully installed deepchem-2.4.0rc1.dev20200805144642
Out[2]:
'2.4.0-rc1.dev'
In [3]:
import deepchem as dc
import os
from deepchem.utils import download_url
In [4]:
download_url("https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/pdbbind_core_df.csv.gz")
data_dir = os.path.join(dc.utils.get_data_dir())
dataset_file= os.path.join(dc.utils.get_data_dir(), "pdbbind_core_df.csv.gz")
raw_dataset = dc.utils.load_from_disk(dataset_file)
In [5]:
print("Type of dataset is: %s" % str(type(raw_dataset)))
print(raw_dataset[:5])
#print("Shape of dataset is: %s" % str(raw_dataset.shape))
Type of dataset is: <class 'pandas.core.frame.DataFrame'>
pdb_id ... label
0 2d3u ... 6.92
1 3cyx ... 8.00
2 3uo4 ... 6.52
3 1p1q ... 4.89
4 3ag9 ... 8.05
[5 rows x 7 columns]
训练模型
我们看到我们的数据集的样子了,现在继续吧,写些代码来处理这些数据集。
In [6]:
import numpy as np
import tensorflow as tf
TODO(rbharath): This tutorial still needs to be fleshed out.
教程尚未完成。
下载全文请到www.data-vision.net,技术联系电话13712566524