Quantum-Lazy-Learning
Background
近些年来,量子机器学习算法不断地涌现出来,这些算法大部分以张量网络为桥梁,可以来处理计算机视觉、模式识别以及自然语言处理等领域的问题[1-8]。其中一大类算法是采用量子态空间来表示样本的概率分布,从而完成生成或分类的任务【6】。 Quantum Lazy Learing就是其中一种,不过它并不是一种受欢迎的量子机器学习分类算法,它在论文中常以各种各样的形式出现并被摒弃,但它好在具有简单的形式,易于理解。在B站视频《张量网络基础课程》的第23小节对lazy learning做了详细的阐述,它在mnist上的分类达到97%之高。感兴趣的同学还可以结合视频中提到的论文GTNC【6】进行进一步的了解,下面将详细介绍它的内容。
Content
lazy learning基于等概率假设,该假设内容是:所有训练样本在量子态空间出现的概率相同。这样将不同的样本映射到量子态空间后,就会自然的形成一个个聚类。以mnist举例来说,映射后的样本在Hilbert空间中会自然的形成十个簇。那么当我们试图对一张新的图片分类时,只需要求出该样本表示的量子态与不同数字簇的距离,用argmax函数即可求得样本的分类结果(属于哪个数字簇)。不同于我们常用的欧式距离,这里我们采用保真度来衡量两个量子态的相似性。不同距离度量对分类效果的差异可以详细参考上面提到的GTNC [6]。
对于具备 L L L个像素的图片集而言(或具备 L L L个特征量的样本集),我们假设其联合概率分布由 L L L个qubit构成的多体态(记为 ∣ φ ⟩ |\varphi \rangle ∣φ⟩)描述,满足
P ( y 1 , … , y L ) = ( ∏ ⊗ l = 1 L ∣ ⟨ y l ∣ ψ ⟩ ∣ ) 2 \mathrm{P}\left(y_{1}, \ldots, y_{L}\right)=\left(\prod_{\otimes l=1}^{L}\left|\left\langle y_{l} \mid \psi\right\rangle\right|\right)^{2} P(y1,…,yL)=(⊗l=1∏L∣⟨yl∣ψ⟩∣)2
P ( y 1 , . . . , y L ) P(y_1, ..., y_L) P(y1,...,yL)表示该概率分布给出的样本 Y = ( y 1 , . . . , y L ) Y=(y_1, ..., y_L) Y=(y1,...,yL)出现的概率,用图形表示为
由此可见,只需要知道训练集,即可通过特征映射计算出 φ l a z y \varphi^{lazy} φlazy态,而不包含任何训练和更新过程, φ l a z y \varphi^{lazy} φlazy态也不包含任何变分参数,因此通过这种方式进行分类监督学习任务被称为量子懒惰学习(quantum lazy learning),唯一的超参数就是映射函数的选择,可以是 ( x , 1 − x ) (x,1-x) (x,1−x)、 ( x , 1 − x ) (\sqrt{x}, \sqrt{1-x}) (x,1−x)或者是 ( s i n ( π 2 x i ) , c o s ( π 2 x j ) ) (sin(\frac{\pi}{2}x_i), cos(\frac{\pi}{2}x_j)) (sin(2πxi),cos(2πxj))【10】等等。
以mnist为例,对于不同的数字可以定义10个lazy态
∣ φ k l a z y ⟩ = 1 ∣ X ∣ ∣ ∑ X ∈ X k ∏ ⊗ l = 1 L ∣ x i ⟩ \left|\varphi_{k}^{lazy}\right\rangle=\frac{1}{\sqrt{|\mathbb{X}|} \mid} \sum_{X \in \mathbb{X}_{k}}{\prod_{\otimes l=1}^{L}}\left|x_{i}\right\rangle ∣∣∣φklazy⟩=∣X∣∣1X∈Xk∑⊗l=1∏L∣xi⟩
同时这样的lazy态满足概率归一条件
⟨ ψ lazy ∣ ψ lazy ⟩ = 1 ∣ x ∣ ∑ X , X ′ ∝ x ⟨ X ∣ X ′ ⟩ ≈ 1 ∣ x ∣ ∑ X , X ′ ∝ x δ X , X ′ = 1 \left\langle\psi^{\text {lazy }} \mid \psi^{\text {lazy }}\right\rangle=\frac{1}{|\mathbb{x}|} \sum_{X, X^{\prime} \propto \mathbb{x}}\left\langle X \mid X^{\prime}\right\rangle \approx \frac{1}{|\mathbb{x}|} \sum_{X, X^{\prime} \propto \mathbb{x}} \delta_{X, X^{\prime}}=1 ⟨ψlazy ∣ψlazy ⟩=∣x∣1X,X′∝x∑⟨X∣X′⟩≈∣x∣1X,X′∝x∑δX,X′=1
当然由于式中的累乘,使得量子态的表示是指数复杂的,以mnist为例,需要 2 784 2^{784} 2784(特征映射维度为2时)的空间来表示,这对经典计算机显然是一个不可能任务。因此在实际计算时,可以将样本与lazy态的内积化简到多项式级别复杂度进行计算,下面对该方式进行进一步的讨论。
Supplementary
上面我们提到lazy态的表示为
∣ φ k l a z y ⟩ = 1 ∣ X ∣ ∣ ∑ X ∈ X k ∏ ⊗ l = 1 L ∣ x i ⟩ \left|\varphi_{k}^{lazy}\right\rangle=\frac{1}{\sqrt{|\mathbb{X}|} \mid} \sum_{X \in \mathbb{X}_{k}}{\prod_{\otimes l=1}^{L}}\left|x_{i}\right\rangle ∣∣∣φklazy⟩=∣X∣∣1X∈Xk∑⊗l=1∏L∣xi⟩
对于样本 ∣ Y ⟩ = ∏ ⊗ l = 1 L ∣ s i ⟩ |Y{\rangle}=\prod_{\otimes l=1}^{L}\left|s_{i}\right\rangle ∣Y⟩=∏⊗l=1L∣si⟩,样本在lazy态中的概率(保真度)表示为
P k ( Y ) = ∣ ⟨ Y ∗ ∣ ψ k lazy ⟩ ∣ 2 = 1 ∣ X ∣ ⋅ ∣ ∑ X ∈ X k ∏ ⊗ l = 1 L ⟨ S l ∣ x l ⟩ ∣ 2 P_{k}(Y)=\left|\left\langle{Y^{*}} \mid \psi_{k}^{\operatorname{lazy}}\right\rangle\right|^{2}\\=\frac{1}{|\mathbb{X}|}\cdot \mid \sum_{\operatorname{X \in \mathbb{X}_{k}}} \prod_{\mathbb{\otimes l=1} }^{L}{\left\langle S_{l} \mid x_{l}\right\rangle} |^{2} Pk(Y)=∣∣∣⟨Y∗∣ψklazy⟩∣∣∣2=∣X∣1⋅∣X∈Xk∑⊗l=1∏L⟨Sl∣xl⟩∣2
从这个结果的角度来分析,通过这样的转换的确是将避免了直接表示lazy态的指数复杂度 O ( d L ) O(d^L) O(dL),将其降为多项式计算复杂度 O ( N L d ) O(NLd) O(NLd)。但这样运算导致的一个后果就是当样本 Y Y Y和训练集样本 X X X一旦有一个像素不同,那么累乘的概率就必定为0。即使将其表示为灰度图片,那么累乘也会使得该运算的结果指数小,求得的概率也就没有意义。
这一点在咨询了首师大的冉老师之后得到解决。在他最新的工作【9】中,样本表示的指数小问题被重新拿出来分析。作者的做法是采用对数保真度来转化累乘,同时引入了 ϵ \epsilon ϵ偏置项来保证模型的稳定。于是上述公式也可以被描述为
P k ( Y ) = 1 ∣ X ∣ ⋅ ∣ ∑ X ∈ X k ∏ ⊗ l = 1 L ⟨ S l ∣ x l ⟩ ∣ 2 = 1 ∣ X ∣ ⋅ ∣ ∑ X ∈ X k ∑ l = 1 L l o g 10 ( ⟨ S l ∣ x l ⟩ + ϵ ) ∣ 2 P_{k}(Y)=\frac{1}{|\mathbb{X}|}\cdot \mid \sum_{\operatorname{X \in \mathbb{X}_{k}}} \prod_{\mathbb{\otimes l=1} }^{L}{\left\langle S_{l} \mid x_{l}\right\rangle} |^{2}\\=\frac{1}{|\mathbb{X}|}\cdot \mid \sum_{\operatorname{X \in \mathbb{X}_{k}}} \sum_{\mathbb{l=1} }^{L}{log_{10}(\left\langle S_{l} \mid x_{l}\right\rangle+\epsilon)} |^{2} Pk(Y)=∣X∣1⋅∣X∈Xk∑⊗l=1∏L⟨Sl∣xl⟩∣2=∣X∣1⋅∣X∈Xk∑l=1∑Llog10(⟨Sl∣xl⟩+ϵ)∣2
其中 ϵ \epsilon ϵ是一个接近0的很小的数,是为了避免出现 l o g 0 log0 log0的情况。这样使得样本间概率指数衰减的问题就得到解决。下一部分将展示lazy learning的核心代码供大家学习,与上述公式所描述的过程是一致的。
Code
下面是将lazy learning用于mnist数据集图像分类的一个例子,仅给出了核心代码,同时写了一些注释供大家参考学习。
def lazy_learning(train_images, test_images, mode = 'mapped'):
'''
params:
train_images: (np.array) 3-order or 4-order tensor, shape in (n_class, n_samples, pixels) or (n_class, n_samples, pixels, map_dim), corresponding to two modes.
test_images: (np.array) 3-order or 4-order tensor, shape in (n_class, n_test_samples, pixels) or (n_class, n_test_samples, pixels, map_dim)
mode: (str) 'mapped' or 'unmapped'
'''
print(train_images.shape)
if mode == 'mapped':
n_class, n_samples, pixels, _ = train_images.shape
n_test_samples = test_images.shape[1]
else:
n_class, n_samples, pixels = train_images.shape
n_test_samples = test_images.shape[1]
for lb in range(n_class): # Traverse the test set of different categories
predict = []
for i in range(n_test_samples):
fidelity = []
for j in range(n_class):
contracted = 0.0
if mode == 'mapped': # get an test image from test set
samples = mapped_test_image[lb, i, :, :]
else:
samples = test_images[lb, i, :]
for t in range(n_samples): # sum inner product between train and samples
contracted_tmp = 0.0
for p in range(pixels):
if mode == 'mapped':
inner_res = np.inner(samples[p, :], mapped_train_image[j, t, p, :])
else:
inner_res = abs(np.cos((np.pi / 2) * (samples[p] - train_images[j, t, p]))) # see arxiv.2107.00195 for details
contracted_tmp += np.log10(inner_res + epsilon)
contracted += contracted_tmp
f_c = contracted / float(n_samples) # get avg fidelity between sample and total training images
fidelity.append(f_c) # the probility of sample to per class
label = np.array(fidelity).argmax(axis=0)
predict.append(label)
predict = np.array(predict)
print("For number {0}, total test sampels {1}, {2} of test set are predicted correctly.".format(lb, n_test_samples, sum(predict == lb)))
Reference
- Zhaoyu Han, Jun Wang, Heng Fan, Lei Wang, and Pan Zhang. Unsupervised Generative Modeling Using Matrix Product States. Physical Review X, 8(3):31012, 2018.
- Song Cheng, Lei Wang, Tao Xiang, and Pan Zhang. Tree tensor networks for generative modeling. Physical Review B, 99(15):1–10, 2019.
- Yaliang Zhao, Laurence T. Yang, and Ronghao Zhang. Tensorbased multiple clustering approaches for cyber-physical-social applications. IEEE Transactions on Emerging Topics in Computing, 8(1):69–81, 2020.
- Xingwei Cao, Xuyang Zhao, and Qibin Zhao. Tensorizing Generative Adversarial Nets. 2018 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia), pages 206–212, 2018.
- Maria Schuld and Nathan Killoran. Quantum Machine Learning in Feature Hilbert Spaces. Physical Review Letters, 122(4), 2019.
- Zhengzhi Sun, Cheng Peng, Ding Liu, Shiju Ran, and Gang Su. Generative tensor network classification model for supervised machine learning. Physical Review B, 101(7):1–6, 2020.
- Song Cheng, Lei Wang, and Pan Zhang. Supervised learning with projected entangled pair states. Physical Review B, 103(12):1–7, 2021.
- Raghavendra Selvan, Silas Ørting, and Erik B Dam. Locally orderless tensor networks for classifying two- and three-dimensional medical images. arXiv preprint arXiv:2009.12280, pages 1–21, 2020.
- Li, Wei-Ming, and Shi-Ju Ran. “Non-parametric Active Learning and Rate Reduction in Many-body Hilbert Space with Rescaled Logarithmic Fidelity.” arXiv preprint arXiv:2107.00195 (2021).
- Blagoveschensky, Philip, and Anh Huy Phan. “Deep convolutional tensor network.” arXiv preprint arXiv:2005.14506 (2020).