由于项目的需要,需要搜集一批有标签的图片,但是人力没有那么多,无法对图片进行分类,所以就先用无监督的方法对用机器对图片自动分类,先富集一批数据,然后再对模型进行训练,于是就想到了k-means算法,但是图片需要提取特征,于是想到了使用SIFT来对图片进行提取特征,提取的方法使用OpenCV的库来进行提取,具体安装OpenCV的方法请参考:点击打开链接。
废话不多说,看代码:
-
#-*- encoding:utf-8 -*-
-
__date__ =
'17/04/21'
-
'''
-
CV_INTER_NN - 最近邻插值,
-
CV_INTER_LINEAR - 双线性插值 (缺省使用)
-
CV_INTER_AREA - 使用象素关系重采样。当图像缩小时候,该方法可以避免波纹出现。当图像放大时,类似于 CV_INTER_NN 方法..
-
CV_INTER_CUBIC - 立方插值
-
'''
-
-
import os, codecs
-
import cv2
-
import numpy
as np
-
from sklearn.cluster
import KMeans
-
-
def get_file_name(path):
-
'''
-
Args: path to list; Returns: path with filenames
-
'''
-
filenames = os.listdir(path)
-
path_filenames = []
-
filename_list = []
-
for file
in filenames:
-
if
not file.startswith(
'.'):
-
path_filenames.append(os.path.join(path, file))
-
filename_list.append(file)
-
-
return path_filenames
-
-
def knn_detect(file_list, cluster_nums, randomState = None):
-
features = []
-
files = file_list
-
sift = cv2.SIFT()
-
for file
in files:
-
print(file)
-
img = cv2.imread(file)
-
img = cv2.resize(img, (
32,
32), interpolation=cv2.INTER_CUBIC)
-
-
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
-
print(gray.dtype)
-
_, des = sift.detectAndCompute(gray,
None)
-
-
if des
is
None:
-
file_list.remove(file)
-
continue
-
-
reshape_feature = des.reshape(
-1,
1)
-
features.append(reshape_feature[
0].tolist())
-
-
input_x = np.array(features)
-
-
kmeans = KMeans(n_clusters = cluster_nums, random_state = randomState).fit(input_x)
-
-
return kmeans.labels_, kmeans.cluster_centers_
-
-
def res_fit(filenames, labels):
-
-
files = [file.split(
'/')[
-1]
for file
in filenames]
-
-
return dict(zip(files, labels))
-
-
def save(path, filename, data):
-
file = os.path.join(path, filename)
-
with codecs.open(file,
'w', encoding =
'utf-8')
as fw:
-
for f, l
in data.items():
-
fw.write(
"{}\t{}\n".format(f, l))
-
-
def main():
-
path_filenames = get_file_name(
"./picture/")
-
-
labels, cluster_centers = knn_detect(path_filenames,
2)
-
-
res_dict = res_fit(path_filenames, labels)
-
save(
'./',
'knn_res.txt', res_dict)
-
-
if __name__ ==
"__main__":
-
main()
使用的方法就是再path 里面传入picture的文件夹地址,还有需要分的类别数,然后程序检测过后将检测的结果写入文件。当然也可以根据检测结果将对应的图片写入对应的文件夹,这个就懒得弄了。还有就是可以设置初始化的rand_state。这个照着之前的维度设置就可以了,留作后期再弄。
-----------------------EOF--------------------------
参考文献:
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html