使用SIFT特征提取和K-Means方法对图片进行分类

由于项目的需要,需要搜集一批有标签的图片,但是人力没有那么多,无法对图片进行分类,所以就先用无监督的方法对用机器对图片自动分类,先富集一批数据,然后再对模型进行训练,于是就想到了k-means算法,但是图片需要提取特征,于是想到了使用SIFT来对图片进行提取特征,提取的方法使用OpenCV的库来进行提取,具体安装OpenCV的方法请参考:点击打开链接

废话不多说,看代码:


  
  
  1. #-*- encoding:utf-8 -*-
  2. __date__ = '17/04/21'
  3. '''
  4. CV_INTER_NN - 最近邻插值,
  5. CV_INTER_LINEAR - 双线性插值 (缺省使用)
  6. CV_INTER_AREA - 使用象素关系重采样。当图像缩小时候,该方法可以避免波纹出现。当图像放大时,类似于 CV_INTER_NN 方法..
  7. CV_INTER_CUBIC - 立方插值
  8. '''
  9. import os, codecs
  10. import cv2
  11. import numpy as np
  12. from sklearn.cluster import KMeans
  13. def get_file_name(path):
  14. '''
  15. Args: path to list; Returns: path with filenames
  16. '''
  17. filenames = os.listdir(path)
  18. path_filenames = []
  19. filename_list = []
  20. for file in filenames:
  21. if not file.startswith( '.'):
  22. path_filenames.append(os.path.join(path, file))
  23. filename_list.append(file)
  24. return path_filenames
  25. def knn_detect(file_list, cluster_nums, randomState = None):
  26. features = []
  27. files = file_list
  28. sift = cv2.SIFT()
  29. for file in files:
  30. print(file)
  31. img = cv2.imread(file)
  32. img = cv2.resize(img, ( 32, 32), interpolation=cv2.INTER_CUBIC)
  33. gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  34. print(gray.dtype)
  35. _, des = sift.detectAndCompute(gray, None)
  36. if des is None:
  37. file_list.remove(file)
  38. continue
  39. reshape_feature = des.reshape( -1, 1)
  40. features.append(reshape_feature[ 0].tolist())
  41. input_x = np.array(features)
  42. kmeans = KMeans(n_clusters = cluster_nums, random_state = randomState).fit(input_x)
  43. return kmeans.labels_, kmeans.cluster_centers_
  44. def res_fit(filenames, labels):
  45. files = [file.split( '/')[ -1] for file in filenames]
  46. return dict(zip(files, labels))
  47. def save(path, filename, data):
  48. file = os.path.join(path, filename)
  49. with codecs.open(file, 'w', encoding = 'utf-8') as fw:
  50. for f, l in data.items():
  51. fw.write( "{}\t{}\n".format(f, l))
  52. def main():
  53. path_filenames = get_file_name( "./picture/")
  54. labels, cluster_centers = knn_detect(path_filenames, 2)
  55. res_dict = res_fit(path_filenames, labels)
  56. save( './', 'knn_res.txt', res_dict)
  57. if __name__ == "__main__":
  58. main()

使用的方法就是再path 里面传入picture的文件夹地址,还有需要分的类别数,然后程序检测过后将检测的结果写入文件。当然也可以根据检测结果将对应的图片写入对应的文件夹,这个就懒得弄了。还有就是可以设置初始化的rand_state。这个照着之前的维度设置就可以了,留作后期再弄。


-----------------------EOF--------------------------


参考文献:

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html


猜你喜欢

转载自blog.csdn.net/zhonglongshen/article/details/87929141