Category two-channel music system files

X x wind courses from personal notes to organize your class talking about bad teachers seem to figure out the code did not, say some fur, mainly to learn some new ideas and expression

First, the basics of supplementary
works 1.try is this: When after the start of a try statement, python will be marked in the context of the current program, so that when an exception occurs can be back here, try clause is executed first, followed by What will happen depends on whether or not an exception occurs during execution.

If reading a file, hope that in the case of whether or not an abnormal occurrence are closing the file, how to do it? This can be accomplished using the finally block. Note that, in a try block, except clause may be used simultaneously and finally block. If you want to use them, then a need to embed another.

The main module is glob 2.glob method, which returns a list of all files matching path (List); the method used to develop requires a path string matching parameters (absolute path string may be a relative path may be),
it returns the file name includes only the current directory in the file name, not including subfolders inside.
https://blog.csdn.net/qq_40196164/article/details/83067846

3.sys os and set system parameters https://blog.csdn.net/zengxiantao1994/article/details/58188527 '' '

4.Python lower () method converts the string in all uppercase characters to lowercase. Syntax lower () method syntax: str.lower () Parameters None

The collection (set) of another form of a data structure of the python. Action: automatically remove duplicate element set type data (set), and sort the elements. Type of ordered set of elements is not repeated disorder.

6.python zip
https://www.cnblogs.com/wdz1226/p/10181354.html

7.glob.glob
main method is glob glob module, the method returns a list of all matching file path (list)
This method requires a parameter is used to specify the path string match (absolute path string may be a relative path may be ), which returns the name of the file contains only the current directory in the file name, not including subfolders inside.
https://blog.csdn.net/qq_17753903/article/details/82180227

8.map function
https://www.runoob.com/python/python-func-map.html

9.Numpy knowledge points to add: np.vstack () & np.hstack ()
https://www.jianshu.com/p/2469e0e2a1cf

10. The popular and appreciated OvO OVR
https://blog.csdn.net/alw_123/article/details/98869193

11.isinstance
https://www.runoob.com/python/python-func-isinstance.html

12.SVM is an acronym for Support Vector Machines (SVM), and can be used for classification and regression. SVC is a Type SVM, is used to do the classification, SVR is another Type SVM, it is used to do regression.
https://blog.csdn.net/u012331016/article/details/45223135

13.Python Dictionary (Dictionary) get () method
https://www.runoob.com/python/att-dictionary-get.html

14. In doing model training time, especially to do cross-validation on the training set, typically you want to save the model down, then put on an independent test set to test, here is the preservation and re-use in Python training model .
scikit-learn model has been persistent operation, can be introduced joblib

https://blog.csdn.net/helloxiaozhe/article/details/80658438
Here Insert Picture Description
Here Insert Picture Description

Second, the code part

'''
代码热身
没有的话需要安装 
pip install pydub
pip install python_speech_features
'''


from pydub.audio_segment import AudioSegment  #切割
from scipy.io import wavfile  #mp3是压缩过后的音乐 失掉很多特征
from python_speech_features.base import mfcc
import pandas as pd
import numpy as np


song=AudioSegment.from_file("./cccc/abc.MP3",format="mp3")
song.export("./cccc/abc.wav",format="wav")
rate,data=wavefile.read("./cccc/abc.wav")
#print(data)
#print(rate)


#mfcc包含了傅里叶变换  和 梅尔倒谱系数
mf_feat=mfcc(data,rate,numcep=13,nfft=2048)
#13是维度 nfft是傅里叶转化时候的速率
#print(mf_feat)#打印出来是一个13维的向量

mm=np.mean(mf_feat,axis=0) #降维处理1*13
mf=np.transpose(mm)
mc.cov(mf)#我不需要行和行,要的是列之间的 关系 转置之后看

result=mm

for k in range(len(mm)):
    result=np.append(result,np.diag(mc,k))
print(result)

features

#feature  老师用的python3 因为起名可以用中文...
import pandas as pd
import numpy as np
import glob
from pydub.audio_segment import AudioSegment  
from scipy.io import wavfile 
from python_speech_features.base import mfcc
import os 
import sys
import time


#def 获取歌单(): 我看着难受...
def getMusicMenu():
    data=pd.read_csv(./ccc)
	data=data[["name","tag"]] #标签使csv文件中已经打好标签了清新摇滚。。
	return data

def getMusicFeatures(file):
    items=file.split(",")
	file.format=items[-1].lower()
	file_name=file[:-len(file.format)+1)]
	if file_format!="wav":
	   #把mp3格式的文件转化为wav,保存至原文件夹
	   song=AudioSegment.from_file(file,format="mp3")
	   file=file_name+".wav"  #我怎么记的这个不合适
	   song.export(file,format="wav")
	#提取wav格式歌曲特征 
	try:
	   rate,data=wavefile.read(file)
	   mfcc_feas=mfcc(data,rate,numcep=13,nfft=2048)#卷积的算法降维
	   mm=np.transpose(mffc_feas)
	   mc=np.cov(mm)
	   result=mc
	   for i in range(mm.shape[0]):
	       result=np.append(result,np.diag(mc,i))
		return result
		
	except Exception as msg:   #为了报错不影响往下进行
	    print(msg)
 	   
	   
def reatureExtraction():
    df=	getMusicMenu()
	name_label_list=np.array(df).tolist()
	name_label_dict=dict(map(lambda t:(t[0],t[1]),name_label_list))
	labels=set(ame_label_dict.values())#不要忘了.values()
	label_index_dict=dict(zip(labels,np.arrange(len(labels))))
	all_music_files=glob.glob(歌曲路径)
	all_music_files.sort()
	loop_count=0
	flag=True
	
	all _mfcc=np.array([])
	for file_name in all_music_files:
	    print("开始处理"+file_name.replace("\xa0",""))
		#xa0  https://blog.csdn.net/clovejava/article/details/89511172
		#因为是文件夹下 music\下面的 
	    music_name=file_name.split("\\")[-1].split(".")[-2].split("-")[-1]
		music_name=music_name.strip()
		if music_name in name_label_dict:
		   label_index=label_index_dict[name_label_dict[music_name]]
		   ff=getMusicFeatures(file_name)
		   ff=np.append(ff,label_index)
		   
		   if flag:
		      all_mfcc=ff
			  flag=Flase
			  
			else:
			    print("无法处理"+file_name.replace("\xa0","")+",找不到对应的lable")
			print(looping----%d" % loop_count)
			print(all_mfcc.shape:",end="")
			print(all_mfcc.shape)
			loop_count+=1
			
		
		label_index_list=[]
		for k in label_index_dict:
		    label_index_list.append([k,label_index_dict[k])
        pd.DataFrame(label_index_list).to_csv(数值化标签路径,header=None,\
		                                      index=False,encoding="utf-8")
											  
		pd.DataFrame(all_mfcc).to_csv(歌曲特征文件存放路径,header=None,\
		                              index=False,encoding="utf-8")
		return all_mfcc

if __name__="main":
   歌曲路径="./data/music_info.csv"
   歌曲源路径"./data/music/*.mp3"
   数值化标签路径="./data/music_index_label.csv"
   歌曲特征文件存放路径="./data/music_features.csv"
   start=time.time()
   reatureExtraction()
   end=time.time()
   
   print("总耗时%.2f秒%(end-start))
   

```bash
#acc 是老师自己写的  预测值和真实值的差异
def get(res,tes):
    n=len(res)
	truth=(res==tes)
	pre=0
	for flag in truth:
	    if flag:
		    pre+=1
	return (pre*100)/n

from sklearn import svm
from sklearn.utils import shuffle  #打乱,再训练,洗牌
from sklearn.model_selection import GridSearchCV,train_test_split 
#网格交叉验证是调参的,交叉验证是评估模型
from sklearn.externals import joblib
import pandas as pd
import numpy as np
import acc  #自己写的
import sys
import time

#选取最优的核函数 rbf高斯核函数  linear 是独立的  poly有互相交叉相乘  半正定
def internal_cross_validation(X,Y):
    parameters={
	     “kernel":("linear","rbf","poly"),
		 "c":[0.1,1]  #松弛因子,泛化能力
		 "probability":[True,False],
		 “decision_function_shape":["ovo","ovr"]
	}
	
	clf=GridSearchCV(svm.SVC(random_state=0),param_grid=parameters,cv=5)
	print(begining...)
	clf.fit(X,Y)
	print("best parameter= ",end="")
	print(clf.best_params_)
	print("best accuracy= ",end="")
    print(clf.best_score_)

def cross_validation(music_csv_file_path=None,data_percentage=0.7):
    if not music_csv_file_path:
       music_csv_file_path=	歌曲特征文件存放路径
	print("begining read data"+music_csv_file_path)
	
	data=pd.read_csv(music_csv_file_path,sep=",",header=None,ending="utf-8")
	sample_fact=0.7  #感觉他这个写错了
	
	if isinstance(data_percetage,float) and 0<data_percentage<1:
	   sample_fact=data_percentage
    
    data=data.sample(frac=sample_fact).T
	X=data[:-1].T  #忘了他不包含右边界了
	Y=np.array(data[-1:])[0]
	#print(X)
	#print(Y)	
	internal_cross_validation(X,Y)
	

def poly_model(X,Y):

	#进行魔性训练,并且计算训练集上预测值与label的准确性

    clf=svm.SVC(kernel="poly",C=0.1,probability=True,decision_function_shape="ovo",random_state=0)	
	clf.fit(X,Y)
	res=clf.predict(X)
	restrain=acc.get(res,Y)
	
    return clf,restrain
		 
def trainModels(train_percentage=0.7,fold=1,music_csv_file_path=None,model_out_f=None)
	
    if not music_csv_file_path:
        music_csv_file_path=歌曲特征文件存放路径
    data=pd.read_csv(music_csv_file_path,sep=",",header=None, encoding="utf-8")
   
    max_train_score=None
    max_test_score=None
    max_source=None
    best_clf=None
    flag=True
    
    for index in range(1,int(fold)+1):
        print(index)
		shuffle_data=shuffle(data)
		X=shuffle_data.T[:1].T
		Y=np.array(shuffle_data.T[-1])[0]
		X_train,X_test,Y_train,Y_test=train_test_split(X,Y,train_size=0.3, random_state=0)
		(clf,train_source)=poly_model(X_train,Y_train)
		y_predict=clf.predict(x_test)
		test_source=acc.get(y_predict,y_test)#测试集的准确率
		source=0.35*train_source+0.65*test_source #模型综合准确率
		if flag:
		    max_source=source
			max_train_source=train_source
			max_test_source=test_source
			best_clf=clf			
        	flag=False	
 
        else:
		    if max_source<source:
			   max_source=source
			   max_train_source=train_source
			   max_test_source=test_source
			   best_clf=clf
		print("第%d次训练,训练集上的正确率为:%.2f,测试集上的正确率为:%.2f,
		      加权平均正确率为:%.2f"%(train_source,test_source,source)
		 
	print("最优训练模型,训练集上的正确率为:%.2f,测试集上的正确率为:%.2f,
		  加权平均正确率为:%.2f"%(max_train_source,max_test_source,max_source) 
		 
	print("最优模型是:")
    print(best_clf)
    if not model_out_f:
        model_out_f=模型保存路径
    joblib.dump(best_clf,model_out_f)		 
		 
if __name__="__main__":
     
	print("="*30+" begining searching the most suitable model..." "+"+"*30
	#打印30个等号 
	start=time.time()
	cross_validation(music_csv_file_path=None,data_percentage=0.7)
	end=time.time()
	print("cost time%.2f" %(end-start))
    #sys.exit(0)
	print("="*30+" begining searching the most suitable model..." +"+"*30)
	start=time.time()
	trainModels(train_percentage=0.7,fold=1000,music_csv_file_path=None,model_out_f=None)
	end=time.time()
	print("cost time%.2f" %(end-start))
	
#svm main
import feature
import pandas as pd
import numpy as np
from sklearn.externals import joblib 
import sys
import time

数值化路径="./data/xx.csv"
def load_model(model_f=None):
    if not model_f:
	    model_f=模型保存路径
	clf=joblib.load(model_f):
	return clf


def 歌曲标签数值化(): #fetch_index_label
    #从文件中读取index和label之间的映射关系,并返回dict
	data=pd.read_csv(数值化路径,header=None,encoding="utf-8")
	name_label_list=np.array(data).tolist()
	index_label_dict=dict(map(lambda t:(t[1],t[0],name_label_list))
	return index_label_dict
index_label_dict=歌曲标签数值化()


def predict_labels(clf,X):
    label_index=clf.predict([x])#win7没有方括号跑不了
	label=index_label_dict[label_index[0]]
	return label
	
if __name__=="__main__":
    数值化标签路径="./data/xx.csv"
    模型保存路径=“./data/music_model.pkl"
    clf=load_model()
    parh=
    music_feature=feature.获取歌曲特征
    label=predict_labels(clf,music_feature)
    print("预测标签为:%s"%label)
  
Published 39 original articles · won praise 1 · views 452

Guess you like

Origin blog.csdn.net/qq_40647378/article/details/103791721