一些Python编程小技巧

1，首先命名时统一满足下划线/驼峰等命名规则，不管自己看起来还是别人开起来都会比较顺心，Python建议使用下划线
2，如果代码过长，建议写成类及其他function写到其他py文件里然后import
3，批量修改变量名：可以用sublime等其他编辑器打开，然后用编辑器的 ctrl + alt + F3 进行批量修改
4，在线下模型调优时可以用%%号对代码进行分块，然后用 ctrl + enter 运行局部代码块，比如以下我要看不同参数值下的验证集性能，可以这样子：

from xgboost.sklearn import XGBClassifier  
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.cross_validation import KFold  
import pandas as pd
import numpy as np 
#读取数据，获取训练集
fila_name = 'dotData.xlsx'
excel = pd.read_excel(fila_name , sheetname=None , skiprows  = 500) 
train_set = excel['data'].sample(frac=1 , random_state = 1).values
#获得输入与输出
X = train_set[:,1:-1]
Y = train_set[:,  -1]
#%% 这里有个#%%作为代码分块
if 0 :
    clf = GradientBoostingClassifier( max_depth = 3 )
    #clf = RandomForestClassifier( max_depth = 3 )
else: 
    clf = XGBClassifier( max_depth = 3 )
score = pd.DataFrame(columns = ['accuracy'])
#下面进行交叉验证
for train_index , test_index in KFold( X.shape[0] , n_folds = 8 ):
    clf.fit( X[train_index] , Y[train_index] )
    Y_hat = clf.predict( X[test_index] )
    accuracy = np.mean( Y[test_index] == Y_hat )
    score = score.append( pd.Series(accuracy , index=['accuracy']) , \
                                    ignore_index = True )
print(score)
print("总样本量为\n",X.shape[0])
print("交叉验证中验证集的准确率为：\n",score.mean())

这样分块之后，在调参时，就只需读取一次，后面的就不用再执行了

5，运用注释来解决一些想要保存但暂时不需要执行的代码
6，可以运用if:0 或者某个标记变量也可以做到注释的作用，使用标记变量还能对若干部分代码同时进行“不执行”或者"执行"的标记

一些Python编程小技巧

一些Python编程小技巧

猜你喜欢