In view of the fact that the things I have seen recently are too complicated, I use a single document to record some machine learning projects I usually see.
NNI
https://nni.readthedocs.io/en/latest/FeatureEngineering/Overview.html
TreeBasedClassifier refers to ExtraTrees
SISSO, I don’t know anything, physical material machine learning Fortran?
https://arxiv.org/pdf/1710.03319.pdf
https://github.com/rouyang2017/SISSO
borutaPy can calculate the n_estimators parameter through the depth of the tree
def _get_tree_num(self, n_feat):
depth = None
try:
depth = self.estimator.get_params()['max_depth']
except KeyError:
warnings.warn(
"The estimator does not have a max_depth property, as a result "
" the number of trees to use cannot be estimated automatically."
)
if depth == None:
depth = 10
# how many times a feature should be considered on average
f_repr = 100
# n_feat * 2 because the training matrix is extended with n shadow features
multi = ((n_feat * 2) / (np.sqrt(n_feat * 2) * depth))
n_estimators = int(multi * f_repr)
return n_estimators
List of sklearn evaluation indicators
Achieve KL divergence
import numpy as np
def KL(a, b):
a = np.asarray(a, dtype=np.float)
b = np.asarray(b, dtype=np.float)
return np.sum(np.where(a != 0, a * np.log(a / b), 0))
values1 = [1.346112,1.337432,1.246655]
values2 = [1.033836,1.082015,1.117323]
print KL(values1, values2)