before optimization
Functions with current cProfile over 1 second:
15013142 function calls (14601930 primitive calls) in 27.217 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.047 0.001 27.217 0.633 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
42 0.072 0.002 13.944 0.332 F:\topo\cop_kmeans.py:13(cop_kmeans)
84 0.002 0.000 11.483 0.137 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 11.357 0.135 11.357 0.135 {pandas._libs.algos.nancorr}
23039 0.595 0.000 6.043 0.000 {built-in method builtins.sum}
42 1.213 0.029 5.205 0.124 F:\topo\cop_kmeans.py:197(get_ml_info)
410285/93131 0.486 0.000 4.970 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
24722 0.052 0.000 4.327 0.000 F:\topo\cop_kmeans.py:79(closest_clusters)
24722 0.597 0.000 4.215 0.000 F:\topo\cop_kmeans.py:80(<listcomp>)
7888 0.056 0.000 4.037 0.001 F:\topo\metrics.py:42(pairwise_correlation)
42 0.011 0.000 3.867 0.092 F:\topo\cop_kmeans.py:217(<listcomp>)
14942 0.015 0.000 3.843 0.000 F:\topo\cop_kmeans.py:217(<genexpr>)
74166 0.058 0.000 3.618 0.000 <__array_function__ internals>:177(median)
74166 0.078 0.000 3.499 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:3711(median)
74166 0.163 0.000 3.421 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:3651(_ureduce)
74166 0.459 0.000 3.229 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:3801(_median)
7888 0.030 0.000 2.283 0.000 D:\py\py3.10\lib\site-packages\pandas\core\series.py:2508(corr)
42 0.000 0.000 2.204 0.052 F:\topo\cop_kmeans.py:71(tolerance)
7888 0.049 0.000 1.772 0.000 D:\py\py3.10\lib\site-packages\pandas\core\nanops.py:83(_f)
16272 0.168 0.000 1.750 0.000 D:\py\py3.10\lib\site-packages\pandas\core\series.py:323(__init__)
139 1.709 0.012 1.713 0.012 F:\topo\cop_kmeans.py:153(compute_centers)
7888 0.041 0.000 1.546 0.000 D:\py\py3.10\lib\site-packages\pandas\core\nanops.py:1524(nancorr)
42 0.009 0.000 1.427 0.034 F:\topo\cop_kmeans.py:75(<listcomp>)
7888 0.014 0.000 1.264 0.000 D:\py\py3.10\lib\site-packages\pandas\core\nanops.py:1566(func)
7888 0.009 0.000 1.250 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.097 0.000 1.229 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
1570172 1.119 0.000 1.119 0.000 F:\topo\cop_kmeans.py:75(<genexpr>)
74166 0.069 0.000 1.094 0.000 <__array_function__ internals>:177(mean)
82340 0.377 0.000 1.038 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:162(_mean)
30775 0.139 0.000 1.009 0.000 D:\py\py3.10\lib\site-packages\pandas\core\construction.py:470(sanitize_array)
1st optimization: use numpy.corrcoef
insteadpd.corr
Modify location:
metrics.py > pairwise_correlation()
Before optimization:
def pairwise_correlation(x, y):
return 1 - pd.Series(x).corr(pd.Series(y))
Optimized:
def pairwise_correlation(x, y):
if not isinstance(x, np.ndarray):
x = np.array(x)
if not isinstance(y, np.ndarray):
y = np.array(y)
nan_idx = np.logical_or(np.isnan(x), np.isnan(y))
return 1 - np.corrcoef(x[~nan_idx], y[~nan_idx])[0][1]
Currently, the actual parameter of x is a list type, and the actual parameter of y is a numpy.ndarray type, which has to be isinstance
processed . We will avoid this judgment by adjusting the calling position later.
Functions with current cProfile over 1 second:
11925832 function calls (11554060 primitive calls) in 24.903 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.050 0.001 24.903 0.579 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.002 0.000 11.962 0.142 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 11.822 0.141 11.822 0.141 {pandas._libs.algos.nancorr}
42 0.074 0.002 11.136 0.265 F:\topo\cop_kmeans.py:13(cop_kmeans)
402814/85660 0.457 0.000 4.824 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
24722 0.054 0.000 4.419 0.000 F:\topo\cop_kmeans.py:79(closest_clusters)
24722 0.605 0.000 4.304 0.000 F:\topo\cop_kmeans.py:80(<listcomp>)
74166 0.058 0.000 3.699 0.000 <__array_function__ internals>:177(median)
74166 0.079 0.000 3.578 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:3711(median)
74166 0.164 0.000 3.500 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:3651(_ureduce)
23039 0.576 0.000 3.360 0.000 {built-in method builtins.sum}
74166 0.468 0.000 3.306 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:3801(_median)
42 1.198 0.029 2.503 0.060 F:\topo\cop_kmeans.py:197(get_ml_info)
42 0.000 0.000 2.211 0.053 F:\topo\cop_kmeans.py:71(tolerance)
139 1.778 0.013 1.784 0.013 F:\topo\cop_kmeans.py:153(compute_centers)
42 0.012 0.000 1.456 0.035 F:\topo\cop_kmeans.py:75(<listcomp>)
7888 0.086 0.000 1.228 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.010 0.000 1.179 0.028 F:\topo\cop_kmeans.py:219(<listcomp>)
14942 0.012 0.000 1.159 0.000 F:\topo\cop_kmeans.py:219(<genexpr>)
1570172 1.149 0.000 1.149 0.000 F:\topo\cop_kmeans.py:75(<genexpr>)
74166 0.071 0.000 1.121 0.000 <__array_function__ internals>:177(mean)
7888 0.008 0.000 1.060 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.078 0.000 1.042 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
82340 0.368 0.000 1.031 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:162(_mean)
The second optimization: put multiple np.median
in the loop and merge
Modify location:
cop_kmeans.py > cop_kmeans()
Before optimization:
def closest_clusters(clusters, data_index, distance):
"""计算到每一个类的距离排序"""
distances = [np.median(distance[data_index, cluster]) for cluster in clusters]
return sorted(range(len(distances)), key=lambda x: distances[x]), distances
for i, d in enumerate(dataset):
indices, clusters_distances = closest_clusters(pre_clusters, i, dataset)
pass # 后续逻辑未用到 clusters_distances,未修改 pre_clusters 和 dataset
Optimized:
all_distances = [np.median(dataset[:, cluster], axis=1) for cluster in pre_clusters]
for i, d in enumerate(dataset):
distances = [all_distances[j][i] for j in range(len(pre_clusters))]
indices = sorted(range(len(distances)), key=lambda x: distances[x])
Functions with current cProfile over 1 second:
8019608 function calls (7869083 primitive calls) in 18.432 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.044 0.001 18.432 0.429 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.002 0.000 10.561 0.126 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.441 0.124 10.441 0.124 {pandas._libs.algos.nancorr}
42 0.059 0.001 6.309 0.150 F:\topo\cop_kmeans.py:13(cop_kmeans)
23039 0.538 0.000 3.175 0.000 {built-in method builtins.sum}
42 1.014 0.024 2.240 0.053 F:\topo\cop_kmeans.py:217(get_ml_info)
42 0.000 0.000 2.085 0.050 F:\topo\cop_kmeans.py:91(tolerance)
139 1.679 0.012 1.684 0.012 F:\topo\cop_kmeans.py:173(compute_centers)
42 0.009 0.000 1.359 0.032 F:\topo\cop_kmeans.py:95(<listcomp>)
107818/11911 0.240 0.000 1.186 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
7888 0.077 0.000 1.154 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.009 0.000 1.115 0.027 F:\topo\cop_kmeans.py:237(<listcomp>)
14942 0.011 0.000 1.096 0.000 F:\topo\cop_kmeans.py:237(<genexpr>)
1570172 1.076 0.000 1.076 0.000 F:\topo\cop_kmeans.py:95(<genexpr>)
7888 0.007 0.000 1.000 0.000 <__array_function__ internals>:177(corrcoef)
Optimization No. 3: Use matrix operations instead of Python list comprehensions
Modify location:
cop_kmeans.py > tolerance()
Before optimization:
# taken from scikit-learn (https://goo.gl/1RYPP5)
def tolerance(tol, dataset):
n = len(dataset)
dim = len(dataset[0])
averages = [sum(dataset[i][d] for i in range(n)) / float(n) for d in range(dim)]
variances = [sum((dataset[i][d] - averages[d]) ** 2 for i in range(n)) / float(n) for d in range(dim)]
return tol * sum(variances) / dim
Optimized:
def tolerance(tol, dataset):
return tol * sum(np.var(dataset, axis=0)) / dataset.shape[1]
Functions with current cProfile over 0.5 seconds:
4864701 function calls (4714176 primitive calls) in 16.199 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.043 0.001 16.199 0.377 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.002 0.000 10.434 0.124 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.316 0.123 10.316 0.123 {pandas._libs.algos.nancorr}
42 0.060 0.001 4.211 0.100 F:\topo\cop_kmeans.py:13(cop_kmeans)
42 1.010 0.024 2.177 0.052 F:\topo\cop_kmeans.py:213(get_ml_info)
139 1.720 0.012 1.724 0.012 F:\topo\cop_kmeans.py:169(compute_centers)
107860/11953 0.227 0.000 1.132 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
7888 0.073 0.000 1.093 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.008 0.000 1.051 0.025 F:\topo\cop_kmeans.py:233(<listcomp>)
8097 0.011 0.000 1.045 0.000 {built-in method builtins.sum}
14942 0.010 0.000 1.034 0.000 F:\topo\cop_kmeans.py:233(<genexpr>)
7888 0.007 0.000 0.947 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.070 0.000 0.931 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.008 0.000 0.577 0.000 <__array_function__ internals>:177(cov)
7888 0.149 0.000 0.559 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
2422 0.008 0.000 0.525 0.000 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:587(__init__)
cop_keams
get_ml_info
The specific time of and :
F:\topo\cop_kmeans.py:13(cop_kmeans) -> 40 0.001 0.001 F:\topo\cop_kmeans.py:41(<listcomp>)
139 0.011 0.082 F:\topo\cop_kmeans.py:45(<listcomp>)
24722 0.014 0.014 F:\topo\cop_kmeans.py:47(<listcomp>)
139 0.001 0.077 F:\topo\cop_kmeans.py:66(<listcomp>)
42 0.000 0.008 F:\topo\cop_kmeans.py:91(tolerance)
42 0.001 0.011 F:\topo\cop_kmeans.py:100(initialize_centers)
25346 0.012 0.012 F:\topo\cop_kmeans.py:157(violate_constraints)
139 1.720 1.724 F:\topo\cop_kmeans.py:169(compute_centers)
42 1.010 2.177 F:\topo\cop_kmeans.py:213(get_ml_info)
42 0.006 0.010 F:\topo\cop_kmeans.py:239(transitive_closure)
74971 0.006 0.006 {built-in method builtins.len}
24722 0.021 0.029 {built-in method builtins.sorted}
139 0.000 0.000 {built-in method builtins.sum}
F:\topo\cop_kmeans.py:213(get_ml_info) -> 42 0.005 0.005 F:\topo\cop_kmeans.py:225(<listcomp>)
42 0.008 1.051 F:\topo\cop_kmeans.py:233(<listcomp>)
1562953 0.110 0.110 {built-in method builtins.len}
7471 0.001 0.001 {method 'append' of 'list' objects}
The fourth optimization: move the logic executed multiple times in the loop to a single execution outside the loop
Modify location:
cop_kmeans.py > get_ml_info()
Before optimization:
for j, group in enumerate(groups):
for d in range(dim):
for i in group:
centroids[j][d] += dataset[i][d]
centroids[j][d] /= float(len(group))
Optimized:
for j, group in enumerate(groups):
n_group = float(len(group))
for d in range(dim):
for i in group:
centroids[j][d] += dataset[i][d]
centroids[j][d] /= n_group
Functions with current cProfile over 0.5 seconds:
3309815 function calls (3159204 primitive calls) in 15.745 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.042 0.001 15.745 0.366 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.002 0.000 10.381 0.124 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.262 0.122 10.262 0.122 {pandas._libs.algos.nancorr}
42 0.056 0.001 3.764 0.090 F:\topo\cop_kmeans.py:13(cop_kmeans)
42 0.778 0.019 1.838 0.044 F:\topo\cop_kmeans.py:213(get_ml_info)
139 1.626 0.012 1.631 0.012 F:\topo\cop_kmeans.py:169(compute_centers)
107860/11953 0.229 0.000 1.139 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
7888 0.073 0.000 1.093 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.008 0.000 1.053 0.025 F:\topo\cop_kmeans.py:234(<listcomp>)
8097 0.010 0.000 1.047 0.000 {built-in method builtins.sum}
14942 0.010 0.000 1.036 0.000 F:\topo\cop_kmeans.py:234(<genexpr>)
7888 0.007 0.000 0.947 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.071 0.000 0.931 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.008 0.000 0.577 0.000 <__array_function__ internals>:177(cov)
7888 0.148 0.000 0.559 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
2422 0.008 0.000 0.520 0.000 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:587(__init__)
cop_keams
get_ml_info
The specific time of and :
F:\topo\cop_kmeans.py:13(cop_kmeans) -> 40 0.001 0.001 F:\topo\cop_kmeans.py:41(<listcomp>)
139 0.011 0.079 F:\topo\cop_kmeans.py:45(<listcomp>)
24722 0.013 0.013 F:\topo\cop_kmeans.py:47(<listcomp>)
139 0.001 0.074 F:\topo\cop_kmeans.py:66(<listcomp>)
42 0.000 0.007 F:\topo\cop_kmeans.py:91(tolerance)
42 0.001 0.012 F:\topo\cop_kmeans.py:100(initialize_centers)
25346 0.008 0.008 F:\topo\cop_kmeans.py:157(violate_constraints)
139 1.626 1.631 F:\topo\cop_kmeans.py:169(compute_centers)
42 0.778 1.838 F:\topo\cop_kmeans.py:213(get_ml_info)
42 0.007 0.011 F:\topo\cop_kmeans.py:240(transitive_closure)
74971 0.006 0.006 {built-in method builtins.len}
24722 0.018 0.027 {built-in method builtins.sorted}
139 0.000 0.000 {built-in method builtins.sum}
F:\topo\cop_kmeans.py:213(get_ml_info) -> 42 0.005 0.005 F:\topo\cop_kmeans.py:225(<listcomp>)
42 0.008 1.053 F:\topo\cop_kmeans.py:234(<listcomp>)
7723 0.001 0.001 {built-in method builtins.len}
7471 0.001 0.001 {method 'append' of 'list' objects}
5th optimization: Use collection merge instead of looping to mark boolean arrays (tested 10% optimization, applied insignificantly)
Modify location:
cop_kmeans.py > get_ml_info()
Before optimization:
flags = [True] * n_dataset
groups = []
for i in range(n_dataset):
if not flags[i]:
continue
group = list(ml[i] | {
i})
groups.append(group)
for j in group:
flags[j] = False
Optimized:
visited = set()
groups = []
for i in range(n_dataset):
if i in visited:
continue
temp = ml[i] | {
i}
groups.append(list(temp))
visited |= temp
Functions with current cProfile over 0.5 seconds:
3309387 function calls (3158862 primitive calls) in 16.126 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.043 0.001 16.126 0.375 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.002 0.000 10.693 0.127 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.574 0.126 10.574 0.126 {pandas._libs.algos.nancorr}
42 0.059 0.001 3.830 0.091 F:\topo\cop_kmeans.py:13(cop_kmeans)
42 0.745 0.018 1.814 0.043 F:\topo\cop_kmeans.py:213(get_ml_info)
139 1.707 0.012 1.712 0.012 F:\topo\cop_kmeans.py:169(compute_centers)
107860/11953 0.230 0.000 1.142 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
7888 0.074 0.000 1.103 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.008 0.000 1.063 0.025 F:\topo\cop_kmeans.py:233(<listcomp>)
8097 0.011 0.000 1.057 0.000 {built-in method builtins.sum}
14942 0.010 0.000 1.046 0.000 F:\topo\cop_kmeans.py:233(<genexpr>)
7888 0.007 0.000 0.953 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.071 0.000 0.937 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.008 0.000 0.579 0.000 <__array_function__ internals>:177(cov)
7888 0.149 0.000 0.561 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
2422 0.008 0.000 0.558 0.000 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:587(__init__)
164 0.003 0.000 0.519 0.003 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:425(dict_to_mgr)
cop_keams
get_ml_info
The specific time of and :
F:\topo\cop_kmeans.py:13(cop_kmeans) -> 40 0.001 0.001 F:\topo\cop_kmeans.py:41(<listcomp>)
139 0.011 0.081 F:\topo\cop_kmeans.py:45(<listcomp>)
24722 0.014 0.014 F:\topo\cop_kmeans.py:47(<listcomp>)
139 0.001 0.076 F:\topo\cop_kmeans.py:66(<listcomp>)
42 0.000 0.008 F:\topo\cop_kmeans.py:91(tolerance)
42 0.001 0.012 F:\topo\cop_kmeans.py:100(initialize_centers)
25346 0.009 0.009 F:\topo\cop_kmeans.py:157(violate_constraints)
139 1.707 1.712 F:\topo\cop_kmeans.py:169(compute_centers)
42 0.745 1.814 F:\topo\cop_kmeans.py:213(get_ml_info)
42 0.006 0.010 F:\topo\cop_kmeans.py:239(transitive_closure)
74971 0.006 0.006 {built-in method builtins.len}
24722 0.020 0.028 {built-in method builtins.sorted}
139 0.000 0.000 {built-in method builtins.sum}
F:\topo\cop_kmeans.py:213(get_ml_info) -> 42 0.005 0.005 F:\topo\cop_kmeans.py:224(<listcomp>)
42 0.008 1.063 F:\topo\cop_kmeans.py:233(<listcomp>)
7639 0.001 0.001 {built-in method builtins.len}
7471 0.001 0.001 {method 'append' of 'list' objects}
The sixth optimization: use numpy matrix calculation instead of Python loop
Continue to use matrix calculations, and avoid the need to convert lists to matrices later.
Modify location:
cop_kmeans.py > get_ml_info()
Before optimization:
dim = len(dataset[0])
centroids = [[0.0] * dim for i in range(len(groups))]
for j, group in enumerate(groups):
n_group = float(len(group))
for d in range(dim):
for i in group:
centroids[j][d] += dataset[i][d]
centroids[j][d] /= n_group
Optimized:
dim = len(dataset[0])
centroids = np.zeros((len(groups), dim), dtype=np.float64)
for j, group in enumerate(groups):
for i in group:
new_centroids[j, :] += dataset[i, :]
centroids[j, :] /= len(group)
Functions with current cProfile over 0.5 seconds:
3301916 function calls (3151391 primitive calls) in 15.182 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.032 0.001 15.182 0.353 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.002 0.000 10.734 0.128 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.618 0.126 10.618 0.126 {pandas._libs.algos.nancorr}
42 0.057 0.001 2.924 0.070 F:\topo\cop_kmeans.py:13(cop_kmeans)
139 1.617 0.012 1.622 0.012 F:\topo\cop_kmeans.py:169(compute_centers)
107860/11953 0.226 0.000 1.125 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
7888 0.070 0.000 1.014 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.034 0.001 1.011 0.024 F:\topo\cop_kmeans.py:213(get_ml_info)
42 0.008 0.000 0.975 0.023 F:\topo\cop_kmeans.py:230(<listcomp>)
8097 0.010 0.000 0.969 0.000 {built-in method builtins.sum}
14942 0.010 0.000 0.958 0.000 F:\topo\cop_kmeans.py:230(<genexpr>)
7888 0.007 0.000 0.942 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.071 0.000 0.927 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.007 0.000 0.572 0.000 <__array_function__ internals>:177(cov)
7888 0.149 0.000 0.554 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
2422 0.007 0.000 0.520 0.000 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:587(__init__)
cop_keams
, get_ml_info
, compute_centers
specific time:
F:\topo\cop_kmeans.py:13(cop_kmeans) -> 40 0.000 0.000 F:\topo\cop_kmeans.py:41(<listcomp>)
139 0.010 0.078 F:\topo\cop_kmeans.py:45(<listcomp>)
24722 0.013 0.013 F:\topo\cop_kmeans.py:47(<listcomp>)
139 0.001 0.074 F:\topo\cop_kmeans.py:66(<listcomp>)
42 0.000 0.008 F:\topo\cop_kmeans.py:91(tolerance)
42 0.001 0.011 F:\topo\cop_kmeans.py:100(initialize_centers)
25346 0.008 0.008 F:\topo\cop_kmeans.py:157(violate_constraints)
139 1.617 1.622 F:\topo\cop_kmeans.py:169(compute_centers)
42 0.034 1.011 F:\topo\cop_kmeans.py:213(get_ml_info)
42 0.006 0.010 F:\topo\cop_kmeans.py:236(transitive_closure)
74971 0.006 0.006 {built-in method builtins.len}
24722 0.019 0.027 {built-in method builtins.sorted}
139 0.000 0.000 {built-in method builtins.sum}
F:\topo\cop_kmeans.py:169(compute_centers) -> 139 0.002 0.002 F:\topo\cop_kmeans.py:173(<listcomp>)
139 0.000 0.000 F:\topo\cop_kmeans.py:176(<listcomp>)
139 0.000 0.000 F:\topo\cop_kmeans.py:203(<listcomp>)
417 0.000 0.000 {built-in method builtins.len}
24722 0.002 0.002 {method 'append' of 'list' objects}
F:\topo\cop_kmeans.py:213(get_ml_info) -> 42 0.008 0.975 F:\topo\cop_kmeans.py:230(<listcomp>)
7639 0.001 0.001 {built-in method builtins.len}
42 0.001 0.001 {built-in method numpy.zeros}
7471 0.001 0.001 {method 'append' of 'list' objects}
The seventh optimization: use numpy matrix calculation and collections.Counter
instead of Python loop
Continue to use matrix calculations, and avoid the need to convert lists to matrices later.
Modify location:
cop_kmeans.py > get_ml_info()
Before optimization:
dim = len(dataset[0])
centers = [[0.0] * dim for i in range(k)]
counts = [0] * k_new # counts 仅下方一处用途
for j, c in enumerate(clusters):
for i in range(dim):
centers[c][i] += dataset[j][i]
counts[c] += 1
for j in range(k_new):
for i in range(dim):
centers[j][i] = centers[j][i] / float(counts[j])
Optimized:
dim = len(dataset[0])
centers = np.zeros((k,dim), dtype=np.float64)
counts = Counter(clusters)
for j, c in enumerate(clusters):
centers[c, :] += dataset[j, :]
for j in range(k_new):
centers[j, :] /= counts[j]
Functions with current cProfile over 0.5 seconds:
3303306 function calls (3152781 primitive calls) in 13.225 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.031 0.001 13.225 0.308 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.001 0.000 10.428 0.124 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.314 0.123 10.314 0.123 {pandas._libs.algos.nancorr}
42 0.053 0.001 1.303 0.031 F:\topo\cop_kmeans.py:14(cop_kmeans)
107860/11953 0.224 0.000 1.079 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
42 0.035 0.001 0.968 0.023 F:\topo\cop_kmeans.py:214(get_ml_info)
7888 0.066 0.000 0.962 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.008 0.000 0.930 0.022 F:\topo\cop_kmeans.py:231(<listcomp>)
8097 0.010 0.000 0.924 0.000 {built-in method builtins.sum}
14942 0.010 0.000 0.914 0.000 F:\topo\cop_kmeans.py:231(<genexpr>)
7888 0.007 0.000 0.894 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.065 0.000 0.880 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.007 0.000 0.543 0.000 <__array_function__ internals>:177(cov)
7888 0.139 0.000 0.526 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
2422 0.007 0.000 0.510 0.000 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:587(__init__)
cop_keams
, get_ml_info
, compute_centers
specific time:
F:\topo\cop_kmeans.py:14(cop_kmeans) -> 40 0.000 0.000 F:\topo\cop_kmeans.py:42(<listcomp>)
139 0.009 0.076 F:\topo\cop_kmeans.py:46(<listcomp>)
24722 0.013 0.013 F:\topo\cop_kmeans.py:48(<listcomp>)
139 0.001 0.059 F:\topo\cop_kmeans.py:67(<listcomp>)
42 0.000 0.008 F:\topo\cop_kmeans.py:92(tolerance)
42 0.001 0.010 F:\topo\cop_kmeans.py:101(initialize_centers)
25346 0.008 0.008 F:\topo\cop_kmeans.py:158(violate_constraints)
139 0.051 0.065 F:\topo\cop_kmeans.py:170(compute_centers)
42 0.035 0.968 F:\topo\cop_kmeans.py:214(get_ml_info)
42 0.007 0.010 F:\topo\cop_kmeans.py:237(transitive_closure)
74971 0.006 0.006 {built-in method builtins.len}
24722 0.019 0.026 {built-in method builtins.sorted}
139 0.000 0.000 {built-in method builtins.sum}
F:\topo\cop_kmeans.py:214(get_ml_info) -> 42 0.008 0.930 F:\topo\cop_kmeans.py:231(<listcomp>)
7639 0.001 0.001 {built-in method builtins.len}
42 0.001 0.001 {built-in method numpy.zeros}
7471 0.001 0.001 {method 'append' of 'list' objects}
F:\topo\cop_kmeans.py:170(compute_centers) -> 139 0.000 0.002 D:\py\py3.10\lib\collections\__init__.py:565(__init__)
139 0.001 0.001 F:\topo\cop_kmeans.py:177(<listcomp>)
139 0.000 0.000 F:\topo\cop_kmeans.py:204(<listcomp>)
556 0.000 0.000 {built-in method builtins.len}
417 0.008 0.008 {built-in method builtins.print}
139 0.000 0.000 {built-in method numpy.zeros}
24722 0.002 0.002 {method 'append' of 'list' objects}
The eighth optimization: Move the logic executed multiple times in the loop to a single execution outside the loop
Modify location:
cop_kmeans.py > cop_kmeans()
before optimization
all_distances = [np.median(dataset[:, cluster], axis=1) for cluster in pre_clusters]
for i, d in enumerate(dataset):
distances = [all_distances[j][i] for j in range(len(pre_clusters))]
indices = sorted(range(len(distances)), key=lambda x: distances[x])
counter = 0
if clusters_[i] == -1:
found_cluster = False
while (not found_cluster) and counter < len(indices):
pass # 逻辑中没有修改 pre_clusters
pass # 逻辑中没有修改 pre_clusters
Optimized:
all_distances = [np.median(dataset[:, cluster], axis=1) for cluster in pre_clusters]
n_cluster = len(pre_clusters)
for i, d in enumerate(dataset):
distances = [all_distances[j][i] for j in range(n_cluster)]
indices = sorted(range(n_cluster), key=lambda x: distances[x])
counter = 0
if clusters_[i] == -1:
found_cluster = False
while (not found_cluster) and counter < n_cluster:
pass
pass
Functions where the current cProfile exceeds 0.1 seconds:
3228099 function calls (3077574 primitive calls) in 13.317 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.032 0.001 13.317 0.310 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.002 0.000 10.442 0.124 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.317 0.123 10.317 0.123 {pandas._libs.algos.nancorr}
42 0.044 0.001 1.309 0.031 F:\topo\cop_kmeans.py:14(cop_kmeans)
107860/11953 0.231 0.000 1.108 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
42 0.037 0.001 0.990 0.024 F:\topo\cop_kmeans.py:212(get_ml_info)
7888 0.068 0.000 0.985 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.008 0.000 0.951 0.023 F:\topo\cop_kmeans.py:229(<listcomp>)
8097 0.010 0.000 0.945 0.000 {built-in method builtins.sum}
14942 0.010 0.000 0.934 0.000 F:\topo\cop_kmeans.py:229(<genexpr>)
7888 0.007 0.000 0.915 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.066 0.000 0.901 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.007 0.000 0.555 0.000 <__array_function__ internals>:177(cov)
7888 0.141 0.000 0.538 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
2422 0.008 0.000 0.528 0.000 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:587(__init__)
164 0.003 0.000 0.489 0.003 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:425(dict_to_mgr)
164 0.001 0.000 0.377 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:102(arrays_to_mgr)
164 0.022 0.000 0.313 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:596(_homogenize)
7888 0.007 0.000 0.293 0.000 <__array_function__ internals>:177(average)
684 0.005 0.000 0.289 0.000 D:\py\py3.10\lib\site-packages\pandas\core\internals\managers.py:253(apply)
14999 0.035 0.000 0.280 0.000 D:\py\py3.10\lib\site-packages\pandas\core\construction.py:470(sanitize_array)
7888 0.029 0.000 0.279 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:395(average)
86 0.000 0.000 0.253 0.003 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:3630(__setitem__)
86 0.001 0.000 0.253 0.003 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:3749(_setitem_frame)
86 0.006 0.000 0.236 0.003 D:\py\py3.10\lib\site-packages\pandas\core\generic.py:9027(_where)
7888 0.007 0.000 0.202 0.000 <__array_function__ internals>:177(clip)
7888 0.008 0.000 0.188 0.000 D:\py\py3.10\lib\site-packages\numpy\core\fromnumeric.py:2083(clip)
8094 0.008 0.000 0.181 0.000 D:\py\py3.10\lib\site-packages\numpy\core\fromnumeric.py:51(_wrapfunc)
669066 0.116 0.000 0.174 0.000 {built-in method builtins.isinstance}
43 0.004 0.000 0.171 0.004 F:\topo\cop_kmeans.py:78(make_constraint)
7888 0.006 0.000 0.170 0.000 {method 'clip' of 'numpy.ndarray' objects}
14052 0.035 0.000 0.168 0.000 D:\py\py3.10\lib\site-packages\pandas\core\dtypes\cast.py:1873(construct_1d_arraylike_from_scalar)
7888 0.020 0.000 0.165 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:125(_clip)
240 0.002 0.000 0.159 0.001 D:\py\py3.10\lib\site-packages\pandas\core\generic.py:5048(filter)
8591 0.066 0.000 0.143 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:162(_mean)
8174 0.006 0.000 0.141 0.000 {method 'mean' of 'numpy.ndarray' objects}
86 0.000 0.000 0.141 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\managers.py:339(putmask)
86 0.001 0.000 0.125 0.001 D:\py\py3.10\lib\site-packages\pandas\core\internals\blocks.py:963(putmask)
86 0.001 0.000 0.120 0.001 D:\py\py3.10\lib\site-packages\pandas\core\array_algos\putmask.py:116(putmask_without_repeat)
376 0.039 0.000 0.110 0.000 D:\py\py3.10\lib\site-packages\pandas\core\internals\managers.py:1541(as_array)
8093 0.006 0.000 0.109 0.000 <__array_function__ internals>:177(broadcast_to)
84 0.000 0.000 0.109 0.001 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:1693(to_numpy)
86 0.000 0.000 0.107 0.001 <__array_function__ internals>:177(putmask)
cop_keams
The specific time:
F:\topo\cop_kmeans.py:14(cop_kmeans) -> 40 0.001 0.001 F:\topo\cop_kmeans.py:42(<listcomp>)
139 0.008 0.076 F:\topo\cop_kmeans.py:46(<listcomp>)
24722 0.013 0.013 F:\topo\cop_kmeans.py:49(<listcomp>)
139 0.001 0.062 F:\topo\cop_kmeans.py:68(<listcomp>)
42 0.000 0.008 F:\topo\cop_kmeans.py:93(tolerance)
42 0.001 0.010 F:\topo\cop_kmeans.py:102(initialize_centers)
25346 0.008 0.008 F:\topo\cop_kmeans.py:159(violate_constraints)
139 0.054 0.060 F:\topo\cop_kmeans.py:171(compute_centers)
42 0.037 0.990 F:\topo\cop_kmeans.py:212(get_ml_info)
42 0.007 0.011 F:\topo\cop_kmeans.py:235(transitive_closure)
320 0.000 0.000 {built-in method builtins.len}
24722 0.019 0.027 {built-in method builtins.sorted}
139 0.000 0.000 {built-in method builtins.sum}
The 9th optimization: remove the increase in the 1st optimization isinstance
(only 15776 times, the remaining 653290 times should be other function calls)
In Optimization 6, it has been guaranteed that both parameter inputs are np.ndarray
.
Modify location:
metrics.py > pairwise_correlation()
Before optimization:
def pairwise_correlation(x, y):
"""
Use pandas to ignore nan
"""
if not isinstance(x, np.ndarray):
x = np.array(x)
if not isinstance(y, np.ndarray):
y = np.array(y)
nan_idx = np.logical_or(np.isnan(x), np.isnan(y))
return 1 - np.corrcoef(x[~nan_idx], y[~nan_idx])[0][1]
Optimized:
def pairwise_correlation(x, y):
"""
Use pandas to ignore nan
"""
nan_idx = np.logical_or(np.isnan(x), np.isnan(y))
return 1 - np.corrcoef(x[~nan_idx], y[~nan_idx])[0][1]
Functions where the current cProfile exceeds 0.1 seconds:
3212323 function calls (3061798 primitive calls) in 13.013 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.030 0.001 13.013 0.303 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.001 0.000 10.240 0.122 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.127 0.121 10.127 0.121 {pandas._libs.algos.nancorr}
42 0.043 0.001 1.287 0.031 F:\topo\cop_kmeans.py:14(cop_kmeans)
107860/11953 0.222 0.000 1.091 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
42 0.034 0.001 0.973 0.023 F:\topo\cop_kmeans.py:213(get_ml_info)
7888 0.061 0.000 0.970 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.008 0.000 0.937 0.022 F:\topo\cop_kmeans.py:230(<listcomp>)
8097 0.010 0.000 0.931 0.000 {built-in method builtins.sum}
14942 0.010 0.000 0.920 0.000 F:\topo\cop_kmeans.py:230(<genexpr>)
7888 0.007 0.000 0.908 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.066 0.000 0.894 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.007 0.000 0.550 0.000 <__array_function__ internals>:177(cov)
7888 0.140 0.000 0.533 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
2422 0.007 0.000 0.507 0.000 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:587(__init__)
164 0.003 0.000 0.470 0.003 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:425(dict_to_mgr)
164 0.001 0.000 0.364 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:102(arrays_to_mgr)
164 0.022 0.000 0.305 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:596(_homogenize)
7888 0.007 0.000 0.291 0.000 <__array_function__ internals>:177(average)
7888 0.029 0.000 0.277 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:395(average)
14999 0.034 0.000 0.273 0.000 D:\py\py3.10\lib\site-packages\pandas\core\construction.py:470(sanitize_array)
684 0.005 0.000 0.269 0.000 D:\py\py3.10\lib\site-packages\pandas\core\internals\managers.py:253(apply)
86 0.001 0.000 0.244 0.003 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:3630(__setitem__)
86 0.001 0.000 0.243 0.003 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:3749(_setitem_frame)
86 0.006 0.000 0.226 0.003 D:\py\py3.10\lib\site-packages\pandas\core\generic.py:9027(_where)
7888 0.007 0.000 0.200 0.000 <__array_function__ internals>:177(clip)
7888 0.008 0.000 0.185 0.000 D:\py\py3.10\lib\site-packages\numpy\core\fromnumeric.py:2083(clip)
8094 0.007 0.000 0.179 0.000 D:\py\py3.10\lib\site-packages\numpy\core\fromnumeric.py:51(_wrapfunc)
7888 0.006 0.000 0.168 0.000 {method 'clip' of 'numpy.ndarray' objects}
653290 0.111 0.000 0.168 0.000 {built-in method builtins.isinstance}
43 0.003 0.000 0.166 0.004 F:\topo\cop_kmeans.py:78(make_constraint)
14052 0.033 0.000 0.163 0.000 D:\py\py3.10\lib\site-packages\pandas\core\dtypes\cast.py:1873(construct_1d_arraylike_from_scalar)
7888 0.019 0.000 0.162 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:125(_clip)
240 0.002 0.000 0.154 0.001 D:\py\py3.10\lib\site-packages\pandas\core\generic.py:5048(filter)
8591 0.065 0.000 0.141 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:162(_mean)
8174 0.006 0.000 0.140 0.000 {method 'mean' of 'numpy.ndarray' objects}
86 0.000 0.000 0.131 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\managers.py:339(putmask)
86 0.001 0.000 0.116 0.001 D:\py\py3.10\lib\site-packages\pandas\core\internals\blocks.py:963(putmask)
86 0.001 0.000 0.111 0.001 D:\py\py3.10\lib\site-packages\pandas\core\array_algos\putmask.py:116(putmask_without_repeat)
8093 0.006 0.000 0.108 0.000 <__array_function__ internals>:177(broadcast_to)
cop_keams
The specific time:
F:\topo\cop_kmeans.py:14(cop_kmeans) -> 40 0.001 0.001 F:\topo\cop_kmeans.py:42(<listcomp>)
139 0.008 0.076 F:\topo\cop_kmeans.py:46(<listcomp>)
24722 0.013 0.013 F:\topo\cop_kmeans.py:49(<listcomp>)
139 0.001 0.062 F:\topo\cop_kmeans.py:68(<listcomp>)
42 0.000 0.008 F:\topo\cop_kmeans.py:93(tolerance)
42 0.001 0.010 F:\topo\cop_kmeans.py:102(initialize_centers)
25346 0.008 0.008 F:\topo\cop_kmeans.py:159(violate_constraints)
139 0.054 0.060 F:\topo\cop_kmeans.py:171(compute_centers)
42 0.037 0.990 F:\topo\cop_kmeans.py:212(get_ml_info)
42 0.007 0.011 F:\topo\cop_kmeans.py:235(transitive_closure)
320 0.000 0.000 {built-in method builtins.len}
24722 0.019 0.027 {built-in method builtins.sorted}
139 0.000 0.000 {built-in method builtins.sum}
The 10th optimization: complex logic simplification (not significant)
Modify location:
cop_kmeans.py > compute_centers()
Before optimization:
new_clusters = [[] for _ in range(k)]
for i in range(len(clusters)):
for j in range(k):
if clusters[i] == j:
new_clusters[j].append(i)
break
Optimized:
new_clusters = [[] for _ in range(k)]
for i in range(len(clusters)):
if clusters[i] < k:
new_clusters[clusters[i]].append(i)
Functions where the current cProfile exceeds 0.1 seconds:
3212323 function calls (3061798 primitive calls) in 12.903 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.030 0.001 12.903 0.300 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.001 0.000 10.209 0.122 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.098 0.120 10.098 0.120 {pandas._libs.algos.nancorr}
42 0.039 0.001 1.251 0.030 F:\topo\cop_kmeans.py:14(cop_kmeans)
107860/11953 0.216 0.000 1.067 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
42 0.035 0.001 0.961 0.023 F:\topo\cop_kmeans.py:207(get_ml_info)
7888 0.062 0.000 0.954 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.008 0.000 0.924 0.022 F:\topo\cop_kmeans.py:224(<listcomp>)
8097 0.010 0.000 0.918 0.000 {built-in method builtins.sum}
14942 0.010 0.000 0.908 0.000 F:\topo\cop_kmeans.py:224(<genexpr>)
7888 0.007 0.000 0.892 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.065 0.000 0.877 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.007 0.000 0.544 0.000 <__array_function__ internals>:177(cov)
7888 0.141 0.000 0.527 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
2422 0.007 0.000 0.485 0.000 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:587(__init__)
164 0.003 0.000 0.450 0.003 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:425(dict_to_mgr)
164 0.001 0.000 0.350 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:102(arrays_to_mgr)
164 0.021 0.000 0.292 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:596(_homogenize)
7888 0.007 0.000 0.285 0.000 <__array_function__ internals>:177(average)
7888 0.028 0.000 0.272 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:395(average)
684 0.005 0.000 0.262 0.000 D:\py\py3.10\lib\site-packages\pandas\core\internals\managers.py:253(apply)
14999 0.032 0.000 0.261 0.000 D:\py\py3.10\lib\site-packages\pandas\core\construction.py:470(sanitize_array)
86 0.000 0.000 0.233 0.003 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:3630(__setitem__)
86 0.001 0.000 0.232 0.003 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:3749(_setitem_frame)
86 0.006 0.000 0.216 0.003 D:\py\py3.10\lib\site-packages\pandas\core\generic.py:9027(_where)
7888 0.007 0.000 0.194 0.000 <__array_function__ internals>:177(clip)
7888 0.007 0.000 0.180 0.000 D:\py\py3.10\lib\site-packages\numpy\core\fromnumeric.py:2083(clip)
8094 0.008 0.000 0.174 0.000 D:\py\py3.10\lib\site-packages\numpy\core\fromnumeric.py:51(_wrapfunc)
43 0.003 0.000 0.166 0.004 F:\topo\cop_kmeans.py:75(make_constraint)
7888 0.006 0.000 0.163 0.000 {method 'clip' of 'numpy.ndarray' objects}
653290 0.106 0.000 0.160 0.000 {built-in method builtins.isinstance}
14052 0.031 0.000 0.158 0.000 D:\py\py3.10\lib\site-packages\pandas\core\dtypes\cast.py:1873(construct_1d_arraylike_from_scalar)
7888 0.019 0.000 0.157 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:125(_clip)
240 0.001 0.000 0.155 0.001 D:\py\py3.10\lib\site-packages\pandas\core\generic.py:5048(filter)
8591 0.063 0.000 0.138 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:162(_mean)
8174 0.006 0.000 0.137 0.000 {method 'mean' of 'numpy.ndarray' objects}
86 0.000 0.000 0.128 0.001 D:\py\py3.10\lib\site-packages\pandas\core\internals\managers.py:339(putmask)
86 0.001 0.000 0.113 0.001 D:\py\py3.10\lib\site-packages\pandas\core\internals\blocks.py:963(putmask)
86 0.000 0.000 0.109 0.001 D:\py\py3.10\lib\site-packages\pandas\core\array_algos\putmask.py:116(putmask_without_repeat)
8093 0.006 0.000 0.106 0.000 <__array_function__ internals>:177(broadcast_to)
cop_keams
The specific time:
ncalls tottime cumtime
F:\topo\cop_kmeans.py:14(cop_kmeans) -> 40 0.000 0.000 F:\topo\cop_kmeans.py:39(<listcomp>)
139 0.008 0.072 F:\topo\cop_kmeans.py:43(<listcomp>)
24722 0.012 0.012 F:\topo\cop_kmeans.py:46(<listcomp>)
139 0.001 0.057 F:\topo\cop_kmeans.py:65(<listcomp>)
42 0.000 0.007 F:\topo\cop_kmeans.py:90(tolerance)
42 0.001 0.010 F:\topo\cop_kmeans.py:99(initialize_centers)
25346 0.007 0.007 F:\topo\cop_kmeans.py:156(violate_constraints)
139 0.045 0.050 F:\topo\cop_kmeans.py:168(compute_centers)
42 0.035 0.961 F:\topo\cop_kmeans.py:207(get_ml_info)
42 0.006 0.010 F:\topo\cop_kmeans.py:230(transitive_closure)
320 0.000 0.000 {built-in method builtins.len}
24722 0.017 0.025 {built-in method builtins.sorted}
139 0.000 0.000 {built-in method builtins.sum}
After the first round of optimization
All functions with a cumulative running time of more than 0.5 seconds:
3212323 function calls (3061798 primitive calls) in 12.903 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
43 0.030 0.001 12.903 0.300 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
84 0.001 0.000 10.209 0.122 D:\py\py3.10\lib\site-packages\pandas\core\frame.py:9486(corr)
84 10.098 0.120 10.098 0.120 {pandas._libs.algos.nancorr}
42 0.039 0.001 1.251 0.030 F:\topo\cop_kmeans.py:14(cop_kmeans)
107860/11953 0.216 0.000 1.067 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
42 0.035 0.001 0.961 0.023 F:\topo\cop_kmeans.py:207(get_ml_info)
7888 0.062 0.000 0.954 0.000 F:\topo\metrics.py:42(pairwise_correlation)
42 0.008 0.000 0.924 0.022 F:\topo\cop_kmeans.py:224(<listcomp>)
8097 0.010 0.000 0.918 0.000 {built-in method builtins.sum}
14942 0.010 0.000 0.908 0.000 F:\topo\cop_kmeans.py:224(<genexpr>)
7888 0.007 0.000 0.892 0.000 <__array_function__ internals>:177(corrcoef)
7888 0.065 0.000 0.877 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
7888 0.007 0.000 0.544 0.000 <__array_function__ internals>:177(cov)
7888 0.141 0.000 0.527 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
Functions that currently have more than 0.02 seconds left to run themselves:
3212323 function calls (3061798 primitive calls) in 12.903 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
84 10.098 0.120 10.098 0.120 {pandas._libs.algos.nancorr}
107860/11953 0.216 0.000 1.067 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
7888 0.141 0.000 0.527 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
653290 0.106 0.000 0.160 0.000 {built-in method builtins.isinstance}
7888 0.065 0.000 0.877 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2689(corrcoef)
8767 0.065 0.000 0.065 0.000 {method 'copy' of 'numpy.ndarray' objects}
11745 0.064 0.000 0.064 0.000 {method 'reduce' of 'numpy.ufunc' objects}
8591 0.063 0.000 0.138 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:162(_mean)
7888 0.062 0.000 0.954 0.000 F:\topo\metrics.py:42(pairwise_correlation)
8093 0.057 0.000 0.083 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\stride_tricks.py:339(_broadcast_to)
43 0.047 0.001 0.049 0.001 D:\py\py3.10\lib\site-packages\pandas\core\algorithms.py:1551(diff)
139 0.045 0.000 0.050 0.000 F:\topo\cop_kmeans.py:168(compute_centers)
15776 0.042 0.000 0.092 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:91(_clip_dep_is_scalar_nan)
42 0.039 0.001 1.251 0.030 F:\topo\cop_kmeans.py:14(cop_kmeans)
7888 0.036 0.000 0.036 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:106(_clip_dep_invoke_with_casting)
42 0.035 0.001 0.961 0.023 F:\topo\cop_kmeans.py:207(get_ml_info)
134352 0.034 0.000 0.048 0.000 D:\py\py3.10\lib\site-packages\pandas\core\dtypes\generic.py:43(_check)
376 0.034 0.000 0.098 0.000 D:\py\py3.10\lib\site-packages\pandas\core\internals\managers.py:1541(as_array)
14052 0.033 0.000 0.052 0.000 D:\py\py3.10\lib\site-packages\pandas\core\dtypes\cast.py:691(infer_dtype_from_scalar)
14999 0.032 0.000 0.261 0.000 D:\py\py3.10\lib\site-packages\pandas\core\construction.py:470(sanitize_array)
417 0.032 0.000 0.032 0.000 {method 'partition' of 'numpy.ndarray' objects}
14052 0.031 0.000 0.158 0.000 D:\py\py3.10\lib\site-packages\pandas\core\dtypes\cast.py:1873(construct_1d_arraylike_from_scalar)
469 0.031 0.000 0.063 0.000 D:\py\py3.10\lib\site-packages\pandas\core\internals\blocks.py:396(apply)
39787 0.030 0.000 0.030 0.000 {built-in method numpy.array}
43 0.030 0.001 12.903 0.300 F:\topo\semi_supervised_cluster.py:13(semi_supervised_cluster)
7888 0.028 0.000 0.272 0.000 D:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:395(average)
8633 0.027 0.000 0.031 0.000 D:\py\py3.10\lib\site-packages\numpy\core\_methods.py:66(_count_reduce_items)
159598/120277 0.026 0.000 0.038 0.000 {built-in method builtins.len}
247 0.024 0.000 0.077 0.000 D:\py\py3.10\lib\site-packages\scipy\cluster\hierarchy.py:1400(to_tree)
1259 0.022 0.000 0.030 0.000 {pandas._libs.lib.maybe_convert_objects}
16022 0.022 0.000 0.031 0.000 D:\py\py3.10\lib\site-packages\numpy\core\fromnumeric.py:3164(ndim)
164 0.021 0.000 0.292 0.002 D:\py\py3.10\lib\site-packages\pandas\core\internals\construction.py:596(_homogenize)
7477 0.020 0.000 0.020 0.000 {method 'count' of 'list' objects}
in:
{pandas._libs.algos.nancorr}
It is the nancorr calculation of pandas{built-in method numpy.core._multiarray_umath.implement_array_function}
It is the underlying logic of each numpy function callD:\py\py3.10\lib\site-packages\numpy\lib\function_base.py:2462(cov)
It is a substep of numpy to calculate corr{built-in method builtins.isinstance}
It is the logic that pandas calls in large quantities in each operation
Subsequent optimization suggestions
- Removing all non-
pandas
essential logic is expected to{pandas._libs.algos.nancorr}
reduce the running time of other parts except the current one by more than 50%; - Greater matrix calculations;
F:\topo\metrics.py:42(pairwise_correlation)
,F:\topo\cop_kmeans.py:168(compute_centers)
,F:\topo\cop_kmeans.py:14(cop_kmeans)
,F:\topo\cop_kmeans.py:207(get_ml_info)
through relatively large revisions, there is still room for optimization- Can consider whether multiple
corr
calculation reused