How to Dynamically Debug Python's Third-Party Libraries How to Dynamically Debug Python's Third-Party Libraries

How to dynamically debug third-party libraries for Python

Note: The method in this article is limited to libraries that come with py source code during debugging installation, such as sklearn.

introduce

I used sklearnit sklearn.feature_extraction.text.TfidfTransformerto get TF特征it, but sklearnthe calculation results I found were not the same as my manual calculation results. sklearnAlthough the source code can be found on github . But if you can't debug it dynamically, you can't see the results intuitively.

So the question is, how can we dynamically debug Python's third-party libraries (for example sklearn)? How can I see the intermediate results of the dynamic running of the source code in the third-party library?

Suppose my code is as follows:

# 原始语料,3个文本
strs_train =[
'God is love',
'OpenGL on the GPU is fast',
'Doctor David is PHD']

from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
# 先提取 Bags of words特征
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(strs_train)
# 再基于Bags of words特征,变换为TF特征
tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts)
X_train_tf = tf_transformer.transform(X_train_counts)
print(X_train_tf.todense())

How can I see sklearn.feature_extraction.text.TfidfTransformer.transform()the intermediate result of the function calculation?

Python Debugging Basics

PythonComes with a module for debugging code pdb. It supports breakpoint setting, single-step debugging, entering function debugging, viewing code snippets, viewing variable values, and dynamically changing variable values.

The following two lines of code can add a breakpoint to the program:

import pdb
pdb.set_trace()

Add a breakpoint, run the program, when the program stops, you can use the following commands to debug the code in SHELL.

Order meaning
c continue code execution
n Next step
r Execute the code, returning from the current function
s enter function
b next breakpoint

Debugging Python third-party libraries

We pdbcan set breakpoints in third-party libraries and debug them. Taking debugging sklearnas an sklearn.feature_extraction.text.TfidfTransformerexample, the following steps are given.

  • (1) Find the location of the third-party library

First use the following Python code to find the sklearnsource code location. My location is here C:\\Users\\biny\\Anaconda3\\lib\\site-packages\\sklearn.

import sklearn, os
path = os.path.dirname(sklearn.__file__)
  • 1
  • 2
  • (2) Delete the Python precompiled字节码

Python程序在运行时,为了提高运行速度,Python解释器先将.py代码编译为byte code字节码),再有Python虚拟机来执行字节码。

下次再运行同一程序时,若.py代码没有改变,则省略将.py代码编译为字节码的步骤,直接运行上次已编译好的字节码

这些字节码,会被存于__pycache__文件夹下,和.pyc文件。按照原理,这个步骤是不需要做的,不过删掉字节码在运行自己的程序,如果不会出现新的字节码文件,说明你的第三方库位置找错了。这样能方便我们发现错误。

  • (3)在第三方库源码中加断点

根据第三方库的位置,找到sklearn.feature_extraction.text.TfidfTransformer.transform()函数所在.py文件。并用pdb在函数开头加上断点(如下)。

def transform(self, X, copy=True):
    import pdb
    pdb.set_trace()

    if hasattr(X, 'dtype') and np.issubdtype(X.dtype, np.float):
        # preserve float family dtype
        X = sp.csr_matrix(X, copy=copy)
    else:
        # convert counts or binary occurrences to floats
        X = sp.csr_matrix(X, dtype=np.float64, copy=copy)

  • (4)运行自己的程序

运行我的代码,停在第三方库中,就可以用pdb命令调试第三方代码了。

  • 此时代码已经运行并进入第三方库中,停止在断点处: 
    C:\mine\tmp\debug_py_3rd_lib>python main.py

    c:\users\biny\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py(1018)transform() 
    -> if hasattr(X, ‘dtype’) and np.issubdtype(X.dtype, np.float): 
    (Pdb)

  • 用n命令(next),让代码单步运行到关键点:

    c:\users\biny\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py(1042)transform() 
    -> if self.norm: 
    (Pdb) n

  • 直接输入要查看的中间变量(X.data),停下的这行代码是即将执行的,我们可以看到执行前的变量值:

    c:\users\biny\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py(1043)transform() 
    -> X = normalize(X, norm=self.norm, copy=False) 
    (Pdb) X.data 
    array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

  • 继续执行代码(n命令),然后可以看到中间变量值被改变。也能看到这个改变是因为做了normalize。 
    (Pdb) n

    c:\users\biny\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py(1045)transform() 
    -> return X 
    (Pdb) X.data 
    array([ 0.57735027, 0.57735027, 0.57735027, 0.40824829, 0.40824829, 
    0.40824829, 0.40824829, 0.40824829, 0.40824829, 0.5 , 
    0.5 , 0.5 , 0.5 ])

记住调试结束后,一定要在第三方源码中删掉pdb断点那两行代码!

参考

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325218684&siteId=291194637