foreword
Jupyter notebook is a powerful tool in the field of data analysis and modeling. You can view output results line by line or cell by cell, simplifying complex problems.
Each variable is cached. When the process is more complicated, you can directly restart from a certain step instead of starting from the beginning, which greatly improves efficiency.
Jupyter notebook has many advantages, but because the cache may also change the original data, or the value of the variable has been changed without knowing it. Of course, this article records other problems: too much output, resulting in the ipynb file in jupyter notebook being too large and The file cannot be opened.
There are many ways to solve the current problem.
method one
If the file can still be opened, clear the output directly
Kernel
-->Restart & Clear Output
If the file can be opened, delete it manually, or use method one, but the tricky question is if the file can no longer be opened, what should I do?
Method Two
Copy a source code file to form a new file
Install the nbconvert plug-in, enter in cmd
pip install jupyter_contrib_nbextensions
Go to the file directory to be copied, and then copy a new source file
Enter the following command, change test.ipynb to the file to be copied
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to notebook --output=NotebookNoOut test.ipynb
It can be seen that the newly generated NotebookNoOut.ipynb is only 2.04kB, and only the source code is kept.
method three
Still need to install the jupyter_contrib_nbextensions plugin first
Delete the output directly on the ipynb file
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace test.ipynb
Similarly, change test.ipynb to your own ipynb file
Method Four
Open the current ipynb file with Notepad
The content after "cells" is all the cells of ipynb, "outputs" is the display information, the content outputs
in it "text"
is the output content, []
delete the content in it, and the current output information will become empty, just need to find the file that cannot be opened ”text“
Just delete the opened outputs .
当前方法是比较推荐的方法,毕竟不需要安装任何东西,并且选择性的删除输出内容。
Method 5 Extract by code
import json
with open('test.ipynb', 'r', encoding='utf8')as f:
json_data = json.load(f)
for i in range(len(json_data['cells'])):
tmp = json_data['cells'][i]['source']
print("cell", json_data["cells"][i]["execution_count"])
for j in tmp:
print(j, end = '')
print('\n')
What is extracted here is the code of every cell, but the code structure of ipynb has changed, which is inconvenient to use.
If you go further, how to deal with the reserved code and code format?
Since ipynb is stored in a certain format, it is enough to keep the storage format, and set the text behind to [].
code:
import json
with open('test.ipynb', 'r', encoding='utf8')as f:
json_data = json.load(f)
for i in range(len(json_data['cells'])):
tmp = json_data['cells'][i]
if tmp['outputs']:
tmp['outputs'][0]['text'] = []
with open('test-copy.ipynb', 'w', encoding='utf-8') as file:
file.write(json.dumps(json_data, indent=2, ensure_ascii=False))
The result of the operation
produces a test-copy.ipynb
file named , which is opened using jupyter notebook.
The result obtained by the above command is exactly the same, and the function of directly removing all output is realized, so that the file is very small and can be opened.
当前方法也是比较推荐的方法,不需要安装任何东西,输出内容的内容可以增加更多的判断,比如输出内容的长度等。