The solution to the problem that the output content of the ipynb file in jupyter notebook is too much to open

foreword

Jupyter notebook is a powerful tool in the field of data analysis and modeling. You can view output results line by line or cell by cell, simplifying complex problems.

Each variable is cached. When the process is more complicated, you can directly restart from a certain step instead of starting from the beginning, which greatly improves efficiency.

Jupyter notebook has many advantages, but because the cache may also change the original data, or the value of the variable has been changed without knowing it. Of course, this article records other problems: too much output, resulting in the ipynb file in jupyter notebook being too large and The file cannot be opened.

There are many ways to solve the current problem.

method one

If the file can still be opened, clear the output directly
Kernel-->Restart & Clear Output

insert image description here

If the file can be opened, delete it manually, or use method one, but the tricky question is if the file can no longer be opened, what should I do?

Method Two

Copy a source code file to form a new file

Install the nbconvert plug-in, enter in cmd

pip install jupyter_contrib_nbextensions

Go to the file directory to be copied, and then copy a new source file
Enter the following command, change test.ipynb to the file to be copied

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to notebook --output=NotebookNoOut test.ipynb

insert image description here
It can be seen that the newly generated NotebookNoOut.ipynb is only 2.04kB, and only the source code is kept.

method three

Still need to install the jupyter_contrib_nbextensions plugin first

Delete the output directly on the ipynb file

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace test.ipynb

Similarly, change test.ipynb to your own ipynb file

Method Four

Open the current ipynb file with Notepad
insert image description here

The content after "cells" is all the cells of ipynb, "outputs" is the display information, the content outputsin it "text"is the output content, []delete the content in it, and the current output information will become empty, just need to find the file that cannot be opened ”text“Just delete the opened outputs .

当前方法是比较推荐的方法,毕竟不需要安装任何东西,并且选择性的删除输出内容。

Method 5 Extract by code

import json

with open('test.ipynb', 'r', encoding='utf8')as f:
    json_data = json.load(f)

for i in range(len(json_data['cells'])):
    tmp = json_data['cells'][i]['source']
    print("cell", json_data["cells"][i]["execution_count"])
    for j in tmp:
        print(j, end = '')
    print('\n')

insert image description here
What is extracted here is the code of every cell, but the code structure of ipynb has changed, which is inconvenient to use.

If you go further, how to deal with the reserved code and code format?

Since ipynb is stored in a certain format, it is enough to keep the storage format, and set the text behind to [].

code:

import json

with open('test.ipynb', 'r', encoding='utf8')as f:
    json_data = json.load(f)

for i in range(len(json_data['cells'])):
    tmp = json_data['cells'][i]
    if tmp['outputs']:
        tmp['outputs'][0]['text'] = []

with open('test-copy.ipynb', 'w', encoding='utf-8') as file:
    file.write(json.dumps(json_data, indent=2, ensure_ascii=False))

The result of the operation
produces a test-copy.ipynbfile named , which is opened using jupyter notebook.
insert image description here
The result obtained by the above command is exactly the same, and the function of directly removing all output is realized, so that the file is very small and can be opened.

当前方法也是比较推荐的方法,不需要安装任何东西,输出内容的内容可以增加更多的判断,比如输出内容的长度等。

Guess you like

Origin blog.csdn.net/Zeus_daifu/article/details/128066691