Things to note when installing conda and using jupyterlab

Insert image description here

1. conda installation

Miniconda official website , Miniconda official documentation

1.1 conda installation

I downloaded and installed conda 23.5.2   from the Miniconda official website
Insert image description here
. The python version is 3.11.4. Check add PATH during installation: the configuration in the end user variable is:

Insert image description here

1.2 Common commands

The following are common conda commands:

conda package management commands describe
conda create --name myenv python=3.8 Create a virtual environment named myenv and specify the python version as 3.8
conda activate myenv
source activate myenv
Activate a virtual environment (windows)
Activate a virtual environment (macOS and Linux)
conda install package_name Install Python packages in an activated virtual environment
conda list List installed packages in the current virtual environment
conda deactivate Deactivate the current virtual environment
conda env export > environment.yml Export the current virtual environment configuration to a YAML file
conda env create -f environment.yml Create a virtual environment from a YAML file
conda remove --name myenv --all Delete the virtual environment with the specified name and all its packages
conda search package_name Search for packages available for installation
conda update --all Upgrade all packages in the current virtual environment
conda virtual environment management command describe
conda update conda Upgrade conda itself
conda config --show Display conda configuration information
conda env listorconda info --envs List all created virtual environments
conda info --all Show all conda information
conda info --env Display details of the current virtual environment
conda config --set auto_activate_base false Disable default activation of the base environment (the base environment is automatically activated by default)
conda config --set auto_activate your_env_name Set your_env_name as the default activation environment

  By default, conda automatically activates the base environment as the current environment. If you want to change an environment to be the default activation environment, you need to do the following:

conda config --set auto_activate_base false				# 禁用默认激活基础环境
conda config --set auto_activate your_env_name			# 设置your_env_name为默认的激活环境

If you want to restore the default activation base environment, you need to run:

conda config --set auto_activate_base true 				# 恢复默认激活base环境

  When you use conda config --setthe command for the first time, a configuration conda file will be created in the user folder .condarc, and the configuration information added by the set command will be written to .condarcthe file. Use the conda info command to view this configuration file address:

Insert image description here

  The default installation source of conda is the Anaconda repository:

conda config --show-sources    # 显示当前配置的源
conda config --backup          # 备份原始配置文件

Next, you can use the following command to set up a domestic mirror source so that the download speed is faster:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/  # 清华源
conda config --add channels https://mirrors.aliyun.com/pypi/simple/					  # 阿里源

Or .condarcwrite directly in the file:

# 配置文件中,注释以#符号开头,且不能写行内注释,只能单独放一行
channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud

  The above configuration file uses Tsinghua source for installation by default. If you want to use another installation source, you can use -cthe option to explicitly specify other channels.

# conda-forge是配置文件中的channel名称,package_name是安装包的名称
conda install -c conda-forge package_name

  Each channel has its own specific purpose and set of packages, and you can choose to use one or more of them to install the relevant packages based on your needs.

  1. conda-forge: Community-driven Conda channel covering various fields, including scientific computing, data analysis, machine learning, computer vision, etc. It contains a large number of commonly used software packages and is updated frequently.
  2. msys2:`This channel may be useful if you need to build and run packages on Windows that require Unix/Linux tools.
  3. bioconda: biocondais a Conda channel dedicated to bioinformatics and biological data analysis.
  4. menpo: menpoChannels are often associated with the Menpo project, a computer vision and machine learning library. This channel contains software packages and tools related to the Menpo project.
  5. pytorch: pytorchChannel contains packages and tools related to the PyTorch deep learning framework.
  6. simpleitk: simpleitkChannel contains software packages and tools related to SimpleITK (Simplified Medical Image Processing Toolkit).

1.3 FAQ

  1. Anaconda powershell Promp error message
    When opening Anaconda powershell Promp, the following error message appears:
无法将“E:\miniconda\Scripts\conda.exe”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包
括路径,请确保路径正确,然后再试一次。
所在位置 C:\Users\LS\Documents\WindowsPowerShell\profile.ps1:4 字符: 4
+ (& "E:\miniconda\Scripts\conda.exe" "shell.powershell" "hook") | Out- ...
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (E:\miniconda\Scripts\conda.exe:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

This is because the previously installed conda profile.ps1left relevant configuration information in the PowerShell configuration file (& "E:\miniconda\Scripts\conda.exe" "shell.powershell" "hook") | Out- .... You only need to change it to:

(& "D:\Miniconda\Scripts\conda.exe" "shell.powershell" "hook") | Out-String | Invoke-Expression

It was previously installed on drive E. After uninstalling, the configuration information in powershell was not cleaned up. Now just change it to the installation directory on drive D.

  1. The shortcuts will become invalid after the system is reinstalled.
    Insert image description here
    After conda is installed, the above two shortcut startup methods will be automatically generated in the start menu bar. They will become invalid after the system is reinstalled. If you reinstall at this time, another group will be generated. Just delete the previously invalid ones.
  2. Uninstallation problem After conda is installed and started, folders and files
    will be generated in the user folder . If you want to uninstall conda, you need to clean up these files..conda.condarc

二、jupyterlab

2.1 jupyterlab installation and uninstallation

  Use conda installthe command to install. The latest version of jupyterlab is only 3.6.3. pip installSo install jupyterlab 4.0.6 directly . Then use the following command to install the Chinese language package of JupyterLab:

pip install jupyterlab-language-pack-zh-CN

Also installed E:\nlp\ChatGLM2-6B-mainand E:\nlp\alpaca-lora-mainunder requirements.txt, as well sentence-transformers, faiss-cpu,blingfire.

If you want to completely uninstall jupyterlab, run the following command:

pip uninstall jupyterlab    # pip安装的执行此命令
conda uninstall jupyterlab  # conda安装的执行此命令
# JupyterLab 会在用户的 home 目录下创建一个配置文件夹,需要删除
rm -r ~/.jupyter

Insert image description here
You also need to remove the JupyterLab extension and kernel

# 列出已安装的扩展和内核
jupyter labextension list
jupyter kernelspec list
# 卸载扩展和内核
jupyter labextension uninstall 扩展名称
jupyter kernelspec uninstall 内核名称

2.2 Common mistakes

2.2.1 Version conflict, jupyterlab cannot start

After successfully installing jupyterlab normally, you can use the following command in cmd to view the version information

jupyter-lab --version
4.0.6

Then use to jupyter-labstart jupyterlab, or enter in the address bar jupyter labto start jupyterlab in the specified directory:

Insert image description here

Start jupyterlab from E disk

  But once, neither method could be started, and jupyter-labwhen entering the command, an error occurred that some packages could not be imported. It is estimated that I installed jupyterlab 3.6.3 on conda and jupyterlab 4.0.6 on pip, causing a version conflict. Because when installing jupyterlab, many dependent packages will be installed at the same time. Although I uninstalled the previous installation version, the dependent packages were not uninstalled, causing conflicts when the new jupyterlab version started.

2.2.2 Plug-in version conflict

Jupyterlab 4 directly integrates the debugger. I didn’t know it at first. I couldn’t find it in the plug-in manager. I installed it directly with pip:

jupyter labextension install @jupyterlab/debugger

As a result, an error is reported every time I start jupyterlab (although it can still be started, it is very annoying to see the error)

2.3 Commonly used plug-ins

2.3.1 debugger

debugger documentation

Jupyterlab version 2 or 3 can install jupyterlab/debugger directly in the plug-in manager:
Insert image description here

  Jupyterlab 4 directly integrates the debugger. You can click the debug button (blue box) in the upper right corner of the notebook. If it displays red, it means it has entered debugging mode. Then, just like pycharm, set breakpoints where you need to debug. Click view-debugger or the debug button (red box) in the right sidebar to open the place where debugging information is displayed.
Insert image description here
  The above is the code for customizing the DataCollatorForMultipleChoice class when using the transformers library for multiple-choice question and answer. When running the following code, it will automatically jump to the breakpoint location and display variable information:

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_ds,
    eval_dataset=tokenized_train_ds,
    tokenizer=tokenizer,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
)
trainer.train()

Insert image description here

  • The variable area has two display modes: list and tree. The value of the variable is not fully displayed and can be directly copied and viewed. The variable window displays four lines by default. If this is not enough, you can pull down the variable window.
  • In the middle is the debugging operation button, which can be clicked or the corresponding shortcut keys can be used.
  • The source file area below and on the left both show that debugging is where the code stops.

For example, debugging shows the format of the variable:

label_name = "label" if 'label' in features[0].keys() else 'labels'    
# 原始features(4个样本)    
[{
    
    'input_ids': [...], 'token_type_ids': [...], 'attention_mask': [...], 'label': 0},
 {
    
    'input_ids': [...], 'token_type_ids': [...], 'attention_mask': [...], 'label': 0},
  {
    
    'input_ids': [...], 'token_type_ids': [...], 'attention_mask': [...], 'label': 1}, 
  {
    
    'input_ids': [...], 'token_type_ids': [...], 'attention_mask': [...], 'label': 0}]
# 对每个样本(feature,字典格式)使用pop删除key为label的键值对,返回被删除的值
# 所以feature被删除了label键值对,而labels的值是四个样本label列表[0, 0, 1, 0]
labels = [feature.pop(label_name) for feature in features]

# 去除label的后的feature(一个样本)
{
    
    'input_ids': [[...], [...], [...], [...], [...]],
 'token_type_ids': [[...], [...], [...], [...], [...]], 
 'attention_mask': [[...], [...], [...], [...], [...]]}
2.3.2 jupyterlab_code_formatter

github repository

jupyterlab_code_formatter is mainly used to format code and supports multiple languages:

GIF

2.4 jupyter skills

Refer to "JupyterLab's 10 Extremely Powerful Secret Techniques"

  1. Multiple row selection
    Insert image description here
  2. Add a virtual environment
    Use the following command to add a virtual environment as a kernel to Jupyter Lab so that it appears as an option in the upper right corner of the Launcher or kernel list:
 $ pip install ipykernel  
 $ ipython kernel install --user --name=new_or_existing_env_name

Note: The above code needs to be used in the virtual environment you need to add, not the environment of jupyter lab

  1. jupyter runRunning notebooks using commands Using commands ,
    you jupyter runcan execute each notebook cell sequentially like a Python script. This command will return the output of each cell in the form of JSON, so if there is a lot of text, the output may be laggy. We can save different hyperparameters into a single notebook and run them, thus saving a running record.
 jupyter run path_to_notebook.ipynb
  1. Splitting the editor window
    Jupyter Lab's windows are displayed in the form of tabs. We can open several editing windows at a time, and drag the windows to split the editor window. The demonstration is as follows:

  2. View documentation at any time
    There are three ways to find documentation for almost any function or magic command directly from the editor.

    1. Use the Shift+Tab keyboard shortcut (default), which displays a popup with documentation for the function or class the cursor is on:
      Insert image description here
    2. Contextual help: If you don't like the pop-up window disappearing after clicking elsewhere, you can also use the help menu or Ctrl + Iuse contextual help. Contextual help displays live documentation for the function or class pointed to by the cursor.
    1. Simply add a question mark (without brackets) to the end of the function or class nameInsert image description here
  3. Use the exclamation point (!) to run terminal commands

# 查看目录
!pwd

  Here's a more practical example. Suppose you have a data folder that contains images used for model training. All images are classified into catalogs based on their classes. Now we need a quick way to count the number of directories inside data/raw/train and store its output in number_of_classes:

 number_of_classes = !ls -1 data/raw/train | wc -l  
   
 >>> print(number_of_classes)  
 43

A single shell command can solve the problem, so we don’t need to write python directory traversal code.

  1. winsoundNotification Execution
    winsound is a module in the Python standard library that allows you to control sounds and play simple sound effects on Windows operating systems. Mainly used to create audio reminders, warnings or play simple sound files, usually used for command line scripts, gadgets or basic sound control needs.
    winsoundThe module provides some main functions and methods, including:

    1. Beep(frequency, duration): Used to make a beep sound. frequencyThe frequency of the beep is specified in Hertz, and durationthe duration of the beep is specified in milliseconds.

    2. PlaySound(sound, flags): This function allows you to play sound files in .wav format (complex audio formats such as mp3 are not supported). soundThe parameter is the file name or path of the sound file. flagsThe parameter is used to specify the playback method and behavior, such as whether to loop playback, asynchronous playback, etc.

    3. MessageBeep(type): This function can be used to emit a system-defined warning sound. typeThe parameter specifies the type of warning sound.

Here is a simple example that demonstrates how to use winsoundthe module to beep:

 import winsound  
 # 训练模型
 ......
 trainer.train()
 # 训练完成后进行通知
 duration = 5000  
 frequency = 440     
 winsound.Beep(frequency, duration)
  1. Automatic reloading and highlighting of scripts

  If we update an imported script, Jupyter will not automatically detect the change unless the kernel is restarted, which can create a lot of problems. So we can use the autoreload command to avoid this problem:

 %load_ext autoreload  
 %autoreload 1

The above code will detect and refresh the kernel every second. It will detect not only script changes, but changes to all files.

  In addition, for python scripts, we can also use pycatcommands to display the content of Python scripts in the form of syntax highlighting. For other file formats, you can use the cat command.
Insert image description here

Guess you like

Origin blog.csdn.net/qq_56591814/article/details/133522959