1 function
1.1 dir() function
Function function: open the package.
Enter in the python console of pycharm:
In[1]: import torch
In[2]: dir(torch)
Out[2]: ......
In[3]: dir(torch.cuda)
Out[3]: ......
In[4]: dir(torch.cuda.is_available)
Out[4]:
['__annotations__',
'__call__',
'__class__',
'__closure__',
'__code__',
'__defaults__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__get__',
'__getattribute__',
'__globals__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__kwdefaults__',
'__le__',
'__lt__',
'__module__',
'__name__',
'__ne__',
'__new__',
'__qualname__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__']
help(torch.cuda.is_available)
Help on function is_available in module torch.cuda:
is_available() -> bool
Returns a bool indicating if CUDA is currently available.
Different outputs can be observed.
A __init__
double underscore indicates a function that has been agreed not to be modified.
1.2 help() function
Function Function: Outputs the function details of the function.
Enter in the python console of pycharm:
In[5]: help(torch.cuda.is_available)
Out[5]:
Help on function is_available in module torch.cuda:
is_available() -> bool
Returns a bool indicating if CUDA is currently available.
2 output Hello world
2.1 pycharm new file
Create a new .py file and enter: print("hello world")
. Right click to run.
2.2 Python Console
Input: print("hello world")
, press Enter.
2.3 jupyter
Open the Conda Prompt and type:
(base) C:\Users\win10>conda activate pytorch
(pytorch) C:\Users\win10>jupyter notebook
Open jupyter, enter: print("hello world")
, click Run, or use the shortcut key: Shift+Enter.
2.4 The difference between the three
1. The python file takes the entire file (all lines) as a block, and executes it from scratch every time. Advantages: Universal, easy to spread, suitable for large projects. Cons: Can only be run from scratch.
2. The python Console uses a separate behavior block, and re-execution will start from the error. Advantages: display the value of each variable, debugging function. Disadvantages: It is not conducive to code reading and modification.
3. Jupyter runs in blocks of arbitrary behavior. Before running to the wrong place, it will be a whole block. After the error is corrected, it will also run in a whole block. Advantages: Conducive to code reading and modification. Cons: The environment needs to be configured.
3 PyTorch load data
3.1 Dataset class
Role: Provide a way to get data and its label.
(1) How to get each data and label.
(2) Tell us how much data there is in total.
3.2 Data Loader
Role: Provide different data forms for the network.
3.3 Download the dataset
Differentiate between images of ants and bees, download link: https://download.pytorch.org/tutorial/hymenoptera_data.zip
文件结构:
- dataset
- train
- ants
- bees
- val
- ants
- bees
The file structure needs to be modified:
新文件结构:
- dataset_ants_bees
- train
- ants_image(ants修改)
- ants_label(新建)
- bees_image(bees修改)
- bees_image(新建)
- val
- ants
- bees
3.4 Using the Dataset class
jupyter input.
from torch.utils.data import Dataset
help(Dataset)
Help on class Dataset in module torch.utils.data.dataset:
class Dataset(typing.Generic)
| An abstract class representing a :class:`Dataset`.
|
| All datasets that represent a map from keys to data samples should subclass
| it. All subclasses should overwrite :meth:`__getitem__`, supporting fetching a
| data sample for a given key. Subclasses could also optionally overwrite
| :meth:`__len__`, which is expected to return the size of the dataset by many
| :class:`~torch.utils.data.Sampler` implementations and the default options
| of :class:`~torch.utils.data.DataLoader`.
|
| .. note::
| :class:`~torch.utils.data.DataLoader` by default constructs a index
| sampler that yields integral indices. To make it work with a map-style
| dataset with non-integral indices/keys, a custom sampler must be provided.
|
| Method resolution order:
| Dataset
| typing.Generic
| builtins.object
|
| Methods defined here:
|
| __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]'
|
| __getitem__(self, index) -> +T_co
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __orig_bases__ = (typing.Generic[+T_co],)
|
| __parameters__ = (+T_co,)
|
| ----------------------------------------------------------------------
| Class methods inherited from typing.Generic:
|
| __class_getitem__(params) from builtins.type
|
| __init_subclass__(*args, **kwargs) from builtins.type
| This method is called when a class is subclassed.
|
| The default implementation does nothing. It may be
| overridden to extend subclasses.
3.5 Read the dataset and display the image
Write python scripts, define MyData
classes inherited from DataSet
classes, define __init__
, __getitem__
, and __len__
functions, and call functions to display images.
from torch.utils.data import Dataset
from PIL import Image
import os
class MyData(Dataset):
def __init__(self,root_dir,label_dir):
self.root_dir = root_dir # 根目录路径
self.label_dir = label_dir # 标签目录路径
self.path = os.path.join(self.root_dir, self.label_dir) # 合成成总路径
print("path: ", self.path)
self.img_path = os.listdir(self.path) # 获取所有图片的地址
print("img_path: ", self.img_path)
def __getitem__(self, idx):
img_name = self.img_path[idx]
img_item_path = os.path.join(self.root_dir, self.label_dir, img_name)
label = self.label_dir
img = Image.open(img_item_path)
return img, label
def __len__(self):
return len(self.img_path)
root_dir = "G:\\Anaconda\\pycharm_pytorch\\learning_project\\dataset_ants_bees\\train"
# 蚂蚁数据集
ants_label_dir = "ants_image"
ants_dataset = MyData(root_dir, ants_label_dir)
img_ants, label_ants = ants_dataset[0]
img_ants.show()
# 蜜蜂数据集
bees_label_dir = "bees_image"
bees_dataset = MyData(root_dir, bees_label_dir)
img_bees, label_bees = bees_dataset[0]
img_bees.show()
# 合并数据集
train_dataset = ants_dataset + bees_dataset
len(train_dataset)
len(ants_dataset)
len(bees_dataset)
img_train, label = train_dataset[200]
img_train.show()
3.6 Add tags
Because the previously downloaded data set only has images, and there is no label corresponding to each image, so write a python script that automatically generates labels:
# 程序功能:生成train文件夹下XXXX_label文件夹下的.txt文件和其标签内容,对应于XXXX_image文件夹下的图片名称
import os
root_dir = "G:\\Anaconda\\pycharm_pytorch\\learning_project\\dataset_ants_bees\\train"
image_dir = ["bees_image", "ants_image"] # 标签目录路径
label_dir = ["bees_label", "ants_label"] # 标签目录路径
label = ["bee","ant"]
for i in range(2):
path_image = os.path.join(root_dir, image_dir[i]) # 合成图像总路径
path_label = os.path.join(root_dir, label_dir[i]) # 合成标签总路径
img_path = os.listdir(path_image) # 获取所有图片的地址
for idx in range(len(img_path)):
file_name = img_path[idx][:-4] + ".txt"
file_path = os.path.join(path_label, file_name)
print(file_path)
file = open(file_path, "w", encoding='utf-8')
file.write(label[i])
file.close()