Model of Hugging Face
Taking the waifu-diffusion model as an example , the given implementation is generally based on diffuser
the library. The sample code is as follows:
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
'hakurei/waifu-diffusion',
torch_dtype=torch.float32
).to('cuda')
prompt = "1girl, aqua eyes, baseball cap, blonde hair, closed mouth, earrings, green background, hat, hoop earrings, jewelry, looking at viewer, shirt, short hair, simple background, solo, upper body, yellow shirt"
with autocast("cuda"):
image = pipe(prompt, guidance_scale=6)["sample"][0]
image.save("test.png")
Download the pre-training model through the network, and load the pre-training model directly, but in fact this model is downloaded locally, but it does not look very easy: because the model is too large, it is divided into some small files for download, and it can be seen later that the model is actually composed of some sub-models, so there are several relatively large files that should correspond to this, and the size is
similar unet、vae
.
After downloading, you can directly print(pipe)
find:
StableDiffusionPipeline {
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.11.0",
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],
"requires_safety_checker": true,
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
Sure enough, it is a series of small models and some unimportant parameters. This model can be directly saved as .pth
a file, and it can also be torch.load(pipe.pth)
read in, but when instantiating the model, it will appear
Traceback (most recent call last):
File "/home/gaoyi/example-app/test.py", line 59, in <module>
traced_script_module = torch.jit.trace(model, example)
File "/home/gaoyi/anaconda3/lib/python3.9/site-packages/torch/jit/_trace.py", line 803, in trace
name = _qualified_name(func)
File "/home/gaoyi/anaconda3/lib/python3.9/site-packages/torch/_jit_internal.py", line 1125, in _qualified_name
raise RuntimeError("Could not get name of python class object")
RuntimeError: Could not get name of python class object
This is because this big guy can't be loaded as a 模型类
load, so it can't be converted directly torch.jit.trace
. Let's change the way to convert the sub-model
model conversion
By printing print(pipe.unet)
, it can be seen that this unet
is an ordinary network with a bunch of familiar network layers:
UNet2DConditionModel(
(conv_in): Conv2d(4, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(time_proj): Timesteps()
(time_embedding): TimestepEmbedding(
(linear_1): Linear(in_features=320, out_features=1280, bias=True)
(act): SiLU()
(linear_2): Linear(in_features=1280, out_features=1280, bias=True)
)
(down_blocks): ModuleList(
(0): CrossAttnDownBlock2D(
(attentions): ModuleList(
(0): Transformer2DModel(
(norm): GroupNorm(32, 320, eps=1e-06, affine=True)
(proj_in): Linear(in_features=320, out_features=320, bias=True)
(transformer_blocks): ModuleList(
(0): BasicTransformerBlock(
(attn1): CrossAttention(
(to_q): Linear(in_features=320, out_features=320, bias=False)
(to_k): Linear(in_features=320, out_features=320, bias=False)
(to_v): Linear(in_features=320, out_features=320, bias=False)
(to_out): ModuleList(
(0): Linear(in_features=320, out_features=320, bias=True)
(1): Dropout(p=0.0, inplace=False)
)
)
(ff): FeedForward(
(net): ModuleList(
(0): GEGLU(
(proj): Linear(in_features=320, out_features=2560, bias=True)
)
(1): Dropout(p=0.0, inplace=False)
(2): Linear(in_features=1280, out_features=320, bias=True)
)
)
(attn2): CrossAttention(
(to_q): Linear(in_features=320, out_features=320, bias=False)
(to_k): Linear(in_features=1024, out_features=320, bias=False)
(to_v): Linear(in_features=1024, out_features=320, bias=False)
(to_out): ModuleList(
(0): Linear(in_features=320, out_features=320, bias=True)
(1): Dropout(p=0.0, inplace=False)
)
)
(norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
)
)
(proj_out): Linear(in_features=320, out_features=320, bias=True)
)
(1): Transformer2DModel(
...
...略
...
(conv_norm_out): GroupNorm(32, 320, eps=1e-05, affine=True)
(conv_act): SiLU()
(conv_out): Conv2d(320, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
Ok, then we can convert this sub-model into the required LibTorch model, but we don't know the input required by this model . We know the name of the model through the printed information UNet2DConditionModel
, so we can query it from the official document of Hugging Face: UNet2DConditionModel
The query found that the input of the model is:
but the specific value is still unknown, at this time you can print(model.config)
check it by:
FrozenDict([('sample_size', 64), ('in_channels', 4), ('out_channels', 4), ('center_input_sample', False),
('flip_sin_to_cos', True), ('freq_shift', 0), ('down_block_types', ['CrossAttnDownBlock2D',
'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'DownBlock2D']), ('mid_block_type',
'UNetMidBlock2DCrossAttn'), ('up_block_types', ['UpBlock2D', 'CrossAttnUpBlock2D',
'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D']), ('only_cross_attention', False),
('block_out_channels', [320, 640, 1280, 1280]), ('layers_per_block', 2), ('downsample_padding', 1),
('mid_block_scale_factor', 1), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 1e-05),
('cross_attention_dim', 1024), ('attention_head_dim', [5, 10, 20, 20]), ('dual_cross_attention', False),
('use_linear_projection', True), ('class_embed_type', None), ('num_class_embeds', None),
('upcast_attention', False), ('resnet_time_scale_shift', 'default'), ('_class_name', 'UNet2DConditionModel'),
('_diffusers_version', '0.10.2'), ('_name_or_path',
'/home/gaoyi/.cache/huggingface/diffusers/models--hakurei--waifu-diffusion/snapshots/55fd50bfae0dd8bcc4bd3a6f25cb167580b972a0/unet')])
A large dictionary, find what we need ('sample_size', 64), ('in_channels', 4), ('out_channels', 4)
, as the input for instantiation, at this time our .py
file is as follows:
model = torch.load("pipe-unet.pth")
# print(model.config)
# print(model)
example = torch.rand(1, 4, 64, 64)
timestep = torch.rand(1)
encoder_hidden_states = torch.rand(1, 4, 64, 64)
traced_script_module = torch.jit.trace(model, (example, timestep, encoder_hidden_states))
traced_script_module.save("pipe-unet.pt")
But an error is reported : mat1 can not be multiplied with mat2, shape 256x64 and 1024x320
, it is probably such a problem, and the specific information will not be pasted. Since the shape of the matrix is wrong, then change the shape. The shape I understood before is the same as it should be, but it looks wrong, but after changing it, I encountered a new problem. When calculating attention, there are too many data, and only three parameters are accepted, so simply pass the encoder_hidden_states
test .example
1024x1024
encoder_hidden_states = torch.rand(1, 4, 1024)
The new problem after that seems to be the problem of inputting tuples during instantiation, as follows:
RuntimeError: Encountering a dict at the output of the tracer might cause the trace to be incorrect,
this is only valid if the container structure does not change based on the module's inputs.
Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`,
use a `NamedTuple` instead). If you absolutely need this and know the side effects,
pass strict=False to trace() to allow this behavior.
It should be that a parameter needs to be passed during conversion strict=False
. The code after adjustment is as follows:
model = torch.load("pipe-unet.pth")
# print(model.config)
# print(model)
example = torch.rand(1, 4, 64, 64)
timestep = torch.rand(1)
encoder_hidden_states = torch.rand(1, 4, 1024)
traced_script_module = torch.jit.trace(model, (example, timestep, encoder_hidden_states), strict=False)
traced_script_module.save("pipe-unet.pt")
Saved successfully!
model testing
According to the test tutorial on the PyTorch official website, write the corresponding C++ file, then use CMake to compile, and finally generate the example-app
executable file, run:
./example-app ../pipe-unet.pt
Output ok
, successfully transformed!