Stable Diffusion - SD v1.6+ version causes BLIP Interrogate CLIP (CLIP reverse push) function RuntimeError exception

Welcome to follow my CSDN: https://spike.blog.csdn.net/This
article address: https://spike.blog.csdn.net/article/details/132994678

Img

Image from 麦橘写实_MajicMIX_Realistic_v6model

Upgrading the SD v1.6 version causes the CLIP reverse push function to become unavailable, that is:
SD

Reference: Image inversion (Interrogate) Prompt word algorithm (BLIP and DeepBooru)

Error log:

# ...
  File "stable_diffusion_webui/repositories/BLIP/models/med.py", line 277, in forward
    self_outputs = self.self(
  File "stable_diffusion_webui/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "stable_diffusion_webui/repositories/BLIP/models/med.py", line 178, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 0

Solution: SD's CLIP reverse function calls GitHub - salesforce/BLIP . The project was last updated in 2022.9. The overall Transformer framework is relatively old and currently only supports version 4.26.1, that is:

pip install transformers==4.26.1
pip install tokenizers==0.11.1

However, the SD v1.6 version of transformers is recommended to be updated to 4.30.2, thus causing conflicts, refer requirements.txtto requirements_versions.txt:

transformers==4.30.2

Therefore, it needs to be modified transformers==4.26.1to be ready for use. BLIP is currently unmaintained, so it can only be based on BLIP's Transformer.

reference:

At the same time, modifying stable-diffusion-webui/modules/launch_utils.pythe script and adding a GitHub agent https://ghproxy.com/can improve the preprocessing speed of starting the WebUI project. If you need to update the version, you can update it according to the corresponding project address. in:

  • The BLIP project is located atstable-diffusion-webui/stable_diffusion_webui/repositories/BLIP
  • The BLIP model is located atstable_diffusion_webui/models/BLIP/model_base_capfilt_large.pth

Right now:

def prepare_environment():
# ...
    clip_package = os.environ.get('CLIP_PACKAGE', "https://ghproxy.com/https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip")
    openclip_package = os.environ.get('OPENCLIP_PACKAGE', "https://ghproxy.com/https://github.com/mlfoundations/open_clip/archive/bb6e834e9c70d9c27d0dc3ecedeebeaeb1ffad6b.zip")

    stable_diffusion_repo = os.environ.get('STABLE_DIFFUSION_REPO', "https://ghproxy.com/https://github.com/Stability-AI/stablediffusion.git")
    stable_diffusion_xl_repo = os.environ.get('STABLE_DIFFUSION_XL_REPO', "https://ghproxy.com/https://github.com/Stability-AI/generative-models.git")
    k_diffusion_repo = os.environ.get('K_DIFFUSION_REPO', 'https://ghproxy.com/https://github.com/crowsonkb/k-diffusion.git')
    codeformer_repo = os.environ.get('CODEFORMER_REPO', 'https://ghproxy.com/https://github.com/sczhou/CodeFormer.git')
    blip_repo = os.environ.get('BLIP_REPO', 'https://ghproxy.com/https://github.com/salesforce/BLIP.git')
#...

Note: The official website model address is https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth , which is larger than the model recommended by SD model_base_caption_capfilt_large.pth, that is, 2.0G and 800M.

         files = modelloader.load_models(
             model_path=os.path.join(paths.models_path, "BLIP"),
-            model_url='https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth',
+            model_url='https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth',^M
             ext_filter=[".pth"],
-            download_name='model_base_caption_capfilt_large.pth',
+            download_name='model_base_capfilt_large.pth',^M
         )

Image description, via New Bing:

a picture of a person sitting on a chair in a luxurious room,
wearing a black and white zebra print dress and black high heels,
chair is a light beige color with a curved back and armrests,
room has a large window with white curtains and a gold-framed mirror on the wall,
floor is made of light-colored wood,
shows a contrast between the bold and striking pattern of the dress and the soft and elegant colors of the room,
seems to be relaxed and comfortable as they are leaning back on the chair and crossing their legs,
picture might be taken for a fashion magazine or a personal blog as it showcases the style and taste of the person,
It might also be taken for a hotel advertisement or a travel diary as it shows the beauty and luxury of the room,
The picture creates an impression of sophistication and glamour as well as curiosity and interest,

Img
Complete promotion word:

(masterpiece, best quality:1.2),highly detailed,extremely detailed,real photo,
looking at viewer,body facing viewer,240D wrap hip very thick pantyhose,
a picture of a person sitting on a chair in a luxurious room,
wearing a black and white zebra print dress and black high heels,
chair is a light beige color with a curved back and armrests,
room has a large window with white curtains and a gold-framed mirror on the wall,
floor is made of light-colored wood,
shows a contrast between the bold and striking pattern of the dress and the soft and elegant colors of the room,
seems to be relaxed and comfortable as they are leaning back on the chair and crossing their legs,
The picture might be taken for a fashion magazine or a personal blog as it showcases the style and taste of the person,
It might also be taken for a hotel advertisement or a travel diary as it shows the beauty and luxury of the room,
picture creates an impression of sophistication and glamour as well as curiosity and interest,
(pair shoes,pair legs:1.2),nice hand,nice figure,
(photorealistic,realistic:1.2),
<lora:more_details:0.4>,<lora:clothing_adjuster_v2:-0.8>,
Negative prompt: (ng_deepnegative_v1_75t:1.3),(negative_hand),(badhandv4),
(negative_feet_v2:0.5),
cleavage,buttocks,
missing arm,missing leg,extra arms,extra legs,mutated legs,extra limbs,malformed limbs,floating limbs,disconnected limbs,
bad anatomy,bad proportions,disfigured,long neck,long leg,
worst quality,bad quality,jpeg artifacts,lowres,normal quality,low quality,
EasyNegative,
Steps: 30, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 2386674497, Size: 512x768, Model hash: e4a30e4607, Model: 麦橘写实_MajicMIX_Realistic_v6, Denoising strength: 0.3, ADetailer model: face_yolov8n.pt, ADetailer prompt: “asian face,beatiful face,”, ADetailer confidence: 0.3, ADetailer dilate/erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.4, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer version: 23.9.2, Hires upscale: 2, Hires steps: 5, Hires upscaler: 4x-UltraSharp, Lora hashes: “more_details: 3b8aa1d351ef, clothing_adjuster_v2: f038e3a5b67b”, TI hashes: “ng_deepnegative_v1_75t: 54e7e4826d53, negative_hand: 73b524a2da12, badhandv4: 5e40d722fc3d, negative_feet_v2: df90b1ff666d, EasyNegative: 66a7279a88dd”, Version: v1.6.0

Guess you like

Origin blog.csdn.net/u012515223/article/details/132994678