Stable diffusion installation stepping on the pit (win&Mac&iOS)

Apple officially supported this library today, so I downloaded it and tried it out. The effect is not bad. The M chip is the first surprise in the field of deep learning.
https://machinelearning.apple.com/research/stable-diffusion-coreml-apple-silicon

1.win use

Install

Share the installation experience of an interesting library.
It was only recently discovered that a great master has open-sourced this generative model. I have been paying attention to large-scale models like DELL before, and I was amazed by the wild imagination of ai. But most of them are just api interfaces, and now poor people can also own them. <smile>

download library

Go to the github website to download -> https://github.com/CompVis/stable-diffusion

git clone https://github.com/CompVis/stable-diffusion.git

Configuration environment and files

Simple operation, enter the following two directly in the downloaded folder

conda env create -f environment.yaml
conda activate ldm

It is configured.
Or foolishly report an error and install one by one like me.
My own environment is pytorch torchvision

pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .
pip install OmegaConf einops taming-transformers pytorch-lighnting clip kornia

question

The first question is explained on the official website, but the download address is hidden deep and hard to find.
insert image description here
I also searched for a while before I found https://huggingface.co/CompVis/stable-diffusion-v-1-4-original , and downloaded this sd-v1-4.ckpt file (any one will do), probably Four G's.
insert image description here

Download the model file and place it in this folder, which is the location officially stated above, and name it model.ckpt.
insert image description here
Run the following code, no accident will report an error.

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms

insert image description here

The problem seems to be that the original author modified this library and replaced your quantize.py (the error message contains the absolute path of the file) file with the file at this URL
https://github.com/CompVis/taming-transformers/blob /master/taming/modules/vqvae/quantize.py
insert image description here

Run it again and report an error.
insert image description here
I only have a 12g 3060. It seems that this is not something ordinary people can afford -_-
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
Now there is a simple way, thanks to the suggestion of the shame of the Department of Mathematics, the video memory can be released by directly reducing the precision.
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
Modify line 244 of the txt2img.py file as follows
insert image description here
Effect -->
insert image description here

Or look at more complicated methods next!
I don't know how much memory is needed. The method found on the Internet is to use an optimized library. There is another way on the Internet to say what security checks are commented out. I tried it and it didn't change.
After https://github.com/basujindal/stable-diffusion
is downloaded, some environments need to be installed on the new library. Run the following installation code in the new folder

pip install -e .

insert image description here
The optimized library code is placed in the optimizedSD folder, and the previous source code is also kept, don't make a mistake.
Reinstall the environment of this optimized library, and put ckpt in the corresponding position.

python optimizedSD/optimized_txt2img.py --prompt "Cyberpunk style image of a Tesla car reflection in rain" --H 512 --W 512 --seed 27 --n_iter 2 --n_samples 5 --ddim_steps 50

Report this error after running.
insert image description here

After checking, it seems that the recently optimized author also changed a library
https://github.com/basujindal/stable-diffusion/issues/175
and can be solved by the following method.
insert image description here

pip install git+https://github.com/crowsonkb/k-diffusion.git

Then open and edit the optimizedSD/ddpm.py file, and change from samples... to the three from k_diffusion in the above picture...
Then the computer with poor graphics card can run, not to mention trying to move bricks to buy a 24g graphics card.
Measured effect -->
insert image description here
insert image description here

2.mac use

Install

https://github.com/apple/ml-stable-diffusion

git clone https://github.com/apple/ml-stable-diffusion
pip install -e . #来到下载好的文件夹下面运行

insert image description here

login hugging face

If https://huggingface.co
is not available, register directly. After registration, click token to generate https://huggingface.co/settings/tokens , and then copy the token
insert image description here

Next, enter commands on the command line and follow the prompts to enter

huggingface-cli login

insert image description here

run command

Run under the folder, by the way, create a folder to place the ml model, replace the following-o

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o ./output_ml

The official also gave an api

--model-version runwayml/stable-diffusion-v1-5 #可以指定其他版本的diffusion模型,默认是 CompVis/stable-diffusion-v1-4
--bundle-resources-for-swift-cli  #将ml文件整理成一个swift包,python生成不需要使用
--chunk-unet #ios和ipados部署需要,后面两个之后有机会我想去尝试一下在真机上的部署
--attention-implementation #在Apple芯片上的npu上实现

If you want to deploy mobile phones and tablets, you can refer to the following

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o ./sd2_ml --chunk-unet --model-version stabilityai/stable-diffusion-2-1-base --bundle-resources-for-swift-cli

insert image description here
About 20 minutes, these files will be generated
insert image description here

Then it still runs under the ml-stable-diffusion folder, and by the way, create a folder for pictures.

python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./output_ml -o ./output_image --compute-unit ALL --seed 93

Swift uses the following

swift run StableDiffusionSample "A photo of a little girl walking on the beach with the Jenny Turtle" --resource-path ./sd2_ml/Resources/ --seed 93 --output-path ./output_image

insert image description here

--model-version #如果前面修改了这个也要修改
--num-inference-steps #默认推理50次,用这个可以自定义次数

I am an M2 MacBook Air. I refer to the official benchmark guide --compute-unitand choose it CPU_AND_NE. It takes about 21 seconds for an inference, which is quite fast (python). Swift takes about 2 seconds for a step, which is much faster.
insert image description here

insert image description here

From the picture above, we can see that there seems to be no suffix M chip because of the small number of GPUs. I guess it is --compute-unitrecommended to choose CPU_AND_NEthe pro series chip ALL, and the above choice CPU_AND_GPU.
--attention-implementationDirectly look at the number of GPU cores, if the number of cores is less than or equal to 16 SPLLIT_EINSUM, it is the default and does not need to be added. Greater than 16 use ORIGINAL. I guess because the M chip is a 16-core NPU, the number of GPU cores is smaller than that of the NPU, so the NPU must be used, and the number of GPU cores is much larger than that of the NPU, so the efficiency of the GPU is higher.
insert image description here

3.iphone&ipad deployment

Open Xcode, import the library and the ML file generated above
insert image description here

import SwiftUI
import StableDiffusion
import CoreML

struct ContentView: View {
    
    
    @State var prompt: String = "a photo of an astronaut riding a horse on mars"
    @State var step = 10
    @State var seed = 100
    @State var image: CGImage?
    @State var progress = 0.0
    @State var generating = false
    @State var booting = true
    
    @State var pipeline: StableDiffusionPipeline?
    
    private let disableSafety = false

    
    var body: some View {
    
    
        VStack {
    
    
            if booting {
    
    
                Text("Initializing...")
            } else {
    
    
                if let image {
    
    
                    Image(uiImage: UIImage(cgImage: image))
                        .resizable()
                        .scaledToFit()
                }
                if generating {
    
    
                    ProgressView(value: progress)
                }
                if !generating {
    
    
                    TextField("Prompt", text: $prompt)
                    Stepper(value: $step, in: 1...100) {
    
    
                        Text("steps: \(step)")
                    }
                    Stepper(value: $seed, in: 0...10000) {
    
    
                        Text("Seed: \(seed)")
                    }
                    Button("Generate") {
    
    
                        progress = 0.0
                        image = nil
                        generating = true
                        Task.detached(priority: .high) {
    
    
                            var images: [CGImage?]?
                            do {
    
    
                                print("generate")
                                images = try pipeline?.generateImages(prompt: prompt, stepCount: step,seed: seed, disableSafety: disableSafety, progressHandler: {
    
     progress in
                                    print("test")
                                    self.progress = Double(progress.step) / Double(step)
                                    if let image = progress.currentImages.first {
    
    
                                        self.image = image
                                    }
                                    return true
                                })
                            } catch let error {
    
    
                                print(error.localizedDescription)
                            }
                            print("finish")
                            if let image = images?.first {
    
    
                                self.image = image
                            }
                            generating = false
                        }
                    }
                }
            }
        }
        .padding()
        .onAppear{
    
    
            Task.detached(priority: .high) {
    
    
                do {
    
    
                    print(os_proc_available_memory())
                    guard let path = Bundle.main.path(forResource: "CoreMLModels", ofType: nil, inDirectory: nil) else {
    
    
                        fatalError("Fatal error: failed to find the CoreML models.")
                    }
                    let resourceURL = URL(fileURLWithPath: path)
                    let config = MLModelConfiguration()
                    config.computeUnits = .cpuAndNeuralEngine
                    pipeline = try StableDiffusionPipeline(resourcesAt: resourceURL, configuration: config,reduceMemory: true)
                    try pipeline?.loadResources()
                    print("initialized pipeline")
                } catch let error {
    
    
                    print("error initializing pipeline")
                    print(error.localizedDescription)
                }
                booting = false
            }
        }
    }
}

struct ContentView_Previews: PreviewProvider {
    
    
    static var previews: some View {
    
    
        ContentView()
    }
}

Recommended on iPad and Mac config.computeUnits = .cpuAndNeuralEngine. If you want to continue deploying on the iPhone, change it to this config.computeUnits = .cpuAndGPU, then come to the Signing interface, click Capability, and select Increased Memory Limit. In this way, it can run on the real iPhone. This project requires a little more than 3GB of memory on the real iPhone. I am an iPhone 14pro, and the default available memory of the program is also a little more than 3GB. Therefore, the available memory can be increased to about 4GB through the Increased Memory Limit before it can run. In addition, even if the memory usage is increased, the neural engine will still receive memory errors, which can only be reported with the GPU. iPad air5 does not have this error and can report both. The running speed of the GPU is a bit slower than the neural engine, but it is cool that a mobile phone can run diffusion locally.
insert image description here

Guess you like

Origin blog.csdn.net/weixin_45569617/article/details/126873709