Apple officially supported this library today, so I downloaded it and tried it out. The effect is not bad. The M chip is the first surprise in the field of deep learning.
https://machinelearning.apple.com/research/stable-diffusion-coreml-apple-silicon
1.win use
Install
Share the installation experience of an interesting library.
It was only recently discovered that a great master has open-sourced this generative model. I have been paying attention to large-scale models like DELL before, and I was amazed by the wild imagination of ai. But most of them are just api interfaces, and now poor people can also own them. <smile>
download library
Go to the github website to download -> https://github.com/CompVis/stable-diffusion
git clone https://github.com/CompVis/stable-diffusion.git
Configuration environment and files
Simple operation, enter the following two directly in the downloaded folder
conda env create -f environment.yaml
conda activate ldm
It is configured.
Or foolishly report an error and install one by one like me.
My own environment is pytorch torchvision
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .
pip install OmegaConf einops taming-transformers pytorch-lighnting clip kornia
question
The first question is explained on the official website, but the download address is hidden deep and hard to find.
I also searched for a while before I found https://huggingface.co/CompVis/stable-diffusion-v-1-4-original , and downloaded this sd-v1-4.ckpt file (any one will do), probably Four G's.
Download the model file and place it in this folder, which is the location officially stated above, and name it model.ckpt.
Run the following code, no accident will report an error.
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
The problem seems to be that the original author modified this library and replaced your quantize.py (the error message contains the absolute path of the file) file with the file at this URL
https://github.com/CompVis/taming-transformers/blob /master/taming/modules/vqvae/quantize.py
Run it again and report an error.
I only have a 12g 3060. It seems that this is not something ordinary people can afford -_-
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
Now there is a simple way, thanks to the suggestion of the shame of the Department of Mathematics, the video memory can be released by directly reducing the precision.
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
Modify line 244 of the txt2img.py file as follows
Effect -->
Or look at more complicated methods next!
I don't know how much memory is needed. The method found on the Internet is to use an optimized library. There is another way on the Internet to say what security checks are commented out. I tried it and it didn't change.
After https://github.com/basujindal/stable-diffusion
is downloaded, some environments need to be installed on the new library. Run the following installation code in the new folder
pip install -e .
The optimized library code is placed in the optimizedSD folder, and the previous source code is also kept, don't make a mistake.
Reinstall the environment of this optimized library, and put ckpt in the corresponding position.
python optimizedSD/optimized_txt2img.py --prompt "Cyberpunk style image of a Tesla car reflection in rain" --H 512 --W 512 --seed 27 --n_iter 2 --n_samples 5 --ddim_steps 50
Report this error after running.
After checking, it seems that the recently optimized author also changed a library
https://github.com/basujindal/stable-diffusion/issues/175
and can be solved by the following method.
pip install git+https://github.com/crowsonkb/k-diffusion.git
Then open and edit the optimizedSD/ddpm.py file, and change from samples... to the three from k_diffusion in the above picture...
Then the computer with poor graphics card can run, not to mention trying to move bricks to buy a 24g graphics card.
Measured effect -->
2.mac use
Install
https://github.com/apple/ml-stable-diffusion
git clone https://github.com/apple/ml-stable-diffusion
pip install -e . #来到下载好的文件夹下面运行
login hugging face
If https://huggingface.co
is not available, register directly. After registration, click token to generate https://huggingface.co/settings/tokens , and then copy the token
Next, enter commands on the command line and follow the prompts to enter
huggingface-cli login
run command
Run under the folder, by the way, create a folder to place the ml model, replace the following-o
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o ./output_ml
The official also gave an api
--model-version runwayml/stable-diffusion-v1-5 #可以指定其他版本的diffusion模型,默认是 CompVis/stable-diffusion-v1-4
--bundle-resources-for-swift-cli #将ml文件整理成一个swift包,python生成不需要使用
--chunk-unet #ios和ipados部署需要,后面两个之后有机会我想去尝试一下在真机上的部署
--attention-implementation #在Apple芯片上的npu上实现
If you want to deploy mobile phones and tablets, you can refer to the following
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o ./sd2_ml --chunk-unet --model-version stabilityai/stable-diffusion-2-1-base --bundle-resources-for-swift-cli
About 20 minutes, these files will be generated
Then it still runs under the ml-stable-diffusion folder, and by the way, create a folder for pictures.
python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i ./output_ml -o ./output_image --compute-unit ALL --seed 93
Swift uses the following
swift run StableDiffusionSample "A photo of a little girl walking on the beach with the Jenny Turtle" --resource-path ./sd2_ml/Resources/ --seed 93 --output-path ./output_image
--model-version #如果前面修改了这个也要修改
--num-inference-steps #默认推理50次,用这个可以自定义次数
I am an M2 MacBook Air. I refer to the official benchmark guide --compute-unit
and choose it CPU_AND_NE
. It takes about 21 seconds for an inference, which is quite fast (python). Swift takes about 2 seconds for a step, which is much faster.
From the picture above, we can see that there seems to be no suffix M chip because of the small number of GPUs. I guess it is --compute-unit
recommended to choose CPU_AND_NE
the pro series chip ALL
, and the above choice CPU_AND_GPU
.
--attention-implementation
Directly look at the number of GPU cores, if the number of cores is less than or equal to 16 SPLLIT_EINSUM
, it is the default and does not need to be added. Greater than 16 use ORIGINAL
. I guess because the M chip is a 16-core NPU, the number of GPU cores is smaller than that of the NPU, so the NPU must be used, and the number of GPU cores is much larger than that of the NPU, so the efficiency of the GPU is higher.
3.iphone&ipad deployment
Open Xcode, import the library and the ML file generated above
import SwiftUI
import StableDiffusion
import CoreML
struct ContentView: View {
@State var prompt: String = "a photo of an astronaut riding a horse on mars"
@State var step = 10
@State var seed = 100
@State var image: CGImage?
@State var progress = 0.0
@State var generating = false
@State var booting = true
@State var pipeline: StableDiffusionPipeline?
private let disableSafety = false
var body: some View {
VStack {
if booting {
Text("Initializing...")
} else {
if let image {
Image(uiImage: UIImage(cgImage: image))
.resizable()
.scaledToFit()
}
if generating {
ProgressView(value: progress)
}
if !generating {
TextField("Prompt", text: $prompt)
Stepper(value: $step, in: 1...100) {
Text("steps: \(step)")
}
Stepper(value: $seed, in: 0...10000) {
Text("Seed: \(seed)")
}
Button("Generate") {
progress = 0.0
image = nil
generating = true
Task.detached(priority: .high) {
var images: [CGImage?]?
do {
print("generate")
images = try pipeline?.generateImages(prompt: prompt, stepCount: step,seed: seed, disableSafety: disableSafety, progressHandler: {
progress in
print("test")
self.progress = Double(progress.step) / Double(step)
if let image = progress.currentImages.first {
self.image = image
}
return true
})
} catch let error {
print(error.localizedDescription)
}
print("finish")
if let image = images?.first {
self.image = image
}
generating = false
}
}
}
}
}
.padding()
.onAppear{
Task.detached(priority: .high) {
do {
print(os_proc_available_memory())
guard let path = Bundle.main.path(forResource: "CoreMLModels", ofType: nil, inDirectory: nil) else {
fatalError("Fatal error: failed to find the CoreML models.")
}
let resourceURL = URL(fileURLWithPath: path)
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine
pipeline = try StableDiffusionPipeline(resourcesAt: resourceURL, configuration: config,reduceMemory: true)
try pipeline?.loadResources()
print("initialized pipeline")
} catch let error {
print("error initializing pipeline")
print(error.localizedDescription)
}
booting = false
}
}
}
}
struct ContentView_Previews: PreviewProvider {
static var previews: some View {
ContentView()
}
}
Recommended on iPad and Mac config.computeUnits = .cpuAndNeuralEngine
. If you want to continue deploying on the iPhone, change it to this config.computeUnits = .cpuAndGPU
, then come to the Signing interface, click Capability, and select Increased Memory Limit. In this way, it can run on the real iPhone. This project requires a little more than 3GB of memory on the real iPhone. I am an iPhone 14pro, and the default available memory of the program is also a little more than 3GB. Therefore, the available memory can be increased to about 4GB through the Increased Memory Limit before it can run. In addition, even if the memory usage is increased, the neural engine will still receive memory errors, which can only be reported with the GPU. iPad air5 does not have this error and can report both. The running speed of the GPU is a bit slower than the neural engine, but it is cool that a mobile phone can run diffusion locally.