Tips for playing Core ML Stable Diffusion on Mac I originally wanted to write an article on how to play Core ML Stable Diffusion on an Apple Silicon Mac. Later, I found that the original document is very clear, so I think I should not repeat it. Instead, I should write some experience and some tips. . Let me talk about the important point first: Although Apple wrote on GitHub that the M1 Ultra 48 GPU version with the same specifications as mine can run the result in 13 seconds, but that is the running score of the "drawing part". Before drawing, the model must be loaded first, and then there is Actions such as sampling also take time, and these loading actions themselves may take more time than "drawing pictures", so I tested it in this article, and then sorted out some tips to make the execution faster. It can also save everyone from detours. The following test environment: macOS 13.1 Beta 4(22C5059b) Mac Studio(M1 Ultra 20 CPU,48 GPU,64GB RAM) Stable Diffusion v1.4 Prompt 固定為 “a high quality photo of an astronaut riding a horse in space” Seed fixed use 13 Execution on macOS 13.1 and above Although macOS 13.0.1 can also be executed, Apple has optimized Stable Difussion in macOS 13.1. More importantly, the GPU cannot be called on macOS 13.0.1, and the generated picture will become a whole gray. It can only be used by the CPU and Apple's Neural Engine (ANE for short), and the performance is much worse. Use converted Checkpoints Don't waste time to manually convert the model according to the original instructions on GitHub, because it takes a long time and a lot of memory (on M1 Ultra, it took me more than half an hour to convert Stable Diffusion 2 Base, using 40GB RAM), you can download it here Already converted checkpoint. Available pretransition checkpoints include: Stable Diffusion v1.4 Stable Diffusion v1.5 Stable Diffusion v2 base The download requires git-lfs, which needs to be installed with homebrew. After the download is complete, you will see two versions: original and split_einsum, original is for CPU and GPU, and split_einsum can be used for ANE. Each folder is divided into packages and compiled, packages are for python, and compiled is for swift. Some models have three files: unet.mlpackage, unet_chunk1.mlpackage, and unet_chunk2.mlpackage. On the computer, we do not need to divide the files to reduce resource consumption, so we can directly cut off unet_chunk1 and unet_chunk2 to reduce the occupied space. Even if you use the converted checkpoint, you will still download a bunch of things when you use it for the first time, which is normal. The Swift version is usually faster While raw file writing is faster on a Mac in Python, the problem remains the same, and that doesn't include model load times. When actually using Python, the loading efficiency is very poor because each time the model is loaded, it needs to be converted once at runtime. On the contrary, Swift runs faster because there are fewer conversion processes.
Forum Role: Participant
Topics Started: 0
Replies Created: 0