Adds "Upcast cross attention layer to float32" option in Stable Diffusion settings. This allows for generating images using SD 2.1 models without --no-half or xFormers.
In order to make upcasting cross attention layer optimizations possible it is necessary to indent several sections of code in sd_hijack_optimizations.py so that a context manager can be used to disable autocast. Also, even though Stable Diffusion (and Diffusers) only upcast q and k, unfortunately my findings were that most of the cross attention layer optimizations could not function unless v is upcast also.
This also handles type casting so that ROCm and MPS torch devices work correctly without --no-half. One cast is required for deepbooru in deepbooru_model.py, some explicit casting is required for img2img and inpainting. depth_model can't be converted to float16 or it won't work correctly on some systems (it's known to have issues on MPS) so in sd_models.py model.depth_model is removed for model.half().
Recently, the option to do latent upscale was added to txt2img highres
fix. This feature runs by scaling the latent sample of the image, and
then running a second pass of img2img.
But, in this edition of highres fix, the image and parameters cannot be
changed between the first pass and second pass. We might want to do a
fixup in img2img before doing the second pass, or might want to run the
second pass at a different resolution.
This change adds the option for img2img to perform its upscale in latent
space, rather than image space, giving very similar results to highres
fix with latent upscale. The result is not exactly the same because
there is an additional latent -> decoder -> image -> encoder -> latent
that won't happen in highres fix, but this conversion has relatively
small losses