My experience with SDXL 0.9 so far

karurochari@lemmy.dbzer0.com · edit-2 1 year ago

Yes, I had to tune it down as well.
I actually ended up with a different workflow from that which was suggested, as I think it is a bit too wasteful. Instead of generating the full image and using latent2latent to introduce new noise from the final version, I stop the generation at an intermediate step and finish it with the refiner model. I did it in the past to combine different sd1.5 checkpoints, and it does work here as well, since the latent space is shared across the two models.

I added an image with the alternative workflow in case someone wants to try it (hopefully metadata are preserved).

karurochari@lemmy.dbzer0.com · 1 year ago

I am going to check that, thank you for the feedback!

karurochari@lemmy.dbzer0.com · edit-2 1 year ago

Yes, I think this kind of “explorative evaluation” would not be possible in automatic1111.
From what I recall, it does not really give much control over the generation pipeline to the final user.
Admittedly it has been a while since I last used it, and I have no idea how good of flexible the SDXL integration is.

From what I understand both models could be trained

Yes, that is also my understanding.
Compared to the original SD1.5 it has so much potential for further extension, I am also confident many of these issues can be ironed out by the community.
And out of the box, base SDXL is very much better than base SD1.5, I am quite positive about that 😄 .

No, I never used the bot on discord.
As for the prompts, I still need to really understand how to best write them.
So far, I mostly used the same style I adopted in SD1.5 (without the Danbooru tags since they are clearly not supported).
I tried to be a bit more “expressive” but I have not really seen much of an improvement.
And words are still “bleeding”, so red eyes will often generate extremely red lips, or red clothes.

karurochari@lemmy.dbzer0.com · 1 year ago

My experience with SDXL 0.9 so far

karurochari@lemmy.dbzer0.com · 1 year ago

I have two ways to do that:

I bought few dolls specifically for that. I just take shots with my camera and use them in controlnet.
If you lean the basics of blender, there are posable models available which will generate the skeleton, and depth map for hands and feet automatically. The advantage is that those are not reconstructed from the image, but exact, which avoids errors of misclassification.

karurochari@lemmy.dbzer0.com · 1 year ago

The final image is actually quite cool! But yes, I think we could use a LoRa to direct the granularity of the details being generated, which is progressively scaled down as we upscale to normalize the generation a bit. This would allow to keep higher denoise ratios and avoiding the “fractal” generation.

karurochari@lemmy.dbzer0.com · 1 year ago

Thank you for the feedback :) !

karurochari@lemmy.dbzer0.com · 1 year ago

My experience with SDXL 0.9 so far

My experience with SDXL 0.9 so far

Analog camera

Analog camera