Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ControlNet devs are already talking about bringing similar features to Automatic1111.

https://github.com/Mikubill/sd-webui-controlnet/discussions/...

So far it seems that the OSS diffusion models + tooling that we can run locally keep being state of the art. It makes me so happy.

(Unlike say LLMs where GPT-4 is clearly dominating for now.)



Thanks. I was going to ask how this is different from the existing inpaint/outpaint pipelines, and this explains perfectly.

Since we have capable local hardware, I'll propose this as an alternative once we get an estimate of our Firefly costs.


> (Unlike say LLMs where GPT-4 is clearly dominating for now.)

A lot of this comes from people comparing GPT-4 to e.g. LLaMA-7B, because that's the thing that fits in memory on their laptop. Whereas you can run LLaMA-65B, and it's dramatically better, but it uses about 128GB of RAM and the hardware needed to run it fast is expensive.

And GPT-4 has even more parameters than that, but that's not a matter of the tooling, it's that someone needs to release a public model with more parameters.


That's part of the point though. I get better results from stable diffusion on my PC than out of DALL-E 2. (I still have some credits there, but little reason to use them.)

I can't do that with LLaMA-65B. (Although to be fair 128 GB RAM is not that much.) But I suspect it's still far less capable than GPT-4, is it not?


It depends what you're trying to get it to do. There are some prompts where the expected output is a piece of code or a paragraph containing some information, and once you reach the threshold where the code works or the information is correct, there isn't a lot of "better" left to get.

Then there are ones where more parameters matters.

Conversely, LLaMA doesn't say "I'm sorry Dave, I'm afraid I can't do that."


What is Automatic1111? Google didn't really help that much, seemed like a whole lot of inside baseball


It's a popular GUI for stable diffusion models with many extensions. Like the sibling comment points out, everyone calls it like that because that's the handle of the original maintainer. (As in: Which web ui? Auto11's.)




It would be cool to see that become a GIMP plugin next, I think that would be more of a direct alternative to the workflow using generative fill within photoshop.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: