Flux, is it better than Midjourney?

Once again, open source catches up to closed source

Aug 17, 2024

This month a new player entered the image generation arena, Black Forest Labs released an open model called Flux.1. From early reports, this is better than the other players such as Stable Diffusion, Midjourney, and Dall-E. In fact, xAI/Twitter is already using Flux as its image generation model behind the scenes to show off how uncensored it can be. We also have many of the new AI videos using Flux as the starter image before animating them to some wild results (maybe this is next month’s post?). This is awesome, with a new player in the arena, the AI image generation space is getting more competitive than ever.

Flux can be downloaded and used with ComfyUI, and comes in three flavors:

FLUX.1-pro, which is their best model, and gated for people to pay for
FLUX.1-dev, which is an open weight model (still non-commercial), but more efficient (somehow the same size)
FLUX.1-schnell, which is the faster model for local use and open under Apache 2.0

Strangely with all these performance enhancements, all these models are the same size (roughly 24GB), for the purposes of playing with this, I’ll try out Dev which is supposedly better

To download Flux Dev, you can check out the HuggingFace repo, thanks to this video it pointed me to the right files where you need all of the Model, encoders, and VAE. This will take up a lot of space.

This works with ComfyUI, which you can download here

The model files go into the models/unet folder, the encoders go into the models/clip folder, and the VAE goes into the models/vae folder

After you have everything in place and open up the ComfyUI in your browser, download this JSON file and drag it into the window. Here you can select the right model, encoder, and VAE:

You can also set the BasicScheduler to sgm_uniform:

Now there’s on major red flag here, the 24GB model means that it will take an entire 4090’s 24GB, which means you have to literally abandon all other usage of the Graphics card (for example, using an intel chip’s video output). The 5090 cannot come out soon enough…but there’s some other options here, you can select the schnell model or you can select fp8 to make it go faster. It takes me about 24 seconds per picture on a 4090

Now let’s get testing on some photos, one thing to really push here is the use of text, hands, and uncensored celebrities

“Retro 80s movie poster with the words on top "Codeium: Coming to an IDE near you" with a bunch of cute robots all giving thumbs up”:

Notice it got the text right on, that’s awesome. I might as well make a few other covers with some other projects, such as the upcoming Floating Points podcast from Codeium, “A modern futuristic album cover with the words titled "Floating Points: A Podcast by Codeium", with a teal green theme”:

Okay so if copyright is not an issue, I used a prompt with two well known characters “Mickey Mouse and Snoopy the dog getting into a boxing match, cartoon style”, notice it isn’t quite there with the boxing gloves, maybe because Mickey Mouse already has gloves on:

It doesn’t seem to do as well with celebrities, here’s a prompt: “A press conference where Mark Zuckerberg and Elon Musk announce they are running for president”, notice it isn’t that detailed with celebrity faces yet (though there is a resemblance):

It does seem like with some extra settings, you can get celebrities to look closer to their look, and you can use extensions like this one to set it

So for me, Flux would be the perfect model to create posters for different projects:

I can’t wait to be able to plug this model into some of the other plug-ins and extensions (such as ControlNet), and see what the open source community does for Flux. Right now having Flux vs. Dall-e vs. Stable Diffusion vs. Midjourney is really showing how fast things are moving, but the worry is that it could also be a race to the bottom if everything gets so good that people do not need to pay for them.

For now a few more fun ones to show the detail and actually “lack of AI cues”:

AI Relevance - Jeff's Substack

Discussion about this post