Innovate Futures @ Benji

[New Diffusion Model] Sana By NVIDIA

Added 2024-10-25 18:55:14 +0000 UTC

Hello Everyone!

I have not much time to make another video about this new AI model, and the code has not release yet.

But I see this one has some potential.

Sana-0.6 https://nvlabs.github.io/Sana/

They are testing in A1111 !

Here's some highlighted of this model :

1 - Created by Nvidia

Who does understand more about GPU than the maker themselves?

The model use linear attention with DiT, which make a difference compared with other model such as Flux, SD 3.5.

2 - Text Encoder

It "replaced T5 with modern decoder-only small LLM as the text encoder. "

That able to handle better natural language text prompt basically.

3- Image Size

Max. it can generate up to 4096 px.

And Sana it is only 1.6B , and 0.6B size, and it can generate up to 4K image.

4 - Deep Compression Autoencoder

From the page

" Deep Compression Autoencoder: We introduce a new Deep Compressinon Autoencoder (DC-AE) that aggressively increases the scaling factor to 32. Compared with AE-F8, our AE-F32 outputs 16× fewer latent tokens, which is crucial for efficient training and generating ultra-high-resolution images, such as 4K resolution."

From here, I see the most important point . it is fewer latent tokens compare with other DiT image model. That means more efficiency in processing.

The MIT hosted a demo page for testing this model: https://sana-gen.mit.edu/

You can check it out.

Very exciting and looking forward to the model weights and code.

Have Fun :)

Ben