[New Diffusion Model] Sana By NVIDIA
Added 2024-10-25 18:55:14 +0000 UTCHello Everyone!
I have not much time to make another video about this new AI model, and the code has not release yet.
But I see this one has some potential.
Sana-0.6 https://nvlabs.github.io/Sana/
They are testing in A1111 !
Here's some highlighted of this model :
1 - Created by Nvidia
Who does understand more about GPU than the maker themselves?
The model use linear attention with DiT, which make a difference compared with other model such as Flux, SD 3.5.
2 - Text Encoder
It "replaced T5 with modern decoder-only small LLM as the text encoder. "
That able to handle better natural language text prompt basically.
3- Image Size
Max. it can generate up to 4096 px.
And Sana it is only 1.6B , and 0.6B size, and it can generate up to 4K image.
4 - Deep Compression Autoencoder
From the page
" Deep Compression Autoencoder: We introduce a new Deep Compressinon Autoencoder (DC-AE) that aggressively increases the scaling factor to 32. Compared with AE-F8, our AE-F32 outputs 16× fewer latent tokens, which is crucial for efficient training and generating ultra-high-resolution images, such as 4K resolution."
From here, I see the most important point . it is fewer latent tokens compare with other DiT image model. That means more efficiency in processing.
The MIT hosted a demo page for testing this model: https://sana-gen.mit.edu/
You can check it out.
Very exciting and looking forward to the model weights and code.
Have Fun :)
Ben
Comments
Haven't use it long time, how dead is it now?
Benjamin Law
2024-10-29 02:47:28 +0000 UTCOr they actually mean ForgeUI? Because a1111 is dead soon
Peter Gašparík
2024-10-29 02:08:22 +0000 UTC