Nvidia shrinks AI image generation method to the size of a WhatsApp message

Perfusion, Nvidia's solution for high storage demands of AI image generation

Nvidia researchers have developed a new AI image generation technique that enables highly customized text-to-image models with minimal storage requirements.

According to a paper published on arXiv, the proposed method, called "Perfusion," can add new visual concepts to existing models, using only 100KB of parameters per concept.

Source: Nvidia Research

As the paper's authors describe, Perfusion works by "making small updates to the internal representation of the text-to-image model."

More specifically, it makes carefully calculated changes to the part of the model that connects textual descriptions to the generated visual features. Applying small parametric edits to the cross-attention layer allows Perfusion to modify the way textual input is converted to images. .

So Perfusion didn't completely retrain the text-to-image model from scratch. Instead, it slightly tweaks the mathematical transformations that turn text into images. This allows it to customize the model to generate new visual concepts without requiring much computing power or model retraining.

The perfusion method requires only 100kb.

Perfusion achieves these results with two to five orders of magnitude fewer parameters than competing techniques.

While other methods can require hundreds of megabytes to gigabytes of storage per concept, Perfusion requires only 100KB, comparable to a small image, text, or WhatsApp message.

This drastic reduction could make it more feasible to deploy highly customized AI art models.

According to co-author Gal Chechik,

"Infusion not only enables more accurate personalization at a fraction of the model size, but also enables the use of more complex cues and the incorporation of individually learned concepts at inference time."

The method can use the individually learned notions of "teddy bear" and "teapot" to generate creative images such as "a teddy bear sailing in a teapot".

Source: Nvidia Research

Possibility of efficient personalization

Perfusion's unique ability to personalize AI models using only 100KB per concept opens up countless potential applications:

This approach paves the way for individuals to easily customize text-to-image models with new objects, scenes, or styles, thereby eliminating the need for costly retraining. Perfusion's efficiency of 100KB parameter updates per concept allows models customized using the technology to be implemented on consumer devices, enabling on-device image creation.

One of the most compelling aspects of this technology is the potential it offers for sharing and collaboration around AI models. Users can share their personalized concepts as small additional files, avoiding sharing tedious model checkpoints.

In terms of distribution, models tailored to specific organizations can be more easily disseminated or deployed at the edge. As the practice of text-to-image generation continues to become more mainstream, the ability to achieve such dramatic size reductions without sacrificing functionality will be critical.

It's worth noting, however, that Perfusion primarily provides model personalization rather than full generative capabilities itself.

Restrictions and releases

While promising, the technique does have some limitations. The authors point out that key choices during training can sometimes overgeneralize a concept. More research is still needed to seamlessly combine multiple personalized ideas into a single image.

The authors note that Perfusion's code will be available on their project page, indicating an intention to publicly release the method in the future, possibly pending peer review and official research publications. However, since the work is currently only published on arXiv, the exact details of public availability remain unclear. On this platform, researchers can upload papers before formal peer review and publication in journals/conferences.

While Perfusion's code has yet to be accessed, the authors' proposed plans mean that such highly efficient, personalized AI systems could, in due course, find their way into the hands of developers, industry, and creators.

With the development of AI art platforms such as MidJourney, DALL-E 2, and Stable Diffusion, techniques that allow for greater user control could be critical for real-world deployment. With neat efficiency improvements like Perfusion, Nvidia seems determined to maintain its edge in a rapidly evolving environment.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)