The Joy and Fear of Limitless AI Image Generators

Image generators like Stable Diffusion can create what looks like real photographs or hand-drawn illustrations of just about anything a person can imagine. This is possible thanks to algorithms that learn to associate the properties of a large collection of images extracted from the web and image databases with their associated text labels. Algorithms learn to render new images to match a text prompt in a process that involves adding and removing random noise to an image.

Because tools like Stable Diffusion use images pulled from the web, their training data often includes pornographic images, which makes the software capable of generating new sexually explicit images. Another concern is that these tools could be used to create images that appear to show a real person doing something compromising, something that could spread false information.

The quality of AI-generated imagery has skyrocketed over the past year and a half, starting with the January 2021 announcement of a system called DALL-E by AI research firm OpenAI. It popularized the model of generating images from text prompts and was followed in April 2022 by a more powerful successor, DALL-E 2, now available as a commercial service.

From the start, OpenAI limited access to its image generators, providing access only through a prompt that filters what can be requested. The same goes for a competing service called Midjourney, launched in July this year, which has helped popularize AI-created art by being widely accessible.

Stable Diffusion is not the first open source AI art generator. Shortly after the release of the original DALL-E, a developer built a clone called DALL-E Mini which was made available to everyone and quickly became a meme-making phenomenon. DALL-E Mini, later renamed Craiyon, still includes similar guardrails as the official DALL-E versions. Clément Delangue, CEO of HuggingFace, a company that hosts many open-source AI projects including Stable Diffusion and Craiyon, says it would be problematic if the technology was only controlled by a few large companies.

“If you look at the long-term development of technology, making it more open, more collaborative and more inclusive is actually better from a security perspective,” he says. Closed technology is more difficult for outside experts and the public to understand, he says, and better if outsiders can assess models for issues such as race, gender or age bias. ; furthermore, others cannot rely on closed technology. Overall, he says, the benefits of open source technology outweigh the risks.

Delangue points out that social media companies could use Stable Diffusion to create their own tools to spot AI-generated images used to spread misinformation. He says the developers have also contributed to a system for adding invisible watermarks to images created using Stable Diffusion so that they are easier to trace, and have created a tool to find particular images in data. training the model so that the problematic ones can be removed.

After becoming interested in Unstable Diffusion, Simpson-Edin became a moderator on Unstable Diffusion Discord. The server prohibits people from posting certain types of content, including images that could be construed as underage pornography. “We can’t moderate what people do on their own machines, but we’re extremely strict with what’s posted,” she says. In the short term, containing the disruptive effects of AI-based artistic creation may depend more on humans than machines.

Leave a Reply

%d bloggers like this: