Photo Lee Unkrich, one of Pixar’s most distinguished animators, as a seventh grader. He looks at the image of a locomotive on the screen of his school’s first computer. Wow, he thinks. Some of the magic wears off, however, when Lee learns the image didn’t appear simply by asking for “a picture of a train.” Instead, it had to be painstakingly coded and rendered by hard-working humans.
Now imagine Lee 43 years later, stumbling across DALL-E, an artificial intelligence that generates original artwork based on human-provided prompts that can literally be as simple as a “picture of a train.” “. As he types words to create image after image, the wow is back. Except this time, it’s not going away. “It’s like a miracle,” he said. said. “When the results came out I gasped and tears came to my eyes. It’s magic.
Our machines have crossed a threshold. All our lives we have been reassured that computers were incapable of being truly creative. Yet all of a sudden, millions of people are now using a new generation of AI to generate stunning, never-before-seen images. Most of these users aren’t, like Lee Unkrich, professional artists, and that’s the point: they don’t have to be. Not everyone can write, direct and edit an Oscar winner like Toy Story 3 Where coconutbut everyone box launch an AI image generator and grab an idea. What appears on screen is astoundingly realistic and in depth of detail. So the universal answer: Wow. On just four services – Midjourney, Stable Diffusion, Artbreeder and DALL-E – humans working with AI are now co-creating more than 20 million images every day. With a brush in hand, artificial intelligence has become a wow engine.
Because these surprise-generating AIs have learned their craft from billions of human-made images, their output hovers around what we expect the images to look like. But because it’s alien AI, fundamentally mysterious even to their creators, they’re restructuring the new footage in ways no human is likely to think of, filling in details that most of us we wouldn’t have the art of imagining, let alone the skills of executing. They can also be tasked with generating more variations of something we like, in the style we want, in seconds. This, ultimately, is their most powerful advantage: they can create new things that are comprehensible and comprehensible but, at the same time, completely unexpected.
In fact, these new AI-generated images are so unexpected that, in the silent awe that immediately follows the wow– another thought comes to mind of just about everyone who has encountered them: man-made art must now be finished. Who can match the speed, cheapness, scale and, yes, wild creativity of these machines? Is art another human pursuit that we must yield to robots? And the next obvious question: if computers can be creative, what else can they do that we’ve been told they can’t?
I’ve spent the past six months using AIs to create thousands of stunning images, often losing a night’s sleep in the endless quest to find just one more beauty hidden in the code. And after interviewing creators, power users, and other early adopters of these generators, I can make a very clear prediction: generative AI will change the way we design just about everything. Oh, and not a single human artist will lose their job because of this new technology.
It’s no overkill to call images generated using AI co-creations. The secret to this sobering new power is that the best applications of it are not the result of typing a single prompt, but very long conversations between humans and machines. The progress for each frame comes from many, many iterations, back and forths, detours and hours, sometimes days, of teamwork, all thanks to years of advancements in machine learning.
AI image generators were born from the marriage of two distinct technologies. One was a historic line of deep learning neural networks that could generate consistent realistic images, and the other was a natural language model that could interface with the image engine. The two have been combined into a language-based image generator. Researchers scavenged all images with adjacent text, such as captions, from the Internet and used billions of these examples to connect visual shapes to words and words to forms. With this new combination, human users could enter a string of words – the prompt – describing the image they were looking for, and the prompt would generate an image based on those words.