OpenAI has extended the capabilities of its text-to-image software from two dimensions to three with the release of Point•E, an open source project that produces 3D images from text prompts.
The artificial intelligence research company has drawn attention to its DALL•E software, which, like the competing projects Stable Diffusion and Mid Road can generate realistic or fantasy images from descriptive text.
While Point shares the chip symbol used in OpenAI’s DALL•E brand, it relies on a different machine learning model called GLIDE. And currently, it’s not as capable. Given a textual directive like “a traffic cone”, Point•E produces a low-resolution point cloud (a set of points in space) that looks like a traffic cone.
The result is far from the quality of a commercial 3D rendering in a film or a video game. But it’s not meant to be. Point clouds represent an intermediate step – once introduced into a 3D application like Blender, they can be transformed into textured meshes that look more like familiar 3D images.
“While our method is still not state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample, providing a practical compromise for some use cases,” explain the OpenAI researchers. Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin and Mark Chen in a document [PDF] describing the project.
The point of Point•E is that it “efficiently generates point clouds” – that’s where the “E” comes from in this case. It can produce 3D models using just one to two minutes of GPU time, compared to state-of-the-art methods that require several GPU hours to create a finished render. It’s significantly faster than Google’s DreamFusion 3D text model – 600x by an estimate.
But Point•E is not a commercial project. This is fundamental research that can eventually lead to the rapid creation of 3D models on demand. With additional work, it can make virtual world creation easier and more accessible to those without professional 3D graphics skills. Or maybe it will help simplify the process of creating 3D printed objects – Point•E supports the creation of point clouds for use in product manufacturing.
“This has implications both when the models are used to create plans for hazardous objects and when the plans are trusted to be safe despite the lack of empirical validation,” observe the authors.
There are other potential issues that need to be addressed. For example, like DALL•E, Point•E is expected to contain biases inherited from its training data set.
And this dataset – several million 3D models and associated metadata of unspecified provenance – is provided without any guarantee that the source models have been used with permission or in accordance with applicable license terms. This could turn out to be a big headache, legally.
There is already a problem posted on the Point•E GitHub repository requesting more information on the dataset. Doyup Lee, a South Korean AI developer, observes, “I think many researchers are also curious about the details of the training data and the data collection process.”
The AI community’s cavalier attitude towards training machine learning models using other people’s work without explicit permission has already fueled an action for infringement against Github Copilot, a service that suggests programming code to developers using OpenAI’s Codex model. Text-image patterns can be tested in the same way because they to be marketed. ®