The next breakthrough to take the AI world by storm could be 3D model generators. This week, OpenAI released Point-E, a machine learning system that creates a 3D object from a text prompt. According to a paper published alongside code basePoint-E can produce 3D models in one to two minutes on a single Nvidia V100 GPU.
Point-E does not create 3D objects in the traditional sense. Instead, it generates point clouds or discrete sets of data points in space that represent a 3D shape – hence the cheeky abbreviation. (The “E” in Point-E is short for “efficiency,” as it is apparently faster than previous 3D object generation approaches.) Point clouds are easier to synthesize from a computationally, but they don’t capture the fine grain of an object. shape or texture – a key limitation of Point-E currently.
To work around this limitation, the Point-E team trained an additional AI system to convert Point-E’s point clouds into meshes. (Meshes — the collections of vertices, edges, and faces that define an object — are commonly used in 3D modeling and design.) But they note in the paper that the model can sometimes miss parts of objects. , resulting in blocky or distorted shapes.
Apart from the mesh generation model, which is autonomous, Point-E consists of two models: a text-image model and an image-3D model. The text-image model, similar to generative art systems like OpenAI’s DALL-E 2 and Steady broadcast, was trained on labeled images to understand associations between words and visual concepts. The 3D image model, on the other hand, was fed a set of images associated with 3D objects so that it learned to translate efficiently between the two.
When given a text prompt – for example, “a 3D printable gear, a single gear 3 inches in diameter and half an inch thick” – Point-E’s text-image model generates a synthetic rendered object that is sent to the Image-to-3D Model, which then generates a point cloud.
After training the models on a dataset of “several million” 3D objects and associated metadata, Point-E could produce colored point clouds that frequently matched text prompts, according to OpenAI researchers. . It’s not perfect – Point-E’s 3D image model sometimes fails to understand the text-to-image model image, resulting in a shape that doesn’t match the text prompt. Still, that’s orders of magnitude faster than the previous state-of-the-art – at least according to the OpenAI team.
“Although our method performs worse on this assessment than state-of-the-art techniques, it produces samples in a small fraction of the time,” they wrote in the paper. “That might make it more practical for some applications, or might allow discovery of higher quality 3D objects.”
What are the applications, exactly? Well, the OpenAI researchers point out that point clouds from Point-E could be used to make real-world objects, for example through 3D printing. With the additional mesh conversion model, the system could – once it is a bit more refined – also find its way into game and animation development workflows.
OpenAI may be the latest company to jump into the 3D object generator fray, but – as mentioned earlier – it’s certainly not the first. Earlier this year, Google released DreamFusion, an expanded version of Dream Fields, a generative 3D system that the company unveiled in 2021. Unlike Dream Fields, DreamFusion requires no prior training, which means it can generate 3D representations of objects without 3D data.
With all eyes on 2D art generators right now, model synthesis AI could be the next big industry disruptor. 3D models are widely used in film and television, interior design, architecture and various scientific fields. Architectural firms use them to demonstrate proposed buildings and landscapes, for example, while engineers use models to design new devices, vehicles and structures.
However, 3D models usually take a long time to create, between several hours and several days. An AI like Point-E could change that if the issues are ever fixed, and make OpenAI a respectable profit doing so.
The question is what type of intellectual property disputes might arise over time. There is a large market for 3D models, with several online marketplaces, including CGStudio and CreativeMarket, allowing artists to sell the content they have created. If Point-E catches on and its models hit the markets, model artists might protest, pointing to evidence that modern generative AI borrows heavily from its training data — existing 3D models, in the case of Point-E. Like DALL-E 2, Point-E does not credit or quote any of the artists who may have influenced its generations.
But OpenAI leaves that problem for another day. Neither the Point-E document nor the GitHub page makes any mention of copyright.
To their credit, researchers To do mention that they expect Point-E to suffer from other issues, such as biases inherited from training data and lack of safeguards around models that could be used to create “dangerous objects”. Perhaps that’s why they’re careful to characterize Point-E as a “starting point” that they hope will inspire “further work” in the field of text-3D synthesis.