Future video games, movies, mixed reality, telepresence, and the “metaverse” will rely heavily on human avatars. We need to accurately recreate detailed 3D characters from color images taken in the field to create realistic, personalized avatars at scale. Due to the difficulties encountered, this issue has yet to be resolved. People dress differently, accessorize differently and adopt diverse and often innovative postures. A decent reconstruction technique should accurately capture them while resisting creative outfits and positions. These techniques require a more precise understanding of the anatomy of the human body and therefore tend to overfit the positions observed in the training data.
As a result, people frequently create distorted forms or disembodied limbs for images of unknown positions; see the second row of Figure 1. The third and fourth rows of Figure 1 show how tracking work regularizes the SI using a prior shape provided by an explicit body model to account for these artifacts, however, this may restrict applicability to new garments while toning down shape details. In other words, robustness, generality, and detail can all be compromised. The robustness of explicit anthropomorphic body models and the adaptability of IF to capture various topologies is what we want, however.
In light of this, we notice two important facts: (1) Inferring 3D geometry with comparable accurate features is still difficult, although it is very straightforward to infer detailed 2D normal maps from color photographs. By using networks, we can precisely imply “geo-aware” 2D maps that we can lift into 3D. (2) It is possible to think of a body model as a low-frequency “canvas” that “guides” the stitching of finely detailed surface parts. We are creating ECON, a revolutionary technique for “explicit dressed people obtained from normals”, with these considerations in mind. An RGB image and an inferred SMPL-X body are the inputs for ECON. Then it outputs a 3D person wearing free-form clothing with an advanced degree of detail and robustness (SOTA).
Step 1: Normal reconstruction of the front and back. Using a conventional image-to-image translation network, we predict normal maps of front- and back-clothed humans from the input RGB image, subject to the body estimate.
2nd step: Reconstruction of the front and rear faces. To create accurate and consistent two-sided 3D surfaces, MF, MB, we use the previously predicted normal maps and the corresponding depth maps produced from the SMPL-X mesh. To do so, we extend the recently published BiNI method and create a new optimization strategy to achieve three goals for the resulting surfaces:
- Their high frequency components are in keeping with normal dressed human.
- Their low frequency components and discontinuities match those of the SMPL-X.
- The depth values on their silhouettes are consistent with each other and consistent with SMPL-X based depth maps.
The occluded and “contoured” sections of the two exit surfaces, MF and MB, lack geometry, making them detailed but incomplete.
Step 3: Complete the 3D form. The SMPL-X mesh and the two d-BiNI surfaces, MF and MB, are the two inputs of this module. The goal is to “paint” the missing geometry. Existing solutions struggle to solve this problem. On the one hand, the Poisson reconstruction naively “fills” the gaps without taking advantage of a prior shape distribution, resulting in “blobby” shapes.
However, data-driven methods need help with missing parts related to (self-)occlusion and lose the information available in the provided high-quality surfaces, resulting in degenerate geometries. We overcome the above restrictions in two steps: (1) For SMPL-X to regularize the “filling” of the form, we develop and retrain the IF-Nets to be packaged on the SMPL-X body. The triangles close to MF and MB are ignored, while the remaining triangles are kept as “filler patches”. (2) Using the Poisson reconstruction, we join the front and back surfaces together with the “filling patches”; note that the gaps between them are small enough for a universal technique.
ECON combines the best features of explicit and implicit surfaces to produce solid and detailed 3D reconstructions of clothed people. As seen at the bottom of Figure 1, the result is a complete 3D shape of a clothed person. We rate ECON using real-world photos and well-known benchmarks (CAPE, Renderpeople). According to a quantitative study, ECON outperforms SOTA. Qualitative results show that ECON generalizes more effectively than SOTA to a wide range of positions and clothes, even when the topology is extremely loose or complicated. This is backed up by perceptual research, showing that ECON is highly favored over rivals in awkward positions and loose clothing when competing against PIFuHD in fashion photography. The code and templates are accessible on GitHub.
Check Paper, Codedand Project. All credit for this research goes to the researchers on this project. Also don’t forget to register. our Reddit page and discord channelwhere we share the latest AI research news, cool AI projects, and more.
Aneesh Tickoo is an intern consultant at MarktechPost. He is currently pursuing his undergraduate studies in Data Science and Artificial Intelligence at Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He enjoys connecting with people and collaborating on interesting projects.