Skip to main navigation Skip to search Skip to main content

Pix2Vox: Context-aware 3D reconstruction from single and multi-view images

  • Harbin Institute of Technology
  • SenseTime Group Limited
  • Peng Cheng Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.

Original languageEnglish
Title of host publicationProceedings - 2019 International Conference on Computer Vision, ICCV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2690-2698
Number of pages9
ISBN (Electronic)9781728148038
DOIs
StatePublished - Oct 2019
Externally publishedYes
Event17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of
Duration: 27 Oct 20192 Nov 2019

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499

Conference

Conference17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
Country/TerritoryKorea, Republic of
CitySeoul
Period27/10/192/11/19

Fingerprint

Dive into the research topics of 'Pix2Vox: Context-aware 3D reconstruction from single and multi-view images'. Together they form a unique fingerprint.

Cite this