Scene Recomposition by Learning-based ICP

Hamid Izadinia Steven M. Seitz

Given an RGBD sequence from a moving camera, we produce a 3D CAD recomposition of the scene. While a fused reconstruction (top) contains holes and noisy geometry, our recomposition (bottom) models the scene as a set of high quality 3D shapes from CAD databases.


By moving a depth sensor around a room, we compute a 3D CAD model of the environment, capturing the room shape and contents such as chairs, desks, sofas, and tables. Rather than reconstructing geometry, we match, place, and align each object in the scene to thousands of CAD models of objects. In addition to the end-to-end system, the key technical contribution is a novel approach for aligning CAD models to 3D scans, based on deep reinforcement learning. This approach, which we call Learning-based ICP, outperforms prior ICP methods in the literature, by learning the best points to match and conditioning on object viewpoint. LICP learns to align using only synthetic data and does not require ground-truth annotation of object pose or keypoint pair matching in real scene scans. While LICP is trained on synthetic data and without 3D real scene annotations, it outperforms both learned local deep feature matching and geometric based alignment methods in real scenes. Proposed method is evaluated on publicly available real scenes datasets of SceneNN and ScanNet as well as synthetic scenes of SUNCG. High quality results are demonstrated on a range of real world scenes, with robustness to clutter, viewpoint, and occlusion.

Keywords: 3D scene recomposition, learning-based ICP (LICP), Deep reinforcement learning (DeepRL), 3D geometry learning, 3D CAD models, 3D shapes, room layout estimation, 3D geometry network, Iterative Closest Point (ICP), scene reconstruction, noisy scan.

Real Scene Shape Alignment Results

Qualitative examples of the recomposed CAD model of the scene. Each example shows a view of the camera in the scanned scene on left and recomposed CAD from the same view on right. Our method can successfully recompose cluttered scenes with lots of distractor objects (first row) and huge amount of occlusions in scenes populated with many furniture objects and in confined spaces (second and third Row). Less accurate CAD recomposition can occur due to ambiguous extent of scanned meshes with nearby objects (bottom row, right), or lack of discriminative shape features in different views (cabinet in bottom row, middle).

Top Retrieved 3D CAD Models

Top retrieved CAD models for each object instance segmentation as query. Point cloud query is color-coded with surface normal.

Learned Weight Surface Point Visualization

Visualization of the learned weights (right) for different samples and various query scan viewpoints (left). For visualization, the learned weights are shown from four different views of the reference CAD model. Weight values are color-coded from low (blue) to high (red). The learned weights are conditioned on the viewpoint of the query scan and reflect the contribution of each surface point in the inference. The first two rows show that the surface points of the same reference CAD model are assigned with different weights depending on the queried scan viewpoint.

Real Scene Recomposition Results

Scene recomposition using our proposed end-to-end and fully automatic method. Scene recomposition is shown for three different scenes. In each scene, the top row shows the top-down view of the scene; the middle and bottom rows demonstrate two close-up views of each scene. Camera location and pose is color coded on top-down view.


      title={Scene Recomposition by Learning-based ICP},
      author={Izadinia, Hamid and Seitz, Steven M},