ACCV 2016 Tutorial: Large-scale 3D Reconstruction from Images

Tianwei Shen, Jinglu Wang, Tian Fang, Long Quan

Hong Kong University of Science and Technology (HKUST)



Course Descriptions

Modeling the world from 2D images has long been a hot topic in computer vision research over the years. With the aid of ubiquitous mobile computing and unmanned aerial vehicles (UAVs), the image capturing process has never been easier. As a result, the scale of 3D reconstruction has increasingly become large due to the advent of big data era. This tutorial covers a wide range of topics regarding large-scale 3D reconstruction, and will be basically composed of two parts. The first focus is in the large-scale Structure-from-Motion (SfM) problem. A brief overview will be firstly presented about the prerequisite knowledge in multi-view geometry. Then we will discuss about the recent trend of optimizing match graph and robust optimization methods in large-scale SfM. The second part of this tutorial addresses the issues in large-scale multi-view stereo (MVS), with the focus of a state-of-the-art large-scale 3D reconstruction pipeline. Finally, we will introduce state-of-the-art methods for 3D semantic segmentation and remodeling. Lessons can be learned both in terms of the state-of-the-art SfM and MVS techniques, as well as large-scale system design.


Time and Location

9:00 am 12:30 am (3 hours), Nov 24, 2016

Room 101A, Taipei International Convention Center (TICC)


Target Audience

This tutorial is relevant for the following audience:

      university graduate students, researchers and faculty interested in exploring the problems of large-scale 3D reconstruction

      industry engineers that work on computer vision, computational photography, or image processing.

The expected audience should have basic knowledge of probability and linear algebra. Some knowledge of camera models and multi-view geometry would be helpful but not necessary since this course is self-contained.


Lecture Materials

The tutorial is divided into three parts:

Part 0 The fundamentals of 3D Computer Vision Revisited

Part 1 Large-Scale Structure-from-Motion: A Modern Synthesis

Part 2 High-quality Textured Surface Reconstruction from Registered Images: State-of-the-art methods

Part 3 Urban Scene Segmentation, Recognition and Remodeling


Lecturers Biography

Long Quan, Full Professor, HKUST <>

Long Quan received the Ph.D. degree in Computer Science from INPL, France, in 1989.  Before joining the Department of Computer Science at Hong Kong University of Science and Technology (HKUST) in 2001, he has been a French CNRS senior research scientist at INRIA in Grenoble. His research interests focus on 3D Reconstruction, Structure from Motion, Vision Geometry, and Image-based Modeling. He has served as an editor for journals like PAMI and IJCV, as well as the general chair or program chair for CVPR, ICCV, and ECCV. He is the Director of the HKUST Center for Visual Computing and Image Science, and an IEEE Fellow of the Computer Society.


Tianwei Shen, Ph.D. Candidate, HKUST <>

Tianwei Shen is a Ph.D. candidate in the Department of Computer Science & Engineering, Hong Kong University of Science and Technology, advised by Prof. Long Quan. Before coming to HKUST, he obtained the bachelor degree from Peking University in 2014, double major in machine intelligence, EECS and psychology. His research interests include large-scale Structure-from-Motion, graph analysis in 3D reconstruction and related optimization problems.


Jinglu Wang, Post-doc Researcher, HKUST <>

Jinglu Wang is currently a post-doctoral researcher in Computer science and Engineering at The Hong Kong University of Science and Technology where she received her PhD degree in 2016 supervised by Prof. Long Quan. Before that, she received the Bachelor degree in Computer Science from Fudan University in 2011. Her research interests include 3D reconstruction, image-based modeling and scene parsing.


Tian Fang, Research Assistant Professor, HKUST <>  (Dr. Tian Fang cannot come because of personal issues.)

Tian Fang received the bachelors and masters degrees in computer science and engineering from the South China University of Technology, China, in 2003 and 2006, respectively, and the PhD degree in computer science and engineering from the Hong Kong University of Science and Technology (HKUST) in 2011. He is currently a research assistant professor in HKUST. His research interests include large-scale image-based modeling, mesh vectorization, image segmentation, recognition, and photo-realistic rendering.


Topics and Schedules


Covered topics

Related Reference

The Fundamentals of 3D Computer Vision Revisited

      Basic projective geometry

      Fundamental/essential matrix

      Epipolar geometry

[18, 19]

Large-Scale Structure-from-Motion: A Modern Synthesis

      Track organization

      Camera resection

      Robust bundle adjustment

      Global/incremental methods

[1 7, 17]

see Part1 for details

High-quality Textured Surface Reconstruction

      Multi-view Stereo

      Surface generation

      Surface refinement

      Texture mapping

[8 - 15]

see Part2 for details

Urban Scene Segmentation, Recognition and Remodeling

      Semantic segmentation

      Faade recognition

      Object Remodeling

[16, 20 - 23]

See Part3 for details



Relation to previous tutorials in CVPR/ICCV/ECCV/ACCV

3D reconstruction is a broad topic with much attention over the years. Our tutorial has partially overlapped with or related to the following recent tutorials:

      Robust Optimization Techniques in Computer Vision (ECCV 2014)

      Open Source Structure-from-Motion (CVPR 2015)

      State of the art 3D reconstruction techniques: Very large scale 3D reconstruction and the role of priors (CVPR 2014)

      Dense Image Correspondences for Computer Vision (CVPR 2014)


Selected Involved Publications


[1] Snavely, N., Seitz, S. M., & Szeliski, R. (2008). Modeling the world from internet photo collections. IJCV, 80(2), 189-210.

[2] Moulon, P., Monasse, P., & Marlet, R. (2013). Global fusion of relative motions for robust, accurate and scalable structure from motion. ICCV.

[3] Fang, T., & Quan, L. (2010). Resampling structure from motion. ECCV 2010, 1-14.

[4] Zhu, S., Fang, T., Zhang, R., & Quan, L. (2014). Multi-view geometry compression. ACCV 2014 (pp. 3-18).

[5] Zach, C., Klopschitz, M., & Pollefeys, M. (2010). Disambiguating visual relations using loop constraints. CVPR 2010.

[6] Wilson, K., & Snavely, N. (2013). Network principles for sfm: Disambiguating repeated structures with local context. ICCV 2013 (pp. 513-520).

[7] Tianwei Shen, Siyu Zhu, Tian Fang, Runze Zhang, Long Quan. Graph-Based Consistent Matching for Structure-from-Motion. ECCV 2016.


Dense Reconstruction:

[8] Lhuillier, M., & Quan, L. (2005). A quasi-dense approach to surface reconstruction from uncalibrated images. PAMI, 27(3), 418-433.

[9] Furukawa, Y., & Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis. PAMI, 32(8), 1362-1376.


Surface Reconstruction and Modeling:

[10] Zeng, G., Paris, S., Quan, L., & Sillion, F. (2005, October). Progressive surface reconstruction from images using a local prior. ICCV 2005. (Vol. 2, pp. 1230-1237).

[11] Quan, L. (2010). Image-based modeling. Springer Science & Business Media.

[12] Quan, L., Tan, P., Zeng, G., Yuan, L., Wang, J., & Kang, S. B. (2006, July). Image-based plant modeling. In ACM Transactions on Graphics (TOG) (Vol. 25, No. 3, pp. 599-604). ACM.

[13] Shiwei Li, Sing Yu Siu, Tian Fang, Long Quan. Efficient Multi-view Surface Refinement with Adaptive Resolution Control. ECCV 2016.



[14] Liu, J., Wang, J., Fang, T., Tai, C. L., & Quan, L. (2015). Higher-Order CRF Structural Segmentation of 3D Reconstructed Surfaces. ICCV (pp. 2093-2101).

[15] Zhang, R., Li, S., Fang, T., Zhu, S., & Quan, L. (2015). Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-View Stereo. ICCV (pp. 2084-2092).

[16] Wang, J., Fang, T., Su, Q., Zhu, S., Liu, J., Cai, S., ... & Quan, L. Image-based Building Regularization Using Structural Linear Features.

[17] Tianwei Shen, Jinglu Wang, Tian Fang, Siyu Zhu, Long Quan. Color Correction for Image-Based Modelling in the Large. ACCV 2016.

[18] Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge university press.

[19] Quan, Long. Image-based modeling. Springer Science & Business Media, 2010.

[20] Zhang, H., Wang, J., Fang, T., & Quan, L. (2014). Joint segmentation of images and scanned point cloud in large-scale street scenes with low-annotation cost. IEEE TIP, 23(11), 4763-4772.

[21] Zhang, H., Wang, J., Tan, P., Wang, J., & Quan, L. (2013). Learning CRFs for image parsing with adaptive subgradient descent. ICCV.

[22] Wang, J., Liu, C. , Shen, T., Quan, L. Structure-driven Facade Parsing With Irregular Patterns. ACPR 2015. (Oral)

[23] Wang, J., Li, S., Zhang, H., Quan, L. Semantic Segmentation of Large-Scale Urban 3D Data with Low Annotation Cost. CVPR Workshop 2015.



We would like to thank Shiwei Li for the course material of Part 2.

This website is built and maintained by Tianwei Shen.


#Visitors since Nov, 2016: