Unsupervised-Learning-Based Continuous Depth and Motion Estimation with Monocular Endoscopy for Virtual Reality Minimally Invasive Surgery

Ling Li, Xiaojian Li, Shanlin Yang, Shuai Ding, Alireza Jolfaei, Xi Zheng

Research output: Contribution to journalArticlepeer-review

23 Citations (Scopus)


Three-dimensional display and virtual reality technology have been applied in minimally invasive surgery to provide doctors with a more immersive surgical experience. One of the most popular systems based on this technology is the Da Vinci surgical robot system. The key to build the in vivo 3-D virtual reality model with a monocular endoscope is an accurate estimation of depth and motion. In this article, a fully unsupervised learning method for depth and motion estimation using the continuous monocular endoscopic video is proposed. After the detection of highlighted regions, EndoMotionNet and EndoDepthNet are designed to estimate ego-motion and depth, respectively. The timing information between consecutive frames is considered with a long short-term memory layer by EndoMotionNet to enhance the accuracy of ego-motion estimation. The estimated depth value of the previous frame is used to estimate the depth of the next frame by EndoDepthNet with a multimode fusion mechanism. The custom loss function is defined to improve the robustness and accuracy of the proposed unsupervised-learning-based method. Experiments with the public datasets verify that the proposed unsupervised-learning-based continuous depth and motion estimation method can effectively improve the accuracy of depth and motion estimation, especially after processing the continuous frame.

Original languageEnglish
Pages (from-to)3920-3928
Number of pages9
JournalIEEE Transactions on Industrial Informatics
Issue number6
Publication statusPublished - Jun 2021
Externally publishedYes


  • Deep learning
  • depth estimation
  • ego-motion
  • endoscope
  • minimally invasive surgery (MIS)
  • odometry
  • unsupervised learning


Dive into the research topics of 'Unsupervised-Learning-Based Continuous Depth and Motion Estimation with Monocular Endoscopy for Virtual Reality Minimally Invasive Surgery'. Together they form a unique fingerprint.

Cite this