Three-dimensional display and virtual reality technology have been applied in minimally invasive surgery to provide doctors with a more immersive surgical experience. One of the most popular systems based on this technology is the Da Vinci surgical robot system. The key to build the in vivo 3-D virtual reality model with a monocular endoscope is an accurate estimation of depth and motion. In this article, a fully unsupervised learning method for depth and motion estimation using the continuous monocular endoscopic video is proposed. After the detection of highlighted regions, EndoMotionNet and EndoDepthNet are designed to estimate ego-motion and depth, respectively. The timing information between consecutive frames is considered with a long short-term memory layer by EndoMotionNet to enhance the accuracy of ego-motion estimation. The estimated depth value of the previous frame is used to estimate the depth of the next frame by EndoDepthNet with a multimode fusion mechanism. The custom loss function is defined to improve the robustness and accuracy of the proposed unsupervised-learning-based method. Experiments with the public datasets verify that the proposed unsupervised-learning-based continuous depth and motion estimation method can effectively improve the accuracy of depth and motion estimation, especially after processing the continuous frame.
- Deep learning
- depth estimation
- minimally invasive surgery (MIS)
- unsupervised learning