Self-supervised learning for depth completion 

Depth maps are a way to visualize the distance of objects around a sensor and are essential to allow robots to build accurate maps of a space and plan their paths for safe navigation. Knowing whether an object is close, and precisely how close, is crucial to avoid collisions. It is something that humans and animals do naturally – from our earliest moments we are learning through trial and error to understand how far things are away from us. For robots, Supervised AI training achieves a similar result by showing millions of images all precisely labelled with ‘ground truth’ so that the AI learns to recognize objects at different distances. But this is expensive, and the datasets needed are often rare and insufficient. Our new depth completion algorithms leverage a specialist neural network which works in conjunction with our stereo vision SLAM system to greatly improve the accuracy and quality of depth maps.

Trading accuracy for cost

Developers looking for accurate depth maps face a range of challenges. Different sensors, such as LIDAR, ‘time of flight’ or stereoscopic cameras with active structured IR patterns, all have different strengths and weaknesses. Their ranges, accuracy and resolution all differ in specific scenarios and even where and what angle they are fixed to an autonomous machine has a big impact on the quality of depth mapping. Balancing competing needs of cost and capability creates difficult trade-offs with the result that most current depth mapping solutions are only suitable for a very small range of use-cases and none are ideal to address the range of scenarios a modern day robot will encounter. For example, bright lights or shiny surfaces like glass can disrupt active sensors. 2D LIDAR is accurate, but ‘sees’ only a slice of the environment and so can miss key elements of a scene.  3D LIDAR provides more coverage but is expensive and bulky while still very limited in its resolution.  

Noisy and incomplete data

In addition, depth data from many sensors can be noisy with erroneous or missing measurements. Significant processing power is required to ‘clean-up’ and integrate data to make it usable. Many depth cameras include an integrated processor just to make these calculations. Often developers are faced with a choice between cost, accuracy and speed. They are forced to choose between one set of sensors that provides reasonable accuracy in a limited set of circumstances, and another set up that can meet different requirements. But to meet the demands of commercial robots working in dynamic and variable scenarios these approaches are limiting. For example, a warehouse robot will need depth maps that are accurate in its immediate vicinity to prevent collisions, in the middle distance to allow effective path planning and in the long distance to allow complete floor plans of the space to be built quickly. Until now,  this has been impossible to achieve.

The capability to create fast, accurate and complete depth maps without the need for expensive, bespoke hardware would be a significant benefit to robot designers increasing the choice of sensors and reducing the design constraints. It will also make the deployment and operation of robots in these spaces faster and more reliable.

Adding SLAM to depth completion AI

With this in mind, we’ve developed the Slamcore Active Depth Completion Network (ACDC-Net) – a neural network for AI that combines the active depth-map with the output of our SLAM calculations to provide complete, accurate depth maps. This approach greatly improves the quality of the depth camera, particularly when measuring distances outside of its tightly defined performance envelope. It can accomplish this with standard GPUs and so can be cost-effectively integrated into autonomous machines to provide depth-mapping at the edge.

Our Visual-inertial SLAM algorithms provide a highly accurate but sparse map of landmarks that allow a robot or autonomous mobile system to estimate pose in real time. Combining these sparse but known landmarks with the original depth-map allows our neural network to calculate and output a greatly enhanced depth map that is far more accurate and complete than those created by the depth sensor on its own.  These videos show how our depth completion AI significantly improves the completeness and sharpness (accuracy) of depth maps.

Adding SLAM data to depth completion brings two clear advantages. First, SLAM provides information of forces and movement relative to the real-world.  Using this we can compare two views of the same landmark to calculate its distance accurately. Combining these precise distances of sparse points together with the sensor’s noisy data (depth+RGB) from a wider field-of-view provides our AI with unique information to produce state-of-the-art depth maps. Secondly, the accuracy of the ACDC-net neural network enables unsupervised training, reducing costs and accelerating deployment of robots in real-world situations where keeping the cost of edge hardware down is critical.

Slamcore’s depth completion AI outperforms other depth map creation approaches. It provides maps that are complete and accurate enough to be used for real-time obstacle avoidance and path planning in dynamic environments. The ACDC-net neural network combined with SLAM data, is highly computationally efficient delivering exceptionally fast depth completion, especially when compared to learned approaches that do not make use of an underlying SLAM system. 

Depth completion has been turbo-charged with the addition of SLAM. Now it can be integrated into a wide range of robot designs to provide a cost-effective solution for path planning, obstacle avoidance and safe operation in a variety of environments. To find out more please get in touch or read our paper on self-supervised depth completion for active stereo.