The ability of a robot to map its surroundings and accurately locate itself using that map is fundamental for any autonomous scenario. But it is only half of the solution. For robots to work together consistently and safely there must be a shared, persistent map that works flawlessly across many different Simultaneous Localization and Mapping (SLAM) instances. Most robots don’t even store the maps they make and so repeat the mapping process every time. Sharing accurate maps between robots is even harder. So creating a single, reliable and persistent map that can be used, and improved upon, in different SLAM sessions (multi-session) is a significant step forward for robotics developers.
Accurate but not robust
Most current solutions rely on 2D-LIDAR to create accurate maps, but there are drawbacks. The lasers used by 2D-LIDAR create very accurate maps but only in two dimensions, like a thin slice through an environment. This ‘slice’ is usually a few CM above the floor of the warehouse or area mapped and is easily disrupted. As a result, LIDAR maps are not robust, and frequent re-mapping is required. Some users report that they need to remap every week or even every day, and this must be done at a slow walking pace with the entire facility shut-down, so is time consuming and disruptive in even relatively small industrial environments. 2D-LIDAR rigs are also expensive (and 3D versions even more so). Because of their sensitivity, the same laser must be used in each robot utilizing the map.
Advances in vision-based systems for SLAM have raised the possibility of using cameras to create multisession maps for robots. Cameras can capture far more information in three-dimensions and colour. As such they can create detailed maps which have the added potential to provide richer information to both robot and human users.
Dealing with changes in lighting
Camera based systems are also faster and far more cost effective than LIDAR. Initial mapping can be undertaken at much higher speed and suitable cameras cost 1/100th of the price of LIDAR. But there are challenges here too. Camera-based systems struggle with different lighting conditions – they work less well in low light and can be confused by shadows or changes in lighting. The large field of vision, colour and high levels of detail also come at a cost – requiring more computational power to process.
Our new approach overcomes these challenges to provide a vision-based multisession mapping capability using low-cost cameras in a computationally efficient manner.
Building multisession maps
A first run, with a robot under remote control or in explore-mode moving through the environment in real time, maps everything in the environment using two cameras, or a low-cost RGB-D sensor. This creates a comprehensive 3D map. It captures all landmarks and potentially billions of 3D reference points. Custom algorithms and machine learning identify those ‘keyframes’ which observe lots of ‘landmarks.’ At the end of this map-building run all the keyframe estimations are rapidly refined before saving them in the database. This map can now be loaded and used by robots to locate themselves in future SLAM sessions.
Using the map
As the robot powers on it uses the up-loaded map including the keyframe database to bootstrap its initial pose or location. It does this quickly (within 10-100 milliseconds) by matching current visual data with the stored keyframes, just as a human orienteer would assess the landmarks they can see (a church steeple, a prominent peak, or a bridge) to relate to symbols on a map and calculate their location. This is the only initialization that the robot needs for an initial pose accurate to centimetres within the map.
As the robot moves from pose to pose, even tens of milliseconds is too long a delay to maintain accurate position on the map. SLAMcore has created a Visibility Classifier that uses a memory-based learning approach to predict the landmarks most likely to be seen from a rough location. Just as the orienteer will look out for the most obvious features to navigate by, so the robot matches the best and most likely to be seen landmarks. Working alongside existing SLAM processes the robot uses these to continuously locate itself within the loaded map in single-digit milliseconds.
Multisession mapping in action
The effectiveness of this approach in real-world situations is demonstrated in this video. An initial map can be created and then loaded at a later date. The system is then able to accurately calculate initial and subsequent poses relative to the original map, even when light conditions are very different, and when the physical environment changes, for example when furniture and objects are moved.
The machine learning model trained in the first run and used by the robot using the map is computationally highly efficient: the solution can run on off-the-shelf hardware and processors. It complements existing SLAM processes and additional data feeds, for example from IMU or wheel odometry, so that robots can operate beyond the boundaries of mapped areas. As additional visual landmarks are detected they can be added to the map as temporary landmarks to aid with tracking on a specific run, but without altering the underlying map.
Using cameras to create robust, accurate persistent maps offers significant benefits to the robotics community. Reducing reliance on expensive LIDAR systems, and potentially requiring fewer separate systems and components, reduces costs and shortens time to market. Camera-based mapping is faster and more robust in real-world applications, lowering barriers to entry and operational costs of deploying robots. 3D visual maps can also contain far more information, including colour, and ultimately semantic understanding of different objects within those maps; they are also easier for humans to understand.
Running SLAM from scratch every time makes no sense
Robots most often work in the same place, and multiple robots will work together in that shared space. Creating an accurate, robust and persistent map once and using it multiple times increases effectiveness and efficiency and allows for better planning, interaction and cooperation in real-world situations. SLAMcore’s breakthrough in using low-cost cameras effectively for multisession mapping lays the foundation for immediate improvements and exciting future developments in the use of robots in many scenarios.
To find out more watch the video or get in touch.