This is the final installment in a three-part series that explores the levels of SLAM compatibility that form part of our full-stack SLAM algorithm.
In previous blogs, we’ve outlined Position as the foundation of spatial intelligence and Mapping as the crucial next level. Adding semantic understanding to these two layers provides the final aspect of Slamcore’s full-stack spatial intelligence pyramid. As noted elsewhere, robots, autonomous machines, and extended reality and metaverse devices must answer three questions to demonstrate spatial intelligence. Where am I in physical space; what objects are around me; and what are those objects? The ability to Perceive what objects are not only draws upon data from the previous two levels but significantly improves the accuracy of maps and position calculations.
The first step in creating semantic understanding is to define discrete objects within a scene and accurately label them. Using a process called panoptic segmentation, our algorithms group sets of pixels from camera data into defined objects, such as ‘wall’, ‘chair’, or ‘person’. Thereafter, instead of processing the thousands of pixels that make up a single object, our algorithms treat them as a single entity, massively reducing processing overheads.
Labeling these objects allows the algorithms to treat them in different ways. Using AI and machine learning, our algorithms can identify and label objects ‘seen’ by autonomous devices. Once labeled further, machine learning can define how to respond to the object, significantly improving the speed and accuracy of Location and Mapping. For example, humans and other moving objects should not be used for navigation, so they can be dismissed from Map and Position calculations.
The ability to understand ‘what’ different objects are leads to important performance benefits. Autonomous devices and machines are often confused by busy scenes in which many fast-moving objects interfere with maps and localization. Processing large numbers of frequently changing pixels adds to the computational load causing slower responses. Identifying and labeling discrete objects and removing things that are not navigatable landmarks simplifies Mapping and Position calculations, increases accuracy and speed, and allows the device to continue operating as usual.
To be useful, panoptic segmentation must happen in real-time and onboard the device without needing cloud-based data processing. This video shows the Slamcore panoptic segmentation in action at a busy railway station.
We all dream of the day we can ask a robot to fetch a beer from the fridge. To do so, the robot must be able to understand that there is an object recognized as a ‘fridge’, that it is usually found in a room labeled as a ‘kitchen’, and that it has a door that can be opened to find other objects inside, one of which will match its understanding of a ‘beer’. This is complex and still hard to achieve, but the semantic understanding delivered by the Perceive layer of our spatial intelligence stack is the foundation. Categorizing a collection of pixels as a specific type of object, be it a ‘human’, a ‘plant’, or a ‘fridge’, is also essential to develop the understanding necessary for next-generation devices that can respond in far more natural ways.
Although this type of application is still some way off, Slamcore’s semantic understanding has real-world applications today. For example, robots in busy factories can use vision-based spatial intelligence to perceive other robots, human workers, and even spills or fallen goods to autonomously reroute to avoid dangerous situations. Perception also enables robots to establish Position, Map their surroundings and navigate new and previously unmapped areas far more quickly. This has significant advantages as robots can be deployed more widely and flexibly without the need to shut down and empty a busy warehouse, supermarket, or hospital of people to allow Mapping.
The power of Slamcore’s full-stack approach to spatial intelligence is that each layer builds on and contributes to the accuracy, robustness, and speed of the others. Position calculations are made more accurate with better maps, and semantic understanding improves maps and speeds up positioning estimates. These benefits are equally relevant to autonomous robots in industrial settings as to consumer electronics devices and wearables that allow individuals to experience perfect alignment of real and virtual worlds as part of a metaverse experience.
If you’re working on a commercial project and want to add semantic understanding as part of your spatial intelligence stack, please get in touch.