The main contributions of this approach can be classified into separate
A multi-threaded real-time VSLAM, able to recognize, localize, and map building components for less pose and localization errors
A novel methodology for extracting structural elements (i.e., rooms and corridors) from detected building components (i.e., walls and grounds)
An algorithm for verifying and enriching geometric objects with their corresponding semantic entities
Conducting real-world experiments under various indoor conditions to assess the effectiveness of the proposed approach
Below table shows the performance of vS-Graphs on different datasets and compared to the state-of-the-art methods. It is measured using ATE reported in meters. For evaluation, each system was evaluated over eight runs on dataset instances.
Analyzing the accuracy of the reconstructed maps against the ground truch across eight iterations shows that vS-Graphs performs more robost compared to its baseline, ORB-SLAM 3.0. The performance is measured using Root Mean Square Error (RMSE) reported in meters. vS-Graphs achieves superior performance in terms of RMSE, despite generating maps with around 10.15% fewer points (on average).
Below evaluations show the performance of vS-Graphs in terms of scene understanding. The results are presented in the form of reconstructed maps enriched with building components (i.e., walls and ground surfaces). These building components are later used to infer the structural elements of the environment (i.e., rooms and corridors).
Scene understanding accuracy on multi-room sequences of the AutoSense dataset is shown below. Here, BC and SE refer to “building components” and “structural elements,” respectively.
Detected / Real | Precision | Recall | |||||
---|---|---|---|---|---|---|---|
Method | Sequence | BC | SE | BC | SE | BC | SE |
vS-Graphs | MR01 | 13 / 12 | 3 / 3 | 0.92 | 1.00 | 0.92 | 1.00 |
MR02 | 12 / 13 | 3 / 3 | 1.00 | 1.00 | 0.92 | 1.00 | |
MR03 | 20 / 17 | 4 / 4 | 0.89 | 1.00 | 0.94 | 1.00 | |
Hydra | MR01 | N/A | 3 / 3 | N/A | 1.00 | N/A | 1.00 |
MR02 | N/A | 5 / 3 | N/A | 0.75 | N/A | 0.75 | |
MR03 | N/A | 4 / 4 | N/A | 1.00 | N/A | 1.00 | |
vGraphs (ours) | MR01 | 14 / 12 | 3 / 3 | 0.86 | 1.00 | 1.00 | 1.00 |
MR02 | 14 / 13 | 3 / 3 | 0.92 | 1.00 | 0.92 | 1.00 | |
MR03 | 18 / 17 | 4 / 4 | 0.94 | 1.00 | 1.00 | 1.00 |
Recognizing building components (i.e., walls and ground surfaces) and constructing the optimizable scene graph based on them is one of the essential validations of vS-Graphs. Below, you can see some of the reconstructed environment maps in different datasets using the proposed framework:
Recognizing structural elements (i.e., rooms and corridors) and constructing the optimizable scene graph based on them is another essential validation of vS-Graphs. Below, you can see some of the reconstructed environment maps enriched with structural elements in different datasets using the proposed framework:
vS-Graphs achieves real-time performance with an average processing rate of 22 ± 3 FPS, exceeding the 20 FPS threshold for real-time operation. Below, you can see the timeline of thread execution while processing a sample dataset instance.
@article{vsgraphs,
title={vS-Graphs: Integrating Visual SLAM and Situational Graphs through Multi-level Scene Understanding},
author={A. Tourani, S. Ejaz, H. Bavle, D. Morilla-Cabello, J.L. Sanchez-Lopez, H. Voos},
year={2025},
url={https://arxiv.org/abs/2503.01783}
}
@article{tourani2024towards,
title={Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data},
author={A. Tourani, S. Ejaz, H. Bavle, J.L. Sanchez-Lopez, H. Voos},
year={2024},
url={https://arxiv.org/abs/2409.06625}
}