In vS-Graphs, the Building Components Recognition module extracts semantic entities such as walls and ground surfaces from each KeyFrame using a panoptic segmentation backbone that provides pixel-level labels and instance boundaries. A parallel Structural Elements Recognition thread then infers higher-level entities (rooms and floors) by grouping spatially consistent components into enclosed areas and aggregating them across building levels. This hierarchical reasoning enhances spatial understanding and enables geometry-aware optimization in vS-Graphs. Below, you can find some qualitative results of the generated scene graphs on different datasets, where bc and se refer to the building components and structural elements, respectively.
Below table shows the performance of vS-Graphs on different datasets and compared to the state-of-the-art methods. It is measured using ATE reported in meters. For evaluation, each system was evaluated over eight runs on dataset instances.
Root Mean Square Error (RMSE) values for ORB-SLAM 3.0 and vS-Graphs across different sequences of the AutoSense dataset (over eight iterations). The results indicate that vS-Graphs generally achieve lower RMSE values, with around 10.15% fewer points (on average).
| Method | Sequence | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| vGraphs (link) | SR01 | 0.2598 | 0.3102 | 0.2608 | 0.3454 | 0.3409 | 0.3122 | 0.2891 | 0.3246 |
| SR02 | 0.2333 | 0.2165 | 0.2301 | 0.2494 | 0.2499 | 0.2414 | 0.2564 | 0.2853 | |
| SR03 | 0.2749 | 0.5544 | 0.2426 | 0.3509 | 0.2837 | 0.3801 | 0.2725 | 0.2920 | |
| MR01 | 4.3774 | 5.5194 | 6.2763 | 4.7143 | 6.8155 | 4.4640 | 4.8322 | 4.5823 | |
| MR02 | 1.1153 | 1.1836 | 0.9430 | 0.9017 | 0.9210 | 1.0799 | 0.9575 | 0.7648 | |
| MR03 | 1.2933 | 0.2978 | 0.3472 | 0.3137 | 1.1864 | 0.2894 | 0.4122 | 0.2721 | |
| ORB-SLAM 3.0 (link) | SR01 | 0.2772 | 0.4435 | 0.3509 | 0.3964 | 0.2753 | 0.3006 | 0.3995 | 0.3602 |
| SR02 | 0.3111 | 0.2781 | 0.2862 | 0.2668 | 0.3000 | 0.2955 | 0.2408 | 0.2787 | |
| SR03 | 0.3451 | 0.3380 | 0.3279 | 0.2931 | 0.3054 | 0.3728 | 0.3076 | 0.3435 | |
| MR01 | 5.1266 | 5.0866 | 5.6484 | 5.6531 | 4.7557 | 6.2066 | 5.2847 | 5.2242 | |
| MR02 | 1.1828 | 1.1521 | 0.9291 | 0.9648 | 0.9379 | 1.0269 | 1.2719 | 1.0016 | |
| MR03 | 0.2869 | 2.5426 | 1.5765 | 2.3271 | 2.0381 | 0.2725 | 0.6926 | 0.4120 |
Number of map points generated by ORB-SLAM 3.0 and vS-Graphs on the AutoSense dataset (over eight iterations). The measurements show that vS-Graphs produces fewer points than ORB-SLAM 3.0, while positively impacting the mapping accuracy.
| Method | Sequence | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|---|
| vGraphs (link) | SR01 | 6759 | 6704 | 6744 | 6741 | 6875 | 6804 | 6780 | 6842 |
| SR02 | 7304 | 7120 | 7420 | 7457 | 7421 | 7212 | 7382 | 7160 | |
| SR03 | 11764 | 11591 | 11937 | 12054 | 12156 | 11844 | 11471 | 11475 | |
| MR01 | 20607 | 20682 | 19692 | 20714 | 21042 | 20274 | 19433 | 20546 | |
| MR02 | 16838 | 16622 | 16479 | 16563 | 17065 | 17200 | 16905 | 17274 | |
| MR03 | 48364 | 46237 | 48250 | 47850 | 47149 | 49008 | 48442 | 45224 | |
| ORB-SLAM 3.0 (link) | SR01 | 6931 | 7085 | 7057 | 6903 | 7012 | 6894 | 7063 | 7130 |
| SR02 | 7639 | 7378 | 7548 | 7712 | 7540 | 7590 | 7363 | 7377 | |
| SR03 | 13140 | 12631 | 12818 | 12903 | 13021 | 13187 | 12613 | 12990 | |
| MR01 | 22564 | 22685 | 21916 | 22688 | 22552 | 22759 | 22701 | 22506 | |
| MR02 | 18816 | 18196 | 17899 | 18547 | 17722 | 18492 | 18117 | 17643 | |
| MR03 | 54797 | 55342 | 56531 | 55499 | 56056 | 55853 | 56187 | 54545 |
The results are in the form of reconstructed maps enriched with building components, which are later used to infer the structural elements of the environment.
| Detected / Real | Precision | Recall | |||||
|---|---|---|---|---|---|---|---|
| Method | Sequence | BC | SE | BC | SE | BC | SE |
| S-Graphs (link) | MR01 | 11 / 14 | 4 / 4 | 0.92 | 1.00 | 0.92 | 1.00 |
| MR02 | 12 / 13 | 4 / 4 | 1.00 | 1.00 | 0.92 | 1.00 | |
| MR03 | 20 / 22 | 6 / 6 | 0.90 | 1.00 | 0.95 | 1.00 | |
| Hydra (link) | MR01 | N/A | 4 / 4 | N/A | 1.00 | N/A | 1.00 |
| MR02 | N/A | 6 / 4 | N/A | 0.75 | N/A | 0.75 | |
| MR03 | N/A | 6 / 6 | N/A | 1.00 | N/A | 0.80 | |
| vGraphs (link) | MR01 | 13 / 14 | 4 / 4 | 0.86 | 1.00 | 1.00 | 1.00 |
| MR02 | 13 / 13 | 4 / 4 | 0.92 | 1.00 | 0.92 | 1.00 | |
| MR03 | 23 / 22 | 6 / 6 | 0.96 | 1.00 | 1.00 | 1.00 | |
vS-Graphs achieves real-time performance with an average processing rate of 22 ± 3 FPS, exceeding the 20 FPS threshold for real-time operation.
@article{vsgraphs,
title={vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding},
author={A. Tourani, S. Ejaz, H. Bavle, M. Fernandez-Cortizas, D. Morilla-Cabello, J.L. Sanchez-Lopez, H. Voos},
year={2025},
url={https://arxiv.org/abs/2503.01783}
}
@article{tourani2024towards,
title={Towards Localizing Structural Elements: Merging Geometrical Detection with Semantic Verification in RGB-D Data},
author={A. Tourani, S. Ejaz, H. Bavle, J.L. Sanchez-Lopez, H. Voos},
year={2024},
url={https://arxiv.org/abs/2409.06625}
}