My research centers on Computer Vision, with a specific emphasis on employing Vision-Language Models to achieve comprehensive understanding of 2D and 3D environments without vocabulary restrictions. Additionally, I am deeply engaged in integrating Multimodal Large-Language Models to facilitate interactive comprehension of 3D scenes. My efforts include the development of algorithms designed to extract both geometric and semantic data from 3D environments, with applications spanning robotics, autonomous driving, and augmented reality domains.