The CAVG model, developed by a team from the University of Macau, integrates natural language processing and large language models to create a context-aware visual grounding model for autonomous driving. This model addresses the public’s cautious attitude towards fully autonomous vehicles by allowing passengers to issue voice commands. The model was developed in response to the Talk2Car challenge, which tasks researchers with accurately pinpointing regions in front-view images based on textual descriptions.
