Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance


1Computer Aided Medical Procedures (CAMP), Technical University of Munich, Munich, Germany
2Munich Center for Machine Learning (MCML), Munich, Germany
3The Chinese University of Hong Kong, Hong Kong, China
MICCAI 2025
*Corresponding Author

Abstract

Understanding medical ultrasound imaging remains a long-standing challenge due to significant visual variability caused by differences in imaging and acquisition parameters. Recent advancements in large language models (LLMs) have been used to automatically generate terminology-rich summaries orientated to clinicians with sufficient physiological knowledge. Nevertheless, the increasing demand for improved ultrasound interpretability and basic scanning guidance among non-expert users, e.g., in point-of-care settings, has not yet been explored. In this study, we first introduce the scene graph (SG) for ultrasound images to explain image content to ordinary and provide guidance for ultrasound scanning. The ultrasound SG is first computed using a transformer-based one-stage method, eliminating the need for explicit object detection. To generate a graspable image explanation for ordinary, the user query is then used to further refine the abstract SG representation through LLMs. Additionally, the predicted SG is explored for its potential in guiding ultrasound scanning toward missing anatomies within the current imaging view, assisting ordinary users in achieving more standardized and complete anatomical exploration. The effectiveness of this SG-based image explanation and scanning guidance has been validated on images from the left and right neck regions, including the carotid and thyroid, across five volunteers. The results demonstrate the potential of the method to maximally democratize ultrasound by enhancing its interpretability and usability for ordinaries.


Framework

Framework Image

This framework integrates scene graph prediction with large language models (LLMs) for ultrasound image understanding, enabling (i) image summarization by focusing on anatomical relations of target structures and (ii) scanning guidance by leveraging orientation and movement cues.


Experimental Results

Framework Image

Evaluation results of different LLM models on Task I and Task II.


BibTeX

@article{li2025semantic,
  title={Semantic Scene Graph for Ultrasound Image Explanation and Scanning Guidance},
  author={Li, Xuesong and Huang, Dianye and Zhang, Yameng and Navab, Nassir and Jiang, Zhongliang},
  journal={arXiv preprint arXiv:2506.19683},
  year={2025}
}