Abstract:

A range of systems mapping approaches are widely used to support the analysis and design of public policy, but can be time and resource intensive to implement. Generative AI tools may be able to streamline the use of systems mapping by helping researchers to quickly synthesise existing data on policy systems, freeing resources to foster greater stakeholder participation and use of maps. To explore and test the potential of these tools to help with systems mapping exercises, we examine the performance of seven proprietary vision language models (VLMs) with a key task in potential workflows - extraction of relevant information from images of system maps already created. VLMs present value as they allow for the synthesis of both textual and image data simultaneously. We test on images of three types of system map diagrams: Causal Loop Diagrams, Fuzzy Cognitive Maps and the Theory of Change maps, and test three different formats for structuring data: DOT, JSON and Markdown table. We find that models summarise factors in maps better than connections, with some models extracting factor labels perfectly for certain images and formats. Models appear to perform better with diagrams that have bolder graphics and when there is greater internal consistency between separate node and edge lists. We also find that models appear to omit correct information more than they include false information, although falsehoods are still common. Our formal approach to testing introduces an empirical framework that will allow researchers to conduct similar research in the future, to maintain pace as the application and capabilities of language models continue to evolve.

Citation:

White, J. & Barbrook-Johnson, P. (2025), 'Using vision-language models to extract network data from images of system maps', INET Oxford Working Paper Series, No. 2025-26
Download Document (pdf, 6.166 MB)