How can LLMs improve medical report scoring?
GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation
This paper introduces GEMA-Score, a new metric for evaluating the quality of AI-generated medical reports, specifically radiology reports. It addresses the limitations of existing metrics by considering both the accuracy of medical information and the quality of the language used.
GEMA-Score uses a multi-agent system powered by LLMs. Each agent has a specific role: extracting medical entities, calculating objective accuracy scores, assessing subjective aspects like readability, and combining these into a final score with explanations. This approach allows for a more granular and interpretable evaluation compared to single LLM or traditional NLP-based methods. The multi-agent workflow and reliance on LLMs make it relevant to developers exploring similar architectures for complex evaluation tasks in other domains.