Can LLMs reliably encode expert knowledge into logic for web apps?
Integrating Expert Knowledge into Logical Programs via LLMs
This paper introduces ExKLoP, a framework for testing how well Large Language Models (LLMs) can translate human-readable expert knowledge (like engineering rules) into executable Python code and then automatically correct errors in that code.
While LLMs excel at generating syntactically correct code, they often make logical errors. The iterative self-correction abilities of the tested LLMs showed only marginal improvement, suggesting that more sophisticated correction methods like cross-model correction might be necessary for reliable multi-agent systems where the LLM translates expert knowledge into executable actions. The framework and dataset provide a benchmark for evaluating and improving this critical aspect of LLM-powered knowledge integration for future multi-agent applications.