Can LLMs repair code on SWE-Bench?
Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench
October 8, 2024
https://arxiv.org/pdf/2410.04485This paper studies how well a conversational AI system can fix software bugs (specifically on the SWE-Bench benchmark).
- Using either LLAMA or GPT-4, they tested giving the AI a conversational "back-and-forth" where it received feedback on whether code changes passed tests.
- This conversational approach, while simple, generated working patches more often than just repeatedly asking for a fix without feedback.
- This suggests that even basic multi-agent interaction (AI agent + testing environment) is promising for program repair tasks.