How can LLMs build layered images procedurally?
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
April 2, 2025
https://arxiv.org/pdf/2504.00010LayerCraft enhances text-to-image generation by using multiple AI agents coordinated by an LLM (GPT-4). It allows users to customize objects within a generated image with complex spatial relationships and fine-grained control via natural language. ChainArchitect uses chain-of-thought reasoning to create a 3D scene layout from a user's prompt. OIN (Object Integration Network) uses LoRA fine-tuning to seamlessly insert user-specified objects into designated locations within the image. This agent-based approach offers improved control, simplifies complex scene generation, and enables creative image customization without requiring extensive user input or technical expertise.