Version history
1 version on record. Newest first; the live version sits at the top with a live indicator.
- Live4/27/2026, 2:59:17 PM
Content snapshot
{ "doi": "10.48550/arxiv.2303.17651", "abstract": "Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it to refine itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner, and feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by ~20% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach.", "journal": "Neural Information Processing Systems", "year": 2023, "authors": "Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri et al.", "url": "http://arxiv.org/pdf/2303.17651", "external_ids": { "doi": "10.48550/arxiv.2303.17651", "s2_id": "257900871", "source": "openalex", "openalex": "https://openalex.org/W4362508231", "normalized_at": "2026-04-27T00:33:10+00:00", "normalized_by_task": "7bc2fe4a-bef7-4b2e-83d4-18bbaeba487b", "scientist_author_slugs": [ "peter-clark" ], "scientist_author_orcids": [ "0000-0002-8006-7015" ] }, "citation_count": 3244 }