Version history

1 version on record. Newest first; the live version sits at the top with a live indicator.
Live
4/27/2026, 2:59:17 PM
Content snapshot
{
  "doi": "10.48550/arxiv.2303.17651",
  "abstract": "Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides feedback for its output and uses it to refine itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner, and feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by ~20% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach.",
  "journal": "Neural Information Processing Systems",
  "year": 2023,
  "authors": "Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri et al.",
  "url": "http://arxiv.org/pdf/2303.17651",
  "external_ids": {
    "doi": "10.48550/arxiv.2303.17651",
    "s2_id": "257900871",
    "source": "openalex",
    "openalex": "https://openalex.org/W4362508231",
    "normalized_at": "2026-04-27T00:33:10+00:00",
    "normalized_by_task": "7bc2fe4a-bef7-4b2e-83d4-18bbaeba487b",
    "scientist_author_slugs": [
      "peter-clark"
    ],
    "scientist_author_orcids": [
      "0000-0002-8006-7015"
    ]
  },
  "citation_count": 3244
}