Reflexion Loops

Most people use AI like a vending machine: prompt in, answer out. But the best results usually come from iteration: draft → critique → revise. Reflexion takes that human workflow and makes it systematic.
Reflexion is an automated self-correction loop where a model:
- produces an attempt,
- reflects on what’s wrong (or what could be better),
- stores that feedback as “lessons,” and
- retries with improvements.
For AI engineers, this is a pattern for building more reliable agents. For non-technical professionals, it’s a way to get higher-quality outputs without endless back-and-forth.
Reflexion in One Line
Reflexion = generate → critique → learn → retry. It’s not magic—just structured iteration.
Why It Works
Reflexion helps with three common issues:
- Overconfidence: the model commits without checking.
- Format drift: the output slowly stops matching your desired structure.
- Missed constraints: key requirements get ignored (word limits, tone, policies, schema).
A loop forces the model to compare its output against an explicit rubric and fix the gaps.
How to Prompt Reflexion
A simple Reflexion prompt includes:
- a first-pass task
- a review rubric (what “good” means)
- a revision request
- optionally, a mistake log the model should maintain
Engineers often implement this as two prompts (generator + critic). Non-technical users can do it in one prompt by forcing two phases.
Example 1: Engineer Mode (Agent Output Reliability)
You’re building an LLM tool that generates SQL for analytics. You want correctness and safety.
Context: You are an AI engineer. The model must generate BigQuery SQL for a production dashboard.Instruction: Use Reflexion. Phase 1: generate the SQL. Phase 2: critique it against the rubric. Phase 3: output a corrected final SQL.Input Data:- Goal: “Weekly active users by country for the last 8 weeks.”- Tables:users(user_id, country)events(user_id, event_name, event_ts)- WAU definition: distinct users with >=1 event in the week.Rubric:- Uses correct time window (last 8 complete weeks)- No Cartesian joins- Uses safe filters and clear aliases- Outputs columns: week_start, country, wauOutput Indicator:- Return three sections titled: DRAFT_SQL, CRITIQUE, FINAL_SQL- In CRITIQUE, list failures as checkboxes: [ ] / [x]
Why this works: you’re forcing the model to self-audit using concrete criteria, then revise. This is basically “unit tests for prompting.”
Example 2: Business Mode (Legal/Exec Tone + Accuracy)
You’re a founder responding to a contract redline summary and need clarity without overpromising.
Context: You are assisting a CEO. We need to reply to a vendor about contract terms without giving legal advice.Instruction: Use Reflexion. Phase 1: draft a reply. Phase 2: critique the reply using the rubric. Phase 3: produce a revised final reply.Input Data:Vendor asks: “Can you confirm we can reuse your deliverables in future client work?”Our preference: allow reuse only for internal portfolio, not resale or reuse across clients.Rubric:- Clear boundary (what’s allowed vs not allowed)- Professional, calm tone- No legal posturing (“this is not legal advice” is okay; threats are not)- Asks for confirmation of specific wordingOutput Indicator:- Provide: DRAFT, CRITIQUE (bullets), FINAL- Keep FINAL under 140 words
The loop makes the model check tone and constraints before you hit send.
Make the Rubric Explicit
Reflexion is only as good as the checklist. If you don’t define what “good” means, the critique becomes generic and revision won’t improve much.
Takeaway
Reflexion turns prompting into a quality system: draft, critique, revise—automatically. Use it when accuracy matters, when format must be strict, or when you’re building agents that need to improve over repeated attempts. The win isn’t that AI becomes perfect—it’s that you build a loop that catches mistakes before they ship.
