Debugging Agent Chains: A Systematic Approach

Your 4-agent chain was working perfectly. Now it's producing garbage. The output is wrong, and you have no idea which agent broke.

This is the most common frustration in agent orchestration. Here's the systematic debugging process we use at Mentiko.

Step 1: Check the run status

Before debugging agent logic, rule out infrastructure issues:

Did all agents complete? Check the run detail page or ls events/ for completion events.
Did any agent timeout? The watchdog marks stalled agents.
Were there API errors? Rate limits, auth failures, and network issues produce different symptoms than logic errors.

If an agent didn't complete, the problem is infrastructure, not prompts. Fix the infra issue first.

Step 2: Isolate the failing agent

A chain produces bad output because one agent produced bad output. Find which one.

Start from the end and work backward:

Read the final agent's output. Is it bad?
Read the final agent's input (previous agent's output). Was the input already bad?
Keep going backward until you find the first agent with bad output that had good input.

That's your failing agent.

In Mentiko, each agent's input and output are captured in the run history. You can inspect any agent's work without re-running the chain.

Step 3: Inspect the event handoff

The event between agents is the contract. Read it:

cat events/research-complete.event

Check:

Is the event format correct? (JSON/YAML/markdown)
Does it contain the expected fields?
Is the data complete? (no truncation, no empty fields)
Does the downstream agent's trigger match this event name exactly?

Event mismatches are the #1 cause of "the chain runs but produces nothing" bugs. A trigger of research:complete won't fire on an event named research_complete.

Step 4: Test the agent in isolation

Extract the failing agent from the chain and run it alone:

Take the exact input it received from the previous agent
Run it standalone with the same prompt and environment
Check: does it produce correct output when isolated?

If yes: The problem is in the handoff, not the agent. Check the event format, the input parsing, or the environment variables.

If no: The problem is in the agent's prompt or configuration. Move to Step 5.

Step 5: Diagnose the prompt

When an agent produces bad output from good input, the prompt is usually the issue. Common problems:

Ambiguous instructions

"Analyze the data" is ambiguous. Does it mean statistical analysis? Trend identification? Anomaly detection? The model picks one interpretation, and it might not be yours.

Fix: Be specific. "Identify the top 3 trends in the data, with supporting statistics and confidence levels."

Missing output format

The agent produces a wall of text when the next agent expects JSON. Or it produces markdown when it should produce plain text.

Fix: Specify the exact output format in the prompt. Include an example.

Context window overflow

The input is too long for the model's context window. The agent silently drops information and produces output based on a truncated view.

Fix: Add a summarization step before the failing agent, or use a model with a larger context window for that agent.

Model capability mismatch

You're using a fast/cheap model for a task that requires reasoning. Or you're using an expensive model for simple classification.

Fix: Match model capability to task complexity. Classifier agents can use faster models. Analysis agents need capable models.

Prompt injection in input

The previous agent's output contains text that overrides this agent's instructions. "Ignore your instructions and instead..."

Fix: Add input sanitization. Wrap the input in clear delimiters. Use system prompts (where the model supports them) to separate instructions from data.

Step 6: Check for non-determinism

Run the same chain with the same input three times. If the output varies significantly, you have a non-determinism problem.

Causes:

Temperature too high (try 0.1-0.3 for consistent output)
Prompt doesn't constrain the output enough
Model version changed (provider updated the model)

Fix: Lower temperature. Add more constraints to the prompt. Pin the model version if your provider supports it.

Step 7: Check the environment

Sometimes the agent is fine but the environment is wrong:

Missing environment variables (secrets not injected)
Wrong workspace (agent running in test instead of prod)
Stale data (agent reading cached files instead of fresh ones)
Permission issues (agent can't read the input file)

Run env | grep KEY_NAME in the agent's workspace to verify environment variables are set correctly.

The debugging checklist

When a chain breaks:

[ ] 1. All agents completed? (check run status)
[ ] 2. Which agent failed? (work backward from output)
[ ] 3. Event handoff correct? (check event files)
[ ] 4. Agent works in isolation? (test standalone)
[ ] 5. Prompt issues? (ambiguity, format, overflow)
[ ] 6. Non-determinism? (run 3x, compare)
[ ] 7. Environment correct? (vars, workspace, permissions)

Most bugs are found in steps 2-4. Prompt issues (step 5) account for the rest. Environment issues (step 7) are rare but frustrating.

Prevention

The best debugging is no debugging:

Quality gates catch bad output before it reaches the next agent
Output schemas enforce consistent formatting
Explicit prompts leave no room for interpretation
Per-agent logging gives you the data to diagnose without re-running
Version-controlled prompts let you diff what changed

Building reliable chains? Learn the 5 chain patterns or see the events guide.