Data Flow Patterns in Agent Chains: Files, Events, and Shared State
Mentiko Team
Every agent chain has two problems to solve: what to do (the agents) and how to pass data between them (the plumbing). Most tutorials focus on the agents and hand-wave the data flow. Then you get to production and discover that how data moves between agents determines whether your chain is debuggable, scalable, and reliable -- or a black box that works until it doesn't.
Mentiko supports three primary data flow patterns: file-based handoff, event payloads, and shared state directories. Each has different tradeoffs around visibility, size limits, performance, and debugging. Here's when to use each and how they work in practice.
File-Based Handoff
This is Mentiko's default and most distinctive pattern. When an agent completes, it writes its output to a file in the chain's run directory. The next agent reads from that file. Every intermediate result is a file on disk that you can inspect, diff, and replay.
{
"name": "data-pipeline",
"agents": [
{
"name": "extractor",
"prompt": "Extract customer records from the CSV. Output as a JSON array.",
"triggers": ["chain:start"],
"emits": ["extraction:complete"],
"output": "extracted_records.json"
},
{
"name": "transformer",
"prompt": "Normalize the extracted records: lowercase emails, format phone numbers, deduplicate by email.",
"triggers": ["extraction:complete"],
"input": "extracted_records.json",
"output": "transformed_records.json",
"emits": ["transform:complete"]
},
{
"name": "loader",
"prompt": "Insert the transformed records into the database via the API.",
"triggers": ["transform:complete"],
"input": "transformed_records.json",
"emits": ["chain:complete"]
}
]
}
Each agent declares an output file it writes to and an input file it reads from. The chain's run directory might look like this after completion:
runs/data-pipeline/run-2026-03-19-001/
chain.json # chain definition snapshot
extracted_records.json # extractor output
transformed_records.json # transformer output
events/
chain-start.json
extraction-complete.json
transform-complete.json
chain-complete.json
Every intermediate result is visible. If the loader fails, you can inspect transformed_records.json to see exactly what it received. If the transformer produces bad data, you can look at extracted_records.json to see if the problem originated upstream. You can even re-run the transformer manually by feeding it the extractor's output file -- no need to re-run the entire chain.
This is the single biggest advantage of file-based handoff: total debuggability. There's no message queue to peek into, no in-memory state that vanished when the process died, no "let me add a log statement and run it again." The data is right there, in a file, always.
The tradeoff is disk I/O and size. If your agent produces a 500MB dataset, writing it to disk and reading it back takes time. For most workloads (JSON documents, text content, structured data under 50MB) this is negligible. For large binary datasets, you'll want the shared state pattern instead.
Event Payloads
Events in Mentiko aren't just signals -- they carry data. When an agent emits an event, it can attach a payload. The downstream agent receives that payload as part of its trigger context.
{
"name": "notification-pipeline",
"agents": [
{
"name": "detector",
"prompt": "Monitor the log stream for error patterns. When you find one, emit an alert with the error type, timestamp, affected service, and severity.",
"triggers": ["chain:start"],
"emits": ["alert:detected"],
"emit_payload": {
"error_type": "{{detected_type}}",
"service": "{{affected_service}}",
"severity": "{{severity_level}}",
"timestamp": "{{detection_time}}"
}
},
{
"name": "enricher",
"prompt": "Look up the affected service in the runbook. Attach the escalation path and known mitigations.",
"triggers": ["alert:detected"],
"emits": ["alert:enriched"],
"emit_payload": {
"original_alert": "{{trigger_payload}}",
"escalation_path": "{{runbook_escalation}}",
"mitigations": "{{runbook_mitigations}}"
}
},
{
"name": "notifier",
"prompt": "Format and send the enriched alert to the appropriate channel based on severity.",
"triggers": ["alert:enriched"],
"emits": ["chain:complete"]
}
]
}
The detector's payload flows through the enricher and into the notifier. Each agent adds to the payload. The enricher wraps the original alert inside its own payload using trigger_payload, which gives the notifier the complete context chain.
Event payloads work well for small, structured data: IDs, status codes, metadata, routing decisions, scores. They're fast because they travel with the event itself -- no separate file read required. They're also easy to route on: you can have conditional triggers that check payload fields, not just event names.
The limitation is size. Event payloads should stay under a few kilobytes. If you're trying to pass a full document, a dataset, or a generated report through an event payload, use file-based handoff instead. Payloads are for metadata and routing context, not bulk data.
Shared State Directories
Some chains need a shared workspace that multiple agents can read from and write to over the course of the run. A shared state directory acts as a scratch pad that persists across agents.
{
"name": "research-project",
"shared_state": {
"directory": "workspace",
"cleanup": "on_complete"
},
"agents": [
{
"name": "planner",
"prompt": "Read the research brief. Create a research plan with 5 subtopics. Write the plan to workspace/plan.json.",
"triggers": ["chain:start"],
"emits": ["plan:ready"]
},
{
"name": "researcher-a",
"prompt": "Research subtopics 1 and 2 from workspace/plan.json. Write findings to workspace/findings-a.md.",
"triggers": ["plan:ready"],
"emits": ["research:partial"]
},
{
"name": "researcher-b",
"prompt": "Research subtopics 3, 4, and 5 from workspace/plan.json. Write findings to workspace/findings-b.md.",
"triggers": ["plan:ready"],
"emits": ["research:partial"]
},
{
"name": "synthesizer",
"prompt": "Read all files in workspace/. Combine the plan and all findings into a final report. Write to workspace/final-report.md.",
"triggers": ["research:partial"],
"collect": 2,
"emits": ["chain:complete"]
}
]
}
The shared_state block declares a workspace directory that exists for the duration of the run. Every agent can read from and write to this directory. The planner writes the plan. Both researchers read the plan and write their findings. The synthesizer reads everything and produces the final output.
Shared state directories are useful when the data flow isn't a clean pipeline. In this example, both researchers need to read the plan (which isn't their trigger -- it was a previous agent's output) and they both write separate files. The synthesizer reads all of it. This many-to-many data flow is awkward with file-based handoff (which is one-to-one) and impossible with event payloads (which are too small).
The risk with shared state is the same as with any shared mutable resource: race conditions. If two agents write to the same file simultaneously, one write silently wins. Mentiko mitigates this with per-file locking, but the better approach is to design your agents to write to separate files. findings-a.md and findings-b.md never conflict because they're different files.
The cleanup field determines what happens to the workspace after the chain completes. on_complete deletes it to save disk space. retain keeps it for debugging. retain_on_error keeps it only when the chain fails, which is a good default for production -- you get cleanup on success and investigation capability on failure.
Combining Patterns
Real chains mix patterns. Event payloads carry routing metadata while file-based handoff moves the actual data. Shared state holds configuration that multiple agents reference.
{
"name": "content-workflow",
"shared_state": {
"directory": "workspace",
"cleanup": "retain_on_error"
},
"agents": [
{
"name": "intake",
"prompt": "Parse the content request. Write the brief to workspace/brief.json. Determine the content type.",
"triggers": ["chain:start"],
"emits": ["content:blog-post", "content:social-thread", "content:email"],
"emit_payload": {
"content_type": "{{detected_type}}",
"priority": "{{request_priority}}"
}
},
{
"name": "blog-writer",
"prompt": "Read workspace/brief.json. Write the blog post to workspace/draft.md.",
"triggers": ["content:blog-post"],
"output": "workspace/draft.md",
"emits": ["draft:complete"]
},
{
"name": "social-writer",
"prompt": "Read workspace/brief.json. Write the social thread to workspace/draft.md.",
"triggers": ["content:social-thread"],
"output": "workspace/draft.md",
"emits": ["draft:complete"]
},
{
"name": "reviewer",
"prompt": "Read workspace/draft.md. Review for quality and brand voice. Write feedback to workspace/review.json.",
"triggers": ["draft:complete"],
"input": "workspace/draft.md",
"output": "workspace/review.json",
"emits": ["chain:complete"]
}
]
}
This chain uses all three patterns. The intake agent's event payload carries the content type for conditional routing and the priority level for downstream decisions. The actual brief lives in the shared workspace because multiple agents need to reference it. The writers and reviewer use file-based handoff within the workspace directory, keeping everything in one inspectable location.
Choosing the Right Pattern
Use event payloads for: routing metadata, IDs, scores, status flags, anything under a few KB that downstream agents need to make decisions.
Use file-based handoff for: documents, datasets, generated content, analysis results -- the actual work product that agents produce and consume. Aim for under 50MB per file.
Use shared state directories for: multi-agent workspaces where several agents read from common reference material, research projects with multiple contributors, any chain where the data flow has cycles or many-to-many relationships.
The default should be file-based handoff with event payloads for metadata. It gives you the best debuggability and the cleanest agent separation. Add shared state when your chain's data flow demands it -- you'll know because you'll find yourself wishing agents could read files they didn't directly produce.
Whatever pattern you choose, the data is always on disk, always inspectable, and always replayable. That's the fundamental design principle: if you can't see what happened between agents, you can't debug it, and if you can't debug it, you can't run it in production.
See the getting started guide for building your first chain, or explore error handling patterns for making your data flow resilient.
Get new posts in your inbox
No spam. Unsubscribe anytime.