Design MCP Servers for Agent Workflows

Overview

A workflow-first method for designing MCP servers that agents can actually use: start from the task, compress tool chains, manage tokens, separate risk levels, and write recovery-oriented errors.

When to use this

You are about to expose an API, database, filesystem, or internal service through MCP and need to decide what the agent-facing tools should be.

Write The Workflow Card

Before naming tools, write one card for the agent job: user request, final answer or artifact, context needed, tools allowed, and done-when check. This keeps the server design anchored in the thing a user actually asks for instead of the shape of an existing API.

User request: the natural-language request the agent should handle.
Final artifact: the answer, file, ticket, patch, or table the agent should return.
Done when: the observable condition that means the tool helped rather than adding noise.

Draft The Tool Contract

Use the same small contract for every proposed tool before implementation. Tool name: action-oriented and specific. Use when: the user situation that should trigger it. Inputs: only the decisions or identifiers the model can reliably provide. Returns: compact structured fields and next-step text. Side effects: files, accounts, money, messages, deletes, or external systems touched.

Errors: explain how the agent can recover, such as narrow the query or request approval.
Example call: one realistic input and one short return shape.
Permission note: read-only, write, delete, external message, or credential use.

Set Output Limits Before Coding

Large outputs are one of the easiest ways to make a promising MCP server unpleasant to use. Decide how each tool handles oversized files, logs, images, or query results before implementation: reject with a recoverable error, return a page, summarize where that is trustworthy, or ask for a narrower range.

Check file size, character count, or approximate tokens before returning content.
Return page cursors or ranges when the user may need every record.
Make the recovery path explicit so the agent can ask for the next slice.

Split Read And Write Tools

A tool that mixes harmless reads with consequential writes makes approval prompts harder to trust. Split read-only lookup from create, update, delete, send, spend, or deploy actions unless the workflow truly needs them together, and document that exception in the contract.

Keep read-only tools separate from mutation tools.
Request the minimum OAuth scope needed for the workflow.
Write approval prompts in user language: what will change, where, and how to undo it.

Run Three Agent Tests

A server can pass unit tests and still be frustrating for an agent. Run three end-to-end tasks in the target client before calling the design ready: the happy path, one messy real-world case, and one permission or oversized-output case.

Track unnecessary tool calls, missing context, oversized responses, and vague errors.
Confirm the agent can finish without guessing around the tool contract.
Update tool descriptions and examples after each failed run.

Method

Write a workflow card first: user request, final answer or artifact, context needed, tools allowed, and done-when check.
Collapse raw API operations into the fewest agent-meaningful tools that can complete the workflow without long chains.
Draft a copyable tool contract for each tool with Tool name, Use when, Inputs, Returns, Side effects, errors, and examples.
Set output budgets before implementation, with explicit overflow behavior for large files, logs, images, or query results.
Separate read-only actions from write or delete actions so approval prompts remain easy for a user to judge.
Add server instructions, tool annotations, and examples that teach the client when to call each tool.
Test against realistic agent tasks and revise whenever the agent has to plan around missing context or noisy output.

Before you start

What to clarify first

Target workflowWrite the exact user job the agent should finish, not just the API you want to expose.
Existing API or data schemaBring the real endpoints, tables, permissions, and data shapes so tool contracts stay grounded.
Expected outputsName the artifact the agent should return, such as a summary, patch, ticket, file, or table.
Permission modelKnow which actions are read-only, which mutate data, and which need explicit user approval.
Representative test tasksUse real tasks to check whether the server helps the agent finish work with fewer guesses.

Helpful references

MCP docsUse the protocol docs to confirm server boundaries, tool metadata, and client expectations.
Agent clientTest in the actual client where the server will run because routing behavior differs by client.
Token budget checkSet size limits before implementation so large logs, files, or query results fail gracefully.
Local test workspaceRun unsafe or messy examples in a sandbox before giving the server access to real systems.

Decision points

Should this be one high-level tool or several smaller tools?: Use one high-level tool when the operations are only useful as a sequence and the agent would otherwise repeat the same chain. Split tools when the steps are independently useful or have different permission risk.
Should the tool return raw data or a shaped answer?: Return shaped data when the raw payload is verbose, unstable, or full of irrelevant fields. Return raw or queryable data when the agent genuinely needs analytical flexibility, such as SQL over a clean local table.
Should overflow be an error, truncation, pagination, or summarization?: Use a recoverable error when a narrower request is easy. Use pagination when the user may need every record. Use truncation only with a clear note. Use summarization when lossy transformation is acceptable.

Common mistakes

Shipping a thin wrapper around every REST or GraphQL endpoint and calling it a playbook.
Returning giant JSON payloads because they are easy for the server to produce.
Combining read and write behavior in one tool without making the risk obvious.
Treating auth as an install-time hurdle instead of asking for access only when the workflow needs it.
Writing vague tool descriptions that tell the model what not to do instead of giving it a recoverable path.

Troubleshooting

The agent keeps chaining many calls for a common task.: Promote that task into a workflow-level tool or provide a query interface over a prepared schema.
The agent loses track of relevant details after a tool call.: Shorten the return shape, remove irrelevant fields, and include only the next facts the model needs.
Users cannot tell whether a tool is safe to approve.: Separate read-only and mutation actions, add annotations, and make the side effect explicit in the description.
Large files or logs break the session.: Preflight size, return a recoverable error with suggested ranges, or expose pagination.

Sources

This playbook is authored from multiple references. Open the originals to inspect details, examples, and current guidance before adapting it.

Block's Playbook for Designing MCP Servers
Primary source for workflow-first design, token budgeting, permission separation, and real MCP server examples.
Model Context Protocol introduction
Protocol mental model and client-server boundary.
MCP example servers
Reference implementations to compare against your proposed tool shape.
Official MCP servers directory
Existing server patterns and integration surface area.

Notes

MCP servers can expose private files, credentials, internal APIs, and write actions. Treat every tool contract as a security boundary and test permissions before sharing.

Comments

No comments yet.