Model Context Protocol (MCP) Will Replace Plugins: How to Build One Tool That Works Across LLMs and IDEs

If you built an OpenAI Plugin, then rebuilt it as a ChatGPT Action, then wrote yet another bespoke adapter for a code editor, you’ve felt the fragmentation tax. Each platform invented its own way to expose tools and data to models. Each had its own auth story, its own schema conventions, and its own idea of streaming and state. Multiply by the number of LLMs and clients your users care about, and you’re maintaining a matrix of compatibility rather than one product.

The Model Context Protocol (MCP) is a pragmatic way out. MCP standardizes how LLM clients discover, call, and stream results from external tools and data sources. You implement a single MCP server—which is just a small JSON-RPC service with well-defined methods—and it works across any MCP-capable client: assistant UIs (e.g., Claude), IDE extensions (e.g., VS Code integrations), terminal agents, and, increasingly, other LLM chat products via adapters or native support.

This article is an opinionated, hands-on guide to building a production-grade MCP tool server: secure by default, schema-first, streaming-friendly, and state-aware. The goal is simple: ship once, run everywhere—Claude, ChatGPT (via adapters or emerging support), and code editors—without rewriting your integration per model or per app.

TL;DR

MCP is a transport-agnostic protocol based on JSON-RPC that standardizes tool discovery, invocation, and streaming.
One MCP server can be consumed by multiple LLMs and clients (desktop apps, IDEs, terminals) without bespoke adapters.
Focus on: secure sandboxing, JSON Schema for inputs/outputs, transport-layer auth, traceable streaming, and explicit state.
Use official SDKs to avoid footguns; don’t invent your own message formats.
Version your tools and schemas, and treat MCP like a public API.

Why MCP beats one-off plugins

Plugin frameworks bundled tool definitions and transport into a single product-specific bundle. They seemed convenient, but:

You had to rebuild per platform (ChatGPT Plugin vs. Actions vs. IDE extension vs. bespoke agent).
Capabilities diverged: one supported streaming, another didn’t; one had OAuth, another required API keys; one let you read files, another didn’t.
The LLM-side prompting and tool-use behaviors varied, forcing per-model engineering.

MCP separates concerns cleanly:

Protocol: standardized methods (“list tools,” “call tool,” “list resources,” “read resource,” “list prompts,” etc.) using JSON-RPC semantics.
Transport: stdio, WebSocket, or other bidirectional channels; auth is transport-layer (e.g., headers, mTLS).
Content: tools (with JSON Schemas), resources (e.g., files, URLs, databases) with MIME types, and optional prompts templates.
Streaming: partial results and progress events over the same channel.

This decoupling is what lets one MCP server run everywhere. Clients can focus on UX and LLM prompting; servers focus on correct behavior, security, and domain logic.

Architecture in one picture (textually)

MCP client (chat app, IDE, agent) ↔ transport (stdio, WebSocket, process IPC) ↔ your MCP server
Your server announces:
- Tools: callable operations with input/output schemas
- Resources: things that can be read/browsed (files, APIs, knowledge bases)
- Prompts: structured, parameterized prompt templates
The client (and the LLM it hosts) discovers tools/resources, decides what to call, streams results, and surfaces them to the user.

What you’ll build in this guide

A minimal but production-ready MCP server exposing:
- Tools: search, fetch_url, write_note
- Resources: notes/ (listable, readable), config/
- Prompts: “summarize-note” with schema-checked inputs
Secure defaults: sandboxed filesystem root, rate limiting, structured logs, and auth
Streaming: progress updates and partial text chunks
State model: per-session scratchpad and durable per-user store

You can implement in TypeScript or Python using the official SDKs. We’ll show both patterns conceptually and provide concrete JSON where it matters.

Secure-by-default design: threat model first

Don’t start by wiring up “run arbitrary shell” and hope the client asks nicely. Start with a threat model:

Data exfiltration: The LLM may try to read secrets (SSH keys, tokens) if you expose the raw filesystem. Solution: an explicit workspace root with an allowlist; disallow upward traversal.
Destructive actions: “delete all files” or “write to prod DB.” Solution: capability flags and explicit tool design that requires clear user confirmation on destructive operations; implement rate limits and dry-run modes.
SSRF/API abuse: If you expose “fetch_url,” you may become a proxy to internal services. Solution: outbound network allowlist/denylist; restrict private IP ranges; require host allowlists.
Prompt injection: Untrusted resources (web pages, files) can instruct the LLM to call dangerous tools. Solution: server-side validation, idempotency, and hard authorization checks independent of LLM prompts.
Secrets leakage: Don’t echo tokens in logs or responses. Solution: secret redaction and sealed storage; avoid returning secrets to the LLM at all.

Security checklist to apply today:

Drop privileges; run as a non-root user.
Constrain filesystem access to a single workspace root.
Enforce JSON Schema on inputs; reject unknown fields in strict modes.
Validate outputs too if you generate machine-consumed data downstream.
Add per-tool rate limiting and circuit breakers.
Implement transport-level auth for remote deployments (mTLS or OAuth 2.0).
Emit audit logs of tool calls with arguments hashes (not raw PII) and results metadata.

JSON Schemas: the contract between model and tool

MCP is schema-first. Every tool exposes an input schema. Many servers also document their output schema—even if the protocol doesn’t force it—because clients, evaluators, and downstream tools benefit from structure.

A concrete example (JSON Schema draft-07 style) for a precise search tool:

json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "search",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "query": { "type": "string", "minLength": 1 },
    "top_k": { "type": "integer", "minimum": 1, "maximum": 20, "default": 5 },
    "site": { "type": "string", "description": "Restrict to a domain (optional)" }
  },
  "required": ["query"]
}

Output schema (useful for downstream automation):

json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "search_result",
  "type": "object",
  "properties": {
    "items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "title": {"type": "string"},
          "url": {"type": "string", "format": "uri"},
          "snippet": {"type": "string"}
        },
        "required": ["title", "url"]
      }
    }
  },
  "required": ["items"],
  "additionalProperties": false
}

Tips:

Set additionalProperties: false in strict tools. This prevents silent typo drift.
Use enums for known modes (e.g., {mode: "fast"|"thorough"}).
Incorporate minLength/maximum to give the model guardrails.
Include descriptions; many clients surface these to the LLM and the user.

The handshake and core methods (wire-level view)

The MCP exchange follows JSON-RPC semantics. The typical flow is:

Client initializes, server acknowledges capabilities.

json
{ "jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {
  "protocolVersion": "1.0",
  "capabilities": { "resources": true, "tools": true, "prompts": true },
  "clientInfo": {"name": "example-client", "version": "0.2.3"}
}}

json
{ "jsonrpc": "2.0", "id": 1, "result": {
  "protocolVersion": "1.0",
  "serverInfo": {"name": "my-mcp-server", "version": "0.1.0"},
  "capabilities": {"resources": true, "tools": true, "prompts": true}
}}

Client asks for tools/resources/prompts.

json
{ "jsonrpc": "2.0", "id": 2, "method": "tools/list" }

json
{ "jsonrpc": "2.0", "id": 2, "result": {
  "tools": [
    { "name": "search", "description": "Web search", "inputSchema": {"$schema": "http://json-schema.org/draft-07/schema#", "type": "object", ...} },
    { "name": "write_note", "description": "Append a note", "inputSchema": {"type": "object", ...} }
  ]
}}

Client invokes a tool.

json
{ "jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": {
  "name": "search",
  "arguments": { "query": "mcp protocol spec", "top_k": 3 }
}}

json
{ "jsonrpc": "2.0", "id": 3, "result": {
  "content": [ { "type": "json", "json": { "items": [ {"title": "...", "url": "..."} ] } } ]
}}

Streaming and events are sent as JSON-RPC notifications (no id), interleaved while a call is in progress. For example, progress updates:

json
{ "jsonrpc": "2.0", "method": "events/progress", "params": {"tool": "search", "progress": 0.5, "message": "Fetching results"} }

Clients differ in how they surface these, but the wire pattern is the same.

Note: exact method names and fields can evolve; consult the current MCP spec and SDK docs for authoritative names. The pattern above captures the essence you’ll implement using SDK helpers rather than constructing JSON manually.

A minimal MCP server (TypeScript)

Below is a compact server using an MCP SDK pattern. It exposes three tools, one resource namespace, and emits progress events. Code is illustrative; use the official package names and APIs from the MCP SDK and your framework.

ts
// ts-node src/server.ts
import { createServer, ToolDef, ResourceDef, startWebSocketServer } from "@modelcontextprotocol/sdk"; // illustrative import
import Ajv from "ajv";
import fetch from "node-fetch";
import { RateLimiterMemory } from "rate-limiter-flexible";
import { readFile, writeFile, mkdir } from "node:fs/promises";
import path from "node:path";

const WORKSPACE_ROOT = process.env.WORKSPACE_ROOT || path.resolve(process.cwd(), "./workspace");
const limiter = new RateLimiterMemory({ points: 20, duration: 60 }); // 20 calls/min per connection
const ajv = new Ajv({ allErrors: true, removeAdditional: "failing" });

// Schemas
const searchSchema = {
  $schema: "http://json-schema.org/draft-07/schema#",
  type: "object",
  additionalProperties: false,
  properties: {
    query: { type: "string", minLength: 1 },
    top_k: { type: "integer", minimum: 1, maximum: 10, default: 5 }
  },
  required: ["query"]
} as const;

const fetchSchema = {
  $schema: "http://json-schema.org/draft-07/schema#",
  type: "object",
  additionalProperties: false,
  properties: { url: { type: "string", format: "uri" } },
  required: ["url"]
} as const;

const writeNoteSchema = {
  $schema: "http://json-schema.org/draft-07/schema#",
  type: "object",
  additionalProperties: false,
  properties: { title: { type: "string", minLength: 1 }, body: { type: "string", minLength: 1 } },
  required: ["title", "body"]
} as const;

const server = createServer({
  name: "example-mcp-server",
  version: "0.1.0",
  onConnect: async (ctx) => {
    // Per-connection session state
    ctx.state = { startedAt: Date.now() };
  },
  middleware: [
    async (ctx, next) => {
      // Rate limiting per connection
      await limiter.consume(ctx.connectionId).catch(() => {
        throw new Error("Rate limit exceeded");
      });
      return next();
    },
  ],
});

// Tools
server.registerTool({
  name: "search",
  description: "Perform a web search (demo)",
  inputSchema: searchSchema,
  handler: async (args, ctx) => {
    const validate = ajv.compile(searchSchema);
    if (!validate(args)) throw new Error("Invalid arguments: " + ajv.errorsText(validate.errors));
    const { query, top_k } = args as any;

    // Streaming progress
    ctx.events.progress({ tool: "search", progress: 0.1, message: `Searching for "+query+"` });
    // Demo: pretend we call a real search API
    await new Promise((r) => setTimeout(r, 400));
    ctx.events.progress({ tool: "search", progress: 0.7, message: "Ranking results" });

    const items = Array.from({ length: top_k || 5 }).map((_, i) => ({
      title: `Result ${i + 1} for ${query}`,
      url: `https://example.com/${encodeURIComponent(query)}/${i + 1}`,
      snippet: "..."
    }));
    ctx.events.progress({ tool: "search", progress: 1.0, message: "Done" });

    return { content: [{ type: "json", json: { items } }] };
  },
});

server.registerTool({
  name: "fetch_url",
  description: "Fetch the contents of a URL with allowlist",
  inputSchema: fetchSchema,
  handler: async (args) => {
    const validate = ajv.compile(fetchSchema);
    if (!validate(args)) throw new Error("Invalid arguments");
    const { url } = args as any;

    // Basic SSRF protections: deny private IP ranges
    const deny = [/^https?:\/\/(10\.|127\.|192\.168\.)/, /^https?:\/\/\[?::1\]?/];
    if (deny.some((re) => re.test(url))) throw new Error("URL not allowed");

    const res = await fetch(url, { redirect: "follow", headers: { "User-Agent": "mcp-server/0.1" } });
    const text = await res.text();
    const mime = res.headers.get("content-type") || "text/plain";

    return { content: [{ type: "text", text: text.slice(0, 4000) }], mimeType: mime };
  },
});

server.registerTool({
  name: "write_note",
  description: "Append a markdown note in the workspace/notes folder",
  inputSchema: writeNoteSchema,
  handler: async (args) => {
    const { title, body } = args as any;
    const notesDir = path.join(WORKSPACE_ROOT, "notes");
    await mkdir(notesDir, { recursive: true });
    const file = path.join(notesDir, `${title.replace(/[^a-z0-9_-]+/gi, "-").toLowerCase()}.md`);
    const content = `# ${title}\n\n${body}\n`;
    await writeFile(file, content, { flag: "a" });
    return { content: [{ type: "text", text: `Saved note: ${file}` }] };
  },
});

// Resources: expose notes directory read-only
server.registerResourceProvider({
  name: "notes",
  list: async () => {
    // Return URIs like mcp://notes/<name>.md and metadata; simplified here
    return [{ uri: "mcp://notes/README.md", mimeType: "text/markdown", description: "Example" }];
  },
  read: async (uri) => {
    const p = uri.replace("mcp://notes/", "");
    const file = path.join(WORKSPACE_ROOT, "notes", p);
    if (!file.startsWith(path.join(WORKSPACE_ROOT, "notes"))) throw new Error("Out of bounds");
    const text = await readFile(file, "utf8");
    return { mimeType: "text/markdown", text };
  },
});

// Transport: WebSocket with optional Bearer token
const port = Number(process.env.PORT || 8976);
startWebSocketServer({
  server,
  port,
  authenticate: (req) => {
    const token = (req.headers["authorization"] || "").toString().replace(/^Bearer\s+/i, "");
    if (process.env.MCP_TOKEN && token !== process.env.MCP_TOKEN) {
      return { ok: false, status: 401 };
    }
    return { ok: true };
  },
});

console.log(`MCP server listening on ws://localhost:${port}`);

What this buys you:

Strict schema validation with Ajv
Basic SSRF guardrails and path traversal checks
Rate limiting per connection
Clean streaming of progress events
A read-only resource namespace and a simple write tool within a sandboxed root
Transport auth via Bearer tokens (upgrade to mTLS for internal networks)

A minimal MCP server (Python)

Python SDKs follow a similar shape. Here’s an illustrative example with Pydantic/JSON Schema validation and asyncio. Treat imports as indicative; use the concrete names from the official MCP Python package.

python
# python -m server
import asyncio
import json
import os
import re
from pathlib import Path
from typing import Dict, Any

from mcp.server import Server  # illustrative
from mcp.schemas import JsonSchemaValidator

WORKSPACE_ROOT = Path(os.environ.get("WORKSPACE_ROOT", ".")) / "workspace"
WORKSPACE_ROOT.mkdir(parents=True, exist_ok=True)

search_schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "additionalProperties": False,
    "properties": {
        "query": {"type": "string", "minLength": 1},
        "top_k": {"type": "integer", "minimum": 1, "maximum": 10, "default": 5},
    },
    "required": ["query"],
}

fetch_schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "additionalProperties": False,
    "properties": {"url": {"type": "string", "format": "uri"}},
    "required": ["url"],
}

write_note_schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "additionalProperties": False,
    "properties": {
        "title": {"type": "string", "minLength": 1},
        "body": {"type": "string", "minLength": 1},
    },
    "required": ["title", "body"],
}

server = Server(name="example-mcp-server", version="0.1.0")
validator = JsonSchemaValidator()

@server.tool(name="search", description="Web search (demo)", input_schema=search_schema)
async def search(args: Dict[str, Any], ctx):
    validator.validate(search_schema, args)
    q = args["query"]
    top_k = int(args.get("top_k", 5))
    ctx.events.progress(tool="search", progress=0.1, message=f"Searching for {q}")
    await asyncio.sleep(0.3)
    ctx.events.progress(tool="search", progress=0.8, message="Ranking")
    items = [
        {"title": f"Result {i+1} for {q}", "url": f"https://example.com/{q}/{i+1}", "snippet": "..."}
        for i in range(top_k)
    ]
    ctx.events.progress(tool="search", progress=1.0, message="Done")
    return {"content": [{"type": "json", "json": {"items": items}}]}

@server.tool(name="fetch_url", description="Fetch URL with allowlist", input_schema=fetch_schema)
async def fetch_url(args: Dict[str, Any], ctx):
    validator.validate(fetch_schema, args)
    url = args["url"]
    if re.match(r"^https?://(10\.|127\.|192\.168\.|\[?::1\]?)", url):
        raise ValueError("URL not allowed")
    import aiohttp

    async with aiohttp.ClientSession() as session:
        async with session.get(url, headers={"User-Agent": "mcp-server/0.1"}) as res:
            text = await res.text()
            mime = res.headers.get("content-type", "text/plain")
            return {"content": [{"type": "text", "text": text[:4000]}], "mimeType": mime}

@server.tool(name="write_note", description="Append a markdown note", input_schema=write_note_schema)
async def write_note(args: Dict[str, Any], ctx):
    title = args["title"]
    body = args["body"]
    safe = re.sub(r"[^a-z0-9_-]+", "-", title.lower())
    notes_dir = WORKSPACE_ROOT / "notes"
    notes_dir.mkdir(parents=True, exist_ok=True)
    f = notes_dir / f"{safe}.md"
    with f.open("a", encoding="utf-8") as fp:
        fp.write(f"# {title}\n\n{body}\n")
    return {"content": [{"type": "text", "text": f"Saved note: {f}"}]}

@server.resource_provider(name="notes")
class NotesProvider:
    async def list(self):
        # Simplified
        return [{"uri": "mcp://notes/README.md", "mimeType": "text/markdown", "description": "Example"}]

    async def read(self, uri: str):
        p = uri.replace("mcp://notes/", "")
        f = (WORKSPACE_ROOT / "notes" / p).resolve()
        if WORKSPACE_ROOT not in f.parents:
            raise ValueError("Out of bounds")
        return {"mimeType": "text/markdown", "text": f.read_text(encoding="utf-8")}

if __name__ == "__main__":
    # Start a WebSocket server; add auth hooks per your framework
    server.run_websocket(host="127.0.0.1", port=int(os.environ.get("PORT", "8976")))

Transport and auth: local vs. remote

MCP deliberately keeps auth in the transport layer so you can choose context-appropriate mechanisms.

Local, spawned via stdio: simplest, no network or auth required. Ideal for IDE extensions spawning local processes. Constrain environment and filesystem.
Local WebSocket/Unix socket: good for desktop agents connecting to local services. Use a random per-session token or OS socket permissions.
Remote WebSocket over TLS: use mTLS inside your network or OAuth 2.0 (Client Credentials for app→service; Authorization Code + PKCE for user-consented flows). Put the token in the Authorization header and validate on connection.
SSH tunnel: useful for on-prem deployments behind firewalls.

Do not embed secrets in MCP messages. Store tokens server-side (keychain, KMS, Vault) and only return minimal metadata to the client.

Streaming: progress and partials are first-class

Users expect streaming. LLMs reason better when they can interleave tool output with ongoing planning. MCP’s event notifications let you:

Emit progress updates: percentage, phase labels
Send partial text chunks (stdout-like)
Stream structured JSON chunks for long-running queries

Patterns that work well:

Always begin with a short progress message like “Starting X (this may take ~20s)” so the client can reassure the user.
Use monotonic progress increments for a better UX.
If you stream chunks, also return a final, consolidated result for clients that don’t surface streams.

Example: incremental text stream in a tool handler (TypeScript-like pseudo):

ts
handler: async (args, ctx) => {
  const stream = ctx.events.textStream({ tool: "index_repo" });
  stream.write("Cloning repo...\n");
  // ...
  stream.write("Indexing 12 files...\n");
  // ...
  stream.end();
  return { content: [{ type: "text", text: "Index complete: 12 files" }] };
}

Example: streaming JSON chunks (batched results):

ts
const stream = ctx.events.jsonStream({ tool: "search" });
for await (const batch of batchedResults) {
  stream.write({ items: batch });
}
stream.end();
return { content: [{ type: "json", json: { total_batches: n } }] };

State: session, durable, and idempotency

MCP itself is stateless at the protocol level but your server will need state:

Session state: ephemeral per connection, e.g., a scratchpad or cancellation tokens. Use ctx.state keyed by connection/session id.
Durable per user: preferences, OAuth tokens, cached embeddings. Store server-side keyed by a stable subject (user id) conveyed by the transport (e.g., from an auth token) or by the client in initialization metadata. Do not rely on LLM memory for critical state.
Idempotency: allow clients to retry tool calls safely. Include an optional idempotency_key in arguments for mutating tools (e.g., write_note). Use it to dedupe.

A simple pattern for idempotent writes in TypeScript:

ts
server.registerTool({
  name: "write_note_v2",
  inputSchema: {
    type: "object",
    additionalProperties: false,
    properties: {
      title: { type: "string" },
      body: { type: "string" },
      idempotency_key: { type: "string", description: "Client-provided UUID" }
    },
    required: ["title", "body"]
  },
  handler: async (args, ctx) => {
    const { title, body, idempotency_key } = args as any;
    if (idempotency_key && await alreadyProcessed(idempotency_key)) {
      return { content: [{ type: "text", text: "Already processed" }], httpStatus: 200 };
    }
    const path = await saveNote(title, body);
    if (idempotency_key) await recordProcessed(idempotency_key);
    return { content: [{ type: "text", text: `Saved: ${path}` }] };
  }
});

Resources and prompts: more than just tools

Tools are for actions; resources are for context. Many LLM tasks benefit when the client can browse and selectively include relevant resources without calling tools that mutate state.

Resources: folder-like namespaces with list and read operations. Good for file trees, knowledge bases, dataset catalogs, API object listings. Always attach a MIME type to help clients render.
Prompts: named templates with an input schema. They’re a portable way to give the client well-structured prompt building blocks that can be parameterized by the model.

Example prompt definition (JSON):

json
{
  "name": "summarize-note",
  "description": "Summarize a markdown note for a busy engineer",
  "inputSchema": {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
      "audience": {"type": "string", "enum": ["lead", "peer", "executive"], "default": "peer"},
      "length": {"type": "string", "enum": ["short", "medium", "detailed"], "default": "short"}
    },
    "additionalProperties": false
  },
  "messages": [
    {"role": "system", "content": "You write crisp technical summaries with canonical links and bullet points."},
    {"role": "user", "content": "Summarize the following note for an {audience} in a {length} length: \n\n{note_text}"}
  ]
}

A client can fetch this by name, fill variables, and call its model. Because the prompt has a schema, the LLM (and the UI) are more likely to honor intended use.

Compatibility: Claude, ChatGPT, IDEs, and agents

Claude Desktop and Claude for VS Code have focused support for MCP servers and can connect to local or remote servers you run. You can add your server’s WebSocket URL or configure a local process.
IDE and terminal agents increasingly standardize on MCP for tool integrations. A single server lets you serve those ecosystems without per-editor code.
ChatGPT and other assistants can consume MCP via community adapters today; native support is emerging. The practical path is to expose your functionality as an MCP server now, and bridge to platforms that don’t yet speak MCP using thin shims rather than re-implementing your business logic.

Deployment patterns that work:

Local dev tools: spawn your server as a child process (stdio transport) so no network exposure is needed.
Team/enterprise tools: host a remote WebSocket service behind TLS with mTLS or OAuth; IDEs/agents connect with team-issued credentials.
Hybrid: a thin local server that proxies to a remote service after adding local capabilities (filesystem, editor buffers) while enforcing least privilege.

Observability: logs, metrics, traces

Treat your MCP server like any other production API.

Structured logs: request id, connection id, tool name, latency, status. Redact arguments that may contain PII.
Metrics: per-tool QPS, error rate, P95 latency, rate limit rejects. Export to Prometheus or your APM.
Traces: instrument each tool call with OpenTelemetry spans. Propagate a traceparent if the client provides one (e.g., in initialize metadata). This makes debugging LLM plans that call multiple tools tractable.
Auditing: for mutating tools, record a durable audit entry with a minimal fingerprint of inputs and the effective principal.

Versioning and evolution

Your schemas will change. Keep compatibility frictionless:

Semantic version your server and individual tools. If you make a breaking input change, publish a new tool name (e.g., write_note_v2) and deprecate the old one later.
Add fields in a backward compatible way when possible and keep additionalProperties false to guard against silent typos.
Announce capability flags in the initialize response so clients can branch on features rather than probing behavior.
Provide a changelog in your server’s README and surface deprecation warnings via events/progress to guide users.

Testing and evaluation

Unit tests: validate schemas, path safety, SSRF guards, and idempotency logic.
Contract tests: record/replay JSON-RPC sequences (initialize → tools/list → call) and assert stable results.
LLM-in-the-loop evals: curate prompts that exercise your tools and measure success with golden outputs. Ensure the LLM reliably chooses your tool when appropriate and abstains otherwise.
Fuzzing: generate random JSON arguments within schema bounds to harden parsing.

End-to-end example: ship once, run anywhere

Suppose you want to ship a “docs assistant” that:

Searches internal docs (API)
Fetches pages by URL (with SSRF protections)
Reads local project notes in an IDE workspace
Summarizes and produces action items

With MCP:

Implement tools: search_docs, fetch_url, write_note, list_notes
Implement resources: notes/ (read-only), bookmarks/ (read-only)
Implement prompts: summarize, action_items
Package as an MCP server with both stdio and WebSocket transports

Then, integrate:

Claude Desktop: add your ws://localhost:8976 endpoint; Claude can call tools/resources directly.
VS Code: install an MCP-aware extension and point it at your server; now the model can access project notes.
ChatGPT: use a thin adapter that bridges MCP to the platform’s tool interface until native MCP is available; no changes to your server.

The core logic, security guarantees, and schemas are the same everywhere. You do not rewrite per model.

Common pitfalls and how to avoid them

Overbroad tools: “run_shell” is a footgun. Prefer specific, parameterized tools with narrow permissions.
No output schema: downstream chains break when text formats drift. Emit structured JSON whenever possible.
Silent failures: always return explicit error messages and non-200-like statuses in results. Emit progress with error phases.
Weak auth: never rely on obscurity. For remote servers, enforce TLS and strong auth. Rotate tokens.
Streaming-only results: some clients don’t surface streams. Always return a final consolidated result.
Unbounded outputs: cap text length and number of items to protect client UIs and token budgets.

Migration notes: from Plugins/Actions to MCP

Map OpenAPI endpoints to MCP tools. Your OpenAPI schemas translate directly to JSON Schemas for inputs/outputs.
Replace platform-specific auth (e.g., ChatGPT OAuth handshake) with transport-layer OAuth for your MCP server.
Replace “fetch this URL and let the LLM parse” with a resource provider that provides MIME-typed content and partial reads.
Replace platform streaming with MCP events; unify your progress semantics across clients.

You’ll likely delete code, not add it.

Production checklist

Security: sandbox root paths, network allowlists, mTLS/OAuth, secrets in KMS/keychain, no secrets in logs.
Reliability: rate limiting, circuit breakers, retry with idempotency, timeouts, health checks.
Observability: structured logs, Prometheus metrics, OpenTelemetry traces.
Compatibility: semantic versioning, backward-compatible changes, deprecation plan, explicit capability flags.
Documentation: README with install/run, tool/resource lists with schemas, examples, changelog.
Testing: unit, contract, LLM evals, fuzzing.

Final take: MCP as the lingua franca for AI tooling

Fragmentation isn’t inevitable. When the contract between the LLM client and your integration is a small set of predictable JSON-RPC calls with schemas, you can stop maintaining N×M matrixes of adapters and start shipping one solid product. MCP is not glamorous—it’s just a clean, practical protocol—but that’s precisely why it will outlast bespoke plugin systems.

Adopt MCP now:

You’ll gain multi-client reach without per-platform rewrites.
You’ll improve security by centralizing policy in one server instead of reproducing it across UIs.
You’ll standardize streaming and state management.
You’ll make your integration testable and observable like any other API.

The sooner your tools speak MCP, the sooner your users can use them wherever they work—chat, terminal, or IDE—without you shipping one-off adapters.

References and further reading

Model Context Protocol: specification and SDKs (official repositories)
JSON-RPC 2.0 Specification: http://www.jsonrpc.org/specification
JSON Schema: https://json-schema.org
OWASP SSRF Prevention Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html
OpenTelemetry: https://opentelemetry.io

Search your MCP SDK’s README for up-to-date method names and capability flags; examples in this article are intentionally illustrative and focus on architecture and security patterns rather than locking to a specific SDK surface.