Skip to content

Instantly share code, notes, and snippets.

@lmmx
Last active March 17, 2025 16:06
Show Gist options
  • Save lmmx/8e7adebc6f04cd02d7bdf39668f6f5bb to your computer and use it in GitHub Desktop.
Save lmmx/8e7adebc6f04cd02d7bdf39668f6f5bb to your computer and use it in GitHub Desktop.

Below is a chronological “blow by blow” summary of the discussion. Each heading corresponds to a comment or set of closely related comments on a given date. I’ve highlighted who spoke, when, and the main takeaways.


December 6, 2024

Comment by @jspahrsummers (Original Post)

  • Context: MCP (Model Context Protocol) is currently stateful (long-lived connections). This enables notifications, server-initiated sampling, logging, etc., but makes serverless deployments tricky. Many serverless platforms don’t allow or prefer long-lived connections.
  • Problem: “Autoscaling, ephemeral” serverless environments have short request lifetimes and can’t easily handle SSE or a single persistent connection.
  • Possible solutions:
    1. Session tokens – The connection is re-modeled as repeated short calls with a token (i.e., storing or encoding session state).
    2. Two protocol variants – “Stateless” for simpler servers; “Stateful” for richer features.
    3. Fully stateless – Radically simpler, but we lose features like notifications and server-initiated messages.

He asks for feedback on whether this is a real problem, and which direction might solve it.


Follow-up comment by @jspahrsummers (same day)

  • Mentions “Claude’s thoughts” (apparently an internal summary or analysis).
  • Suggests exploring session tokens, or a “hybrid approach” where connection pooling or event-sourcing could be used.

Reply by @jspahrsummers, still December 6, 2024

  • Clarifies that “connection pooling” doesn’t necessarily solve the issue in serverless because sticky routing is annoying across ephemeral instances.
  • Event sourcing is interesting, but might be complicated or produce too much extra event-logging overhead.

December 12, 2024

Comment by @cramforce

  • Speaks from a Vercel perspective.
  • Prefers Option 1 (session token).
  • Possibly do a hybrid approach: a server can still keep in-memory state if it wants, or it can store the session data in some external store.
  • Resumability is a nice feature.

December 19, 2024

Comment by @orliesaurus

  • Asks: “What are some stateful use cases you’ve seen?”

Reply by @jspahrsummers, January 2, 2025

(Though the question was from December 19, the reply is posted January 2.)

  • Example use case: being notified of resource changes. This is effectively “live” state.

Comment by @azdolinski (December 19)

  • Has a prototype with stateful client-server connections.
  • Argues in favor of having an AAA (Auth, Authorization, Accounting) layer.
  • Mentions the usefulness of state across multiple calls (like one tool to write, another to run, a third to analyze).
  • Votes for Option 2: a “stateless” variant plus a “full-featured stateful” variant.

Reply by @soren-olympus, January 16, 2025

  • Points out that preserving context privately on the server side might be important for certain secure or “clean-room” style use cases.

January 2, 2025

Comment by @allenporter

  • Asks whether a max request timeout is truly a big problem, or if we can handle reconnect logic.
  • Wonders if the average LLM session is only a few minutes anyway.
  • Points out that stateful connections can get dropped (like with poor network). So clients must handle reconnection anyway.
  • Believes it may not be worth drastically changing the protocol yet.

Reply by @in-op

  • Common scenario: multiple pods in Kubernetes. The SSE channel goes to one pod, subsequent requests get round-robined, breaking the protocol.

Reply by @allenporter

  • Says the Python SDK encodes a session in follow-up posts. One can do it, but he questions if that implies bigger changes.

January 6, 2025

Comment by @atesgoral

  • Suggests “progressive enhancement”: a base JSON-RPC call for tools, plus optional SSE for real-time notifications.
  • Doesn’t want to sabotage the simpler approach by forcing SSE or complicated connections.
  • Points out SSE alone doesn’t let the server make requests of the client (like sampling) in a truly symmetrical way.

Reply by @dave-shawley

  • Mentions that “notifications.*” are a way to do some server->client messaging, but acknowledges the difficulty of fully bidirectional messaging on HTTP.

January 8, 2025

Comment by @sean-roberts

  • Favors Option 2 (both stateless and stateful possible). The separate “transport contract” from the actual payload might reduce complexity.

Reply by @sean-roberts (same day)

  • Reiterates the idea of decoupling how state is maintained from how messages are transported.

Comment by @pcingola

  • Also supports “stateless + stateful.” Thinks it will help with scaling, federation, building “MCP Hubs,” etc.

Comment by @orliesaurus (a question from earlier)

  • The reply from [@jspahrsummers] clarifies usage examples.

January 27, 2025

Comment by @calclavia

  • Building a hosting platform for MCP servers. SSE is hard to scale because it’s inherently stateful; multiple replicas get complicated with sticky routing.
  • Wonders why SSE was chosen over websockets or gRPC.

Reply by @jerome3o-anthropic (January 29, 2025)

  • SSE is “simple to implement” for many web stacks, but acknowledges routing complexity across pods.

Reply by @calclavia (January 30, 2025)

  • Would love official WebSocket documentation. Points out it’s in the SDK, but not thoroughly specified.

Reply by @the-vampiire (March 8, 2025, but nested here)

  • Surprised WebSockets vs. SSE hasn’t had more direct discussion. WS feels more natural for an MCP server.

January 30, 2025

  • Summarizes the “direction he’s leaning”: Let typical client->server calls happen over standard HTTP, and let server->client notifications be optional SSE. So you only do the “long lived” piece if you want the extra features.

Reply by @atesgoral

  • The notification URL could also be a ws: link. Could do a two-step upgrade approach.

February 3, 2025

Comment by @jspahrsummers

  • That approach (HTTP requests + optional SSE) “threads the needle” between simple and powerful.
  • Possibly require SSE but allow servers to disconnect quickly if they don’t actually want a long “live” connection.

Comment by @pcingola

  • Summarizes how ephemeral SSE connections can be: open them for the duration of an agent or tool call, then close.

February 19, 2025

Comment by @artpi

  • SSE is tough in PHP/WordPress environments. Has to hack the stack to hold open connections.
  • Expresses preference for simpler “query/response” semantics. Maybe notifications handled with webhooks or polling.
  • So, sees SSE as optional.

February 20, 2025

Comment by @clareliguori

  • Distinguishes two categories of server->client:

    1. “Notifications about what the server can do” (like resource or tool changes).
    2. “Requests in the middle of a long-running job” (like progress notifications).
  • Suggests separate flows for each, or ephemeral sessions that wrap the job. If the connection breaks, the client might need to re-request status.


March 1, 2025

Multiple comments by @atesgoral

  • “MCP Lite” approach: just standard JSON-RPC over HTTP, no SSE.
  • For truly “live” use cases, add an optional streaming or SSE approach after a handshake.
  • Mentions short-lived SSE as a direct response stream, or a “Job” concept for long-running tasks.

March 3, 2025

Comment by @tristanz

  • Mentions typical patterns: “fire and forget” vs. “job endpoints,” referencing Google’s AIP-151.
  • If a job is long-running, the server returns an “operation resource” and you can poll or subscribe for updates (SSE or websockets).
  • That pattern might fit MCP well.

March 4, 2025

Comment by @kentonv (Cloudflare)

  • Stateful is fine if you can pin requests to a Durable Object. WebSockets also recommended.
  • Doesn’t mind session IDs but personally prefers a single, simpler WebSocket for cleanup.

Comment by @daviddenton

  • SSE can also handle pings for cleanup.
  • Noting that websockets can be more complicated in some frameworks.

Comment by @kentonv (further)

  • In Cloudflare’s experience, SSE is often more trouble than websockets in terms of intermediate proxies buffering or dropping connections.

Comment by @topherbullock (Shopify)

  • SSE plus POST side channels is painful to implement in typical web frameworks.
  • WebSockets or ActionCable is more standard in the Rails ecosystem.

Comment by @dsp-ant (March 7, 2025)

  • Mentions it’s tricky to do custom auth headers in the browser’s websockets.

Replies by @atesgoral and @Kludex

  • You can embed tokens in the URL or subprotocol.
  • Usually you do an initial handshake over HTTP, then a “signed” WS link.

March 7, 2025

Various participants (@thdxr, @sean-roberts, @pcingola, etc.)

  • Many repeated the theme: “We want a simpler default HTTP approach but also an optional advanced stateful mode with SSE or WS.”
  • “Stateless by default, plus optional session-based advanced features.”

Comment by @gingerhendrix

  • Suggests “maybe keep it even simpler: just keep stdio for local, and for remote, let devs pick any bridging approach.”

Comment by @wycats (March 12)

  • Also sees that aligning with normal HTTP or optional upgrade to WS is valuable.

March 10, 2025

Comment by @VoloBuilds

  • Argues for Option 3 (fully stateless). Feels “server-initiated sampling” is an anti-pattern, plus it can cause security issues.
  • If we absolutely need stateful, do Option 2 so servers can choose.

Various replies from others:

  • Some people strongly want to keep advanced features like sampling, but possibly as an optional or separate part of the spec.

March 11, 2025

Comment by @lloydzhou

  • Describes a detailed architecture using Nchan (an NGINX module) as a front-end gateway or “transport” for SSE/Websocket bridging.
  • Emphasizes that with an external queue/Redis, we can scale.

March 13–14, 2025

Comment by @Mocuto

  • Asks if sampling couldn’t be done by webhooks. Possibly we’d store session or sampling IDs.

Comment by @richard-rance

  • Also questions whether sampling should exist, rather than making the tool itself “sub-agent with an LLM.”

March 14–15, 2025

Comment by @gtrguy17

  • Quotes an external blog post about SSE difficulties in serverless.
  • Argues that websockets might be more stable than SSE.

Comment by @atesgoral

  • Notes that SSE vs. WebSockets is partly about robust libraries, not the inherent protocol. Both need keepalives, reconnection strategies, etc.

March 17, 2025

Closing comment by @jspahrsummers

  • Thanks everyone.
  • Announces a proposal for a new “Streamable HTTP transport” — basically plain HTTP request/reply plus optional SSE for streaming.
  • Notes that WebSockets remains possible in the future, but for now “plain HTTP + optional SSE” is the path.

Shortly after this, the discussion is closed as resolved.


Overall Takeaways

  1. The Problem

    • Maintaining a long-lived, stateful connection is difficult on autoscaling or serverless systems.
    • SSE requires “sticky” routing or a specialized load-balancer approach to route subsequent calls to the same instance holding state.
  2. Proposed Approaches

    • “Stateless” or “HTTP-only” calls for quick tool calls.
    • Optional SSE or WebSockets for advanced features (notifications, server-initiated messages).
    • Possibly store session data externally if you really need to scale.
  3. Final Decision

    • The maintainers lean toward a simpler “HTTP + optional SSE” approach, known as the “Streamable HTTP transport.”
    • Leaves the door open for WebSockets or extended sessions in the future.

Thus, the entire thread moves from the initial question—“Is our stateful SSE approach too limiting?”—to a fairly strong consensus that “Yes, do something simpler by default (plain HTTP), with SSE or WS as an add-on for advanced use, so we can accommodate ephemeral or serverless environments.”

That’s the high-level summary of every major comment and date in the discussion.

Below is a detailed, comment-by-comment “blow by blow” of the entire GitHub discussion, with special attention to the technical arguments and points of contention around stateful vs. stateless connections, SSE vs. WebSockets, serverless constraints, etc. The headings correspond to chronological order and group related replies as they appeared in the thread. I’ve preserved each participant’s main technical points (and reasoning) while also including date/timestamps where relevant.


December 6, 2024

Original Post by @jspahrsummers

  • Context:
    • MCP is currently “stateful,” expecting a single, long-lived connection (e.g. SSE or similar).
    • This supports features like:
      1. Notifications about resource changes or tool availability (client<->server).
      2. Server-initiated sampling calls, enabling agent workflows.
      3. Arbitrary logging from server to client.
    • However, having a single long-lived connection is difficult in serverless or autoscaling environments (many only allow short-lived requests).
  • Problem:
    • Platforms like AWS Lambda or Google Cloud Run often have time limits (in minutes) and don’t handle sticky connections well.
    • This discourages broader MCP adoption by serverless developers.
  • Possible Solutions:
    1. Encapsulate state via a session token
      • Each new request (client->server) can attach a token.
      • The server might store or re-encode the session data.
      • Pros: Minimal changes to the actual protocol semantics; mostly a transport-layer detail.
      • Cons: Forces server authors to do more complicated session handling or persistence, which can be a non-trivial burden.
    2. Offer “stateless” and “stateful” modes
      • Keep all current stateful features for those who want them.
      • Provide an alternate “stateless” subset for serverless folks who don’t want to hold connections.
      • Pros: Flexible, preserves advanced features.
      • Cons: Splits the spec into two variants, potentially doubling complexity.
    3. Make MCP fully stateless
      • Remove the assumption of a persistent connection.
      • Pros: Much simpler for everyone, easier to deploy, no special connection issues.
      • Cons: Loses real-time notifications; server->client calls must become webhooks or polling.
      • Backwards-incompatible—big change right after release.

He concludes by asking if it’s worth solving, and for pros/cons or other suggestions.


Follow-up Comment by @jspahrsummers the Same Day

He references “Claude’s thoughts,” summarizing them:

  • Claude’s analysis:
    • Definitely a problem worth solving to support serverless.
    • Option 1 (Session Token) is pragmatic but burdens server authors with session serialization and storage. Perhaps the SDK can do some heavy lifting.
    • Option 2 (Stateless/Stateful) is a nice middle ground, but might introduce confusion or overhead.
    • Option 3 (Fully Stateless) is the simplest in theory but too limiting (no real-time events, more complicated client exposure, etc.).
    • Possibly consider a “hybrid approach,” e.g. connection pooling or event-sourcing.
      • But connection pooling can still be tough in ephemeral serverless contexts.
      • Event-sourcing helps reconstruct state, but might produce a lot of overhead or logs.

Reply by @jspahrsummers, still December 6

  • Emphasizes that “connection pooling” doesn’t really solve the ephemeral nature of serverless.
  • Event sourcing might be noisy. If you do frequent checkpoints, that’s basically the same as storing state anyway.

Additional Note by @jspahrsummers, December 6

  • Points out that there’s a difference between protocol-level state (like SSE subscription or open notifications) vs. application-level state (like a stateful agent in memory).
  • The latter might require a persistent connection anyway, depending on the use case.

December 12, 2024

Comment by @cramforce

  • Speaks from a Vercel perspective, leaning toward Option 1 (session token).
  • Suggests a hybrid approach where:
    • If the server is genuinely stateful and has sticky load balancing, it can keep state in memory.
    • Otherwise, have a token-based approach to re-hydrate state from an external store or a session ID.
  • Mentions that ephemeral sessions can be resumed if the session ID is passed around.

December 19, 2024

Comment by @orliesaurus

  • Simple question: “What are some examples of stateful use cases you’ve actually seen?”

(No direct same-day reply; see below for [@jspahrsummers]’s later response in January.)


Comment by @azdolinski

  • Already built a prototype for tool invocation that relies on a stateful client-server flow.
  • Believes AAA (Authentication, Authorization, Accounting) is quite important for “multiple invocation” scenarios.
  • For them, preserving context across many steps is crucial (like multiple micro-steps all happening inside the server).
  • Prefers Option 2 (both stateless and stateful) because they want to preserve advanced features.
  • Also references “Agent as a Tool” patterns, needing continuity inside the server.

January 2, 2025

Reply by @jspahrsummers to [@orliesaurus’s question from Dec 19]

  • Example stateful scenario:
    • “If you want the server to notify you about remote resources or changed events, that effectively requires state or subscription.”
    • You could do it via webhooks, but that reintroduces complexity on the client side or requires the client to be publicly accessible.

Comment by @allenporter

  • Asks if having a short time limit is truly such a big barrier. The average LLM session might be short anyway.
  • Also notes that a persistent connection can drop for many reasons—clients must handle re-connect logic anyway.
  • Summarizes that session-based management can introduce overhead like GC, session lifetime rules, etc.

Reply by @in-op

  • Points out that SSE plus multiple Kubernetes pods can break easily if you can’t do sticky routing (since SSE might go to Pod A, but the next POST could arrive at Pod B).

Reply by @allenporter

  • Acknowledges the python SDK encoding a session ID. Not necessarily requiring changes to the entire spec, but it’s a piece of the puzzle.

January 6, 2025

Comment by @atesgoral

  • Proposed “progressive enhancement” approach:
    • Base level: simple, stateless JSON-RPC calls. No SSE or advanced state.
    • Then, if the client wants real-time server->client notifications, it can “subscribe” to an SSE endpoint (discovered during initialization).
  • Argues we shouldn’t hamper “simple calls” with the overhead of SSE.
  • AAA or other complexities can be layered on top.
  • Likes that approach for notifications, but points out that “server making requests of the client” (like sampling) is trickier if we only have SSE. SSE is one-directional, so the server can’t truly prompt the client for something unless the client is also polling or calling back.

Reply by @dave-shawley

  • Mentions some JSON-RPC constructs (notifications.*) for pushing messages. True bidirectional on standard HTTP is always cumbersome.
  • We might add session IDs or document-level properties to help with partial reconnection.

January 8, 2025

Comment by @sean-roberts

  • Backs Option 2: add a simpler “stateless” plus a more advanced “stateful” variant. Let the developer pick.
  • Believes the protocol payload and the transport could be separately governed, so we don’t fork the entire spec.

Follow-up by @sean-roberts, same day

  • Also says the question of how to represent “state” should be decoupled from “how to do notifications.”

Comment by @pcingola

  • Backs “stateless vs. stateful.”
  • Believes “stateless” is simpler for scale/federation, while advanced “long-lived” features remain an option.

January 2–9, 2025 (overlapping dates)

(Some additional conversation references the earlier question by [@orliesaurus]; also [@calclavia] or [@atliesaurus] appear in the upcoming timeline. Skipping minor interim placeholders.)


January 27, 2025

Comment by @calclavia

  • They run a hosting platform for MCP. SSE is difficult because it’s not easy to route or scale. You often need sticky sessions for SSE.
  • Wonders why SSE was chosen over WebSockets or gRPC, which might handle ephemeral horizontal scaling better.

January 29–30, 2025

  • The original reason for SSE: it’s “pure HTTP” and simpler for many web frameworks to get started.
  • But acknowledges complexities for load balancing, sticky routing.
  • Mentions that “the question of SSE vs. WebSockets vs. event logs, etc.” is partly separate from the overall stateful vs. stateless question. Both SSE and WebSockets require holding connections.

Follow-up by @calclavia

  • Encourages official documentation for WebSocket as a first-class transport. It might solve some SSE headaches in ephemeral or high-traffic setups.

Comment by @the-vampiire (March 8 but nested here)

  • Surprised websockets are not the standard from the start, as they’re “more natural” for bidirectional agent connections.

January 30, 2025

Detailed Comment by @jerome3o-anthropic

  • Summarizes they’ve been mulling over the problem.
  • Points to the main reason for a stateful connection: advanced features like real-time resource changes or server-initiated sampling.
  • Observes that most developers do not use those features yet, so it’s creating friction with ephemeral deployments.
  • Proposes an approach:
    • “HTTP for requests, optional SSE for notifications.”
    • If you want advanced features, open (and keep open) an SSE channel.
    • If you’re on a purely ephemeral environment, you can skip that or quickly close it after partial usage.
  • Believes this “progressive enhancement” method doesn’t close the door on future features, but also doesn’t force SSE on simpler use cases.

February 3, 2025

Replies by @jspahrsummers and @pcingola

  • [@jspahrsummers]: This might be the best of both worlds: plain request-response for normal calls, SSE only if you want “push” style messages.
    • A serverless environment could connect briefly, get updates, then disconnect.
  • [@pcingola]: This might already be possible with minimal changes, as the protocol is flexible on how SSE is used. Just a matter of clarifying best practices.

February 19, 2025

Comment by @artpi (Automattic/WordPress context)

  • SSE is very awkward in PHP because of the stateless nature of typical WordPress hosting.
  • Had to do hacks (database polling, setting php.ini directives, etc.).
  • Argues for a purely “query-response” style by default, perhaps with optional webhooks or polling if you want “live” events.
  • SSE is fine, but only as an optional, advanced feature.

February 20, 2025

Comment by @clareliguori

  • Distinguishes two categories of server->client flows:
    1. Updates about server capabilities (tools/resources changes). The client might poll or subscribe for that.
    2. Long-running calls that need progress notifications or server->client requests.
  • Suggests that if the connection is lost, you might need a “session ID” or “job ID.”
  • If your server is ephemeral, you might store that ID in a data store, then pick up the job if the client reconnects.
  • Points out the difference between “short queries” and “long-living job references.”

March 1, 2025

Multiple Comments by @atesgoral

  1. On “MCP Lite”:
    • They currently do trivial JSON-RPC calls with no subscriptions or advanced notifications. That covers many use cases.
  2. Sampling Without Streaming (borderline idea):
    • The server can respond to a tools/call with a partial result that says “I need sampling from the client.” The client does it locally, then calls back with “tools/continue,” passing the sampling result.
    • This means no direct server->client push. The client orchestrates it.
  3. Short-lived SSE:
    • You can do a single POST for tools/call and the server returns SSE for the duration of that call, streaming partial progress.
    • Then it ends. No indefinite or “forever” SSE.

March 3–4, 2025

Comment by @tristanz (March 3)

  • Summarizes that many real-world APIs use a “job pattern” for longer operations. Possibly we can do:
    • tool/call -> returns job_id
    • operation/stream?job_id=xxx -> SSE or WebSocket for updates
    • operation/cancel -> cancels job
  • So it’s effectively “stateless calls” except for references to job objects.

Comment by @kentonv (Cloudflare) (March 4)

  • Cloudflare Workers’ Durable Objects can handle long-lived connections or ephemeral reconnection.
  • If we do session tokens, the question is how to handle cleanup when the client disappears.
  • Prefers WebSockets as an approach because it’s simpler to detect a client disconnect.

Comment by @daviddenton (March 4)

  • SSE is also feasible if you do pings to detect disconnection.
  • But yes, WebSockets can be more direct, although they can also be complicated on the server side in some frameworks.

Comment by @kentonv (further)

  • In practice, SSE sometimes hits issues with proxies buffering or times out. WebSockets are ironically more likely to pass through reliably in certain hosting/CDN layers.
  • Also, in Cloudflare land, they can do fancy things with websockets (like hibernating a durable object).

Comment by @topherbullock (Shopify) (March 4)

  • Echoes that SSE can be tricky with Rails. WebSockets / ActionCable are more mature for typical real-time logic.

March 7–8, 2025

Comment by @dsp-ant

  • One challenge with websockets in browsers is that you cannot set arbitrary headers for authentication. People discuss subprotocol hacks or passing tokens in the URL.

Reply by @atesgoral

  • That’s why sometimes you generate a signed ws:// URL after an HTTP handshake. It’s basically a short-lived token in the URL.

Reply by @Kludex

  • Another trick: use the protocols parameter in the JavaScript WebSocket constructor to pass something like “Authorization, token,” then deny or accept the upgrade.

Comment by @wycats (March 12)

  • If we stick close to standard HTTP + optional upgrade to WebSocket, we remain compatible with a lot of infra.

March 10, 2025

Comment by @VoloBuilds

  • Argues strongly for Option 3 (fully stateless).
  • Believes sampling is an “anti-pattern,” given the security concerns and complexity. If a server wants to do AI, let it do it on its own.
  • If we absolutely must keep advanced features, do Option 2 (a separate “advanced” path).
  • But simpler is better—most tools are basically REST-like anyway.

Various replies:

  • Some push back that sampling is valuable for agent-driven or private LLM scenarios.
  • Others note that “removing it entirely” would be too big of a break from current spec.

March 11, 2025

Comment by @lloydzhou

  • Detailed architecture using nchan as an NGINX module.
  • SSE or WebSockets can be fronted by nchan, which stores messages in Redis for horizontal scaling.
  • The actual MCP server is behind that, seeing simple HTTP requests. This decouples the ephemeral complexity from the core server logic.

March 13–14, 2025

Comment by @Mocuto

  • Wonders if sampling can be done with webhooks: the server pings the client’s endpoint.
  • Then the client returns data referencing a session ID.
  • But that requires the client to have an inbound route—trickier for local dev, etc.

Comment by @richard-rance

  • Also questions the necessity of “server-initiated sampling” at all—why not let the server manage its own LLM?

March 14–15, 2025

Comment by @gtrguy17

  • References a blog post about SSE’s limitations in serverless.
  • Concludes that for ephemeral scale, WebSockets might be more robust or flexible than SSE.

Reply by @atesgoral

  • SSE vs. WebSockets is partly about library maturity. Both can lose connections, both require keepalive logic.
  • SSE is a simpler protocol (just streaming text) but definitely one-way. WebSockets is full-duplex but not inherently more stable—just that many frameworks have better built-in handling for WS pings, etc.

March 17, 2025

Closing Comment by @jspahrsummers

  • Thanks everyone for the deep discussion.
  • Announces a new proposal for a “Streamable HTTP transport” (linking to PR #206), effectively “plain HTTP requests + optional SSE streaming.”
  • States that WebSockets could still happen later, but for now this approach tries to address the short-lived vs. persistent problem.
  • Closes the discussion as resolved.

Key Points of Technical Contention

  1. SSE vs. WebSockets

    • Some found SSE simpler and “pure HTTP,” but others found that ephemeral scaling or certain frameworks (like Rails, or serverless platforms) handle WebSockets better and SSE can cause buffering or proxy issues.
    • There’s no perfect universal consensus; many see SSE as an easy “push-only” model, while WebSockets are truly bidirectional and sometimes more stable in modern hosting.
  2. Stateful vs. Stateless

    • Many want a purely “stateless” mode so simple or serverless deployments can just do short calls, possibly with some “job or session ID.”
    • Others prefer to preserve the advanced features (notifications, server-initiated calls) that rely on long-lived connections.
    • The compromise: Let the server implement the advanced “live” features if it wants, otherwise do simple request-response.
  3. Server-Initiated Sampling

    • This specifically drew questions about security, overhead, or necessity. Some participants called it an “anti-pattern,” while others see it as central for agentic workflows.
    • In a purely stateless environment, sampling might require a “prompt from server -> client,” but the client can simply re-call the server, meaning no persistent SSE or WebSocket is strictly required. Others want a more direct approach (server directly requests a sample on the open connection).
  4. Serverless Constraints

    • Time-limits (like 15 minutes on some platforms), ephemeral containers, and no guaranteed local storage.
    • SSE typically needs a single container to remain open, so you can’t easily scale horizontally. Workarounds include external message brokers or bridging.
    • This push effectively started the push for “HTTP call, ephemeral SSE if needed, but no forced indefinite connection.”
  5. Load Balancing and Sticky Sessions

    • Many folks pointed out that SSE or long-lived websockets require either sticky routing or a big central state store.
    • If each request is randomly assigned to pods, it breaks the direct streaming connection unless you carefully route it or you have an external “channel” (Nchan, Redis, etc.) in the middle.
  6. Progressive Enhancement

    • A repeated theme: “Simple JSON-RPC or HTTP calls for everything by default, let advanced or real-time subscription be optional.”
    • This ensures that many devs (like WordPress or standard serverless) can skip implementing SSE or a persistent store.

In the end, the official plan that closes out the discussion was “Streamable HTTP,” effectively letting developers do normal POST calls for requests, with an optional SSE-based mechanism (short- or long-lived) if they want partial or real-time updates. WebSockets might still appear, but not mandated at this time.

That concludes the full blow-by-blow with all major technical details and points of contention from the entire GitHub Discussion #102.

1. Core Tension: Stateful vs. Stateless

Problem Statement

  • MCP’s existing design assumes a long-lived, stateful connection (often SSE).
  • This works well for:
    • Notifications of changes (like resource/tool updates).
    • Server-initiated requests (e.g., sampling).
    • Real-time logging or agentic workflows.
  • However, it causes serious friction in serverless/autoscaling platforms where short request lifetimes are common and “sticky routing” is not always feasible.

Key Points of Contention

  1. Long-lived connections: Many see them as powerful (real-time updates, push calls), but they conflict with ephemeral hosting.
  2. Recovery overhead: Some worry about how to restore session or subscription state if a persistent connection drops.
  3. Implementation complexity: Fully stateful servers must handle session tokens or persistent storage. Opponents prefer a simpler, stateless approach for easy scaling and simpler code.

Positions and Arguments

  • Pro-Stateful:
    • Those needing advanced agentic features or continuous server->client updates (e.g., watchers of dynamic resources).
    • They argue it’s essential for real-time collaboration, agent-driven tasks, private “server reasoning,” and easy server->client logging.
  • Pro-Stateless:
    • Serverless/back-end folks who want minimal overhead, no sticky sessions.
    • They claim simpler request-response patterns suffice for most use cases and encourage broader adoption.

2. Transport Mechanisms: SSE vs. WebSockets vs. HTTP

Main Debate

  • MCP originally relied heavily on SSE (Server-Sent Events) for streaming. Critics found it fragile behind load balancers and with short-lived containers.
  • WebSockets was proposed as a potentially more robust or “true” bidirectional transport.
  • Plain HTTP (with short polling or ephemeral streaming) was also favored by some for simplicity and better compatibility.

Points of Contention

  1. SSE Complexity:
    • Requires special handling in ephemeral or highly scaled deployments (sticky sessions, fallback solutions, etc.).
    • Some frameworks (PHP, Rails) find SSE trickier to implement natively.
  2. WebSockets:
    • Often easier to maintain stable connections if the infrastructure supports it.
    • But not all hosting environments handle WebSockets gracefully; some developers also found them tricky to integrate or test, especially regarding custom headers in browsers.
  3. Pure HTTP Requests:
    • Viewed as simplest for basic request-response; no persistent socket.
    • Lacks push from server to client unless you poll or do partial streaming in a single request.

Positions and Arguments

  • SSE Advocates:
    • Preferred it initially due to “standard HTTP” status, simpler for unidirectional push, and no separate upgrade step.
    • Accept it can be combined with short “transaction cycles” (connect for a tool call, then close).
  • WebSocket Advocates:
    • Argued it’s more stable across many proxies or CDNs and truly bidirectional.
    • People from Cloudflare, Shopify, or certain other ecosystems found SSE less reliable in their environments.
  • HTTP-Only Advocates:
    • Simple, minimal overhead.
    • Best for one-shot calls or ephemeral tasks.
    • Fine for scenarios that don’t need real-time push.

3. Server-Initiated “Sampling” Feature

Rationale for Sampling

  • MCP allows servers to request “sampling” from the client’s LLM, letting the server do partial tasks or agentic logic, then ask the client for more text/AI completions.

Controversy

  1. Security & Complexity:
    • Some saw it as an anti-pattern, allowing a remote server to leverage the client’s LLM resources or tokens, potentially raising trust and security issues.
  2. Alternative Approaches:
    • Statelers or minimal servers might just store a partial “prompt” and have the client re-post, or rely on webhooks.
    • Others want direct server->client calls for convenience and real-time agent flows.

Positions

  • In Favor of Sampling:
    • Proponents building “agentic workflows” that can’t run their own LLM.
    • They see server->client text requests as essential to advanced usage.
  • Against or Ambivalent:
    • Many simply do not see a pressing need or are wary of the security implications.
    • They would prefer each server handle its own AI or do a simpler “job token” approach (client calls back with results).

4. Deployment in Serverless Environments

Core Issue

  • Serverless platforms typically:
    • Limit request duration (minutes).
    • Offer ephemeral instances.
    • Often have no guaranteed local storage or sticky routing.

Arguments & Proposed Workarounds

  1. Session Tokens:
    • One viewpoint: re-hydrate state from a token if a container spins down.
    • Con: burdens the developer to store or load complex state externally.
  2. Short-Lived Connections:
    • Use ephemeral SSE or short WebSocket connections only when needed.
    • Others propose a “job model” (server returns an operation ID, client polls or streams).
  3. Intermediate Gateways:
    • Some set up specialized brokers (e.g., Nchan, Redis) so SSE or WebSocket connections are anchored in a stable gateway, letting serverless backends scale freely behind it.

5. Proposed Resolutions & Final Directions

Key Solutions Suggested

  1. Stateless + Optional Statefulness:
    • Provide a minimal HTTP approach for simple calls.
    • Add an optional SSE or WebSocket subscription for advanced features if desired.
  2. Progressive Enhancement:
    • Let a server implement “just JSON-RPC / HTTP” or “HTTP + SSE for notifications.”
    • Possibly add WebSockets down the line for richer bidirectional state.
  3. Streamable HTTP Transport (the eventual consensus):
    • Plain HTTP requests for tool calls, with the option to do an SSE stream on the same endpoint if you need partial results or notifications mid-call.

Who Championed Each Side

  • Those Preferring Simplicity/Stateless:
    • Various serverless platform users (e.g., folks from Vercel, WordPress, or ephemeral hosting) stressed minimal overhead and disliked holding open connections.
  • Those Needing Real-Time Stateful Features:
    • People focusing on agentic or advanced collaboration tasks wanted indefinite connections or at least an SSE/WS fallback.
    • Cloudflare also was open to stateful approaches (Durable Objects) but recognized not all would support that.
  • Final Spec Maintainers (e.g., [@jspahrsummers], others)
    • Listened to feedback; concluded an “HTTP + ephemeral SSE” approach is a good compromise.
    • Left the door open for WebSocket support in the future.

6. Overall Outcome

  • Most participants agree that forcing one single approach is not ideal.
  • The new direction: A simpler default (HTTP-based tool calls) plus optional streaming for advanced features.
  • This approach was formally introduced in a “Streamable HTTP Transport” proposal and closed the discussion with an acknowledgment that WebSockets could be added later if needed.

Summary of Key Takeaways

  • Stateful vs. Stateless: The protocol will support both, letting servers remain simple or go advanced.
  • SSE vs. WebSockets: SSE remains in the spec for streaming, but the spec maintainers acknowledge many teams prefer WebSockets and may formally add or refine it later.
  • Sampling: Remains a point of debate; some see it as vital, others see it as too complex or insecure.
  • Serverless Friendliness: The final direction aims to let ephemeral deployments work via short requests or ephemeral SSE channels, instead of mandatory long-lived connections.

read and explain a blow by blow account with dates: modelcontextprotocol/modelcontextprotocol#102

Click to show full text

State, and long-lived vs. short-lived connections #102

Closed

jspahrsummers started this conversation in Ideas

State, and long-lived vs. short-lived connections #102

@jspahrsummers jspahrsummers

Dec 6, 2024· 31 comments · 53 replies

Return to top

Discussion options

Quote reply

edited

[

jspahrsummers](/jspahrsummers)

Dec 6, 2024

Maintainer

Original comment in English - Translate to English

Context

MCP is currently a stateful protocol, with a long-lived connection between client and server. This allows us to support behaviors like:

  1. Notifications about changes—e.g., changes to resources, or tools being added/removed. (These can occur in either direction too.)
  2. Server-initiated sampling at any point, enabling agentic workflows.
  3. Passing arbitrary server logging through to the client.
  4. … more stuff in future? …

The connection is restartable with fairly little recovery cost (it's not catastrophic, like losing data), but the protocol is definitely not designed around repeatedly opening a connection, issuing one semantic request, then closing.

Problem

This is fairly limiting for serverless deployments, which frequently autoscale up and down, and generally aren't designed around long-lived requests (for example, typically there's a max request lifetime measured in minutes).

Deploying to a Platform-as-a-Service is really nice and convenient as a developer, so not being very compatible with this model creates an impediment to broader MCP adoption.

Possible solutions

I can imagine a few different answers here, each with their own tradeoffs:

Option 1: encapsulate state into a state or session token

Any stateful interaction over a long-lived connection could instead be modeled as independent requests (e.g., webhooks) by passing back and forth some sort of token that either:

  1. Encodes all session state itself, or…
  2. Merely identifies the stateful session, with server and client having some persistent storage associated with that session token.

Pros:

  • This is the simplest, incremental evolution from where MCP is today.
  • Implementable just at the transport layer—protocol and application logic can be (mostly) agnostic to this, I think.

Cons:

  • Somewhat annoying for servers to implement (and a key design principle of MCP is that servers should be really easy to implement):
    • Requires complex state serialization/deserialization. The SDKs could do some of this, but probably not all.
    • May require servers to be deployed with persistent storage.

Option 2: offer "stateless" and "stateful" variants of the protocol

Continue supporting all the behaviors I listed up top, but only when used in "stateful" mode. Offer a "stateless" mode that doesn't have those things.

It's possible that some transports could implement this in a fairly gradated way—e.g., HTTP could be stateful if client -> server can use SSE, but gracefully degrade to stateless by just using POSTed webhooks.

Pros:

  • Stateless servers could be implemented very simply.
  • No additional complexity for stateful servers.
  • Backwards compatible.

Cons:

  • Adds complexity to the spec and SDKs, and creates bifurcation.
  • Plausibly adds complexity to MCP hosts, which may need to connect to both kinds of servers.

Option 3: make all of MCP "stateless"

Make sweeping changes to completely revamp MCP into a fully stateless protocol. Drop all features that require statefulness, like those mentioned up top.

Pros:

  • Simplest for spec, SDKs, and client/server/host implementations (where "simple" here means "fewest concerns," not necessarily "easiest" or "best").

Cons:

  • Agents and "live" behaviors like subscriptions get significantly more difficult to implement.
  • Any server -> client communication would need to use something like web hooks (although this is transport dependent). Importantly, this means that the client would need to be internet-accessible, which isn't currently a requirement.
  • Very backwards incompatible. Massive change right after the initial release.

Thoughts?

I'd welcome all of:

  1. Feedback on whether you think this is a problem worth solving in the first place.
  2. Feedback or questions about the options listed here.
  3. Suggestions for other options I didn't think of and list!

Preview Was this translation helpful? Give feedback.

45  46 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

heart 3 @karlhorky@abegong@ffMathykarlhorky, abegong, and ffMathy reacted with heart emoji eyes 1 @lloydzhoulloydzhou reacted with eyes emoji

All reactions

  • heart3 reactions
  • eyes1 reaction

Replies: 31 comments · 53 replies

Comment options

Quote reply

[

jspahrsummers](/jspahrsummers)

Dec 6, 2024

Maintainer Author

Original comment in English - Translate to English

Claude's thoughts:

Based on the MCP documentation and the problem you've outlined, I'll share my analysis of the situation and the proposed solutions:

First, I agree this is a problem worth solving. The ability to deploy MCP servers in serverless environments would significantly lower the barrier to entry and increase adoption, especially for developers already working in cloud environments.

Let's analyze each option:

Option 1 (State/Session Token):

  • This feels like the most pragmatic solution while preserving MCP's current capabilities
  • The main concern is the implementation complexity for server authors, which goes against MCP's principle of being easy to implement
  • However, this could potentially be mitigated by:
    1. Having the SDKs handle most of the state management complexity
    2. Providing simple default implementations for common cases
    3. Making state persistence optional - servers could choose to only support ephemeral sessions

Option 2 (Stateless/Stateful Variants):

  • This is an interesting middle ground that provides flexibility
  • The graceful degradation approach with transports is particularly compelling
  • However, the bifurcation could lead to:
    1. Confusion about which mode to use
    2. Inconsistent behavior across different server implementations
    3. Increased complexity in host applications that need to handle both modes

Option 3 (Fully Stateless):

  • While this would simplify the protocol, it seems too restrictive
  • The loss of server->client communication would significantly limit MCP's capabilities
  • The requirement for clients to be internet-accessible is particularly problematic for desktop applications and IDE integrations

Additional Considerations:

  1. Hybrid Approach:
  • What if we kept the protocol stateful but added a "connection pooling" layer in the transport?
  • The transport could maintain a pool of connections and handle reconnection/state recovery transparently
  • This could allow serverless deployments while preserving the protocol's capabilities
  • Servers could optionally implement more sophisticated state management if needed
  1. Event-Sourcing Pattern:
  • Instead of maintaining direct state, servers could maintain an append-only log of events
  • State could be reconstructed from this event log when needed
  • This could simplify state management while preserving functionality
  • Could be particularly well-suited for serverless environments with managed event stores

My recommendation would be to explore a combination of Options 1 and 4:

  1. Add session token support at the transport layer
  2. Implement connection pooling in the SDKs
  3. Provide simple default implementations for common state management patterns
  4. Keep it optional - servers can choose to be completely stateless if they don't need the additional capabilities

This would preserve MCP's powerful features while making it more accessible for serverless deployments. What are your thoughts on this hybrid approach?

Preview Was this translation helpful? Give feedback.

3  4 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

1 reply

@jspahrsummers

Comment options

Quote reply

edited

Maintainer Author

Original comment in English - Translate to English

I don't think "connection pooling" really solves anything. The problem with serverless architectures is that you often don't have good ways to coordinate across instances and do this kind of sticky routing (even if possible, it's annoying).

Event sourcing is a good callout that I didn't think of, though. Main question is what the signal-to-noise ratio would be on those logged events how many of those logged events will be redundant. Another way to think about this: what would be the ideal ratio between "checkpoints" and events in the log? If the checkpoints are very frequent, then just using that as the backing state seems better.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 3 @nick-merrill@brucechou1983@shivamd20nick-merrill, brucechou1983, and shivamd20 reacted with thumbs up emoji

All reactions

  • +13 reactions

Write a reply

Comment options

Quote reply

[

jspahrsummers](/jspahrsummers)

Dec 6, 2024

Maintainer Author

Original comment in English - Translate to English

Also, although my post almost entirely focuses on protocol state, we could also imagine servers that have application state, like stateful agents. Those might naturally lend themselves to a long-lived connection anyway, even aside from the stateful features in the protocol.

Preview Was this translation helpful? Give feedback.

2  3 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 2 @soren-olympus@jawhnycookesoren-olympus and jawhnycooke reacted with thumbs up emoji

All reactions

  • +12 reactions

0 replies

Write a reply

Comment options

Quote reply

[

cramforce](/cramforce)

Dec 12, 2024

Original comment in English - Translate to English

Speaking from Vercel's perspective option 1 is probably best.

You could consider a hybrid version where

  • You do have a session token
  • Clients can use it to resume
  • Servers can say "I have stateful load-balancing and I wont restart, so I'll just keep state in memory", hence not increasing complexity for such servers
  • But even serverful solutions would benefit from full resumability

I do agree that statefulness is hard to avoid in general (even a trivial stream could be interrupted and ideally the client could resume it), so I would not shy away from it as a general feature–just implement it in a way that is easy for distributed systems to achieve.

Preview Was this translation helpful? Give feedback.

12  13 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 3 @jawhnycooke@mave99a@EditUndojawhnycooke, mave99a, and EditUndo reacted with thumbs up emoji

All reactions

  • +13 reactions

0 replies

Write a reply

Comment options

Quote reply

[

orliesaurus](/orliesaurus)

Dec 19, 2024

Original comment in English - Translate to English

What are some statefull use cases that you ve seen?

Preview Was this translation helpful? Give feedback.

2  3 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

1 reply

@jspahrsummers

Comment options

Quote reply

edited

Maintainer Author

Original comment in English - Translate to English

For example, if you want to be notified of remote resources appearing or disappearing, this is effectively state. It could be delivered via web hooks or other means, but that's a question of how to represent it.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 2 @soren-olympus@MichalKalitasoren-olympus and MichalKalita reacted with thumbs up emoji

All reactions

  • +12 reactions

Write a reply

Comment options

Quote reply

[

azdolinski](/azdolinski)

Dec 19, 2024

Original comment in English - Translate to English

Hi everyone, I've been following this discussion with interest. I already have a working prototype solution for tool invocation, based on a stateful client-server connection, as you can see in the attached diagram.
image

To truly achieve the full potential of serverless and make it usable in all dimensions, we really just need a robust AAA (Authentication, Authorization, Accounting) mechanism. This would allow both the MCP and the server to recognize:

1A - Who (Authentication) - who is the user or client initiating the connection (or continuing it in the Nth subsequent session...).
2A - What (Authorization) - what resources and operations this user/client has access to (it would be nice to have such a mechanism and have 1000 tools, but not always all tools need to be exposed to the client).
3A - How (Accounting) - how are the resources being used? For example, has there been any conversation exchange within a given tool? What was the result? Was there a previous error? Has the last operation completed? etc.

For me, maintaining context across multiple invocations is actually quite important... one tool writes, another one executes, and a third one analyzes (and all of this on the server side).

I'm planning to use an Agent as a Tool (Swiss Army Knife/Multitool). Although, as of today, the solution is simple because, as a user, I'm working in my own space / the Docker container is entirely for me.

So, if you want MCP to be stateless also - that's a great idea... but I only ask for one thing... please don't abandon the stateful concept because I think that with the AAA function, it would be a beautiful solution for communicating with much more complex scenarios.

Option 2: +1 vote ! 😉

Preview Was this translation helpful? Give feedback.

9  10 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

1 reply

@soren-olympus

Comment options

Quote reply

edited

Original comment in English - Translate to English

For me, maintaining context across multiple invocations is actually quite important... one tool writes, another one executes, and a third one analyzes (and all of this on the server side).

In particular, it could be valuable to maintain context across multiple invocations within a trusted environment (e.g., the server) without requiring sampling calls to a potentially untrusted LM client. This would enable the server to process intermediate states securely (to "privately think") before exposing final results to the client—an extension of AAA that requires statefulness.

As a motivating extension, a naive form of this exists in ad-tech clean rooms, where advertisers and publishers privately combine data for targeting and measurement with agreed-upon egress rules. This allows compute on data with more manageable privacy and competitive risks. [Seems quite likely similar dynamics will appear in MCP applications.]

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

[

allenporter](/allenporter)

Jan 2, 2025

Original comment in English - Translate to English

  1. Feedback on whether you think this is a problem worth solving in the first place.
  2. Feedback or questions about the options listed here.
  3. Suggestions for other options I didn't think of and list!

Some thoughts that come to mind are:

  • Is a max request timeout in terms of minutes an actual problem? If so, then i think it would be worth getting more explicit about the specific challenges that introduces. For example, if efficiency/scalability of data transfer of resuming a session is the key problem, then could be worth getting more specific about that with use cases. (But even if the session length is in days, not sure it would change that problem dramatically)
  • My assumption is the average duration of an LLM user session is probably also measured in minutes.
  • Even a stateful transport session can disconnect due to poor network connectivity. Are resumable stateful sessions also a requirement? I would say probably not? But the point being that clients already need to handle reconnecting for many reasons
  • Taking on session state management at a higher level may have other additional follow on problems e.g. garbage collecting old sessions, managing secondary resources associated with a session, leaking connections, etc. The session will need to have some notion of lifetime/TLL/keep alive, etc then at that point are we back to a similar question about the max request timeout.
  • My impression is that today you can solve this by building a local MCP server that provides a custom transport to use any protocol. Then another way to think about this problem is improving distribution of local MCP servers. This may be an existing problem to solve anyway.

My take is that it is not clear this is worth solving given the current motivation/problem statement.

Preview Was this translation helpful? Give feedback.

1  2 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

2 replies

@in-op

Comment options

Quote reply

Original comment in English - Translate to English

A pretty common use case would be having your MCP server deployed using Kubernetes with multiple pods due to high traffic. The client connects to 1 pod for the SSE connection, but subsequent POST requests get round-robined to any other pod which totally breaks the protocol.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 2 @juan-abia@ravelijuan-abia and raveli reacted with thumbs up emoji

All reactions

  • +12 reactions

@allenporter

Comment options

Quote reply

Original comment in English - Translate to English

A pretty common use case would be having your MCP server deployed using Kubernetes with multiple pods due to high traffic. The client connects to 1 pod for the SSE connection, but subsequent POST requests get round-robined to any other pod which totally breaks the protocol.

Sure but what are you proposing? I am aware of thats how SSE works today. The sse implementation in the python-sdk already handles this by encoding a session is in the follow up post urls and associates it with the original request. That doesn't need an additional protocol extension as it's already supported with external state if needed.. unless you are arguing for removing state from the protocol.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

edited

[

atesgoral](/atesgoral)

Jan 6, 2025

Collaborator

Original comment in English - Translate to English

Progressive enhancement could be an option.

At the base level, keep a very simple mental model: tool call === procedure call === JSON-RPC.

If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL (the presence of which can be negotiated with MCP's capability negotiation model).

i.e. Don't tightly couple JSON-RPC with SSE. JSON-RPC is very simple to implement. Don't sabotage this simplicity by coupling it with SSE.

Also, tool calling should ideally be stateless / idempotent: Tool calls are just procedures that take a bunch of arguments and return some result. Application state should only be part of it by including a chunk of grounding data / conversation context as an argument to a tool call. Tools would remain portable/reusable with that. EDIT: Not so sure about this.

AAA can be an envelope/tunnel around MCP and not overcomplicate MCP itself. JSON-RPC and SSE URLs could be signed URLs generated by off-MCP API requests that establish AAA.

Apologies if all this is too terse / abstract. Just wanted to quickly dump things at the top of my mind.

Preview Was this translation helpful? Give feedback.

8  9 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

6 replies

 Show 1 previous reply 

@jspahrsummers

Comment options

Quote reply

Maintainer Author

Original comment in English - Translate to English

This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@dave-shawley

Comment options

Quote reply

Original comment in English - Translate to English

That is where the parameters to notifications.* come in. Bi-directional messaging over HTTP where the message is not in response to a request is difficult. Both SSE and web sockets provide a pattern that works over an established connection; however, if the connection is severed, then all bets are off. We can add parameters that make reconnecting a disconnected session (stream?) possible. I'm going to call it a session for the time being. Stream might be a better way to think of it though.

I was working on a response to the "tool calling should ideally be stateless" comment that I abandoned. There is a similar problem there since many tools need to identify/cache/stash information that is local to the session. Perhaps, creating a session ID in the protocol that is included in messages. It could be optionally included in the capabilities.notifications.SSE param of initialize to resume a disconnected session. The server could create a new session and return the identifier in the response. The same identifier could be passed into tool calling somehow (haven't fleshed out what this would look like yet). The idea is to not necessarily mandate a session storage mechanism but to enable one for clients, servers, and tools that opt-in. Having document level properties would work nicely here.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@sean-roberts

Comment options

Quote reply

Original comment in English - Translate to English

This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
   "capabilities":  {
      "notifications": {
         "SSE": {},
         "websocket": {},
         "polling": {
              "interval": {number}
         }
      }
  }
}

Riffing on the idea, what if the client was informed that it could "check in" with the client state of some sort. The Server could then inform the client that it needs to make a request. There might be some piggybacking that's possible with the expectation of regular interval pings.

In this world, the client can always send standard requests/notifications as needed (using the allowed notifications) to the server. With polling, it can check in with the server to identify if it wants to request anything. This would only happen during the duration of the session as determined by the client - no different from the expectations for SSE/WS. The server can then inform the client that it wants a status update if it's stale on something it needs.

This might also help as a means for graceful degradation for servers as well. If/when there are issues with connection management.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

eyes 1 @jspahrsummersjspahrsummers reacted with eyes emoji

All reactions

  • eyes1 reaction

@apryiomka

Comment options

Quote reply

edited

Original comment in English - Translate to English

This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
   "capabilities":  {
      "notifications": {
         "SSE": {},
         "websocket": {},
         "polling": {
              "interval": {number}
         }
      }
  }
}

SSE, polling makes sense for long running operations. I would like to see support for regular synchronous request / response. I would assume many tools would just return the output right away or within meaningfully small interval.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
   "capabilities":  {
      "notifications": {
         "SSE": {},
         "websocket": {},
         "http": {},
         "polling": {
              "interval": {number}
         }
      }
  }
}

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@allan-simon

Comment options

Quote reply

Original comment in English - Translate to English

I agree , the stateful should be negotiated , and the client should assume it's not available except if negotiated with the server

So that even on "STDIO" transport, you could just call one shot commands ( grep, jq etc.) without needing to add wrapper around them.

On a side note, currently Librechat provide an option to feed the tools with an openapi-spec swagger.io/specification and it's a real time saver , suddenly all the hundreds of Saas with an rest api become accessible without any wrapper

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @atesgoralatesgoral reacted with thumbs up emoji

All reactions

  • +11 reaction

Write a reply

Comment options

Quote reply

edited

[

sean-roberts](/sean-roberts)

Jan 8, 2025

Original comment in English - Translate to English

Big fan here 👋

Option 2 feels right and is in line with the other conversations around authorization that enable multiple paths depending on the servers capabilities. The trade-off being the additional complexity to the architecture itself. That said, this feels solvable by trying to look at the protocol payloads as separate from the delivery mechanism and let the delivery mechanisms abide to a separate contract layer which itself could be decoupled from clients as SDKs. With that the protocol shouldn't bifurcate - assuming we solve sufficient parity.

Preview Was this translation helpful? Give feedback.

3  4 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

1 reply

@sean-roberts

Comment options

Quote reply

Original comment in English - Translate to English

What's more is that we should probably decouple the notifications/communication from how to represent state as well. Even for WS/SSE, this will be useful but is a little different to the problem of how do we provide sufficient requesting capabilities for client to server and for server to connected client.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

[

pcingola](/pcingola)

Jan 8, 2025

Original comment in English - Translate to English

Thank you @jspahrsummers and others for this discussion.

From my perspective (I'm quite new to MCP, so please correct me if I'm wrong), the most interesting would be

Option 2: offer "stateless" and "stateful" variants of the protocol"

Why?

  1. It keeps the full functionality of the statefull server, for those that need it.

  2. It opens the door for a "simplified" server (with reduced functionality) to be implemented in a stateless manner.

I think stateless enables easier scaling, federation of MCP servers, "tools discovery", and MCP proxies. If we want to create some "HuggingFace" for MCP tools, this would be "MCP Stateless" would make it easier.

I currently see all the AAA layer as completely independnet from the Statefull / Statless discussion.

I'll be happy to help / contribute if "Stateless MCP" becomes a thing.

Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected?

Preview Was this translation helpful? Give feedback.

4  5 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

1 reply

@jspahrsummers

Comment options

Quote reply

Maintainer Author

Original comment in English - Translate to English

Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected?

That's how. 🙂 We have a core set of maintainers that we are looking to expand over time; they are responsible for reviewing proposals and offering feedback. Changes to the spec specifically require the approval of multiple maintainers.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

edited

[

calclavia](/calclavia)

Jan 27, 2025

Collaborator

Original comment in English - Translate to English

I'm building a hosting platform for deploying MCPs and SSE makes it hard to scale remote MCPs because we can't use serverless.

I did more research into this, and it seems like there's no way to properly route a connection under a protocol like SSE because all the POST requests are independent (REST is stateless, after all). So if you scale up any server to multiple replicas (even in a non-serverless way using VMs or Kubernetes), it's a pain to figure out which spun-up instance to route the messages to.

Actually, statefulness isn't the issue here - it's SSE. One way to side-step this is via gRPC or WebSockets due to how they retain the connection on subsequent requests (there's a sense of session affinity). Is there a reason why WS or gRPC wasn't chosen as the primary transport and SSE was chosen instead? Just want to fully understand the motivations.
@jspahrsummers

Preview Was this translation helpful? Give feedback.

6  7 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

3 replies

@jerome3o-anthropic

Comment options

Quote reply

edited

Maintainer

Original comment in English - Translate to English

iirc the reason we went for SSE over websockets/gRPC is because SSE exists within standard HTTP, and we figured that adopting websockets on average would probably be a bigger lift than supporting SSE for existing web stacks.

It's a good point regarding additional complexity of routing of subsequent requests back to the container/instance that is holding open the SSE connection. This is another complexity/barrier to the deployment of servers. Some off the cuff ways to solve this:
* Use something like redis to route messages to the correct places
* During the configuration of the SSE transport, the server specifies the endpoint in which it will listen for messages in the session - this endpoint would be used to route back to the correct server instances

That being said, I think stateful/stateless discussion is still relevant - as supporting long lived websockets in a webapp would still necessitate solving all the same issues with stateful/long lived connections

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 4 @jspahrsummers@atesgoral@Lurrobert@EditUndojspahrsummers, atesgoral, Lurrobert, and EditUndo reacted with thumbs up emoji

All reactions

  • +14 reactions

@calclavia

Comment options

Quote reply

Collaborator

Original comment in English - Translate to English

iirc the reason we went for SSE over websockets/gRPC is because SSE exists within standard HTTP, and we figured that adopting websockets on average would probably be a bigger lift than supporting SSE for existing web stacks.

It's a good point regarding additional complexity of routing of subsequent requests back to the container/instance that is holding open the SSE connection. This is another complexity/barrier to the deployment of servers. Some off the cuff ways to solve this: * Use something like redis to route messages to the correct places * During the configuration of the SSE transport, the server specifies the endpoint in which it will listen for messages in the session - this endpoint would be used to route back to the correct server instances

That being said, I think stateful/stateless discussion is still relevant - as supporting long lived websockets in a webapp would still necessitate solving all the same issues with stateful/long lived connections

Thanks for clarifying the motivation behind SSE.

I think having both options (WS/SSE) officially documented would be great, since I think for practical purposes WS is more efficient and designed specifically for these long-lived bidirectional connections and avoids requiring server-side complexities.

Plus, I noticed WS is already in the SDKs, so it would be good if that's formalized.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

heart 2 @sam-ayo@the-vampiiresam-ayo and the-vampiire reacted with heart emoji

All reactions

  • heart2 reactions

@the-vampiire

Comment options

Quote reply

edited

Original comment in English - Translate to English

MCP servers can already be seen as a proxy that requires new development. I’m not convinced of the argument that SSE is preferred for simplicity of being native to HTTP.

WS seems far more natural for MCP and is (relatively) trivial to add to SDKs whose purpose is to already abstract development of MCP servers.

I’m surprised this hasn't received more discussion (catching up so maybe it does further in the discussion)

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 6 @irvinebroque@Kludex@wjayesh@VoloBuilds@richard-rance@Lurrobertirvinebroque, Kludex, wjayesh, VoloBuilds, richard-rance, and Lurrobert reacted with thumbs up emoji

All reactions

  • +16 reactions

Write a reply

Comment options

Quote reply

edited

[

jerome3o-anthropic](/jerome3o-anthropic)

Jan 30, 2025

Maintainer

Original comment in English - Translate to English

I've been mulling this over a bit and wanted to share my (candid and somewhat rambly) thoughts on this.

A bit of a recap of the problem

The key issue with the statefulness is the scaling characteristics of long lived connections / inability to use serverless deployments. There is also an issue with the SSE transport where the "side channel" post requests need to be routed to the server instance holding open the SSE stream.

The reason we have a stateful bidirectional protocol is to enable some really nice features (quoting justin):

  1. Notifications about changes—e.g., changes to resources, or tools being added/removed. (These can occur in either direction too.)
  2. Server-initiated sampling at any point, enabling agentic workflows.
  3. Passing arbitrary server logging through to the client.
  4. … more stuff in future? …

I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication). It's still very early days, but most servers and clients aren't properly leveraging these features. I suspect this is because they are harder to implement, and there aren't many good examples of clients in the wild that support the features.

It's important for adoption that we don't add undue complexity/friction to client and server developers early on, but it's also important that we don't close doors on the aspects of the protocol that will enable the long-tail of great features.

The direction I'm currently leaning in

I really like @atesgoral's approach of progressive enhancement:

If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL

I feel like we could update the SSE transport (or just make a new transport) where:

All client->server messages go through HTTP POST requests (including initialization) and the responses i.e:

→ POST body contains:
  {
    "method": "tools/call",
    "params": {
      "name": "string",
      "arguments": {...}
    }
  }

← Response 200 contains:
  {
    "content": [
      // Array of TextContent, ImageContent, or EmbeddedResource
    ],
    "isError": false  // Optional, defaults to false
  }

→ POST body contains:
  {
    "method": "resources/read",
    "params": {
      "uri": "resource-uri"
    }
  }

← Response 200 contains:
  {
    "contents": [
      // Array of TextResourceContents or BlobResourceContents
    ]
  }

(Note: In the current SSE implementation all server->client messages come through the open SSE channel)

And all server initiated messages (i.e. notifications and sampling requests) come through an SSE stream that the client can optionally subscribe to.

The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections.

Then, when server implementers and clients decide to implement the richer stateful features, they can implement the SSE channel and tackle the scaling implications.

These SSE channels could also be best effort, and it's okay for them to occasionally disconnect (i.e. when a deployment occurs).

Pros:

  • Easy for server implementers to deploy MCP at scale
  • Only tackle the complexity of long-lived connections if they want/need the additional features
  • Don't close the door to the richer bidi features of the protocol

Cons:

  • Most servers won't support the richer features
  • Some additional complexity of the transport implementation for clients

There are probably other issues with this that I haven't thought through

Preview Was this translation helpful? Give feedback.

8  9 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

heart 13 @gching@atesgoral@topherbullock@in-op@jspahrsummers@majdsalman2@pattobrien@stallent@au-re@dsp-antgching, atesgoral, topherbullock, in-op, jspahrsummers, majdsalman2, pattobrien, stallent, au-re, dsp-ant, and 3 more reacted with heart emoji eyes 1 @petrgazarovpetrgazarov reacted with eyes emoji

All reactions

  • heart13 reactions
  • eyes1 reaction

7 replies

 Show 2 previous replies 

@atesgoral

Comment options

Quote reply

edited

Collaborator

Original comment in English - Translate to English

What's more, the real-time notification URL could also be a ws: URL, telling the client to either:

  • Open a unidirectional WebSockets stream for notifications
  • Move over to bidirectional WebSockets entirely. This is fun because it will effectively be a 2-step upgrade, from JSON-RPC to HTTP to WebSockets (using the Upgrade/Connection headers).

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@jspahrsummers

Comment options

Quote reply

edited

Maintainer Author

Original comment in English - Translate to English

This direction makes sense to me! I think this threads the needle well between simplicity while still supporting these features that we believe will be important—especially for agents.

Perhaps we could make SSE support a stronger requirement on servers, but allow disconnecting it at any time? Then a short-lived interaction but still supporting bidirectionality could look like:

  1. Client periodically sends requests/notifications to server via HTTP POST
  2. Client periodically connects to server over SSE
  3. Server delivers any "queued" requests/notifications over SSE
  4. If the server doesn't wish to support a long-lived connection, it then cleanly disconnects the SSE stream
  5. [repeat]

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@pcingola

Comment options

Quote reply

Original comment in English - Translate to English

I tihnk we all tend to agree that the crux of the server scaling problem is not as much about "state", but more about "long lived connections" (i.e. SSE).

As @jerome3o-anthropic, @jspahrsummers, @atesgoral , and others mentioned, if the SSE connections are "relatively short", and the servers are "allowed to disconnect" (i.e. best effort), the scaling issues should be minimized.

It seems to me the we are reaching the conclusion that SSE connections should be alive only during a "transaction cycle", e.g. request for an "agent / tool", and the agent uses sampling capabilities. Within this back and forth (which may last a few seconds, or up to a couple of minutes), we keep the SSE channel open, but then it's closed when the end of the "agent / tool cycle" is reached.

IMO the protocol "as is" allows this behaviour (perhaps with some minor additions to #153). The changes in specification are probably in the clarifications and examples on "how to implement" the client and server. Some changes will also be needed in the reference implementations, which I'm happy to contribute to if there is a decision to move forward.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@artpi

Comment options

Quote reply

Original comment in English - Translate to English

This is a very good direction!

At Automattic we are trying to use MCP for WordPress.

I managed to hack together native WordPress support for the SSE approach, but because PHP is kind of stateless, I had to:

  • Throw in a bunch of php,.ini directives to hack together a working SSE connection
  • Store responses in the database
  • Read them in a loop in the SSE endpoint.

This is very sub-optimal and I would love for the HTTP transport to just be query/response. The current SSE implementation frankly sounds like the STDIO approach ported directly to the web. HTTP/REST is a stateless protocol and yet there is a lot of hacks to make it behave statefully by using databases, memcache, etc etc.

The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections.

YES PLEASE

I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication)

Cons:
Most servers won't support the richer features

@jerome3o-anthropic I don't think the current SSE implementation is helping here. Again, it looks like a direct port of STDIO approach to the web, which is not how most web apps seem to operate.
The agentic workflows and larger systems will require long-running intermittent connections to operate and the way to do those would be to introduce

  • Webhooks
  • Message endpoints that would be polled /checked periodically.

Taking inspiration from payments

Come to think of it, these workflows could have similar latency to payments flows, where authorizing a credit card could take up to a minute, invoicing could take minutes and renewals could take years - its a mix of short-and-long term connections.

So from Stripe API we have:

Then a web-based client could send a webhook, or poll the server for an event.
I know this all requires clients to be a little more complicated, but I also think we'll end up in a world whree there is more servers than clients. thus we should optimize a bit for the ease of use of servers vs the clients.

In any event, your proposal of the SSE channel being optional is great.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 2 @soren-olympus@Jameswlepagesoren-olympus and Jameswlepage reacted with thumbs up emoji

All reactions

  • +12 reactions

@Fraggle

Comment options

Quote reply

edited

Original comment in English - Translate to English

Hello, I have been following the discussions with great interest and I really like this approach too.

A few remarks about the server response in a stateless situation as described in the OP (maybe it's already covered somewhere else that I missed).

First, I think we should still use the JSONRPC format for the response instead of using another format.

Also, instead of only sending the final response directly, I believe it would be useful for the server to be able to stream back messages to the client in order to inform of progress during long running operations.
I understand it's somehow still server side events but in a transactional / serverless use-case. People using LLM are used to it as most LLM api are streaming.

It could be configured by the client header (Accept: application/json-rpc-stream) or in clientCapabilities, if not supported, the server would just discard all notifications/* messages.

(Maybe it's just simpler to always stream the answer)

Very exciting stuff anyway !

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

[

clareliguori](/clareliguori)

Feb 20, 2025

Original comment in English - Translate to English

Thank you for this discussion!

If I'm understanding the current spec correctly, I think there are two categories of server>client communication to solve for over short-lived and/or interruptable connections, but today they are not distinguished between each other in the spec. I'm wondering if they should be, and if they should happen over distinct connections between client and server, instead of over one monolith streaming connection.

My rough stab at how that might look, without perfectly understanding the spec today:

Category 1. Notifications about changes to what the server can provide to the client

Examples: Resource/prompt/tool list changes, resource content changes

Use case: As the client application, I need to keep track of the resources/prompts/tools that a server can provide to me, so I can reason about using those resources/prompts/tools and/or present that list to the user. Streaming notifications from the server help me keep my local list of resources/prompts/tools up-to-date in real time. If I get disconnected, I can re-build my local list of resources/prompts/tools by calling the server's List/Get APIs, and then connect to a stream for updates. If a server does not support streaming updates, I can poll the server's List/Get APIs periodically to keep my local list up-to-date.

For servers that don't support streaming (or clients who don't want to stream):
Client periodically sends requests/notifications to server via HTTP POST

For servers that support streaming:

  1. Client discovers server's resources/prompts/tools via HTTP POST
  2. Client subscribes to change notifications over SSE
  3. Server delivers change notifications over SSE
  4. SSE connection disconnects (this could be a network issue, the server going through a deployment or scale down, etc)
  5. Repeat 1-4

Category 2. Requests/notifications that are (hopefully?) directly related to some work that the client requested

Examples: Sampling requests, tool progress notifications, (logging?), (roots?)

Use case: As the client application, I want to use prompts, tools, and agents from a server. In the course of completing my request to the server, the server may need additional information from me (like LLM samples). Or, it may want to send me occasional updates like progress notifications and logs. I establish a bidirectional communication stream with the server, so that the server can send me the information and requests it needs to complete my work. If the stream is disconnected mid-way, the server may not be able to complete my request and I may need to start a new request.

In the spec today, there doesn't seem to be any kind of a "session ID" or "job ID" associated with a request that might take a while to complete and might require some back-and-forth communication. For example, sampling requests and progress notifications from server>client don't seem to be directly associated to the original tool call request initiated from client>server. It seems like today it is technically valid for a server to spam the client with sampling requests and root requests over the long-running connection, without the client ever actually using the server.

Let's assume that some kind of session ID is introduced that is assigned to requests from the client for using prompts/tools/agents. For certain types of server>client requests, they must be within the context of a session ID. The original request from the client can be upgraded to a stream for bidirectional communication for that session only. The server completes the session when it has completed the requested work. (I think this pattern is similar to the "transactions" @pcingola was describing in his comment above)

For servers that don't support streaming:
Client sends a tool call request to server via HTTP POST.
Connection is held open until the server returns the result.
If the connection breaks in the middle, the client must send a new request.

For servers that support streaming:

  1. Client sends a tool call request to server via HTTP POST
  2. The request is upgraded to SSE
  3. Server delivers requests/notifications to the client via SSE
  4. Server delivers final result over SSE
  5. Server closes the SSE connection

If the connection breaks in the middle, the client must send a new request.

Optional: For servers that persist session state (for example, by session ID):
For non-streaming servers, the client could poll the server by session ID for any requests the server has for the client.

If the connection is broken (for either streaming or non-streaming servers), the client can make a request to get the results of a session ID. The result comes back immediately if the session is already complete. The request is resumed if the session ID exists. The behavior then depends on whether the server supports streaming, as above - either the HTTP request is held open until the result is ready, or a stream is started for bidirectional communication.

Preview Was this translation helpful? Give feedback.

1  2 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @richard-rancerichard-rance reacted with thumbs up emoji

All reactions

  • +11 reaction

0 replies

Write a reply

Comment options

Quote reply

edited

[

ErenArslan](/ErenArslan)

Feb 25, 2025

Original comment in English - Translate to English

We have solution. We manage transporter and server per connection.

We can handle multiple SSE remote sessions with this way.

import express, { RequestHandler } from "express";
import createServer from "./server.js";

// Add Winston logger for better logging
import winston from 'winston';

// Configure logger
const logger = winston.createLogger({
  level: 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  transports: [
    new winston.transports.Console(),
  ]
});

// Add uncaught exception handler
process.on('uncaughtException', (error) => {
  logger.error('Uncaught Exception:', { error: error.message, stack: error.stack });
  // Give time for logs to be written before potential pod restart
  setTimeout(() => process.exit(1), 1000);
});

process.on('unhandledRejection', (reason, promise) => {
  logger.error('Unhandled Rejection:', { reason, promise });
});



// Start receiving messages on stdin and sending messages on stdout
const transportMap = new Map<string, SSEServerTransport>();

const app = express();

// Add request logging middleware
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = Date.now() - start;
    logger.info('Request completed', {
      method: req.method,
      path: req.path,
      statusCode: res.statusCode,
      duration,
      userAgent: req.get('user-agent')
    });
  });
  next();
});

const sseHandler: RequestHandler = async (req, res) => {
  const transport = new SSEServerTransport("/messages", res);
  const server = createServer();
  res.setHeader('X-Accel-Buffering',"no");

  try {
    transportMap.set(transport.sessionId, transport);

    res.on('close', async () => {
      logger.info('SSE connection closed', { sessionId: transport.sessionId });
      await server.close()
      transportMap.delete(transport.sessionId);
    });

    await server.connect(transport);
    logger.info('SSE connection established successfully', { sessionId: transport.sessionId });
  } catch (error) {
    logger.error('Failed to establish SSE connection', {
      sessionId: transport.sessionId,
      error: error instanceof Error ? error.message : 'Unknown error',
      stack: error instanceof Error ? error.stack : undefined
    });
    transportMap.delete(transport.sessionId);
    res.status(500).end();
  }
};

const messageHandler: RequestHandler = async (req, res) => {
  const sessionId = req.query.sessionId as string;
  res.setHeader('X-Accel-Buffering',"no");

  console.log("Messages sessionId", sessionId);
  if (!sessionId) {
    logger.error('Message received without sessionId');
    res.status(400).json({ error: 'sessionId is required' });
    return;
  }

  const transport = transportMap.get(sessionId);
  if (!transport) {
    logger.error('No active transport found for session', { sessionId });
    res.status(404).json({ error: 'No active connection found for this session' });
    return;
  }

  try {
    logger.debug('Received message', { sessionId, body: req.body });
    await transport.handlePostMessage(req, res);
    logger.debug('Message handled successfully', { sessionId });
  } catch (error) {
    logger.error('Error handling message', {
      sessionId,
      error: error instanceof Error ? error.message : 'Unknown error',
      stack: error instanceof Error ? error.stack : undefined
    });
    res.status(500).json({ error: 'Internal server error' });
  }
};

app.get("/sse", sseHandler);
app.post("/messages", messageHandler);

const port = process.env.PORT || 3001;

// Add error handling for server startup
const httpServer = app.listen(port, () => {
  logger.info(`Server started`, { 
    port, 
    nodeEnv: process.env.NODE_ENV,
    pid: process.pid,
    memory: process.memoryUsage()
  });
});

// Handle server-specific errors
httpServer.on('error', (error: Error) => {
  logger.error('Server startup error:', {
    error: error.message,
    stack: error.stack,
    port,
    pid: process.pid
  });
  
  // Exit process on critical errors
  if ((error as any).code === 'EADDRINUSE') {
    logger.error('Port is already in use, exiting process');
    process.exit(1);
  }
});

// Handle process termination
process.on('SIGTERM', () => {
  logger.info('SIGTERM received, shutting down gracefully');
  httpServer.close(() => {
    logger.info('Server closed');
    process.exit(0);
  });
  
  // Force close if graceful shutdown fails
  setTimeout(() => {
    logger.error('Could not close server gracefully, forcing shutdown');
    process.exit(1);
  }, 10000);
});

process.on('SIGINT', () => {
  logger.info('SIGINT received, shutting down gracefully');
  httpServer.close(() => {
    logger.info('Server closed');
    process.exit(0);
  });
  
  // Force close if graceful shutdown fails
  setTimeout(() => {
    logger.error('Could not close server gracefully, forcing shutdown');
    process.exit(1);
  }, 10000);
});```


Preview Was this translation helpful? Give feedback.

2  3 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

0 replies

Write a reply

Comment options

Quote reply

edited

[

atesgoral](/atesgoral)

Mar 1, 2025

Collaborator

Original comment in English - Translate to English

At Shopify, we're so far mostly using what we call "MCP Lite": Just regular, transactional (POST and get the result in the HTTP response) JSON-RPC, and often just implementing the MCP tools/call method. This meets most of our current use cases since we have a fledgling ecosystem of internal tools with no immediate need for dynamic server/tool discovery. But that need is fast-approaching!

We have in fact done a PoC implementation of the JSON-RPC-SSE transport when it first came out, but as others in this thread have pointed out, it's awkward to implement: In podded deployments we are forced to use an inter-processes message passing mechanism to link the JSON-RPC POST request to the SSE stream.

I proposed progressive enhancement above, without pictures. Time for some pictures.

Selective notification subscription

"MCP Lite", using plain JSON-RPC. No SSE in sight. Very simple for adoption:

Loading

sequenceDiagram participant C as MCP Client participant S as MCP Server

C->>+S: POST JSON-RPC tools/call
S-->>-C: tool result

Important points:

  1. initialize can be called for capability negotiation, but maybe there are smart/assumed defaults that make this step optional
  2. tools/list can be called to discover tools, but if servers talking to each other already know what tools exist at the destination, they can skip this

Discovering notification URLs during initialization, splitting the MCP Server's JSON-RPC and notification endpoints for clarity:

Loading

sequenceDiagram participant C as MCP Client box MCP Server participant J as JSON-RPC Endpoint participant N as Notification Endpoint end

C->>+J: POST JSON-RPC initialize
J-->>-C: Notification URLs

C->>N: Start streaming from a notification URL above

N-->>C: event 1

C->>+J: POST JSON-RPC tools/call
J-->>-C: tool result

N-->>C: event 2

Important points:

  1. The notification stream returned from initialize can be zero or more transports supported by the server. It could also be a subset based on a capability query from the client, making it easy for the client to pick the first one that it ranks as preferable. HTTP poll URL, SSE URL, WebSocket URL, Kafka URL, ...
  2. The JSON-RPC tool/call is completely unhindered by any notification streaming happening on some other pipe. They are on different planes of existence.

Preview Was this translation helpful? Give feedback.

6  7 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @Wh1isperWh1isper reacted with thumbs up emoji

All reactions

  • +11 reaction

0 replies

Write a reply

Comment options

Quote reply

edited

[

atesgoral](/atesgoral)

Mar 1, 2025

Collaborator

Original comment in English - Translate to English

Sampling without streaming (borderline crazy idea)

In an "MCP Lite" world (see above), how can MCP-server-initiated sampling work?

Borrowing from HTTP, where servers can emit different response codes to ask clients to take certain actions (e.g. provide credentials, redirect away and forget this URL, I'm busy backoff, etc.) the tools/call method could return a special result to prompt the client to process sampling, and then return the result to the MCP server as a "tool call continuation":

Loading

sequenceDiagram participant C as MCP Client participant S as MCP Server participant U as User participant L as LLM

C->>+S: POST JSON-RPC tools/call
S-->>-C: sampling request, continuation payload

C->>+U: Get user approval
U-->>-C: Go ahead

C->>+L: Perform completion
L-->-C: Completion

C->>+U: Get user approval
U-->>-C: Go ahead

C->>+S: POST JSON-RPC tools/continue
S-->>-C: tool result

Assumption: The MCP Server will never send an unsolicited sampling request to the client, but these will all be as a response to tool calls.

Abstractly, this treats the tool as a finite state machine. When sampling is needed, the state of the tool is bounced back to the client and the client can progress the state of the tool by passing it the state + completion to transition the state back to running. This "state" could simply be a tool call reference if the MCP Server is stateful and can persist the paused tool state on its side.

Preview Was this translation helpful? Give feedback.

2  3 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

0 replies

Write a reply

Comment options

Quote reply

edited

[

atesgoral](/atesgoral)

Mar 1, 2025

Collaborator

Original comment in English - Translate to English

Short-lived SSE as JSON-RPC response

I think others might have suggested or alluded to this already. Focusing on tool calling only:

POST to the JSON-RPC endpoint, get back an SSE response. The stream only lasts for the duration of a tool call.

Simple tool response over a single SSE event:

Loading

sequenceDiagram participant C as MCP Client participant S as MCP Server

C->>+S: POST JSON-RPC tools/call
S-->>-C: tool result over SSE

Certain implementations may support tools emitting intermediate diagnostic events or progress events usually meant for rendering on the UI.

Loading

sequenceDiagram participant C as MCP Client participant S as MCP Server

C->>+S: POST JSON-RPC tools/call
S-->>C: reticulating splines
S-->>C: modulating frequencies
S-->>-C: tool result over SSE

A tool can also emit one or more sampling requests over SSE (even at different times in its processing cycle) and the same continuation mechanism in my post above can be used to resume the tool when all sampling is completed.

Preview Was this translation helpful? Give feedback.

2  3 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @sumcodingsumcoding reacted with thumbs up emoji

All reactions

  • +11 reaction

1 reply

@daviddenton

Comment options

Quote reply

edited

Original comment in English - Translate to English

This is exactly the approach we're taking for the MCP SDK which we're currently implementing in @http4k and we think it hits a good balance and is consistent from a client POV. This would be great as a standardised fallback for simple HTTP usage in the spec - we still require an accept header of text/eventstream content type. The only thing that is of question for us would be a standardised (default) URL convention on which this endpoint would be hosted. I can think of the following schemes:

  1. Reusing /sse - this would be consistent, but would prohibit both request schemes being mixed on the same server.
  2. Use /messages but with an accept header of text/eventstream to disambiguate it from the established mechanism. This of course could potentially break existing clients, but is HTTP compliant.
  3. Using the method name as the endpoint path (eg. /tools/call). This would be easy to do but would be duplicative with what is already in the JSON RPC message received and I'm not sure it adds any value.
  4. Something else we haven't thought of!

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

heart 1 @atesgoralatesgoral reacted with heart emoji

All reactions

  • heart1 reaction

Write a reply

Comment options

Quote reply

[

tristanz](/tristanz)

Mar 3, 2025

Original comment in English - Translate to English

We've been struggling with this too. Long-lived connections are problematic for the reasons others have listed.

It seems like robust tool calling needs to satisfy two constraints:

  1. Simple tools should be fast and familiar.
  2. Long-running tools should be cancellable, streamable, and reliable.

Most cloud APIs solve this by having two types of endpoints:

  1. Standard endpoints that return the result immediately, without support for streaming or cancellation.
  2. Job endpoints that return a reference to a job that is cancellable, reliable, and streamable.

Good examples of this pattern are Google's AIP-151 for Long-Running Operations and Fal AI's Queue Endpoint. FAL's Queue API is a good reference implementation for long-running operations for models and tools that have streaming output.

Modifying this to MCP's JSON-RPC protocol would be relatively straightforward.

For simple tools:

tool/call -> immediate result

For long-running tools:

tool/call -> returns Operation reference (job_id)
operation/stream?id=xxx -> SSE stream of results
operation/cancel?id=xxx -> cancel operation
operation/get?id=xxx -> get current state/result

This gives you a stable job id that you can cancel and reconnect to regardless of connection stability.

This is slightly slightly more complicated than just upgrading to SSE on the initial call, as proposed in previous comments, but is easy to understand. I guess you could also support upgrading to SSE directly if optimizing was a priority, but conceptually there is a job.

Sequence diagrams end up like:

Loading

sequenceDiagram participant Client participant MCP as MCP (Job Manager) participant Tool

%% Simple Tool Flow
Client->>MCP: tool/call (simple tool)
MCP->>Tool: Execute simple tool
Tool-->>MCP: Result
MCP-->>Client: Immediate response

%% Long-running Tool Flow
Client->>MCP: tool/call (long-running tool)
MCP->>MCP: Create job record
MCP->>Tool: Start job execution
Note right of MCP: MCP tracks job state
MCP-->>Client: Return Operation reference (job_id)

Client->>MCP: operation/stream?id=xxx
Tool-->>MCP: Job progress updates
MCP-->>Client: Stream updates via SSE

%% Optional Get/Cancel Flow
opt Get Operation State
    Client->>MCP: operation/get?id=xxx
    MCP-->>Client: Current state/result
end

opt Cancel Operation
    Client->>MCP: operation/cancel?id=xxx
    MCP->>Tool: Cancel job execution
    Tool-->>MCP: Execution cancelled
    MCP-->>Client: Cancellation confirmed
end

If there are needs for other types of notifications than job progress updates that seems like a separate Events API. I'd lean toward making that be done via reliable webhook delivery vs. a single long-lived SSE connection.

Preview Was this translation helpful? Give feedback.

9  10 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

0 replies

Write a reply

Comment options

Quote reply

edited

[

madevoge](/madevoge)

Mar 4, 2025

Original comment in English - Translate to English

I also believe Option 1 makes sense as a way to disconnect sockets from sessions.
I have a transport proposal #182 that goes into some of the implications of bi-directional traffic, such as the lifecycle and security.

From the discussion post, I would argue that we can keep the management of session context and state management up to the server to decide upon.
We can enable flexibility by being generous and undescriptive in the "session id" format.
Whether a server wants to encode data or include a Secure Access Signature as a pointer, this should be opaque to the client.

In terms of extensibility, a few additions that could be great but are not required to solve the long-running sessions:

  • we would not lock ourselves out of defining extensibility points for clients and servers to support an additional SSE "subscribing" channel to listen on a session.
  • Potentially allow the ability to "return early" with a response body when full async is not required for a given call. When a server always returns in the same http request, this would essentially define a stateless server. The server maintainer could decide at any time to start leveraging the callback URI as the server evolves and starts operating more asynchronously.

Preview Was this translation helpful? Give feedback.

2  3 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

0 replies

Write a reply

Comment options

Quote reply

edited

[

kentonv](/kentonv)

Mar 4, 2025

Original comment in English - Translate to English

Coming to this thread a bit late, but speaking for Cloudflare Workers:

Statefulness is just fine for us. Durable Objects are all about handling stateful protocols. The original stateful MCP protocol over a WebSocket transport should be a great fit for MCP servers built on Workers.

A protocol involving session IDs would also be OK -- it's trivial for Workers to route requests with the same session ID to the same Durable Object, where its state is tracked. The main problem is lifecycle: if the MCP client disappears without explicitly ending the session, how does the MCP server decide when it can clean up? WebSockets are nice because you naturally clean up when the connection is closed. So MCP servers built on Workers would probably prefer a stateful WebSocket-based protocol, but could also live with session IDs.

I am not sure how a session token that "Encodes all session state itself" would work exactly, but it sounds like complexity that wouldn't benefit Workers users.

Preview Was this translation helpful? Give feedback.

16  17 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @the-vampiirethe-vampiire reacted with thumbs up emoji -1 3 @rauchg@jaredpalmer@lgrammelrauchg, jaredpalmer, and lgrammel reacted with thumbs down emoji

All reactions

  • +11 reaction
  • -13 reactions

9 replies

@daviddenton

Comment options

Quote reply

Original comment in English - Translate to English

Speaking selfishly (because I'm maintaining a serverside SDK for MCP 😄):

WebSockets are nice because you naturally clean up when the connection is closed
Overall, we're bearish on Websockets over SSE or straight HTTP because of the inherent complexity - we've found they don't really play nicely (and definitely not with serverless workers).

The main problem is lifecycle: if the MCP client disappears without explicitly ending the session, how does the MCP server decide when it can clean up?
For SSE session termination, we've implemented a regular "ping" event which is sent from the server through the SSE. On failure, it just cleans up the server-side state.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @elithrarelithrar reacted with thumbs up emoji

All reactions

  • +11 reaction

@kentonv

Comment options

Quote reply

Original comment in English - Translate to English

FWIW, in Cloudflare Workers, WebSocket is very much preferred over SSE. We have a lot of infrastructure around WebSockets that is missing for SSE, like Durable Object hibernation (which allows the server to shut down while the WebSocket is idle and start back up when something interesting happens, without losing the connection).

More generally (not specific to Workers), I find that trying to stream events over plain HTTP (in either direction; requests or responses) doesn't work well because proxies commonly assume that they can safely buffer the body bytes, and there's no standard way to tell them not to. Whereas when the proxy sees a WebSocket, it knows that obviously it should not buffer WebSocket messages. So WebSockets ironically tend to have fewer compatibility issues with CDNs and other middleboxes.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 4 @elithrar@MellowYarker@the-vampiire@MichalKalitaelithrar, MellowYarker, the-vampiire, and MichalKalita reacted with thumbs up emoji

All reactions

  • +14 reactions

@topherbullock

Comment options

Quote reply

Collaborator

Original comment in English - Translate to English

I find that trying to stream events over plain HTTP (in either direction; requests or responses) doesn't work well because proxies commonly assume that they can safely buffer the body bytes

got bit by this same issue and it led to a long night of pleading and poking at settings to get a Google Cloud Run app to play nicely and stream over SSE.

WebSocket is very much preferred over SSE

Same from the Shopify standpoint / Ruby + Rails ecosystem. The POST request and side-channel SSE dance is a pain to implement and there isn't a lot out of the box for this. ActionCable is much easier to adopt and more mature; will be easier to integrate as a standard transport in the Ruby SDK.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

heart 4 @elithrar@VoloBuilds@Jflick58@MichalKalitaelithrar, VoloBuilds, Jflick58, and MichalKalita reacted with heart emoji

All reactions

  • heart4 reactions

@dsp-ant

Comment options

Quote reply

Maintainer

Original comment in English - Translate to English

This is super useful information.

One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@atesgoral

Comment options

Quote reply

edited

Collaborator

Original comment in English - Translate to English

Going back to the motivation behind my suggestion above: After a regular POST initialize handshake, the returned WS stream URL can be a signed URL that can embody anything from authentication to some sort of session state (and thus no need for custom headers).

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @Jflick58Jflick58 reacted with thumbs up emoji

All reactions

  • +11 reaction

@Kludex

Comment options

Quote reply

Collaborator

Original comment in English - Translate to English

This is super useful information.

One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.

What I usually recommend is to use the protocols parameter from the WebAPI and send the Authorization and the token as subprotocols.

So you can denial the websocket connection, and send a 401 HTTP response, because the connection was not upgraded yet.

Disclaimer: I maintain uvicorn and starlette. The server dependencies of the MCP package in Python.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 2 @elithrar@Wh1isperelithrar and Wh1isper reacted with thumbs up emoji

All reactions

  • +12 reactions

@daviddenton

Comment options

Quote reply

edited

Original comment in English - Translate to English

This is super useful information.

One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.

From the JVM perspective - so speaking entirely selfishly since I operate mostly in that space🙃 - websocket server implementations are fairly badly featured/inconsistent/untestable/trickier to secure, so I believe requiring them would definitely hinder adoption in that space. Our current SDK model is to provide standard protocol options for SSE + WS + straight JsonRpc (no streaming) + StdIo.

That said, a regular POST plus redirect to Websocket makes good sense here from a usability and consistency viewpoint - ie if you want streaming then expect an endpoint to be returned from the initial call.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@kentonv

Comment options

Quote reply

edited

Original comment in English - Translate to English

It's indeed annoying that the in-browser WebSocket API does not allow you to set headers. IMO they should allow you to set the Authorization header at least. But we don't control that.

  • I suspect most non-browser WebSocket implementations will let you set headers, since the specific security concerns motivating this restriction don't really apply outside a browser. Certainly Cloudflare Workers supports setting headers on outgoing WebSockets. Do we expect MCP clients to be running in-browser or do we expect this to be server-to-server communications?
  • The "recommended" approach is of course to send credentials as the first message on the WebSocket. Of course, this doesn't always fit nicely into HTTP frameworks that would like to do authorization as middleware.
  • A simple compromise is to just put the token in the URL. This might have some security concerns due to the fact that some systems may assume URLs are not secrets and might decide to log them and such, but perhaps that can be mitigated by using a single-use short-lived token? (I think this is what @daviddenton is suggesting, the initial POST would be authenticated and then redirect to a single-use secret URL.)
  • I haven't heard of @Kludex's approach before of using the protocols parameter, but that does seem like a neat hack and I can't think of a real problem with it!

My feeling right now is: Support the regular Authorization header and also support stuffing the token into the protocols as a work-around for browsers. Show what we've done to browser-makers and hope that it bothers them enough that they just start supporting setting the Authorization header directly in the API.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 4 @Kludex@irvinebroque@the-vampiire@Jflick58Kludex, irvinebroque, the-vampiire, and Jflick58 reacted with thumbs up emoji

All reactions

  • +14 reactions

@tristanz

Comment options

Quote reply

Original comment in English - Translate to English

In multi-tenant setup, what's a session duration? The user, thread, connection, turn, tool call?

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

[

thdxr](/thdxr)

Mar 7, 2025

Original comment in English - Translate to English

we're working on solving internal operations things over at SST for our users and letting them ship tools in a lambda is super important. it becomes a no brainer vs something they have to think about if it has to be containerized

option 2 is obviously the simplest for us - and we actually already built this in the short term so we can get moving. bridge mcp server that can talk to a stateless implementation of the mcp protocol hosted at some url

Preview Was this translation helpful? Give feedback.

6  7 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 4 @sean-roberts@boristane@mattzcarey@olaservosean-roberts, boristane, mattzcarey, and olaservo reacted with thumbs up emoji rocket 1 @asaxonasaxon reacted with rocket emoji eyes 1 @asaxonasaxon reacted with eyes emoji

All reactions

  • +14 reactions
  • rocket1 reaction
  • eyes1 reaction

0 replies

Write a reply

Comment options

Quote reply

[

ognis1205](/ognis1205)

Mar 7, 2025

Original comment in English - Translate to English

Following.

Preview Was this translation helpful? Give feedback.

1  2 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

0 replies

Write a reply

Comment options

Quote reply

[

gingerhendrix](/gingerhendrix)

Mar 7, 2025

Original comment in English - Translate to English

For client -> server - Just remove the SSE transport from the spec and have everyone use stdio. Developers are free to implement any protocol they wish to connect to their web service and then expose the client as an MCP-server.

This is the "paving the cowpaths" way, it's what most MCP servers in the wild are already doing (e.g Dax's comment), and it leaves developers to come up with the best solution for their needs. This also leaves the door open to future standardization on (possibly multiple) protocols more suited to client -> server.

(MCP Host / Client) --- MCP/STDIO ---> (standardized MCP Server/Web Client) --- Some new thing ---> (Web Service)

The SSE transport could still be used - but now via a standard client npx mcp-see https://my-cool-tool.ai/mcp. Similar clients could be made for websockets etc.


I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks.

Preview Was this translation helpful? Give feedback.

1  2 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

-1 1 @ananis25ananis25 reacted with thumbs down emoji

All reactions

  • -11 reaction

2 replies

@daviddenton

Comment options

Quote reply

Original comment in English - Translate to English

I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks.

For server->server we were working under the assumption that each server can simply connect to the other as a client - that would be simple and mean that each participant had access to the full range of tools from the other.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@atesgoral

Comment options

Quote reply

edited

Collaborator

Original comment in English - Translate to English

Server-to-server, a server that hosts an agent (red process boundary) is also an MCP Client:

image

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

[

thiagovictorino](/thiagovictorino)

Mar 7, 2025

Original comment in English - Translate to English

Good morning, folks!

Maybe I’m too unfamiliar with this subject to offer a fully informed opinion, but I can share my experience with MCP as a developer user.

From my perspective, I’d go all in with HTTP requests. It could significantly increase the number of available servers since it opens up opportunities for people to monetize them. In my experience with MCP, a single request is usually enough to get what I need—I don’t have to listen for ongoing updates. This makes synchronous communication simple to implement and straightforward to use.

I suggest keeping the current SSE approach but adding this new HTTP-based option, each with its own pros and cons. The server’s developer can then decide which protocol best suits their needs.

Just sharing my two cents—keep rocking!

Preview Was this translation helpful? Give feedback.

5  6 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

1 reply

@daviddenton

Comment options

Quote reply

Original comment in English - Translate to English

For the perspective of those of us building out MCPs SDKs, adding a straight stateless JSONRPC (without negotiation and hence no notifications or sampling), was actually quite simple, as was WS - so it seemed a shame not to speculatively add it. 🙃

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

edited

[

sean-roberts](/sean-roberts)

Mar 7, 2025

Original comment in English - Translate to English

From my understanding, the biggest issue with supporting standard HTTP endpoint calls is that there isn't a means for the server to do sampling, same connection resource change subscriptions, or general server-initiated communication. But what if that is the trade-off we're asking for the spec to make? Every MCP server that wants to do server-initiated communication should use SSE/WS style transports. This is just the common expectation for the web in general - you have to either implement polling/SSE/WS to do it or you're looking at client-initiated transactions only.

I can only think of a handful of use cases that would want to support sampling, but countless that only want to expose their capabilities reliably using the systems they understand today. Will this change in the future? Maybe .. but we should build a system to support developers today with the option to evolve it to support future capabilities. This is why there is a capabilities negotiation layer.

Preview Was this translation helpful? Give feedback.

7  8 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

4 replies

@mbleigh

Comment options

Quote reply

Original comment in English - Translate to English

I agree that looking in the ecosystem today it seems like >90% of all MCP servers are doing stateless things that do not require subscriptions or ongoing connections. Most of them are tools or prompts.

To that end, it seems like Option 2 could be pursued -- capabilities negotiation can determine whether a stateful connection is required, and servers and clients both can avoid the added complexities of persistent connections if they don't need it. Option 1 could be pursued in parallel to make it easier to build stateful connections on stateless infra (along with e.g. a webhook transport spec).

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @TeemuSoTeemuSo reacted with thumbs up emoji

All reactions

  • +11 reaction

@boxabirds

Comment options

Quote reply

Original comment in English - Translate to English

Exactly. Consider game creation in AI coding agents. It'll be pretty wizzy for the agent to be able to convert a user's plain English request into media asset discovery / generation & procurement, directly into the project. At best it's stateless, and at worst it needs an auth token for context just like normal web services.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@cramforce

Comment options

Quote reply

Original comment in English - Translate to English

👍

HTTP has this right:

  • Stateless by default
  • Session token on initial connect if stateful
  • Use existing UPGRADE mechanism to switch to websocket when desirable

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 4 @wycats@rauchg@Mocuto@richard-rancewycats, rauchg, Mocuto, and richard-rance reacted with thumbs up emoji

All reactions

  • +14 reactions

@wycats

Comment options

Quote reply

Original comment in English - Translate to English

@cramforce I agree with this completely.

If we stick closely to stateless HTTP (w/ WebSocket upgrade), then MCPs are "just another kind of server" from the perspective of quite a bit of infrastructure. This is a very valuable property, and it's hard to find any MCP constraints that make it so special that it needs a special kind of network protocol.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 2 @samuelpath@rauchgsamuelpath and rauchg reacted with thumbs up emoji

All reactions

  • +12 reactions

Write a reply

Comment options

Quote reply

[

VoloBuilds](/VoloBuilds)

Mar 10, 2025

Original comment in English - Translate to English

Personally, I'd go with Option 3.

MCP is supposed to make it easy for AI agents to integrate with tools and resources. This is a data integration problem. The industry standard for integrating data across platforms are REST APIs. This is what 99% of companies will already have up and running. The burden of integration for MCP is largely on the server developers - and expecting them to not only create a new set of endpoints but to run their software in an entirely different way (requiring long-running servers) feels absurd to me.

You could argue that it is to support additional capabilities. But the two main capabilities I am seeing above are 'sampling' and the server informing the client about updated resources/capabilities. The latter is easily solved 90% of the time by the client polling the server - and for the last 10%, the server can simply reply with a 400-level error.

As far as 'sampling' - I believe this is an anti-pattern and should be out of scope for MCP. If servers need AI capabilities to properly respond to tool/resource requests, they should implement that behind their API. They shouldn't have to depend on unpredictable AI capabilities of an unknown client. I don't think this capability should even be something that servers should be able to do. It creates security issues where servers can covertly request sensitive data that clients may have. It also adds unnecessary risk for client developers since servers can effective utilize the client's AI tokens. I'm not sure why a client developer would even build support for sampling given these concerns (what do they really have to gain?) - speaking of which, none of the current documented clients have support for sampling: modelcontextprotocol.io/clients

Any other more complex server-client interactions should be handled by multiple separate tool/resource calls.

In my opinion, a stateless version is an absolute must. Many developers are using serverless solutions and long-running servers/connections are a non-option for them. So at a minimum, we should go with Option 2. But I would go a step further and simplify the protocol by removing features which (in my opinion) shouldn't be there in the first place.

Preview Was this translation helpful? Give feedback.

16  17 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 3 @A-D-I-T-Y-A@dzhechko@ysm-devA-D-I-T-Y-A, dzhechko, and ysm-dev reacted with thumbs up emoji -1 1 @r03tr03t reacted with thumbs down emoji heart 2 @MichalKalita@ysm-devMichalKalita and ysm-dev reacted with heart emoji eyes 2 @asaxon@ysm-devasaxon and ysm-dev reacted with eyes emoji

All reactions

  • +13 reactions
  • -11 reaction
  • heart2 reactions
  • eyes2 reactions

8 replies

 Show 3 previous replies 

@sean-roberts

Comment options

Quote reply

Original comment in English - Translate to English

I generally agree with you @VoloBuilds

For those suggesting there aren't security concerns with it.. the security concern with sampling is not with the communication mechanism; it's with the authorization, scoping, and access controls that are not solved with it yet when it comes to a remote server reaching into a local computer. This area of security controls is less of an issue for programs talking to one another on the same system.

Where I don't agree is that we don't have to throw out the parts that aren't universally applicable because they aren't universally needed. There is utility and opportunity to leverage these systems in the future - specifically after these security concerns have proper answers.

I don't think Option 3 is really a viable option - not impossible but has massive negatives. MCP has already gained a lot of traction and removing these is not only going to be a substantial impact to the community (very early as well) and it's not necessarily for the best. Option 2 is what I believe is the right path, support simple paths while also supporting existing implementations. The "complexity" of this should be easily removed if the focus changes from MCP being concerned with the end-to-end process and focusing on payloads and transports as separate parts. As a client or a server, you should be focusing on payloads (probably generating these) and the transports sort themselves out in the negotiation phase. If it's SSE vs WS vs Streaming vs RPC vs ... it shouldn't matter so long as the system can respond to events and provide the correct structure.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @Jflick58Jflick58 reacted with thumbs up emoji

All reactions

  • +11 reaction

@Jflick58

Comment options

Quote reply

Original comment in English - Translate to English

I agree that if sampling is truly a core part of the protocol, I would like to see official support and/or examples. As of now, I don't feel confident in the value prop of sampling especially given the tradeoffs we're discussing in protocol implementation.

That being said - "a simplified OpenAPI alternative with a few standardised endpoints." is exactly what I'm looking for, especially if it means I can pull in 3rd party implementations easier. I see managing the LLM as a separate task from managing the resources a semi-agentic LLM can access. In fact, I prefer that be decoupled for ease of building model-agnostic solutions. The reality for me (and the org that I'm a part of where I'm trying to push MCP) is that the alternative to MCP is building a custom set of APIs to centralize tool/prompt/resource management to pair with our various LLM deployments and interfaces.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 1 @VoloBuildsVoloBuilds reacted with thumbs up emoji

All reactions

  • +11 reaction

@Mocuto

Comment options

Quote reply

edited

Original comment in English - Translate to English

It's not clear to me why sampling can't be implemented in a stateless manner via webhooks. Ideally the server sends the client a sampling request with the session ID and some kind of sampling ID and the client can decide how it wants to handle it, either ignoring it or satisfying it according to its own logic. The client's response can include the sampling ID and the session ID which can allow the server to recover any state-related information upon receiving the response. Are there any flaws in this approach?

Ideally, if a tool requires sampling, that could be specified up front in its definition structure so that application/agent developers can reason about it explicitly.

Forgive me if this is a dumb comment, I'm currently doing a review of MCP as part of a broader literature review on agent communication protocols and am still ramping up on the space.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 2 @VoloBuilds@MichalKalitaVoloBuilds and MichalKalita reacted with thumbs up emoji

All reactions

  • +12 reactions

@sean-roberts

Comment options

Quote reply

Original comment in English - Translate to English

It's not a dumb comment and it's true, however, the spec isn't oriented around these stateless approaches. Another challenge is that local computers would require a webhook to execute this as well. That itself is not an insurmountable problem but would open up a lot of new problems for allowing servers to callback into local clients. For web

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

@richard-rance

Comment options

Quote reply

Original comment in English - Translate to English

Sampling is the one part of the spec that didn't make sense to me. What are some examples as to why you would use sampling instead of treating the tool as an independent sub-agent with its own access to run a model and context? If it is simply about saving the tool from configuring/running a model then I wonder if the MCP Host could provide a signed URL that the tool could make a simple request to.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

[

lloydzhou](/lloydzhou)

Mar 11, 2025

@jspahrsummers

我在生产实践中,也遇到了类似的问题。
我想我找到了一个相对比较好的抽象方式(应该是按照方案一的模式)使得mcp server能通过session_id保持状态,同时也能使用传统的方式做到很好的扩展

  1. 重新抽象一个基于nchan.ioTransport

SseServerTransport完全不同,这个Transport作为一个nginx module实际是在mcp server的前面同时作为网关Gateway一样的形式存在
同时支持sse + websocket

  1. 通过nchan_publisher_upstream_request将mcp client发送过来的ClientRequest转发到后端的mcp server Image

  2. 实现一个HTTMCP继承FastMCP(可以使用内置的装饰器定义tools/resource...)

  3. 放弃内置的server.run方法启动,而是将所有的request_handler注册到一个APIRouter上 Image

  4. 最后将一个或者多个mcp server挂载到fastapi的app上,以普通的web api的方式部署。 Image

系统可用性

  1. nchan是一个专业的做im通信的软件可以支持比较高的并发访问能力
  2. nchan作为Transport的同时,也作为Gateway存在,其底层基于nginx,也是一个高性能的web服务器。
  3. 这里使用nginx的upstream和真实的mcp server通信。通信协议是http,后端可以使用任意一种传统的web服务器开发mcp功能

系统扩展性

  1. nchan支持以redis cluster做存储进行横向扩展。
  2. 这里使用nginx upstream和mcp server通信,后端的mcp server也很容易通过增加负载均衡的方式进行扩展。

Preview Was this translation helpful? Give feedback.

1  2 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

eyes 2 @asaxon@lloydzhouasaxon and lloydzhou reacted with eyes emoji

All reactions

  • eyes2 reactions

2 replies

@Jflick58

Comment options

Quote reply

Original comment in English - Translate to English

Claude provided translation:

@jspahrsummers
In my production practice, I've encountered similar issues. I think I've found a relatively good abstraction method (which should follow pattern one) that allows the mcp server to maintain state through session_id while also achieving good scalability using traditional methods.

  1. Re-abstract a Transport based on nchan.io Unlike SseServerTransport, this Transport actually exists in front of the mcp server as a kind of Gateway in the form of an nginx module, supporting both sse + websocket
    1. Forward ClientRequest from mcp client to the backend mcp server through nchan_publisher_upstream_request
    2. Implement an HTTMCP that inherits from FastMCP (can use built-in decorators to define tools/resources...)
    3. Abandon the built-in server.run method to start, but instead register all request_handlers to an APIRouter
    4. Finally, mount one or more mcp servers on the fastapi app and deploy them as ordinary web APIs.

System availability:

  1. nchan is professional software for IM communication that can support relatively high concurrent access capabilities
  2. nchan exists as both a Transport and a Gateway, with its underlying technology based on nginx, which is also a high-performance web server.
  3. Here, nginx's upstream is used to communicate with the real mcp server. The communication protocol is http, and the backend can use any traditional web server to develop mcp functionality.

System scalability:

  1. nchan supports horizontal scaling using redis cluster for storage.
  2. Here, nginx upstream is used to communicate with the mcp server, and the backend mcp server can also be easily scaled by adding load balancing.

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

heart 2 @atesgoral@lloydzhouatesgoral and lloydzhou reacted with heart emoji

All reactions

  • heart2 reactions

@lloydzhou

Comment options

Quote reply

edited

Original comment in English - Translate to English

I drew a sequence diagram, which should illustrate the entire process well:

  1. the session_id (which is equivalent to nchan_channel_id) will remain from the start of the connection until the connection ends.

  2. If the task has a short execution time, it can return directly.

  3. If the task has a long execution time, it can be executed in an asynchronous queue, only requiring progress to be pushed in between and the final result to be pushed at the end.

    Loading

sequenceDiagram MCP Client->>NCNAN: connect activate NCNAN MCP Client-->>NCNAN: jsonrpc request NCNAN-->>FastAPI: nchan_publisher_upstream_request FastAPI-->>MCP Server: call_tool(name, args) MCP Server-->>FastAPI: result FastAPI-->>NCNAN: jsonrpc response NCNAN-->> MCP Client: jsonrpc response

MCP Client-->>NCNAN: jsonrpc request NCNAN-->>FastAPI: nchan_publisher_upstream_request FastAPI-->>MCP Server: call_tool(name, args) in backend MCP Server-->>NCNAN: push notification NCNAN-->> MCP Client: notification MCP Server-->>NCNAN: push jsonrpc response NCNAN-->> MCP Client: jsonrpc response NCNAN->> MCP Client: close deactivate NCNAN

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

edited

[

kfischer-okarin](/kfischer-okarin)

Mar 11, 2025

Original comment in English - Translate to English

I'm late to the conversation and just a random guy who started using and implementing MCPs recently.

Option 1 makes the most sense to me, and the con could be mitigated since I think the state management could be handled as a server-agnostic persistence wrapper that the serverless containers would use.

Basically with a session state structure like this:

interface Subscription { subscribeRequest: SubscribeRequest, resourceHash: string, }

interface MCPSessionState { initializeMessage: InitializeRequest; latestResponses: { toolList?: Tool[]; promptList?: Prompt[]; resourceList?: Resource[]; resourceTemplateList?: ResourceTemplate[]; }; subscriptions: Subscription[]; }

Starting a new session

  • Initialize a new session state object with the received initialize request
  • Store it somewhere using the session ID (assuming the transport layer generates that somehow)
  • Spin up the actual server as subprocess etc. and bridge all communication
    • Remember which of the listChanged capabilities are available

During a running Session (while the serverless container is alive)

  • When responding to a {feature}/list request and the server has the respective listChanged capability:
    • Store the server response in the respective latestResponse property
  • When receiving a resources/subscribe request from the client:
    • Remember the subscription by:
      • Storing the subscription message
      • Retrieving the resource from the server, calculating a hash and storing that
  • When receiving a resources/unsubscribe request:
    • Remove the matching subscription
  • When the server sends a notifications/resources/updated notification:
    • Retrieve the resource yourself too and update the hash of the subscription

Continuing a running session (in a new serverless container etc.)

  • Retrieve the session state object via the session ID
  • Spin up the server
  • Replay the initialize message
    • Ignore the response (assuming the MCP servers are deterministic with regard to their initialize message exchanges)
  • For each stored latest response:
    • Retrieve the current list from the server
    • If it is different from the stored result, send a notifications/{feature}/list_changed notification to the client and update the stored value
  • For each subscription:
    • Replay the subscription message
    • Handle resource changes that occurred between session interruption and continuation:
      • Retrieve the resource and compare its hash to the stored hash; if changed, send a notifications/resources/updated notification to the client

Preview Was this translation helpful? Give feedback.

1  2 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

1 reply

@kfischer-okarin

Comment options

Quote reply

Original comment in English - Translate to English

Of course if the protocol would evolve to, for example add something like a "HEAD" style requests for lists and resources that returns only a last_updated_at timestamp then some of the bookkeeping inside the wrapper could become much more simple since only the timestamps would need to be compared to

But that might be out of scope for this discussion... This idea assumes the current stable protocol

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

edited

[

gtrguy17](/gtrguy17)

Mar 14, 2025

Original comment in English - Translate to English

This seems to be a good option:

blog.blaxel.ai/how-we-optimized-mcp-servers-for-serverless

WebSockets: a game-changer for MCP

Facing these challenges, we made the decision to switch from SSE to WebSockets for all our MCP server communications. Despite the fact that MCP documentation doesn't extensively discuss WebSockets, they are officially supported - and as we discovered, they work significantly better in cloud environments.

Why WebSockets outperform SSE for MCP servers

WebSockets establish a persistent, full-duplex TCP connection between client and server, allowing for bidirectional communication. This architecture offers several advantages over SSE for MCP servers:

  1. Connection Stability: WebSockets maintain more stable connections, with built-in mechanisms for handling disconnections and reconnections.
  2. Bidirectional Communication: While MCP often doesn't require extensive client-to-server communication, having the capability for bidirectional data flow eliminates the need for separate HTTP requests for client-initiated actions.
  3. Binary Data Support: WebSockets can transmit both binary data and UTF-8 text, whereas SSE is limited to UTF-8. This provides more flexibility for different types of data exchange.
  4. Better Performance: WebSockets typically offer lower latency and overhead compared to SSE, especially for frequent communications.
  5. No Connection Limits: WebSockets don't suffer from the same browser connection limits as SSE, making them more suitable for applications where users might have multiple tabs open.

Forking Supergateway

To implement our WebSocket solution, we forked the Supergateway project and modified it to use WebSockets instead of SSE. The core changes involved:

  1. Protocol Adaptation: Modifying the communication layer to use WebSocket protocol instead of HTTP streaming.
  2. Connection Management: Implementing robust connection handling with automatic reconnection logic.
  3. Error Handling: Enhancing error detection and recovery mechanisms to ensure reliable operation in cloud environments.
  4. Scaling Optimizations: Adding features to better support horizontal scaling across multiple instances.

Our modified version of Supergateway is available on GitHub as Blaxel's Supergateway, and we welcome contributions and feedback from the community!

Technical implementation: WebSockets for MCP

For those interested in the technical details, here's how we implemented WebSockets for our MCP servers. Please note that the entire code can be found in open-source on our GitHub on Blaxel's Supergateway and Blaxel’s SDK.

Preview Was this translation helpful? Give feedback.

2  3 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

1 reply

@EditUndo

Comment options

Quote reply

Original comment in English - Translate to English

Kudos on this solution and the article was a great read, but its unclear to me how this resolves the stateful/stateless debate? Wouldn't long lived serverless functions be an anti-pattern?

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Comment options

Quote reply

edited by atesgoral

[

Lurrobert](/Lurrobert)

Mar 15, 2025

Original comment in English - Translate to English

Could the MCP team please add WebSockets? It personally feels like it is the "smartest universal solution" led to overly increased complexity, making it hard to adopt and build on top of the tech, and no one is happy....

SSE: a promising start with disappointing Results

During our search for better solutions, we discovered Supergateway, a tool that wraps stdio-based MCP servers with Server-Sent Events (SSE). On paper, this looked like an elegant solution.

For those unfamiliar with SSE, it's a technology that establishes a one-way communication channel from server to client over HTTP. Unlike WebSockets, which provide full-duplex communication, SSE is designed specifically for server-to-client updates. This makes it seemingly ideal for scenarios where clients primarily need to receive updates from servers.

We implemented Supergateway with SSE, but quickly ran into significant issues:

What are the problems with SSE in serverless environments

  1. Connection Instability: In serverless environments, SSE connections dropped randomly and frequently. This is particularly problematic for AI agents that need reliable, persistent connections to function properly.
  2. Scaling Challenges: As we tried to scale our platform, the limitations of SSE became increasingly apparent. The protocol wasn't designed with cloud-native architectures in mind.
  3. Browser Connection Limits: SSE suffers from a limitation to the maximum number of open connections, which is set to a very low number (6) per browser and domain. This became problematic when users opened multiple tabs.
  4. Proxy and Firewall Issues: Some proxies and firewalls block SSE connections because they don't have a Content-Length header, creating deployment challenges in enterprise environments.

After extensive testing, we concluded that while SSE might work well for simpler use cases or controlled environments, it wasn't robust enough for our cloud-based AI agent platform.

(Edited to replace the screenshot with text in the original post. Point 2 above was highlighted. - @atesgoral)

Preview Was this translation helpful? Give feedback.

1  2 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

0 replies

Write a reply

Comment options

Quote reply

edited

[

atesgoral](/atesgoral)

Mar 15, 2025

Collaborator

Original comment in English - Translate to English

WebSockets can surely be a more ergonomic transport for persistent connections, especially since they provide full-duplex message transfer out-of-the-box.

However some of the SSE criticism and WebSocket praising above is a bit misinformed.

SSE is not a protocol upgrade in the same sense WebSockets is. SSE is plain old HTTP, using "text/event-stream" as content type, with the server sending events as streaming body chunks. It's up to the SSE server and clients to reestablish the connection and make use of SSE event ids for reliable delivery. But most clients are implemented very naively and are not robust. This is not an inherent problem with SSE itself. SSE actually has some built-in reconnection capabilities in the spec (Last-Event-ID header and automatic reconnection in the EventSource API) that are often overlooked.

WebSocket is also not this magical transport with connection and delivery guarantees. It's just a persistent TCP socket. It's still up to the clients and servers to properly implement robustness measures like pings and reconnects. There are mature WebSocket client/server libraries out there that can create the impression that robustness is a built-in feature of WebSockets. You can experience this by implementing your own WebSocket client from scratch, and then reinventing all the robustness measures that libraries that have been around for a while have implemented.

Also, in the end, SSE is also just another persistent TCP socket, but with only one side doing the talking (if we ignore HTTP/3 QUIC).

Preview Was this translation helpful? Give feedback.

1  2 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 4 @lloydzhou@EditUndo@jspahrsummers@Wh1isperlloydzhou, EditUndo, jspahrsummers, and Wh1isper reacted with thumbs up emoji

All reactions

  • +14 reactions

0 replies

Write a reply

Comment options

Quote reply

edited

[

jspahrsummers](/jspahrsummers)

Mar 17, 2025

Maintainer Author

Original comment in English - Translate to English

Thank you all for the discussion here! This has been a very thorny topic with a lot of tradeoffs, no matter which path we pursue.

Incorporating all of the input we've heard here and elsewhere, we are proposing a new Streamable HTTP transport, which roughly amounts to plain HTTP plus optional SSE—very similar to what @jerome3o-anthropic described above. Please take a look at that PR description and the associated spec language, and flag any questions or concerns!

Note that other options, like WebSocket, remain possible in future, but we believe strongly this is the right path to pursue for now. All of your comments are greatly appreciated! 🙇

Preview Was this translation helpful? Give feedback.

4  5 
  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

+1 5 @Fraggle@Lurrobert@laszlovandenhoek@the-vampiire@r03tFraggle, Lurrobert, laszlovandenhoek, the-vampiire, and r03t reacted with thumbs up emoji heart 2 @jerome3o-anthropic@r03tjerome3o-anthropic and r03t reacted with heart emoji

All reactions

  • +15 reactions
  • heart2 reactions

1 reply

@r03t

Comment options

Quote reply

Original comment in English - Translate to English

Nice. Go simple live life!!!

Preview Was this translation helpful? Give feedback.

  • +1
  • -1
  • smile
  • tada
  • thinking_face
  • heart
  • rocket
  • eyes

All reactions

Write a reply

Add a comment

Comment

Write Preview

Heading

Bold

Italic

Quote

Code

Link


Numbered list

Unordered list

Task list


Attach files

Mention

Reference

Saved replies

Slash commands


Menu

  • Heading
  • Bold
  • Italic
  • Quote
  • Code
  • Link
  • Numbered list
  • Unordered list
  • Task list
  • Attach files
  • Mention
  • Reference
  • Saved replies
  • Slash commands

Select a reply

Loading 

Create a new saved reply

There was an error creating your Discussion.

Slash commands

Preview

Loading

Slash commands

Preview

An unexpected error has occurred

Add your comment here...

We don’t support that file type.

Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.

Attaching documents requires write permission to this repository.

Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.

This file is empty.

Try again with a file that’s not empty.

This file is hidden.

Try again with another file.

Something went really wrong, and we can’t process that file.

Try again.

Markdown is supported

Paste, drop, or click to add files

Nothing to preview

Comment

Remember, contributions to this repository should follow its contributing guidelines, security policy, and code of conduct.

Category

[bulb

Ideas](/modelcontextprotocol/specification/discussions/categories/ideas)

Labels

None yet

44 participants

@jspahrsummers @wycats @mbleigh @gingerhendrix @Fraggle @atesgoral @cramforce @boxabirds @allan-simon @tristanz @dave-shawley @thiagovictorino @clareliguori @thdxr @sean-roberts @pcingola @daviddenton @lloydzhou @calclavia @topherbullock @artpi and others

Notifications

    Subscribe

You’re not receiving notifications from this thread.

Events

OK now do not give just a high level summary, give the full detail, including all specific technical points of contention (keep to the blow by blow account format)

Now summarise the key issues, points of contention, arguments made and who took which sides and why. Do not use the blow by blow format, instead use structured headings with multiple subheadings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment