You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below is a chronological “blow by blow” summary of the discussion. Each heading corresponds to a comment or set of closely related comments on a given date. I’ve highlighted who spoke, when, and the main takeaways.
Context: MCP (Model Context Protocol) is currently stateful (long-lived connections). This enables notifications, server-initiated sampling, logging, etc., but makes serverless deployments tricky. Many serverless platforms don’t allow or prefer long-lived connections.
Problem: “Autoscaling, ephemeral” serverless environments have short request lifetimes and can’t easily handle SSE or a single persistent connection.
Possible solutions:
Session tokens – The connection is re-modeled as repeated short calls with a token (i.e., storing or encoding session state).
Two protocol variants – “Stateless” for simpler servers; “Stateful” for richer features.
Fully stateless – Radically simpler, but we lose features like notifications and server-initiated messages.
He asks for feedback on whether this is a real problem, and which direction might solve it.
Mentions that “notifications.*” are a way to do some server->client messaging, but acknowledges the difficulty of fully bidirectional messaging on HTTP.
Building a hosting platform for MCP servers. SSE is hard to scale because it’s inherently stateful; multiple replicas get complicated with sticky routing.
Wonders why SSE was chosen over websockets or gRPC.
Summarizes the “direction he’s leaning”: Let typical client->server calls happen over standard HTTP, and let server->client notifications be optional SSE. So you only do the “long lived” piece if you want the extra features.
Announces a proposal for a new “Streamable HTTP transport” — basically plain HTTP request/reply plus optional SSE for streaming.
Notes that WebSockets remains possible in the future, but for now “plain HTTP + optional SSE” is the path.
Shortly after this, the discussion is closed as resolved.
Overall Takeaways
The Problem
Maintaining a long-lived, stateful connection is difficult on autoscaling or serverless systems.
SSE requires “sticky” routing or a specialized load-balancer approach to route subsequent calls to the same instance holding state.
Proposed Approaches
“Stateless” or “HTTP-only” calls for quick tool calls.
Optional SSE or WebSockets for advanced features (notifications, server-initiated messages).
Possibly store session data externally if you really need to scale.
Final Decision
The maintainers lean toward a simpler “HTTP + optional SSE” approach, known as the “Streamable HTTP transport.”
Leaves the door open for WebSockets or extended sessions in the future.
Thus, the entire thread moves from the initial question—“Is our stateful SSE approach too limiting?”—to a fairly strong consensus that “Yes, do something simpler by default (plain HTTP), with SSE or WS as an add-on for advanced use, so we can accommodate ephemeral or serverless environments.”
That’s the high-level summary of every major comment and date in the discussion.
Below is a detailed, comment-by-comment “blow by blow” of the entire GitHub discussion, with special attention to the technical arguments and points of contention around stateful vs. stateless connections, SSE vs. WebSockets, serverless constraints, etc. The headings correspond to chronological order and group related replies as they appeared in the thread. I’ve preserved each participant’s main technical points (and reasoning) while also including date/timestamps where relevant.
Points out that there’s a difference between protocol-level state (like SSE subscription or open notifications) vs. application-level state (like a stateful agent in memory).
The latter might require a persistent connection anyway, depending on the use case.
Points out that SSE plus multiple Kubernetes pods can break easily if you can’t do sticky routing (since SSE might go to Pod A, but the next POST could arrive at Pod B).
Likes that approach for notifications, but points out that “server making requests of the client” (like sampling) is trickier if we only have SSE. SSE is one-directional, so the server can’t truly prompt the client for something unless the client is also polling or calling back.
Believes “stateless” is simpler for scale/federation, while advanced “long-lived” features remain an option.
January 2–9, 2025 (overlapping dates)
(Some additional conversation references the earlier question by [@orliesaurus]; also [@calclavia] or [@atliesaurus] appear in the upcoming timeline. Skipping minor interim placeholders.)
The original reason for SSE: it’s “pure HTTP” and simpler for many web frameworks to get started.
But acknowledges complexities for load balancing, sticky routing.
Mentions that “the question of SSE vs. WebSockets vs. event logs, etc.” is partly separate from the overall stateful vs. stateless question. Both SSE and WebSockets require holding connections.
[@jspahrsummers]: This might be the best of both worlds: plain request-response for normal calls, SSE only if you want “push” style messages.
A serverless environment could connect briefly, get updates, then disconnect.
[@pcingola]: This might already be possible with minimal changes, as the protocol is flexible on how SSE is used. Just a matter of clarifying best practices.
They currently do trivial JSON-RPC calls with no subscriptions or advanced notifications. That covers many use cases.
Sampling Without Streaming (borderline idea):
The server can respond to a tools/call with a partial result that says “I need sampling from the client.” The client does it locally, then calls back with “tools/continue,” passing the sampling result.
This means no direct server->client push. The client orchestrates it.
Short-lived SSE:
You can do a single POST for tools/call and the server returns SSE for the duration of that call, streaming partial progress.
In practice, SSE sometimes hits issues with proxies buffering or times out. WebSockets are ironically more likely to pass through reliably in certain hosting/CDN layers.
Also, in Cloudflare land, they can do fancy things with websockets (like hibernating a durable object).
One challenge with websockets in browsers is that you cannot set arbitrary headers for authentication. People discuss subprotocol hacks or passing tokens in the URL.
Another trick: use the protocols parameter in the JavaScript WebSocket constructor to pass something like “Authorization, token,” then deny or accept the upgrade.
SSE vs. WebSockets is partly about library maturity. Both can lose connections, both require keepalive logic.
SSE is a simpler protocol (just streaming text) but definitely one-way. WebSockets is full-duplex but not inherently more stable—just that many frameworks have better built-in handling for WS pings, etc.
Announces a new proposal for a “Streamable HTTP transport” (linking to PR #206), effectively “plain HTTP requests + optional SSE streaming.”
States that WebSockets could still happen later, but for now this approach tries to address the short-lived vs. persistent problem.
Closes the discussion as resolved.
Key Points of Technical Contention
SSE vs. WebSockets
Some found SSE simpler and “pure HTTP,” but others found that ephemeral scaling or certain frameworks (like Rails, or serverless platforms) handle WebSockets better and SSE can cause buffering or proxy issues.
There’s no perfect universal consensus; many see SSE as an easy “push-only” model, while WebSockets are truly bidirectional and sometimes more stable in modern hosting.
Stateful vs. Stateless
Many want a purely “stateless” mode so simple or serverless deployments can just do short calls, possibly with some “job or session ID.”
Others prefer to preserve the advanced features (notifications, server-initiated calls) that rely on long-lived connections.
The compromise: Let the server implement the advanced “live” features if it wants, otherwise do simple request-response.
Server-Initiated Sampling
This specifically drew questions about security, overhead, or necessity. Some participants called it an “anti-pattern,” while others see it as central for agentic workflows.
In a purely stateless environment, sampling might require a “prompt from server -> client,” but the client can simply re-call the server, meaning no persistent SSE or WebSocket is strictly required. Others want a more direct approach (server directly requests a sample on the open connection).
Serverless Constraints
Time-limits (like 15 minutes on some platforms), ephemeral containers, and no guaranteed local storage.
SSE typically needs a single container to remain open, so you can’t easily scale horizontally. Workarounds include external message brokers or bridging.
This push effectively started the push for “HTTP call, ephemeral SSE if needed, but no forced indefinite connection.”
Load Balancing and Sticky Sessions
Many folks pointed out that SSE or long-lived websockets require either sticky routing or a big central state store.
If each request is randomly assigned to pods, it breaks the direct streaming connection unless you carefully route it or you have an external “channel” (Nchan, Redis, etc.) in the middle.
Progressive Enhancement
A repeated theme: “Simple JSON-RPC or HTTP calls for everything by default, let advanced or real-time subscription be optional.”
This ensures that many devs (like WordPress or standard serverless) can skip implementing SSE or a persistent store.
In the end, the official plan that closes out the discussion was “Streamable HTTP,” effectively letting developers do normal POST calls for requests, with an optional SSE-based mechanism (short- or long-lived) if they want partial or real-time updates. WebSockets might still appear, but not mandated at this time.
That concludes the full blow-by-blow with all major technical details and points of contention from the entire GitHub Discussion #102.
MCP’s existing design assumes a long-lived, stateful connection (often SSE).
This works well for:
Notifications of changes (like resource/tool updates).
Server-initiated requests (e.g., sampling).
Real-time logging or agentic workflows.
However, it causes serious friction in serverless/autoscaling platforms where short request lifetimes are common and “sticky routing” is not always feasible.
Key Points of Contention
Long-lived connections: Many see them as powerful (real-time updates, push calls), but they conflict with ephemeral hosting.
Recovery overhead: Some worry about how to restore session or subscription state if a persistent connection drops.
Implementation complexity: Fully stateful servers must handle session tokens or persistent storage. Opponents prefer a simpler, stateless approach for easy scaling and simpler code.
Positions and Arguments
Pro-Stateful:
Those needing advanced agentic features or continuous server->client updates (e.g., watchers of dynamic resources).
They argue it’s essential for real-time collaboration, agent-driven tasks, private “server reasoning,” and easy server->client logging.
Pro-Stateless:
Serverless/back-end folks who want minimal overhead, no sticky sessions.
They claim simpler request-response patterns suffice for most use cases and encourage broader adoption.
2. Transport Mechanisms: SSE vs. WebSockets vs. HTTP
Main Debate
MCP originally relied heavily on SSE (Server-Sent Events) for streaming. Critics found it fragile behind load balancers and with short-lived containers.
WebSockets was proposed as a potentially more robust or “true” bidirectional transport.
Plain HTTP (with short polling or ephemeral streaming) was also favored by some for simplicity and better compatibility.
Points of Contention
SSE Complexity:
Requires special handling in ephemeral or highly scaled deployments (sticky sessions, fallback solutions, etc.).
Some frameworks (PHP, Rails) find SSE trickier to implement natively.
WebSockets:
Often easier to maintain stable connections if the infrastructure supports it.
But not all hosting environments handle WebSockets gracefully; some developers also found them tricky to integrate or test, especially regarding custom headers in browsers.
Pure HTTP Requests:
Viewed as simplest for basic request-response; no persistent socket.
Lacks push from server to client unless you poll or do partial streaming in a single request.
Positions and Arguments
SSE Advocates:
Preferred it initially due to “standard HTTP” status, simpler for unidirectional push, and no separate upgrade step.
Accept it can be combined with short “transaction cycles” (connect for a tool call, then close).
WebSocket Advocates:
Argued it’s more stable across many proxies or CDNs and truly bidirectional.
People from Cloudflare, Shopify, or certain other ecosystems found SSE less reliable in their environments.
HTTP-Only Advocates:
Simple, minimal overhead.
Best for one-shot calls or ephemeral tasks.
Fine for scenarios that don’t need real-time push.
3. Server-Initiated “Sampling” Feature
Rationale for Sampling
MCP allows servers to request “sampling” from the client’s LLM, letting the server do partial tasks or agentic logic, then ask the client for more text/AI completions.
Controversy
Security & Complexity:
Some saw it as an anti-pattern, allowing a remote server to leverage the client’s LLM resources or tokens, potentially raising trust and security issues.
Alternative Approaches:
Statelers or minimal servers might just store a partial “prompt” and have the client re-post, or rely on webhooks.
Others want direct server->client calls for convenience and real-time agent flows.
Positions
In Favor of Sampling:
Proponents building “agentic workflows” that can’t run their own LLM.
They see server->client text requests as essential to advanced usage.
Against or Ambivalent:
Many simply do not see a pressing need or are wary of the security implications.
They would prefer each server handle its own AI or do a simpler “job token” approach (client calls back with results).
4. Deployment in Serverless Environments
Core Issue
Serverless platforms typically:
Limit request duration (minutes).
Offer ephemeral instances.
Often have no guaranteed local storage or sticky routing.
Arguments & Proposed Workarounds
Session Tokens:
One viewpoint: re-hydrate state from a token if a container spins down.
Con: burdens the developer to store or load complex state externally.
Short-Lived Connections:
Use ephemeral SSE or short WebSocket connections only when needed.
Others propose a “job model” (server returns an operation ID, client polls or streams).
Intermediate Gateways:
Some set up specialized brokers (e.g., Nchan, Redis) so SSE or WebSocket connections are anchored in a stable gateway, letting serverless backends scale freely behind it.
5. Proposed Resolutions & Final Directions
Key Solutions Suggested
Stateless + Optional Statefulness:
Provide a minimal HTTP approach for simple calls.
Add an optional SSE or WebSocket subscription for advanced features if desired.
Progressive Enhancement:
Let a server implement “just JSON-RPC / HTTP” or “HTTP + SSE for notifications.”
Possibly add WebSockets down the line for richer bidirectional state.
Streamable HTTP Transport (the eventual consensus):
Plain HTTP requests for tool calls, with the option to do an SSE stream on the same endpoint if you need partial results or notifications mid-call.
Who Championed Each Side
Those Preferring Simplicity/Stateless:
Various serverless platform users (e.g., folks from Vercel, WordPress, or ephemeral hosting) stressed minimal overhead and disliked holding open connections.
Those Needing Real-Time Stateful Features:
People focusing on agentic or advanced collaboration tasks wanted indefinite connections or at least an SSE/WS fallback.
Cloudflare also was open to stateful approaches (Durable Objects) but recognized not all would support that.
Final Spec Maintainers (e.g., [@jspahrsummers], others)
Listened to feedback; concluded an “HTTP + ephemeral SSE” approach is a good compromise.
Left the door open for WebSocket support in the future.
6. Overall Outcome
Most participants agree that forcing one single approach is not ideal.
The new direction: A simpler default (HTTP-based tool calls) plus optional streaming for advanced features.
This approach was formally introduced in a “Streamable HTTP Transport” proposal and closed the discussion with an acknowledgment that WebSockets could be added later if needed.
Summary of Key Takeaways
Stateful vs. Stateless: The protocol will support both, letting servers remain simple or go advanced.
SSE vs. WebSockets: SSE remains in the spec for streaming, but the spec maintainers acknowledge many teams prefer WebSockets and may formally add or refine it later.
Sampling: Remains a point of debate; some see it as vital, others see it as too complex or insecure.
Serverless Friendliness: The final direction aims to let ephemeral deployments work via short requests or ephemeral SSE channels, instead of mandatory long-lived connections.
Original comment in English - Translate to English
Context
MCP is currently a stateful protocol, with a long-lived connection between client and server. This allows us to support behaviors like:
Notifications about changes—e.g., changes to resources, or tools being added/removed. (These can occur in either direction too.)
Server-initiated sampling at any point, enabling agentic workflows.
Passing arbitrary server logging through to the client.
… more stuff in future? …
The connection is restartable with fairly little recovery cost (it's not catastrophic, like losing data), but the protocol is definitely not designed around repeatedly opening a connection, issuing one semantic request, then closing.
Problem
This is fairly limiting for serverless deployments, which frequently autoscale up and down, and generally aren't designed around long-lived requests (for example, typically there's a max request lifetime measured in minutes).
Deploying to a Platform-as-a-Service is really nice and convenient as a developer, so not being very compatible with this model creates an impediment to broader MCP adoption.
Possible solutions
I can imagine a few different answers here, each with their own tradeoffs:
Option 1: encapsulate state into a state or session token
Any stateful interaction over a long-lived connection could instead be modeled as independent requests (e.g., webhooks) by passing back and forth some sort of token that either:
Encodes all session state itself, or…
Merely identifies the stateful session, with server and client having some persistent storage associated with that session token.
Pros:
This is the simplest, incremental evolution from where MCP is today.
Implementable just at the transport layer—protocol and application logic can be (mostly) agnostic to this, I think.
Cons:
Somewhat annoying for servers to implement (and a key design principle of MCP is that servers should be really easy to implement):
Requires complex state serialization/deserialization. The SDKs could do some of this, but probably not all.
May require servers to be deployed with persistent storage.
Option 2: offer "stateless" and "stateful" variants of the protocol
Continue supporting all the behaviors I listed up top, but only when used in "stateful" mode. Offer a "stateless" mode that doesn't have those things.
It's possible that some transports could implement this in a fairly gradated way—e.g., HTTP could be stateful if client -> server can use SSE, but gracefully degrade to stateless by just using POSTed webhooks.
Pros:
Stateless servers could be implemented very simply.
No additional complexity for stateful servers.
Backwards compatible.
Cons:
Adds complexity to the spec and SDKs, and creates bifurcation.
Plausibly adds complexity to MCP hosts, which may need to connect to both kinds of servers.
Option 3: make all of MCP "stateless"
Make sweeping changes to completely revamp MCP into a fully stateless protocol. Drop all features that require statefulness, like those mentioned up top.
Pros:
Simplest for spec, SDKs, and client/server/host implementations (where "simple" here means "fewest concerns," not necessarily "easiest" or "best").
Cons:
Agents and "live" behaviors like subscriptions get significantly more difficult to implement.
Any server -> client communication would need to use something like web hooks (although this is transport dependent). Importantly, this means that the client would need to be internet-accessible, which isn't currently a requirement.
Very backwards incompatible. Massive change right after the initial release.
Thoughts?
I'd welcome all of:
Feedback on whether you think this is a problem worth solving in the first place.
Feedback or questions about the options listed here.
Suggestions for other options I didn't think of and list!
Original comment in English - Translate to English
Claude's thoughts:
Based on the MCP documentation and the problem you've outlined, I'll share my analysis of the situation and the proposed solutions:
First, I agree this is a problem worth solving. The ability to deploy MCP servers in serverless environments would significantly lower the barrier to entry and increase adoption, especially for developers already working in cloud environments.
Let's analyze each option:
Option 1 (State/Session Token):
This feels like the most pragmatic solution while preserving MCP's current capabilities
The main concern is the implementation complexity for server authors, which goes against MCP's principle of being easy to implement
However, this could potentially be mitigated by:
Having the SDKs handle most of the state management complexity
Providing simple default implementations for common cases
Making state persistence optional - servers could choose to only support ephemeral sessions
Option 2 (Stateless/Stateful Variants):
This is an interesting middle ground that provides flexibility
The graceful degradation approach with transports is particularly compelling
However, the bifurcation could lead to:
Confusion about which mode to use
Inconsistent behavior across different server implementations
Increased complexity in host applications that need to handle both modes
Option 3 (Fully Stateless):
While this would simplify the protocol, it seems too restrictive
The loss of server->client communication would significantly limit MCP's capabilities
The requirement for clients to be internet-accessible is particularly problematic for desktop applications and IDE integrations
Additional Considerations:
Hybrid Approach:
What if we kept the protocol stateful but added a "connection pooling" layer in the transport?
The transport could maintain a pool of connections and handle reconnection/state recovery transparently
This could allow serverless deployments while preserving the protocol's capabilities
Servers could optionally implement more sophisticated state management if needed
Event-Sourcing Pattern:
Instead of maintaining direct state, servers could maintain an append-only log of events
State could be reconstructed from this event log when needed
This could simplify state management while preserving functionality
Could be particularly well-suited for serverless environments with managed event stores
My recommendation would be to explore a combination of Options 1 and 4:
Add session token support at the transport layer
Implement connection pooling in the SDKs
Provide simple default implementations for common state management patterns
Keep it optional - servers can choose to be completely stateless if they don't need the additional capabilities
This would preserve MCP's powerful features while making it more accessible for serverless deployments. What are your thoughts on this hybrid approach?
Original comment in English - Translate to English
I don't think "connection pooling" really solves anything. The problem with serverless architectures is that you often don't have good ways to coordinate across instances and do this kind of sticky routing (even if possible, it's annoying).
Event sourcing is a good callout that I didn't think of, though. Main question is what the signal-to-noise ratio would be on those logged events how many of those logged events will be redundant. Another way to think about this: what would be the ideal ratio between "checkpoints" and events in the log? If the checkpoints are very frequent, then just using that as the backing state seems better.
Original comment in English - Translate to English
Also, although my post almost entirely focuses on protocol state, we could also imagine servers that have application state, like stateful agents. Those might naturally lend themselves to a long-lived connection anyway, even aside from the stateful features in the protocol.
Original comment in English - Translate to English
Speaking from Vercel's perspective option 1 is probably best.
You could consider a hybrid version where
You do have a session token
Clients can use it to resume
Servers can say "I have stateful load-balancing and I wont restart, so I'll just keep state in memory", hence not increasing complexity for such servers
But even serverful solutions would benefit from full resumability
I do agree that statefulness is hard to avoid in general (even a trivial stream could be interrupted and ideally the client could resume it), so I would not shy away from it as a general feature–just implement it in a way that is easy for distributed systems to achieve.
Original comment in English - Translate to English
For example, if you want to be notified of remote resources appearing or disappearing, this is effectively state. It could be delivered via web hooks or other means, but that's a question of how to represent it.
Original comment in English - Translate to English
Hi everyone, I've been following this discussion with interest. I already have a working prototype solution for tool invocation, based on a stateful client-server connection, as you can see in the attached diagram.
To truly achieve the full potential of serverless and make it usable in all dimensions, we really just need a robust AAA (Authentication, Authorization, Accounting) mechanism. This would allow both the MCP and the server to recognize:
1A - Who (Authentication) - who is the user or client initiating the connection (or continuing it in the Nth subsequent session...).
2A - What (Authorization) - what resources and operations this user/client has access to (it would be nice to have such a mechanism and have 1000 tools, but not always all tools need to be exposed to the client).
3A - How (Accounting) - how are the resources being used? For example, has there been any conversation exchange within a given tool? What was the result? Was there a previous error? Has the last operation completed? etc.
For me, maintaining context across multiple invocations is actually quite important... one tool writes, another one executes, and a third one analyzes (and all of this on the server side).
I'm planning to use an Agent as a Tool (Swiss Army Knife/Multitool). Although, as of today, the solution is simple because, as a user, I'm working in my own space / the Docker container is entirely for me.
So, if you want MCP to be stateless also - that's a great idea... but I only ask for one thing... please don't abandon the stateful concept because I think that with the AAA function, it would be a beautiful solution for communicating with much more complex scenarios.
Original comment in English - Translate to English
For me, maintaining context across multiple invocations is actually quite important... one tool writes, another one executes, and a third one analyzes (and all of this on the server side).
In particular, it could be valuable to maintain context across multiple invocations within a trusted environment (e.g., the server) without requiring sampling calls to a potentially untrusted LM client. This would enable the server to process intermediate states securely (to "privately think") before exposing final results to the client—an extension of AAA that requires statefulness.
As a motivating extension, a naive form of this exists in ad-tech clean rooms, where advertisers and publishers privately combine data for targeting and measurement with agreed-upon egress rules. This allows compute on data with more manageable privacy and competitive risks. [Seems quite likely similar dynamics will appear in MCP applications.]
Original comment in English - Translate to English
Feedback on whether you think this is a problem worth solving in the first place.
Feedback or questions about the options listed here.
Suggestions for other options I didn't think of and list!
Some thoughts that come to mind are:
Is a max request timeout in terms of minutes an actual problem? If so, then i think it would be worth getting more explicit about the specific challenges that introduces. For example, if efficiency/scalability of data transfer of resuming a session is the key problem, then could be worth getting more specific about that with use cases. (But even if the session length is in days, not sure it would change that problem dramatically)
My assumption is the average duration of an LLM user session is probably also measured in minutes.
Even a stateful transport session can disconnect due to poor network connectivity. Are resumable stateful sessions also a requirement? I would say probably not? But the point being that clients already need to handle reconnecting for many reasons
Taking on session state management at a higher level may have other additional follow on problems e.g. garbage collecting old sessions, managing secondary resources associated with a session, leaking connections, etc. The session will need to have some notion of lifetime/TLL/keep alive, etc then at that point are we back to a similar question about the max request timeout.
My impression is that today you can solve this by building a local MCP server that provides a custom transport to use any protocol. Then another way to think about this problem is improving distribution of local MCP servers. This may be an existing problem to solve anyway.
My take is that it is not clear this is worth solving given the current motivation/problem statement.
Original comment in English - Translate to English
A pretty common use case would be having your MCP server deployed using Kubernetes with multiple pods due to high traffic. The client connects to 1 pod for the SSE connection, but subsequent POST requests get round-robined to any other pod which totally breaks the protocol.
Original comment in English - Translate to English
A pretty common use case would be having your MCP server deployed using Kubernetes with multiple pods due to high traffic. The client connects to 1 pod for the SSE connection, but subsequent POST requests get round-robined to any other pod which totally breaks the protocol.
Sure but what are you proposing? I am aware of thats how SSE works today. The sse implementation in the python-sdk already handles this by encoding a session is in the follow up post urls and associates it with the original request. That doesn't need an additional protocol extension as it's already supported with external state if needed.. unless you are arguing for removing state from the protocol.
Original comment in English - Translate to English
Progressive enhancement could be an option.
At the base level, keep a very simple mental model: tool call === procedure call === JSON-RPC.
If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL (the presence of which can be negotiated with MCP's capability negotiation model).
i.e. Don't tightly couple JSON-RPC with SSE. JSON-RPC is very simple to implement. Don't sabotage this simplicity by coupling it with SSE.
Also, tool calling should ideally be stateless / idempotent: Tool calls are just procedures that take a bunch of arguments and return some result. Application state should only be part of it by including a chunk of grounding data / conversation context as an argument to a tool call. Tools would remain portable/reusable with that. EDIT: Not so sure about this.
AAA can be an envelope/tunnel around MCP and not overcomplicate MCP itself. JSON-RPC and SSE URLs could be signed URLs generated by off-MCP API requests that establish AAA.
Apologies if all this is too terse / abstract. Just wanted to quickly dump things at the top of my mind.
Original comment in English - Translate to English
This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).
Original comment in English - Translate to English
That is where the parameters to notifications.* come in. Bi-directional messaging over HTTP where the message is not in response to a request is difficult. Both SSE and web sockets provide a pattern that works over an established connection; however, if the connection is severed, then all bets are off. We can add parameters that make reconnecting a disconnected session (stream?) possible. I'm going to call it a session for the time being. Stream might be a better way to think of it though.
I was working on a response to the "tool calling should ideally be stateless" comment that I abandoned. There is a similar problem there since many tools need to identify/cache/stash information that is local to the session. Perhaps, creating a session ID in the protocol that is included in messages. It could be optionally included in the capabilities.notifications.SSE param of initialize to resume a disconnected session. The server could create a new session and return the identifier in the response. The same identifier could be passed into tool calling somehow (haven't fleshed out what this would look like yet). The idea is to not necessarily mandate a session storage mechanism but to enable one for clients, servers, and tools that opt-in. Having document level properties would work nicely here.
Original comment in English - Translate to English
This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).
Riffing on the idea, what if the client was informed that it could "check in" with the client state of some sort. The Server could then inform the client that it needs to make a request. There might be some piggybacking that's possible with the expectation of regular interval pings.
In this world, the client can always send standard requests/notifications as needed (using the allowed notifications) to the server. With polling, it can check in with the server to identify if it wants to request anything. This would only happen during the duration of the session as determined by the client - no different from the expectations for SSE/WS. The server can then inform the client that it wants a status update if it's stale on something it needs.
This might also help as a means for graceful degradation for servers as well. If/when there are issues with connection management.
Original comment in English - Translate to English
This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).
SSE, polling makes sense for long running operations. I would like to see support for regular synchronous request / response. I would assume many tools would just return the output right away or within meaningfully small interval.
Original comment in English - Translate to English
I agree , the stateful should be negotiated , and the client should assume it's not available except if negotiated with the server
So that even on "STDIO" transport, you could just call one shot commands ( grep, jq etc.) without needing to add wrapper around them.
On a side note, currently Librechat provide an option to feed the tools with an openapi-spec swagger.io/specification and it's a real time saver , suddenly all the hundreds of Saas with an rest api become accessible without any wrapper
Original comment in English - Translate to English
Big fan here 👋
Option 2 feels right and is in line with the other conversations around authorization that enable multiple paths depending on the servers capabilities. The trade-off being the additional complexity to the architecture itself. That said, this feels solvable by trying to look at the protocol payloads as separate from the delivery mechanism and let the delivery mechanisms abide to a separate contract layer which itself could be decoupled from clients as SDKs. With that the protocol shouldn't bifurcate - assuming we solve sufficient parity.
Original comment in English - Translate to English
What's more is that we should probably decouple the notifications/communication from how to represent state as well. Even for WS/SSE, this will be useful but is a little different to the problem of how do we provide sufficient requesting capabilities for client to server and for server to connected client.
Original comment in English - Translate to English
Thank you @jspahrsummers and others for this discussion.
From my perspective (I'm quite new to MCP, so please correct me if I'm wrong), the most interesting would be
Option 2: offer "stateless" and "stateful" variants of the protocol"
Why?
It keeps the full functionality of the statefull server, for those that need it.
It opens the door for a "simplified" server (with reduced functionality) to be implemented in a stateless manner.
I think stateless enables easier scaling, federation of MCP servers, "tools discovery", and MCP proxies. If we want to create some "HuggingFace" for MCP tools, this would be "MCP Stateless" would make it easier.
I currently see all the AAA layer as completely independnet from the Statefull / Statless discussion.
I'll be happy to help / contribute if "Stateless MCP" becomes a thing.
Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected?
Original comment in English - Translate to English
Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected?
That's how. 🙂 We have a core set of maintainers that we are looking to expand over time; they are responsible for reviewing proposals and offering feedback. Changes to the spec specifically require the approval of multiple maintainers.
Original comment in English - Translate to English
I'm building a hosting platform for deploying MCPs and SSE makes it hard to scale remote MCPs because we can't use serverless.
I did more research into this, and it seems like there's no way to properly route a connection under a protocol like SSE because all the POST requests are independent (REST is stateless, after all). So if you scale up any server to multiple replicas (even in a non-serverless way using VMs or Kubernetes), it's a pain to figure out which spun-up instance to route the messages to.
Actually, statefulness isn't the issue here - it's SSE. One way to side-step this is via gRPC or WebSockets due to how they retain the connection on subsequent requests (there's a sense of session affinity). Is there a reason why WS or gRPC wasn't chosen as the primary transport and SSE was chosen instead? Just want to fully understand the motivations. @jspahrsummers
Original comment in English - Translate to English
iirc the reason we went for SSE over websockets/gRPC is because SSE exists within standard HTTP, and we figured that adopting websockets on average would probably be a bigger lift than supporting SSE for existing web stacks.
It's a good point regarding additional complexity of routing of subsequent requests back to the container/instance that is holding open the SSE connection. This is another complexity/barrier to the deployment of servers. Some off the cuff ways to solve this:
* Use something like redis to route messages to the correct places
* During the configuration of the SSE transport, the server specifies the endpoint in which it will listen for messages in the session - this endpoint would be used to route back to the correct server instances
That being said, I think stateful/stateless discussion is still relevant - as supporting long lived websockets in a webapp would still necessitate solving all the same issues with stateful/long lived connections
Original comment in English - Translate to English
iirc the reason we went for SSE over websockets/gRPC is because SSE exists within standard HTTP, and we figured that adopting websockets on average would probably be a bigger lift than supporting SSE for existing web stacks.
It's a good point regarding additional complexity of routing of subsequent requests back to the container/instance that is holding open the SSE connection. This is another complexity/barrier to the deployment of servers. Some off the cuff ways to solve this: * Use something like redis to route messages to the correct places * During the configuration of the SSE transport, the server specifies the endpoint in which it will listen for messages in the session - this endpoint would be used to route back to the correct server instances
That being said, I think stateful/stateless discussion is still relevant - as supporting long lived websockets in a webapp would still necessitate solving all the same issues with stateful/long lived connections
Thanks for clarifying the motivation behind SSE.
I think having both options (WS/SSE) officially documented would be great, since I think for practical purposes WS is more efficient and designed specifically for these long-lived bidirectional connections and avoids requiring server-side complexities.
Plus, I noticed WS is already in the SDKs, so it would be good if that's formalized.
Original comment in English - Translate to English
MCP servers can already be seen as a proxy that requires new development. I’m not convinced of the argument that SSE is preferred for simplicity of being native to HTTP.
WS seems far more natural for MCP and is (relatively) trivial to add to SDKs whose purpose is to already abstract development of MCP servers.
I’m surprised this hasn't received more discussion (catching up so maybe it does further in the discussion)
Original comment in English - Translate to English
I've been mulling this over a bit and wanted to share my (candid and somewhat rambly) thoughts on this.
A bit of a recap of the problem
The key issue with the statefulness is the scaling characteristics of long lived connections / inability to use serverless deployments. There is also an issue with the SSE transport where the "side channel" post requests need to be routed to the server instance holding open the SSE stream.
The reason we have a stateful bidirectional protocol is to enable some really nice features (quoting justin):
Notifications about changes—e.g., changes to resources, or tools being added/removed. (These can occur in either direction too.)
Server-initiated sampling at any point, enabling agentic workflows.
Passing arbitrary server logging through to the client.
… more stuff in future? …
I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication). It's still very early days, but most servers and clients aren't properly leveraging these features. I suspect this is because they are harder to implement, and there aren't many good examples of clients in the wild that support the features.
It's important for adoption that we don't add undue complexity/friction to client and server developers early on, but it's also important that we don't close doors on the aspects of the protocol that will enable the long-tail of great features.
The direction I'm currently leaning in
I really like @atesgoral's approach of progressive enhancement:
If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL
I feel like we could update the SSE transport (or just make a new transport) where:
All client->server messages go through HTTP POST requests (including initialization) and the responses i.e:
→ POST body contains:
{
"method": "tools/call",
"params": {
"name": "string",
"arguments": {...}
}
}
← Response 200 contains:
{
"content": [
// Array of TextContent, ImageContent, or EmbeddedResource
],
"isError": false // Optional, defaults to false
}
→ POST body contains:
{
"method": "resources/read",
"params": {
"uri": "resource-uri"
}
}
← Response 200 contains:
{
"contents": [
// Array of TextResourceContents or BlobResourceContents
]
}
(Note: In the current SSE implementation all server->client messages come through the open SSE channel)
And all server initiated messages (i.e. notifications and sampling requests) come through an SSE stream that the client can optionally subscribe to.
The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections.
Then, when server implementers and clients decide to implement the richer stateful features, they can implement the SSE channel and tackle the scaling implications.
These SSE channels could also be best effort, and it's okay for them to occasionally disconnect (i.e. when a deployment occurs).
Pros:
Easy for server implementers to deploy MCP at scale
Only tackle the complexity of long-lived connections if they want/need the additional features
Don't close the door to the richer bidi features of the protocol
Cons:
Most servers won't support the richer features
Some additional complexity of the transport implementation for clients
There are probably other issues with this that I haven't thought through
Original comment in English - Translate to English
What's more, the real-time notification URL could also be a ws: URL, telling the client to either:
Open a unidirectional WebSockets stream for notifications
Move over to bidirectional WebSockets entirely. This is fun because it will effectively be a 2-step upgrade, from JSON-RPC to HTTP to WebSockets (using the Upgrade/Connection headers).
Original comment in English - Translate to English
This direction makes sense to me! I think this threads the needle well between simplicity while still supporting these features that we believe will be important—especially for agents.
Perhaps we could make SSE support a stronger requirement on servers, but allow disconnecting it at any time? Then a short-lived interaction but still supporting bidirectionality could look like:
Client periodically sends requests/notifications to server via HTTP POST
Client periodically connects to server over SSE
Server delivers any "queued" requests/notifications over SSE
If the server doesn't wish to support a long-lived connection, it then cleanly disconnects the SSE stream
Original comment in English - Translate to English
I tihnk we all tend to agree that the crux of the server scaling problem is not as much about "state", but more about "long lived connections" (i.e. SSE).
As @jerome3o-anthropic, @jspahrsummers, @atesgoral , and others mentioned, if the SSE connections are "relatively short", and the servers are "allowed to disconnect" (i.e. best effort), the scaling issues should be minimized.
It seems to me the we are reaching the conclusion that SSE connections should be alive only during a "transaction cycle", e.g. request for an "agent / tool", and the agent uses sampling capabilities. Within this back and forth (which may last a few seconds, or up to a couple of minutes), we keep the SSE channel open, but then it's closed when the end of the "agent / tool cycle" is reached.
IMO the protocol "as is" allows this behaviour (perhaps with some minor additions to #153). The changes in specification are probably in the clarifications and examples on "how to implement" the client and server. Some changes will also be needed in the reference implementations, which I'm happy to contribute to if there is a decision to move forward.
Original comment in English - Translate to English
This is a very good direction!
At Automattic we are trying to use MCP for WordPress.
I managed to hack together native WordPress support for the SSE approach, but because PHP is kind of stateless, I had to:
Throw in a bunch of php,.ini directives to hack together a working SSE connection
Store responses in the database
Read them in a loop in the SSE endpoint.
This is very sub-optimal and I would love for the HTTP transport to just be query/response. The current SSE implementation frankly sounds like the STDIO approach ported directly to the web. HTTP/REST is a stateless protocol and yet there is a lot of hacks to make it behave statefully by using databases, memcache, etc etc.
The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections.
YES PLEASE
I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication)
Cons:
Most servers won't support the richer features
@jerome3o-anthropic I don't think the current SSE implementation is helping here. Again, it looks like a direct port of STDIO approach to the web, which is not how most web apps seem to operate.
The agentic workflows and larger systems will require long-running intermittent connections to operate and the way to do those would be to introduce
Webhooks
Message endpoints that would be polled /checked periodically.
Taking inspiration from payments
Come to think of it, these workflows could have similar latency to payments flows, where authorizing a credit card could take up to a minute, invoicing could take minutes and renewals could take years - its a mix of short-and-long term connections.
Then a web-based client could send a webhook, or poll the server for an event.
I know this all requires clients to be a little more complicated, but I also think we'll end up in a world whree there is more servers than clients. thus we should optimize a bit for the ease of use of servers vs the clients.
In any event, your proposal of the SSE channel being optional is great.
Original comment in English - Translate to English
Hello, I have been following the discussions with great interest and I really like this approach too.
A few remarks about the server response in a stateless situation as described in the OP (maybe it's already covered somewhere else that I missed).
First, I think we should still use the JSONRPC format for the response instead of using another format.
Also, instead of only sending the final response directly, I believe it would be useful for the server to be able to stream back messages to the client in order to inform of progress during long running operations.
I understand it's somehow still server side events but in a transactional / serverless use-case. People using LLM are used to it as most LLM api are streaming.
It could be configured by the client header (Accept: application/json-rpc-stream) or in clientCapabilities, if not supported, the server would just discard all notifications/* messages.
(Maybe it's just simpler to always stream the answer)
Original comment in English - Translate to English
Thank you for this discussion!
If I'm understanding the current spec correctly, I think there are two categories of server>client communication to solve for over short-lived and/or interruptable connections, but today they are not distinguished between each other in the spec. I'm wondering if they should be, and if they should happen over distinct connections between client and server, instead of over one monolith streaming connection.
My rough stab at how that might look, without perfectly understanding the spec today:
Category 1. Notifications about changes to what the server can provide to the client
Examples: Resource/prompt/tool list changes, resource content changes
Use case: As the client application, I need to keep track of the resources/prompts/tools that a server can provide to me, so I can reason about using those resources/prompts/tools and/or present that list to the user. Streaming notifications from the server help me keep my local list of resources/prompts/tools up-to-date in real time. If I get disconnected, I can re-build my local list of resources/prompts/tools by calling the server's List/Get APIs, and then connect to a stream for updates. If a server does not support streaming updates, I can poll the server's List/Get APIs periodically to keep my local list up-to-date.
For servers that don't support streaming (or clients who don't want to stream):
Client periodically sends requests/notifications to server via HTTP POST
For servers that support streaming:
Client discovers server's resources/prompts/tools via HTTP POST
Client subscribes to change notifications over SSE
Server delivers change notifications over SSE
SSE connection disconnects (this could be a network issue, the server going through a deployment or scale down, etc)
Repeat 1-4
Category 2. Requests/notifications that are (hopefully?) directly related to some work that the client requested
Use case: As the client application, I want to use prompts, tools, and agents from a server. In the course of completing my request to the server, the server may need additional information from me (like LLM samples). Or, it may want to send me occasional updates like progress notifications and logs. I establish a bidirectional communication stream with the server, so that the server can send me the information and requests it needs to complete my work. If the stream is disconnected mid-way, the server may not be able to complete my request and I may need to start a new request.
In the spec today, there doesn't seem to be any kind of a "session ID" or "job ID" associated with a request that might take a while to complete and might require some back-and-forth communication. For example, sampling requests and progress notifications from server>client don't seem to be directly associated to the original tool call request initiated from client>server. It seems like today it is technically valid for a server to spam the client with sampling requests and root requests over the long-running connection, without the client ever actually using the server.
Let's assume that some kind of session ID is introduced that is assigned to requests from the client for using prompts/tools/agents. For certain types of server>client requests, they must be within the context of a session ID. The original request from the client can be upgraded to a stream for bidirectional communication for that session only. The server completes the session when it has completed the requested work. (I think this pattern is similar to the "transactions" @pcingola was describing in his comment above)
For servers that don't support streaming:
Client sends a tool call request to server via HTTP POST.
Connection is held open until the server returns the result.
If the connection breaks in the middle, the client must send a new request.
For servers that support streaming:
Client sends a tool call request to server via HTTP POST
The request is upgraded to SSE
Server delivers requests/notifications to the client via SSE
Server delivers final result over SSE
Server closes the SSE connection
If the connection breaks in the middle, the client must send a new request.
Optional: For servers that persist session state (for example, by session ID):
For non-streaming servers, the client could poll the server by session ID for any requests the server has for the client.
If the connection is broken (for either streaming or non-streaming servers), the client can make a request to get the results of a session ID. The result comes back immediately if the session is already complete. The request is resumed if the session ID exists. The behavior then depends on whether the server supports streaming, as above - either the HTTP request is held open until the result is ready, or a stream is started for bidirectional communication.
Original comment in English - Translate to English
At Shopify, we're so far mostly using what we call "MCP Lite": Just regular, transactional (POST and get the result in the HTTP response) JSON-RPC, and often just implementing the MCP tools/call method. This meets most of our current use cases since we have a fledgling ecosystem of internal tools with no immediate need for dynamic server/tool discovery. But that need is fast-approaching!
We have in fact done a PoC implementation of the JSON-RPC-SSE transport when it first came out, but as others in this thread have pointed out, it's awkward to implement: In podded deployments we are forced to use an inter-processes message passing mechanism to link the JSON-RPC POST request to the SSE stream.
I proposed progressive enhancement above, without pictures. Time for some pictures.
Selective notification subscription
"MCP Lite", using plain JSON-RPC. No SSE in sight. Very simple for adoption:
Loading
sequenceDiagram
participant C as MCP Client
participant S as MCP Server
C->>+S: POST JSON-RPC tools/call
S-->>-C: tool result
Important points:
initialize can be called for capability negotiation, but maybe there are smart/assumed defaults that make this step optional
tools/list can be called to discover tools, but if servers talking to each other already know what tools exist at the destination, they can skip this
Discovering notification URLs during initialization, splitting the MCP Server's JSON-RPC and notification endpoints for clarity:
Loading
sequenceDiagram
participant C as MCP Client
box MCP Server
participant J as JSON-RPC Endpoint
participant N as Notification Endpoint
end
C->>+J: POST JSON-RPC initialize
J-->>-C: Notification URLs
C->>N: Start streaming from a notification URL above
N-->>C: event 1
C->>+J: POST JSON-RPC tools/call
J-->>-C: tool result
N-->>C: event 2
Important points:
The notification stream returned from initialize can be zero or more transports supported by the server. It could also be a subset based on a capability query from the client, making it easy for the client to pick the first one that it ranks as preferable. HTTP poll URL, SSE URL, WebSocket URL, Kafka URL, ...
The JSON-RPC tool/call is completely unhindered by any notification streaming happening on some other pipe. They are on different planes of existence.
Original comment in English - Translate to English
Sampling without streaming (borderline crazy idea)
In an "MCP Lite" world (see above), how can MCP-server-initiated sampling work?
Borrowing from HTTP, where servers can emit different response codes to ask clients to take certain actions (e.g. provide credentials, redirect away and forget this URL, I'm busy backoff, etc.) the tools/call method could return a special result to prompt the client to process sampling, and then return the result to the MCP server as a "tool call continuation":
Loading
sequenceDiagram
participant C as MCP Client
participant S as MCP Server
participant U as User
participant L as LLM
C->>+S: POST JSON-RPC tools/call
S-->>-C: sampling request, continuation payload
C->>+U: Get user approval
U-->>-C: Go ahead
C->>+L: Perform completion
L-->-C: Completion
C->>+U: Get user approval
U-->>-C: Go ahead
C->>+S: POST JSON-RPC tools/continue
S-->>-C: tool result
Assumption: The MCP Server will never send an unsolicited sampling request to the client, but these will all be as a response to tool calls.
Abstractly, this treats the tool as a finite state machine. When sampling is needed, the state of the tool is bounced back to the client and the client can progress the state of the tool by passing it the state + completion to transition the state back to running. This "state" could simply be a tool call reference if the MCP Server is stateful and can persist the paused tool state on its side.
Original comment in English - Translate to English
Short-lived SSE as JSON-RPC response
I think others might have suggested or alluded to this already. Focusing on tool calling only:
POST to the JSON-RPC endpoint, get back an SSE response. The stream only lasts for the duration of a tool call.
Simple tool response over a single SSE event:
Loading
sequenceDiagram
participant C as MCP Client
participant S as MCP Server
C->>+S: POST JSON-RPC tools/call
S-->>-C: tool result over SSE
Certain implementations may support tools emitting intermediate diagnostic events or progress events usually meant for rendering on the UI.
Loading
sequenceDiagram
participant C as MCP Client
participant S as MCP Server
C->>+S: POST JSON-RPC tools/call
S-->>C: reticulating splines
S-->>C: modulating frequencies
S-->>-C: tool result over SSE
A tool can also emit one or more sampling requests over SSE (even at different times in its processing cycle) and the same continuation mechanism in my post above can be used to resume the tool when all sampling is completed.
Original comment in English - Translate to English
This is exactly the approach we're taking for the MCP SDK which we're currently implementing in @http4k and we think it hits a good balance and is consistent from a client POV. This would be great as a standardised fallback for simple HTTP usage in the spec - we still require an accept header of text/eventstream content type. The only thing that is of question for us would be a standardised (default) URL convention on which this endpoint would be hosted. I can think of the following schemes:
Reusing /sse - this would be consistent, but would prohibit both request schemes being mixed on the same server.
Use /messages but with an accept header of text/eventstream to disambiguate it from the established mechanism. This of course could potentially break existing clients, but is HTTP compliant.
Using the method name as the endpoint path (eg. /tools/call). This would be easy to do but would be duplicative with what is already in the JSON RPC message received and I'm not sure it adds any value.
Modifying this to MCP's JSON-RPC protocol would be relatively straightforward.
For simple tools:
tool/call -> immediate result
For long-running tools:
tool/call -> returns Operation reference (job_id)
operation/stream?id=xxx -> SSE stream of results
operation/cancel?id=xxx -> cancel operation
operation/get?id=xxx -> get current state/result
This gives you a stable job id that you can cancel and reconnect to regardless of connection stability.
This is slightly slightly more complicated than just upgrading to SSE on the initial call, as proposed in previous comments, but is easy to understand. I guess you could also support upgrading to SSE directly if optimizing was a priority, but conceptually there is a job.
%% Simple Tool Flow
Client->>MCP: tool/call (simple tool)
MCP->>Tool: Execute simple tool
Tool-->>MCP: Result
MCP-->>Client: Immediate response
%% Long-running Tool Flow
Client->>MCP: tool/call (long-running tool)
MCP->>MCP: Create job record
MCP->>Tool: Start job execution
Note right of MCP: MCP tracks job state
MCP-->>Client: Return Operation reference (job_id)
Client->>MCP: operation/stream?id=xxx
Tool-->>MCP: Job progress updates
MCP-->>Client: Stream updates via SSE
%% Optional Get/Cancel Flow
opt Get Operation State
Client->>MCP: operation/get?id=xxx
MCP-->>Client: Current state/result
end
opt Cancel Operation
Client->>MCP: operation/cancel?id=xxx
MCP->>Tool: Cancel job execution
Tool-->>MCP: Execution cancelled
MCP-->>Client: Cancellation confirmed
end
If there are needs for other types of notifications than job progress updates that seems like a separate Events API. I'd lean toward making that be done via reliable webhook delivery vs. a single long-lived SSE connection.
Original comment in English - Translate to English
I also believe Option 1 makes sense as a way to disconnect sockets from sessions.
I have a transport proposal #182 that goes into some of the implications of bi-directional traffic, such as the lifecycle and security.
From the discussion post, I would argue that we can keep the management of session context and state management up to the server to decide upon.
We can enable flexibility by being generous and undescriptive in the "session id" format.
Whether a server wants to encode data or include a Secure Access Signature as a pointer, this should be opaque to the client.
In terms of extensibility, a few additions that could be great but are not required to solve the long-running sessions:
we would not lock ourselves out of defining extensibility points for clients and servers to support an additional SSE "subscribing" channel to listen on a session.
Potentially allow the ability to "return early" with a response body when full async is not required for a given call.
When a server always returns in the same http request, this would essentially define a stateless server. The server maintainer could decide at any time to start leveraging the callback URI as the server evolves and starts operating more asynchronously.
Original comment in English - Translate to English
Coming to this thread a bit late, but speaking for Cloudflare Workers:
Statefulness is just fine for us. Durable Objects are all about handling stateful protocols. The original stateful MCP protocol over a WebSocket transport should be a great fit for MCP servers built on Workers.
A protocol involving session IDs would also be OK -- it's trivial for Workers to route requests with the same session ID to the same Durable Object, where its state is tracked. The main problem is lifecycle: if the MCP client disappears without explicitly ending the session, how does the MCP server decide when it can clean up? WebSockets are nice because you naturally clean up when the connection is closed. So MCP servers built on Workers would probably prefer a stateful WebSocket-based protocol, but could also live with session IDs.
I am not sure how a session token that "Encodes all session state itself" would work exactly, but it sounds like complexity that wouldn't benefit Workers users.
Original comment in English - Translate to English
Speaking selfishly (because I'm maintaining a serverside SDK for MCP 😄):
WebSockets are nice because you naturally clean up when the connection is closed
Overall, we're bearish on Websockets over SSE or straight HTTP because of the inherent complexity - we've found they don't really play nicely (and definitely not with serverless workers).
The main problem is lifecycle: if the MCP client disappears without explicitly ending the session, how does the MCP server decide when it can clean up?
For SSE session termination, we've implemented a regular "ping" event which is sent from the server through the SSE. On failure, it just cleans up the server-side state.
Original comment in English - Translate to English
FWIW, in Cloudflare Workers, WebSocket is very much preferred over SSE. We have a lot of infrastructure around WebSockets that is missing for SSE, like Durable Object hibernation (which allows the server to shut down while the WebSocket is idle and start back up when something interesting happens, without losing the connection).
More generally (not specific to Workers), I find that trying to stream events over plain HTTP (in either direction; requests or responses) doesn't work well because proxies commonly assume that they can safely buffer the body bytes, and there's no standard way to tell them not to. Whereas when the proxy sees a WebSocket, it knows that obviously it should not buffer WebSocket messages. So WebSockets ironically tend to have fewer compatibility issues with CDNs and other middleboxes.
Original comment in English - Translate to English
I find that trying to stream events over plain HTTP (in either direction; requests or responses) doesn't work well because proxies commonly assume that they can safely buffer the body bytes
got bit by this same issue and it led to a long night of pleading and poking at settings to get a Google Cloud Run app to play nicely and stream over SSE.
WebSocket is very much preferred over SSE
Same from the Shopify standpoint / Ruby + Rails ecosystem. The POST request and side-channel SSE dance is a pain to implement and there isn't a lot out of the box for this. ActionCable is much easier to adopt and more mature; will be easier to integrate as a standard transport in the Ruby SDK.
Original comment in English - Translate to English
This is super useful information.
One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.
Original comment in English - Translate to English
Going back to the motivation behind my suggestion above: After a regular POST initialize handshake, the returned WS stream URL can be a signed URL that can embody anything from authentication to some sort of session state (and thus no need for custom headers).
Original comment in English - Translate to English
This is super useful information.
One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.
What I usually recommend is to use the protocols parameter from the WebAPI and send the Authorization and the token as subprotocols.
So you can denial the websocket connection, and send a 401 HTTP response, because the connection was not upgraded yet.
Disclaimer: I maintain uvicorn and starlette. The server dependencies of the MCP package in Python.
Original comment in English - Translate to English
This is super useful information.
One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.
From the JVM perspective - so speaking entirely selfishly since I operate mostly in that space🙃 - websocket server implementations are fairly badly featured/inconsistent/untestable/trickier to secure, so I believe requiring them would definitely hinder adoption in that space. Our current SDK model is to provide standard protocol options for SSE + WS + straight JsonRpc (no streaming) + StdIo.
That said, a regular POST plus redirect to Websocket makes good sense here from a usability and consistency viewpoint - ie if you want streaming then expect an endpoint to be returned from the initial call.
Original comment in English - Translate to English
It's indeed annoying that the in-browser WebSocket API does not allow you to set headers. IMO they should allow you to set the Authorization header at least. But we don't control that.
I suspect most non-browser WebSocket implementations will let you set headers, since the specific security concerns motivating this restriction don't really apply outside a browser. Certainly Cloudflare Workers supports setting headers on outgoing WebSockets. Do we expect MCP clients to be running in-browser or do we expect this to be server-to-server communications?
The "recommended" approach is of course to send credentials as the first message on the WebSocket. Of course, this doesn't always fit nicely into HTTP frameworks that would like to do authorization as middleware.
A simple compromise is to just put the token in the URL. This might have some security concerns due to the fact that some systems may assume URLs are not secrets and might decide to log them and such, but perhaps that can be mitigated by using a single-use short-lived token? (I think this is what @daviddenton is suggesting, the initial POST would be authenticated and then redirect to a single-use secret URL.)
I haven't heard of @Kludex's approach before of using the protocols parameter, but that does seem like a neat hack and I can't think of a real problem with it!
My feeling right now is: Support the regular Authorization header and also support stuffing the token into the protocols as a work-around for browsers. Show what we've done to browser-makers and hope that it bothers them enough that they just start supporting setting the Authorization header directly in the API.
Original comment in English - Translate to English
we're working on solving internal operations things over at SST for our users and letting them ship tools in a lambda is super important. it becomes a no brainer vs something they have to think about if it has to be containerized
option 2 is obviously the simplest for us - and we actually already built this in the short term so we can get moving. bridge mcp server that can talk to a stateless implementation of the mcp protocol hosted at some url
Original comment in English - Translate to English
For client -> server - Just remove the SSE transport from the spec and have everyone use stdio. Developers are free to implement any protocol they wish to connect to their web service and then expose the client as an MCP-server.
This is the "paving the cowpaths" way, it's what most MCP servers in the wild are already doing (e.g Dax's comment), and it leaves developers to come up with the best solution for their needs. This also leaves the door open to future standardization on (possibly multiple) protocols more suited to client -> server.
(MCP Host / Client) --- MCP/STDIO ---> (standardized MCP Server/Web Client) --- Some new thing ---> (Web Service)
The SSE transport could still be used - but now via a standard client npx mcp-see https://my-cool-tool.ai/mcp. Similar clients could be made for websockets etc.
I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks.
Original comment in English - Translate to English
I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks.
For server->server we were working under the assumption that each server can simply connect to the other as a client - that would be simple and mean that each participant had access to the full range of tools from the other.
Original comment in English - Translate to English
Good morning, folks!
Maybe I’m too unfamiliar with this subject to offer a fully informed opinion, but I can share my experience with MCP as a developer user.
From my perspective, I’d go all in with HTTP requests. It could significantly increase the number of available servers since it opens up opportunities for people to monetize them. In my experience with MCP, a single request is usually enough to get what I need—I don’t have to listen for ongoing updates. This makes synchronous communication simple to implement and straightforward to use.
I suggest keeping the current SSE approach but adding this new HTTP-based option, each with its own pros and cons. The server’s developer can then decide which protocol best suits their needs.
Original comment in English - Translate to English
For the perspective of those of us building out MCPs SDKs, adding a straight stateless JSONRPC (without negotiation and hence no notifications or sampling), was actually quite simple, as was WS - so it seemed a shame not to speculatively add it. 🙃
Original comment in English - Translate to English
From my understanding, the biggest issue with supporting standard HTTP endpoint calls is that there isn't a means for the server to do sampling, same connection resource change subscriptions, or general server-initiated communication. But what if that is the trade-off we're asking for the spec to make? Every MCP server that wants to do server-initiated communication should use SSE/WS style transports. This is just the common expectation for the web in general - you have to either implement polling/SSE/WS to do it or you're looking at client-initiated transactions only.
I can only think of a handful of use cases that would want to support sampling, but countless that only want to expose their capabilities reliably using the systems they understand today. Will this change in the future? Maybe .. but we should build a system to support developers today with the option to evolve it to support future capabilities. This is why there is a capabilities negotiation layer.
Original comment in English - Translate to English
I agree that looking in the ecosystem today it seems like >90% of all MCP servers are doing stateless things that do not require subscriptions or ongoing connections. Most of them are tools or prompts.
To that end, it seems like Option 2 could be pursued -- capabilities negotiation can determine whether a stateful connection is required, and servers and clients both can avoid the added complexities of persistent connections if they don't need it. Option 1 could be pursued in parallel to make it easier to build stateful connections on stateless infra (along with e.g. a webhook transport spec).
Original comment in English - Translate to English
Exactly. Consider game creation in AI coding agents. It'll be pretty wizzy for the agent to be able to convert a user's plain English request into media asset discovery / generation & procurement, directly into the project. At best it's stateless, and at worst it needs an auth token for context just like normal web services.
If we stick closely to stateless HTTP (w/ WebSocket upgrade), then MCPs are "just another kind of server" from the perspective of quite a bit of infrastructure. This is a very valuable property, and it's hard to find any MCP constraints that make it so special that it needs a special kind of network protocol.
Original comment in English - Translate to English
Personally, I'd go with Option 3.
MCP is supposed to make it easy for AI agents to integrate with tools and resources. This is a data integration problem. The industry standard for integrating data across platforms are REST APIs. This is what 99% of companies will already have up and running. The burden of integration for MCP is largely on the server developers - and expecting them to not only create a new set of endpoints but to run their software in an entirely different way (requiring long-running servers) feels absurd to me.
You could argue that it is to support additional capabilities. But the two main capabilities I am seeing above are 'sampling' and the server informing the client about updated resources/capabilities. The latter is easily solved 90% of the time by the client polling the server - and for the last 10%, the server can simply reply with a 400-level error.
As far as 'sampling' - I believe this is an anti-pattern and should be out of scope for MCP. If servers need AI capabilities to properly respond to tool/resource requests, they should implement that behind their API. They shouldn't have to depend on unpredictable AI capabilities of an unknown client. I don't think this capability should even be something that servers should be able to do. It creates security issues where servers can covertly request sensitive data that clients may have. It also adds unnecessary risk for client developers since servers can effective utilize the client's AI tokens. I'm not sure why a client developer would even build support for sampling given these concerns (what do they really have to gain?) - speaking of which, none of the current documented clients have support for sampling: modelcontextprotocol.io/clients
Any other more complex server-client interactions should be handled by multiple separate tool/resource calls.
In my opinion, a stateless version is an absolute must. Many developers are using serverless solutions and long-running servers/connections are a non-option for them. So at a minimum, we should go with Option 2. But I would go a step further and simplify the protocol by removing features which (in my opinion) shouldn't be there in the first place.
3 A-D-I-T-Y-A, dzhechko, and ysm-dev reacted with thumbs up emoji 1 r03t reacted with thumbs down emoji 2 MichalKalita and ysm-dev reacted with heart emoji 2 asaxon and ysm-dev reacted with eyes emoji
For those suggesting there aren't security concerns with it.. the security concern with sampling is not with the communication mechanism; it's with the authorization, scoping, and access controls that are not solved with it yet when it comes to a remote server reaching into a local computer. This area of security controls is less of an issue for programs talking to one another on the same system.
Where I don't agree is that we don't have to throw out the parts that aren't universally applicable because they aren't universally needed. There is utility and opportunity to leverage these systems in the future - specifically after these security concerns have proper answers.
I don't think Option 3 is really a viable option - not impossible but has massive negatives. MCP has already gained a lot of traction and removing these is not only going to be a substantial impact to the community (very early as well) and it's not necessarily for the best. Option 2 is what I believe is the right path, support simple paths while also supporting existing implementations. The "complexity" of this should be easily removed if the focus changes from MCP being concerned with the end-to-end process and focusing on payloads and transports as separate parts. As a client or a server, you should be focusing on payloads (probably generating these) and the transports sort themselves out in the negotiation phase. If it's SSE vs WS vs Streaming vs RPC vs ... it shouldn't matter so long as the system can respond to events and provide the correct structure.
Original comment in English - Translate to English
I agree that if sampling is truly a core part of the protocol, I would like to see official support and/or examples. As of now, I don't feel confident in the value prop of sampling especially given the tradeoffs we're discussing in protocol implementation.
That being said - "a simplified OpenAPI alternative with a few standardised endpoints." is exactly what I'm looking for, especially if it means I can pull in 3rd party implementations easier. I see managing the LLM as a separate task from managing the resources a semi-agentic LLM can access. In fact, I prefer that be decoupled for ease of building model-agnostic solutions. The reality for me (and the org that I'm a part of where I'm trying to push MCP) is that the alternative to MCP is building a custom set of APIs to centralize tool/prompt/resource management to pair with our various LLM deployments and interfaces.
Original comment in English - Translate to English
It's not clear to me why sampling can't be implemented in a stateless manner via webhooks. Ideally the server sends the client a sampling request with the session ID and some kind of sampling ID and the client can decide how it wants to handle it, either ignoring it or satisfying it according to its own logic. The client's response can include the sampling ID and the session ID which can allow the server to recover any state-related information upon receiving the response. Are there any flaws in this approach?
Ideally, if a tool requires sampling, that could be specified up front in its definition structure so that application/agent developers can reason about it explicitly.
Forgive me if this is a dumb comment, I'm currently doing a review of MCP as part of a broader literature review on agent communication protocols and am still ramping up on the space.
Original comment in English - Translate to English
It's not a dumb comment and it's true, however, the spec isn't oriented around these stateless approaches. Another challenge is that local computers would require a webhook to execute this as well. That itself is not an insurmountable problem but would open up a lot of new problems for allowing servers to callback into local clients. For web
Original comment in English - Translate to English
Sampling is the one part of the spec that didn't make sense to me. What are some examples as to why you would use sampling instead of treating the tool as an independent sub-agent with its own access to run a model and context? If it is simply about saving the tool from configuring/running a model then I wonder if the MCP Host could provide a signed URL that the tool could make a simple request to.
Original comment in English - Translate to English
Claude provided translation:
@jspahrsummers
In my production practice, I've encountered similar issues. I think I've found a relatively good abstraction method (which should follow pattern one) that allows the mcp server to maintain state through session_id while also achieving good scalability using traditional methods.
Re-abstract a Transport based on nchan.io
Unlike SseServerTransport, this Transport actually exists in front of the mcp server as a kind of Gateway in the form of an nginx module, supporting both sse + websocket
Forward ClientRequest from mcp client to the backend mcp server through nchan_publisher_upstream_request
Implement an HTTMCP that inherits from FastMCP (can use built-in decorators to define tools/resources...)
Abandon the built-in server.run method to start, but instead register all request_handlers to an APIRouter
Finally, mount one or more mcp servers on the fastapi app and deploy them as ordinary web APIs.
System availability:
nchan is professional software for IM communication that can support relatively high concurrent access capabilities
nchan exists as both a Transport and a Gateway, with its underlying technology based on nginx, which is also a high-performance web server.
Here, nginx's upstream is used to communicate with the real mcp server. The communication protocol is http, and the backend can use any traditional web server to develop mcp functionality.
System scalability:
nchan supports horizontal scaling using redis cluster for storage.
Here, nginx upstream is used to communicate with the mcp server, and the backend mcp server can also be easily scaled by adding load balancing.
Original comment in English - Translate to English
I drew a sequence diagram, which should illustrate the entire process well:
the session_id (which is equivalent to nchan_channel_id) will remain from the start of the connection until the connection ends.
If the task has a short execution time, it can return directly.
If the task has a long execution time, it can be executed in an asynchronous queue, only requiring progress to be pushed in between and the final result to be pushed at the end.
Original comment in English - Translate to English
I'm late to the conversation and just a random guy who started using and implementing MCPs recently.
Option 1 makes the most sense to me, and the con could be mitigated since I think the state management could be handled as a server-agnostic persistence wrapper that the serverless containers would use.
Basically with a session state structure like this:
Original comment in English - Translate to English
Of course if the protocol would evolve to, for example add something like a "HEAD" style requests for lists and resources that returns only a last_updated_at timestamp then some of the bookkeeping inside the wrapper could become much more simple since only the timestamps would need to be compared to
But that might be out of scope for this discussion... This idea assumes the current stable protocol
Facing these challenges, we made the decision to switch from SSE to WebSockets for all our MCP server communications. Despite the fact that MCP documentation doesn't extensively discuss WebSockets, they are officially supported - and as we discovered, they work significantly better in cloud environments.
Why WebSockets outperform SSE for MCP servers
WebSockets establish a persistent, full-duplex TCP connection between client and server, allowing for bidirectional communication. This architecture offers several advantages over SSE for MCP servers:
Connection Stability: WebSockets maintain more stable connections, with built-in mechanisms for handling disconnections and reconnections.
Bidirectional Communication: While MCP often doesn't require extensive client-to-server communication, having the capability for bidirectional data flow eliminates the need for separate HTTP requests for client-initiated actions.
Binary Data Support: WebSockets can transmit both binary data and UTF-8 text, whereas SSE is limited to UTF-8. This provides more flexibility for different types of data exchange.
Better Performance: WebSockets typically offer lower latency and overhead compared to SSE, especially for frequent communications.
No Connection Limits: WebSockets don't suffer from the same browser connection limits as SSE, making them more suitable for applications where users might have multiple tabs open.
Forking Supergateway
To implement our WebSocket solution, we forked the Supergateway project and modified it to use WebSockets instead of SSE. The core changes involved:
Protocol Adaptation: Modifying the communication layer to use WebSocket protocol instead of HTTP streaming.
Connection Management: Implementing robust connection handling with automatic reconnection logic.
Error Handling: Enhancing error detection and recovery mechanisms to ensure reliable operation in cloud environments.
Scaling Optimizations: Adding features to better support horizontal scaling across multiple instances.
Our modified version of Supergateway is available on GitHub as Blaxel's Supergateway, and we welcome contributions and feedback from the community!
Technical implementation: WebSockets for MCP
For those interested in the technical details, here's how we implemented WebSockets for our MCP servers. Please note that the entire code can be found in open-source on our GitHub on Blaxel's Supergateway and Blaxel’s SDK.
Original comment in English - Translate to English
Kudos on this solution and the article was a great read, but its unclear to me how this resolves the stateful/stateless debate? Wouldn't long lived serverless functions be an anti-pattern?
Original comment in English - Translate to English
Could the MCP team please add WebSockets? It personally feels like it is the "smartest universal solution" led to overly increased complexity, making it hard to adopt and build on top of the tech, and no one is happy....
SSE: a promising start with disappointing Results
During our search for better solutions, we discovered Supergateway, a tool that wraps stdio-based MCP servers with Server-Sent Events (SSE). On paper, this looked like an elegant solution.
For those unfamiliar with SSE, it's a technology that establishes a one-way communication channel from server to client over HTTP. Unlike WebSockets, which provide full-duplex communication, SSE is designed specifically for server-to-client updates. This makes it seemingly ideal for scenarios where clients primarily need to receive updates from servers.
We implemented Supergateway with SSE, but quickly ran into significant issues:
What are the problems with SSE in serverless environments
Connection Instability: In serverless environments, SSE connections dropped randomly and frequently. This is particularly problematic for AI agents that need reliable, persistent connections to function properly.
Scaling Challenges: As we tried to scale our platform, the limitations of SSE became increasingly apparent. The protocol wasn't designed with cloud-native architectures in mind.
Browser Connection Limits: SSE suffers from a limitation to the maximum number of open connections, which is set to a very low number (6) per browser and domain. This became problematic when users opened multiple tabs.
Proxy and Firewall Issues: Some proxies and firewalls block SSE connections because they don't have a Content-Length header, creating deployment challenges in enterprise environments.
After extensive testing, we concluded that while SSE might work well for simpler use cases or controlled environments, it wasn't robust enough for our cloud-based AI agent platform.
(Edited to replace the screenshot with text in the original post. Point 2 above was highlighted. - @atesgoral)
Original comment in English - Translate to English
WebSockets can surely be a more ergonomic transport for persistent connections, especially since they provide full-duplex message transfer out-of-the-box.
However some of the SSE criticism and WebSocket praising above is a bit misinformed.
SSE is not a protocol upgrade in the same sense WebSockets is. SSE is plain old HTTP, using "text/event-stream" as content type, with the server sending events as streaming body chunks. It's up to the SSE server and clients to reestablish the connection and make use of SSE event ids for reliable delivery. But most clients are implemented very naively and are not robust. This is not an inherent problem with SSE itself. SSE actually has some built-in reconnection capabilities in the spec (Last-Event-ID header and automatic reconnection in the EventSource API) that are often overlooked.
WebSocket is also not this magical transport with connection and delivery guarantees. It's just a persistent TCP socket. It's still up to the clients and servers to properly implement robustness measures like pings and reconnects. There are mature WebSocket client/server libraries out there that can create the impression that robustness is a built-in feature of WebSockets. You can experience this by implementing your own WebSocket client from scratch, and then reinventing all the robustness measures that libraries that have been around for a while have implemented.
Also, in the end, SSE is also just another persistent TCP socket, but with only one side doing the talking (if we ignore HTTP/3 QUIC).
Original comment in English - Translate to English
Thank you all for the discussion here! This has been a very thorny topic with a lot of tradeoffs, no matter which path we pursue.
Incorporating all of the input we've heard here and elsewhere, we are proposing a new Streamable HTTP transport, which roughly amounts to plain HTTP plus optional SSE—very similar to what @jerome3o-anthropicdescribed above. Please take a look at that PR description and the associated spec language, and flag any questions or concerns!
Note that other options, like WebSocket, remain possible in future, but we believe strongly this is the right path to pursue for now. All of your comments are greatly appreciated! 🙇
OK now do not give just a high level summary, give the full detail, including all specific technical points of contention (keep to the blow by blow account format)
Now summarise the key issues, points of contention, arguments made and who took which sides and why. Do not use the blow by blow format, instead use structured headings with multiple subheadings.