-
Notifications
You must be signed in to change notification settings - Fork 646
fix: add status field in Resource class and Resource as a return type in CallToolResult #549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: add status field in Resource class and Resource as a return type in CallToolResult #549
Conversation
… in CallToolResult
I previously left some feedback regarding the async resource approach: #491 (comment). I believe all of those points apply to this approach as well. Some additional points for this approach:
|
+1 this is a general problem we need to solve -- clients should be able to GetResource wherein they provide a URI and get the details of that URI. I feel that is a key thing missing from this PR.
Adding GetResource will provide an alternative to subscription.
I don't see why the nature of progress notifications, results, and errors changes at all. This is just adding a new possible return type for tools and ensuring resources have status. Progress messages are still supported via normal channel. Think of this this way - this PR is just doing the following:
Why is the pattern partial results any different? Servers still publish partial results over a stream. When servers eventually provide a final result it now may have a Resource attached. Am I missing something?
This seems specific to the prior PR - i dont know if this applies here.
This is a great point -- we really need a GetResource operation in addition to a ListResource / ReadResource one.
This also seems specific to prior PR. |
That would shift the problem from maintaining multiple subscriptions to polling multiple resources.
By "changing the nature of", I mean there will be a bifurcation. Think about how sync tools and async tools will require entirely separate code paths in the SDKs. Think about how defining a new property on, say, By "no support for progress messages", I was referring to the prior PR having a progress number and total, but not capturing progress messages. This PR has neither of those things; however, if clients must still use the normal channel for progress notifications, why not get a normal result type directly from that channel as well?
This is an important use case, but it is orthogonal to async tools. Representing the data as a resource is one approach, but it does not require adding a
It's worth pointing out that the
I was assuming that partial results would not be over a stream, because if they were over a stream, why have a resource attached?
Neither PR addresses what happens when an async tool makes a request to a disconnected client, so the point applies to both PRs. Again, if the answer is that the request is sent over the resumable HTTP SSE stream, then why isn't the full tool result also sent over the stream? If we need an API to poll the status of the stream after a disconnection, then let's just add that.
Neither PR addresses how the server should handle resource reclamation, so the point applies to both PRs. When should a resource expire? How is that time frame communicated to the client? If a server sends a request to a disconnected client, how long must the server wait before deleting the "pending" resource? When an "available" or "error" resource has been read, can it be immediately deleted? How long must the server maintain a record of a "deleted" resource? |
I'm looking into this problem at the moment too, this may have already been considered however I wonder if the simplest possible option is just to allow a resource uri reference in a progress notification e.g. for FastMCP. context.report_progress(
progress=5,
total=10,
message='generating content',
resource_uri='resource:/content_5_10.mp4'
) That way there's virtually no new concepts required, I suggest it is up to server implementors how long they keep any in progress resources hanging around i.e. no guarantees and they are expected to be ephemeral but I would imagine most servers would/could make these uris valid for minutes with little overall cost implications to enable clients time to cache the results. I suggest resource_uri should be an optional field that defaults to None for backwards compatibility. If this is of interest I can draft as an RFC, I'll try to have a pass at implementing it as a branch to the python sdk to demonstrate the idea further. |
This is a nuanced discussion involving modelcontextprotocol/python-sdk#800 as well, I commented there that it's already permitted (from a model POV) for Notifications to include arbitrary extra data (models have |
FWIW i think this need to keep parity already exists - i dont think this PR changes the need to ensure TextResourceContents and TextContent evolve in sync. If we were to redesign this we might use the same Content objects for both but at this point we need to continue with this abstraction. Also philosophically, while it is nice if clients can treat async results identically to sync ones, i dont think this is a necessity. The code on client side will meaningfully differ for async regardless.
Servers can choose similar to APIs. Tools which execute short, simple operations are probably better to model as sync while ones which are long-running will be better modeled as async.
They can -- this is an alternative to sitting on the line waiting for streaming results.
I don't see anything wrong with adding other content types to Resource in fullness of time.
That's a great point - adding status will be backwards incompatible for existing consumers. Keeping status in a separate extension class of Resource or creating a brand new class would be 2 options here. @bzsurbhi would suggest thinking through this point.
Basically I'm just saying that servers can decide to still send progress notifications if they really want to and clients can decide whether to ignore them or not. That said, I don't expect this to happen in general. The point of this is that clients don't need to wait on the line for completion.
I'm not fully understanding this comment. So in Streamable HTTP client/server interactions can be completely stateless i.e. single request/response similar to REST APIs. This PR is defining semantics for indicating to clients when some artifact of a request is still being processed + allowing tools to specify Resources as responses. When requests are not stateless or involve streaming, servers can still send these Resources but the use-cases for doing so would probably be different.
Resource reclamation seems like an orthogonal problem that already exists for Resources today. Not sure if we need to couple it with this proposal on enhancing Tool calls w/ Resources. |
I think this kind of defeats the purpose here of facilitating flows where clients do not need to wait on the line for servers to complete. |
It occurs to me that rather than a resource with a status perhaps a future result might be better and more generalisable approach. This can then be solved at the SDK layer rather than the protocol layer and this would then allow any type of response not just content so would also support the structuredContent approach too. This avoids the need for complicating the resource protocol so a resource is always expected to be 'ready' it just might be ready in the future. e.g. for the python sdk this might look like: future: Future[ImageContent] = session.call_tool_soon('generate_image', progress_callback=progress_callback)
image = future.result(timeout=60) # 60 seconds The naming convention here is meant to mirror start_task_soon from anyio but other naming options are obviously available. In this case the python client can opt into this behaviour and use a callback or wait for a result and cancel in the case it's taking too long. I think this covers use case that this change is trying to address and doesn't require a new concept at the protocol level. This might be going off on a completely new use case but I'm interested in the idea of creating tools to perform open ended long running tasks like training a model. In this scenario a tool will output many intermediary artefacts such as metrics, validation_data and checkpoints and a single reponse object doesn't feel that useful. It would be neat if there were a way to return a sequence of future resources that the server could emit as they become available. That was the main reason for allowing progress to respond with resources, however the limitation in this approach is is it's not obvious how a client would reconnect with a long running task if it disconnects and then rejoins say the next day if the training is taking a long time. It might be because it's late in the day in the UK but this last part seems hard to implement both with this Future approach or with Resources that can be updated (as per discussion here modelcontextprotocol/python-sdk#800 (comment). |
solving on client side won't help since it still requires a persistent connection. But, I do think the idea of creating a new class like "Promise" and a new alternative to CallTool might be another way to achieve this. |
Yep that's the sort of direction I'm thinking too if there's an appetite for a new method, I agree supporting non-persistent connections is going to be really useful for these sorts of use cases. Perhaps something along the lines of the following for the low level sdk: task: TaskReference = session.start_task('train_model', training_config)
session.subscribe_updates(task, 'progress', progress_callback)
session.subscribe_updates(task, 'metrics', metrics_callback)
result = session.wait_result(task, timeout=60) The start_task would behave a bit like call_tool but instead of being expected to get a result immediately the client gets a reference that can be used to get the result later. The task reference should be externalisable so another process can pick up the refrence after a disconnect rejoined_task = session.join_task(old_task)
session.subscribe_updates(rejoined_task, 'progress', progress_callback)
session.subscribe_updates(rejoined_task, 'metrics', metrics_callback)
result = session.result(rejoined_task, timeout=60) There's a question of how the client would request updates that it had missed whilst disconnected, perhaps allowing an optional position marker on rejoin to enable the client to say replay updates after a certain point. It's not obvious how to parse the position markers around, I might need to think on that some more...somehow associated with the old_task?? The task reference might advertise the channels the client can subscribe to with a description e.g. class TaskReference(BaseModel):
task_token: str
channels: Dict[str, str] I'm being very lazy with the channels descripton there but the idea would be to provide at the very least a string description of the channels the client could subscribe to. High level sdk libs might then wrap up this low level task protocol into Promises to make the simple cases easy but allow for more complex behaviour without requiring lots of new semantics. I realise this is going way of piste from the original ticket. |
Lemme try to pull some strands together, maybe just to clear up my own thinking,, so the problem to address is how to deal with really long-running tool processes in a way that the user can stay informed about the status. Is that roughly accurate? So an LLM decides to output a tool_call to respond to a user prompt, maybe the prompt is even like, "solve this incredibly hard problem," ("What is the meaning of life?") and the LLM has a "build-an-LLM tool" so it decides the best thing to do is to train a new foundational model to solve the problem. So unlike a "normal" tool, you can't wait synchronously for the answer, that can't be the expected response. So to me, it's like the tool's function is to kick off a long process, with maybe many intermediate steps and many resources or outputs along the way. So the way I would think about it is you get like a response with metadata for the overall "job", and maybe the response from the tool is a description of all the sub-tasks like "data collection task", "training task", "evaluation task", finally the new model is built and it gets to the "answer the original question" task. For me I would picture this like the big job is a resource with a URI and the response from the tool is just this URI and a "plan" of all the sub-tasks or something. Then the client is happy and life goes on and either it subscribes to updates about this The closest thing I can think of is the "Deep Research" "tool" like in chatGPT e.g., it can take like 20 minutes, but it's synchronous and blocking, so are we trying to think about how to deal with situations like that but even on a larger scale? Supposing I'm actually following the thread (maybe I'm just following my own thread) I guess I like this approach overall. Is this and this competing proposals? If so, it would be easier to reason about if they were both at the same level, like @davemssavage 's PR is on the SDK level - or is that also part of the debate, like "is this a protocol change or an SDK change"? OK, so if what I said above is off base, then, sorry for clogging up the debate, just minimize this comment lol, but if I'm following, here's closing thoughts:
Hope this helped. 🤷 EDIT (caveat): Maybe I'm thinking of a resource too loosely, I think of it as literally pretty much any entity in the universe, and the contents is data about the entity, but a resource itself does not have to be a bit of data. Maybe that's contrary to the protocol, which says "Resources represent any kind of data that an MCP server wants to make available to clients." but I think it's nice to think of a resource ANYTHING, and if you want you can argue that anything can be represented "as data," but maybe I am on my own in thinking that way. EDIT: I noticed that the notion of a "sub-resource" is already mentioned (maybe not defined) in the schema.ts:
|
Final thought then time for some sleep, it would be neat if start_task played nicely with existing tools e.g. server implementors might perform some mapping to make tools automatically appear as tasks so existing libraries don't need rewriting. Some of the semantics around channels in my previous post would require new server side apis but these should be opt-in rather than mandatory. |
I don't disagree that it would be nice to unify many types, but, other than this PR, why would
If clients can treat sync and async results identically, why would the code meaningfully differ? The spec supports resuming a disconnected stream. If a result is transmitted via the stream, then the client can handle it in the same way upon resume, regardless of whether the tool is sync or async. There just needs to be a mechanism to trigger the resume, such as a timer or a webhook.
What about a tool that executes a short, simple operation, except that a particular input causes it to run for a very long time?
The client does not need to stay online while waiting for streamed results.
I don't understand the relationship between these two comments.
Servers can make requests to clients, e.g. sampling requests. Much like progress notifications, those requests are sent via the stream. If a client must read the stream to receive and handle such messages, why not also send the result via the stream? The only real purpose of the resource is to monitor the status of the result. Instead of altering the resource primi 8000 tive to suit that purpose (and dealing with the ensuing consequences), let's add a separate API specifically for that purpose.
I'm not sure that resource reclamation is actually a problem that exists today. Regardless, there is a fundamental difference when you are creating resources on every tool call, so it is not an orthogonal problem. The questions I asked in my previous comment were specific to tool-created resources. |
This sounds like the approach I proposed in #491 (comment). Ultimately, I concluded that it would be better to generalize the approach for all tools, not just (anticipated) long-running tasks. See #543.
See #543 (comment). |
@davemssavage I agree adding I overall like the idea of introducing a new class and an alternative to tool/call
This implementation provides:
The client workflow would be:
This approach:
Note: The class names, method names, and exact structure shown here are for demonstration purposes only and not final. The actual implementation would need to be refined through discussion and align with the protocol's naming conventions and design patterns." Let me know what you think |
I agree trying to keep things close to the original tool protocol is probably better on reflection. Thinking about it some more the JSONRPCRequest interface already has the idea of a requestId https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/draft/schema.ts#L79 I wonder if what we're actuallly talking about is externalising this so a client can get the result of a previously submitted JSONRPCRequest? A couple of points I'm generally not a big fan of polling, I think given the protocol already has support for server side notifications perhaps these should be used to notify a client that a task is complete or failed rather than the client having to keep polling the result, that tends to lead to unnessery delays. I guess the client should always be able to fall back to polling but ideally it wouldn't be the primary mechanism. Also I think it would be sensible to support task progress for async tool calls so supporting notifications of async task progress should be allowed, if this change also added in the idea of notifying of a resource along side the string message I think this would cover the original idea I had reporting arbitary info e.g. metrics, validation data, checkpoints etc as per #549 (comment). We should also make sure it works with the structuredContent responses as well as the TextContent | ImageContent | Embedded I think we're all thinking along the same lines perhaps it's worth putting together a new rfc, I find it a bit easier to comment on code when it's in a merge request than buried in comments. I'd also be happy to contribute towards a python-sdk experiment to prove it works in reality if you need a volunteer for such work. |
I believe we could use the existing Resource subscription and notification mechanisms rather than implementing new asynchronous code. For polling support, we can add a status field through a new AsyncOperation/AsyncResource class instead of modifying the base Resource class directly. This approach leverages our existing infrastructure while maintaining cleaner separation of concerns.
In this approach:
The server implementation would:
The client workflow would be:
This approach leverages existing resource endpoints for content retrieval and notifications I'm down to creating a new RFC once we agree on the high level approach I'm happy to continue this discussion offline to explore more ideas and reach a conclusion. Could you share your Discord username? I wasn't able to locate you there |
Hi @bzsurbhi I have updated my nickname on the mcp discord server, hopefully that makes me easier to find. I'm not a big user of discord to date so this might not be sufficient. I took a pass at a merge request that captures some of my thinking here: Happy to progress this either on discord or via either pull request as appropriate. I should probably flag that I'm working on this as a background project so it's evenings and weekends so responses may be sporadic. |
Motivation and Context
Enhanced support for long-running operations by implementing a Resource-based polling mechanism. The server can now return a Resource object (containing URI and status) from call_tool(), allowing clients to poll the resource status until it reaches Ready state. Once ready, clients can retrieve the content using read_resource()."
How Has This Been Tested?
Breaking Changes
This is not a breaking change
Types of changes
Checklist
Additional context