feat: Some kind of risk level returned by servers

Is your feature request related to a problem? Please describe.

Most MCP client applications (such as the Claude Desktop app), ask users to approve many minor actions. Example dialog:

This can be frustrating for users if there are many tools they're trying to use. Having to do this many times likely will result in users defaulting to allowing decisions (alarm fatigue). Some other MCP client apps might choose to not ask users for permission at all, which seems dangerous.

Ideally we want some way to:

Enable users to grant permission to run low-risk actions automatically
Flag to users when actions really are potentially high-risk, and the consequences (without alarm fatigue)

Describe the solution you'd like

Currently, the protocol does not provide a way for servers to indicate how 'risky' an action is (apart from maybe in a non-structured way in the description). There's also no straightforward way for the server to provide context about how risky a particular action would be.

One idea might be to add properties to the Tool data type, that adds something like:

An expression of how risky an action is in general.
- Intuitively, this might be thought of as a risk_level of low, moderate or high
- In practice we might want something richer, and invite ideas on how we can do this better! Maybe an array of risk types, with some value for impact on them. For example (ideally 'low' or 'high' would be defined on an impact scale for each category):
  - make_amazon_purchase => financial_risk: ~£100-£1000
  - access_google_drive => privacy_risk: moderate?
  - access_medical_records => privacy_risk: high
  - control_smart_light_state => disruption_risk: low
  - start_aws_server => financial_risk: ~£100, cyber_risk: moderate
  - [I think these categories could do with a lot of refinement, and don't stand by them - am sure there are better papers exploring risk taxonomies of AI agents available!]
- Another way this might be thought of is maybe like oauth scopes, which again define a class of actions usually by how impactful they can be.
A way for the server to easily respond with a more precise risk level for a given call
- e.g. if making an Amazon purchase of a specific item, it might return an impact statement like 'This will authorise a payment of £25.99 to buy Cuddly Stuffed Animal Sloth Soft Toy'. It might also attach a structured risk statment of financial_risk: £25.99.

Clients could then have more flexibility for how they want to warn users of actions. E.g.

maybe a user is happy with autoapproving AI system taking any actions with moderate privacy risk (corresponding to a level of accessing general documents), but zero financial risk.
a user might be happy with AI systems reading any of the data in their database, but not editing any of it without checking with them

(In the future, AI systems might themselves be able to make these judgements based on a risk profile set by the user - e.g. evaluating the request against a user's risk appetite statement. Returning the risk information would then help this system evaluate more complex tests, such as 'Autoapprove edits to database table X, but only allow read access to table Y' OR 'Autoapprove creating email drafts, but ask me before sending them.')

Describe alternatives you've considered

I'm open to other ways to achieve solving the problem (improving safety of MCPs by avoiding alarm fatigue).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions