String representation of Sigstore identities

There are a number of places where users must ask "does this signature come from X?" where X is an "identity." This is actually non-trivial to get right: you can't just ask for user@example.com because what if I made my username user@example.com for some random OIDC provider that Fulcio happens to trust (like justtrustme.dev)? See sigstore/cosign#1947

So we've settled on a UX in Cosign that's kind of a pain: you have to have a magic combination of flags (--certificate-oidc-issuer, --certificate-identity) and this gets even worse when you start considering e.g. workload identities via GitHub Actions (sigstore/cosign#2691). There's a number of issues related to this UX:

A number of folks have remarked something like "wouldn't it be nice if I could just pass in a string to represent the identity and Cosign figured the rest out?" While each project (policy-controller, Cosign, sigstore-python, any other CLI or otherwise user-facing implementations) could figure this out on their own, it seems useful to have a consistent notion of identity across the Sigstore ecosystem, and sig-clients seems like a good place to coordinate.

(Note that this issue is mostly about some kind of user-facing string, not a string for embedding into the Fulcio certificate in place of the multiple extensions that get used.)

Some general considerations:

Do we even want to encourage typing identities into the CLI?
- In the long term, you should always be getting your identities via some meta-policy like TUF.
- But in the short-term, there is a need to provide identities yourself.
- Maybe we could have users provide this input in the form of a file, which may be machine-readable but not necessarily human-friendly. This avoids risks like typosquatting.
  - This is somewhat user-hostile but maybe the security tradeoffs are worth it.
Backwards compatibility: you don't want to make old identities, which used to embed hard-coded strings, suddently more liberal than they were.
- Probably easy to avoid: just use a different, mutually exclusive flag.

Some options that have been discussed:

Distinguished Names in X.500

RFC 1779 provides a notion of a string representation of distinguished names. This uses ASCII strings (though it is capable of representing arbitrary ASN.1 BER-encoded data via an "escaped" notation): Foo=lol, Bar=baz.

Pros:

It works.
No need to invent something new.

Cons:

No support for more advanced matching (wildcards, regular expressions).
A little gross once special characters get involved.
Hard for clients to look at such a string and figure out if you're asking for something sensible or something totally insecure (e.g., omitting the issuer).

Invent something ourselves

Pros:

Meets our needs: easy to define a set of common patterns that don't have footguns (e.g. user("user@example.com", "accounts.google.com") wouldn't let you omit the issuer)
We can make it quite ergonomic.

Cons:

Nonstandard.
We may get something wrong.
Hard to enable flexibility (e.g., using regular expressions or wildcards). This could also be considered a pro 🙂
Requires development work for every new "archetype" of identity (e.g., BYO PKI).

Logic programming / expression language

Squinting, you might realize that the "identities" we're talking about aren't so much fixed identities as a predicate over the certificate. That is, sometimes I want to match all of some number of X.509 extensions; sometimes I just want a few. Maybe I want to express things like "signed by Alice or Bob."

There are existing languages for expressing predicates. These include full-blown programming languages (a terrible idea in this case!) and more-restricted languages, like logic programming languages (Rego, CUE) and expression/filtering languages (jq, CEL). Cosign already supports Rego and CUE for matching predicates over attestations. Could we have users provide expressions for identity matching?

Pros:

As flexible as we need.
Could be used internally for validation. Prevents bad verification and false positives due to client implementation bugs.
We could ship a Sigstore "standard library" for these languages to embed common patterns.
Matches the way we already handle attestations.

Cons:

Requires shipping an engine for these language with each client.
Complexity: way more effort to implement than something hardcoded.
- Though arguably this cuts through an existing Gordian knot of verification somewhat, and overall simplifies/unifies things.
Yet another concept for users to learn. Maybe while learning they'll make security-critical mistakes.

Do nothing

It’s somewhat tough to express these in CLI flags, but maybe we just have the wrong flags? You could still do CLI flags to express common patterns. Maybe you need something like mutually exclusive groups with more specific requirements. Or, people are getting by with the current flags (though they are frequently complaining, as the issues mentioned above illustrate).

Pros:

Easy.
Familiar. Works.

Cons:

No flexibility.
Validating that a user's query is sensible is quite hard to check, and needs to be repeated across ecosystems.
This is really easy to shoot yourself in the foot with. What’s AND-ed, what’s OR-ed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distinguished Names in X.500

Invent something ourselves

Logic programming / expression language

Do nothing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Distinguished Names in X.500

Invent something ourselves

Logic programming / expression language

Do nothing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions