10000 [SwiftLexicalLookup][GSoC] Add initial name lookup functionality by MAJKFL · Pull Request #2719 · swiftlang/swift-syntax · GitHub
[go: up one dir, main page]

Skip to content

[SwiftLexicalLookup][GSoC] Add initial name lookup functionality #2719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 23, 2024

Conversation

MAJKFL
Copy link
Contributor
@MAJKFL MAJKFL commented Jul 5, 2024

This PR introduces ScopeSyntax protocol and some initial scope implementations.

  • The implementation works with tuples let (a, b) = (0, 0) and inline chaining let a = 0, b = 0.
  • Result of a lookup is an array of LookupName, an enum that distinguishes identifiers (extracted from variable declarations and/or closure/function parameters) from declarations (struct, class, actor, function etc.). We could possibly also add more cases that would serve as flags for the compiler later on.
  • Supports following scopes: code block, member block, for and while loops and closures.
  • Additionally, introduced result type assertion to assertLexicalScopeQuery test harness.
  • As discussed in [SwiftLexicalScopes][GSoC] Add simple scope queries and initial name lookup API structure #2696, this approach is centered around the syntax nodes using ScopeSyntax protocol to build hierarchy and uses position of the node used for lookup to determine availability of certain declarations (like in code block or if let).

@MAJKFL MAJKFL requested review from ahoppen and bnbarham as code owners July 5, 2024 18:17
case identifier(String, SyntaxProtocol)
/// Declaration associated with the name.
/// Could be class, struct, actor, protocol, function and more
case declaration(String, DeclSyntaxProtocol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the declarations with names conform to the NamedDeclSyntax protocol. Please replace the (String, DeclSyntaxProtocl) with NamedDeclSyntax rather than duplicating the name.

Copy link
Contributor Author
@MAJKFL MAJKFL Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it now to use NamedDeclSyntax. I also introduced IdentifiableSyntax protocol to introduce similar sort of logic to the rest of the relevant syntax nodes.

}

/// Extracts name introduced by `closureShorthandParameter`.
private static func handle(closureShorthandParameter: ClosureShorthandParameterSyntax) -> [LookupName] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you haven't covered the whole language yet, but closure captures are another thing to consider in the design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I forgot about this case. Apart from closure captures, I'll try to also include guard and file scope with this PR. With those changes, it should be pretty straight forward to introduce those and then we could move on to more complex scopes with the next PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added closure capture, if case support, guard and file scope. I couldn't find a case where ClosureCaptureSyntax expression wouldn't be a DeclReferenceExprSyntax hence the force unwrap. Do you think we can make this assumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed on Monday, the desired behavior for the file scope can be adjusted by passing the right config to the lookup method. By introducing LookupConfig protocol we could also specify more configuration options that could add more specific behavior to name lookup in the future (something like: setting some default shadowing rules, filtering specific compiler result flags etc.). Also there's currently one quirk in the guard scope implementation, even though it responds to all name lookup queries correctly, it assigns the declarations to enclosing scope. If you think we shouldn't merge it like this, I can put it fixed in the next PR instead. This one is already much larger than I anticipated.

/// declaration, followed by the first function name, and then the second function name,
/// in this exact order. The constant declaration within the function body is omitted
/// due to the ordering rules that prioritize visibility within the function body.
@_spi(Experimental) public func lookup(for name: String) -> [LookupName] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should proactively unify the queries for "give me all of the entities visible at this point" (which is used for code completion, typo correction, etc.) and "give me all of the entities matching at this point" (which is used for resolving names). That could be as simple as making the name parameter optional and, when it's non-nil, filtering out any names that don't match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the name parameter optional and added support for looking up all names accessible at a given node. I made one test case that exercises this type of lookup. I'm only curious about the desired ordering.

class foo {
  let 1️⃣a = 0
  let 2️⃣b = 0

  3️⃣func foo() {
    let 4️⃣a = 0
    let 5️⃣c = 0
  
    6️⃣x
  }
}

Like in this snippet, when running the lookup for all names at position 6️⃣, should we return the result as [(4️⃣, 5️⃣), (1️⃣, 2️⃣, 3️⃣)] or [(5️⃣, 4️⃣), (3️⃣, 2️⃣, 1️⃣)] (i.e. in the order of introduction or accessibility from the node used for lookup)? I think the first option would make more sense as we wouldn't have to account for cases like an additional declaration in a member block under the scope we perform lookup in and it's also the way name lookup works now, but I'm also curious of your take on this one.

/// declaration, followed by the first function name, and then the second function name,
/// in this exact order. The constant declaration within the function body is omitted
/// due to the ordering rules that prioritize visibility within the function body.
@_spi(Experimental) public func lookup(for name: String) -> [LookupName] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally and unrelatedly, I think we're going to need more structure in the result than a flat array [LookupName]. The current structure does not allow a client to distinguish between "there are two things that match f in the same scope" and "there are two things that match f but the first one shadows the second one", which is important for clients. I suggest returning an array of "interesting" scopes, where an interesting scope contains the names the found in that scope, and perhaps other directions like "look into the instance members of ". For now, we could just have an array of scopes, where each scope has the LookupNames found in it, so we can handle shadowing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reaso 8000 n will be displayed to describe this comment to others. Learn more.

I introduced a new enum LookupResult that, as for now, has one case fromScope that holds ScopeSyntax and [LookupName] associated with that scope. With the lookup function result type now being [LookupResult], I think this could be a nice structure to work with. We could easily add new cases for the result that could encapsulate more information and e.g. act as compiler flags or provide suggestions for shadowing. What do you think about it?

) -> [LookupName] {
introducedNames
.filter { introducedName in
(!positionSensitive || introducedName.isBefore(syntax)) && introducedName.refersTo(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "is before" check isn't going to work when we start modeling the precise location where a variable in a pattern is introduced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it now to isAccessible which is used in every filtering condition and accounts for the precise location of introduction set by the parent scope. Do you think this could work in the long run?

@MAJKFL MAJKFL requested a review from DougGregor July 10, 2024 14:46

extension ClosureCaptureSyntax: IdentifiableSyntax {
@_spi(Experimental) public var identifier: TokenSyntax {
expression.as(DeclReferenceExprSyntax.self)!.baseName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to handle all of the possible kinds of captures. For example, here are some captures that aren't a DeclReferenceExprSyntax:

{ [x=y+1, weak self] in print(x) }

I think the best result here would involve reworking the syntax tree and parser to provide more structure for the parsed closure captures, so there's a single node describing the captured name that we can make IdentifiableSyntax.

import SwiftSyntax

/// Syntax node that can be refered to with an identifier.
public protocol IdentifiableSyntax: SyntaxProtocol {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the kind of thing that we should sink down into SwiftSyntax itself. It can be a refactoring we apply later, when we're more sure about the design.

case .identifier(let syntax, _):
syntax.identifier.text
case .declaration(let syntax, _):
syntax.name.text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be using Identifier here to properly work with back-ticked names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I changed it to use Identifier for name comparison. That also means, the current implementation doesn't work for implicit names like self as it's token kind is not identifier. I removed one test case that exercised self capture in closures, but will add it back once we properly model implicit names (hopefully in the next PR).

/// declaration, followed by the first function name, and then the second function name,
/// in this exact order. The constant declaration within the function body is omitted
/// due to the ordering rules that prioritize visibility within the function body.
@_spi(Experimental) public func lookup(for name: String?, with configArr: [LookupConfig] = []) -> [LookupResult] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the array of LookupConfig is more abstract than we need. Any kind of configuration is going to need custom logic within the lookup function. I recommend replacing this array (and the LookupConfig protocol) with a simple struct with configuration options, e.g.,

struct LookupConfig {
  var fileScopeHandling: FileScopeHandlingConfig = (whatever the default is)
}

Over time, we can expand it if we need to. For example, we could add an optional build configuration when we're ready to deal with #ifs:

  var buildConfiguration: (any BuildConfiguration)? = nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced LookupConfig struct to simplify configuration handling. Thank you for pointing out using build configuration. Do you think we could also use it now to provide default file scope behavior? Or should we leave it for later when we have more configurations?

Copy link
Member
@DougGregor DougGregor Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to use it now. We'll benefit from having it threaded through the APIs that need it.

case identifier(IdentifiableSyntax, accessibleAfter: AbsolutePosition?)
/// Declaration associated with the name.
/// Could be class, struct, actor, protocol, function and more
case declaration(NamedDeclSyntax, accessibleAfter: AbsolutePosition?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that we'll need another case here eventually for "implicit" names, such as the error introduced in

do {
  try something()
} catch { // the catch clause introduces the name `error`.
}

This can, of course, come later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I'll introduce implicit names with the next PR.


extension ClosureCaptureSyntax: IdentifiableSyntax {
@_spi(Experimental) public var identifier: TokenSyntax {
expression.as(DeclReferenceExprSyntax.self)!.baseName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for all closure captures. For example:

{ [x=y+1, weak self] in ... }

I suspect the right answer is to rework the syntax tree so that the syntax structure always provides something that we can make an IdentifiableSyntax without the !.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this doesn't have to happen now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for now, I left a comment next to the force unwrap so we don't forget about it. Do you have already an idea how could we alter the syntax tree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we want different syntactic forms for captures like x, x=<expr>, and weak x, where the x in every case is just the syntax for the capture name. It will make it a lot easier to identify the names introduced by captures.

import SwiftSyntax

/// Syntax node that can be refered to with an identifier.
public protocol IdentifiableSyntax: SyntaxProtocol {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's iterate with this protocol some more. We might want to sink it down into SwiftSyntax and make the NamedDeclSyntax protocol inherit from it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this doesn't have to happen now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep adding the protocol to more syntax nodes like FunctionParameterSyntax and GenericParameterSyntax in the coming PRs. Also, would it be a good idea to soon fix the problem with ClosureCaptureSyntax so we can complete implementation of IdentifiableSyntax?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix with ClosureCaptureSyntax should be a separate PR from this one entirely, since it will involve changing the syntax tree. I don't specifically mind whether it goes before or after this PR; we can fix it up either way. (But we shouldn't sink this protocol down into SwiftSyntax until we fix the issue)

@MAJKFL MAJKFL requested a review from DougGregor July 16, 2024 15:58
Copy link
Member
@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I think I'm happy with this PR as the starting point, and then we can refine it in subsequence PRs to improve the modeling of closures and to introduce implicit names.

@DougGregor
Copy link
Member

@swift-ci please test

@DougGregor
Copy link
Member

@swift-ci please test

@DougGregor
Copy link
Member

@swift-ci please test Windows

/// in this exact order. The constant declaration within the function body is omitted
/// due to the ordering rules that prioritize visibility within the function body.
@_spi(Experimental) public func lookup(
for name: String?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these API’s take String as a parameter or Identifier? I think it really depends on how we envision this API to be used. If we mostly expect it to be used to look up names of eg. DeclReferenceExprSyntax, then I’d prefer to use Identifier because it means that callers can’t get handling of escaping using backticks wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could maybe be a bit tricky to also support implicit name lookup like self, newValue etc. when using Identifier. That’s why I actually had to remove self capture from this PR as implicit name handling is now part of #2748. I might’ve missed something, so I’d love to hear your take on this.

Copy link
Member
@ahoppen ahoppen Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think newValue can be easily represented as an Identifier. We just need to add an initializer that creates a new Identifier from a String.

I was briefly suspicious of self because I thought that the keyword self and the identifier self could refer to different things. But looks like they are considered identical

struct Foo {
  func test() {
    let `self` = 1
    print(self)
    print(`self`)
  }
}

Foo().test()
// Prints
// 1
// 1

// I might have expected
// Foo()
// 1

The following deviation of this might be a good test case as well

struct Foo {
  func test() {
    print(self)
    let `self` = 1
    print(`self`)
  }
}

Another advantage of using Identifier would be that, if we decide to introduce some semantics where eg. the self keyword refers to something different than the self identifier, we might be able to add that to the Identifier type. If we use String as the basis, our hands are tied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a separate branch with Identifier-based lookup. I'm not entirely sure this is the most optimal solution, as it's a naive one, so I'd love to hear your feedback.

I added a new StringIdentifier that strips backticks and added it as an optional property to Identifier. I also made optional the original raw and arena properties. I'm mainly concerned about having the old properties as optional.

public struct StringIdentifier: Equatable, Hashable, Sendable {
  public let name: String

  fileprivate init(_ string: String) {
    let backtick = "`"
    if string.count > 2 && string.hasPrefix(backtick) && string.hasSuffix(backtick) {
      let startIndex = string.index(after: string.startIndex)
      let endIndex = string.index(before: string.endIndex)
      self.name = String(string[startIndex..<endIndex])
    } else {
      self.name = string
    }
  }
}

I also added the test cases you've mentioned, that now work as expected due to the new Identifier-based name comparison.

As it's an internal change in SwiftSyntax module, I'd like to first ask for your feedback before adding it into the next PR. Can you perhaps think of a better way to handle this?

assertLexicalScopeQuery(
source: source,
methodUnderTest: { marker, argument in< 10000 /span>
let result = argument.lookup(for: useNilAsTheParameter ? nil : argument.text, with: config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be argument.identifier.name so that we handle escaped names correctly? Also seem my comment on whether the lookup API should be based on String or Identifier.

Suggested change
let result = argument.lookup(for: useNilAsTheParameter ? nil : argument.text, with: config)
let result = argument.lookup(for: useNilAsTheParameter ? nil : argument.text, with: config)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn’t work for implicit name references like self, as their identifiers are nil. In this case, those are just internals of a test method so I think it should be fine to just use argument.text. I think, as for now at least, we should left out using token.identifier.name over token.text up to the client and keep using String in name lookup for the sake of simplicity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the result of this will be whether the lookup API should be Identifier-based or String-based. So let’s continue that one.

@DougGregor DougGregor merged commit 3db6f01 into swiftlang:main Jul 23, 2024
3 checks passed
@DougGregor
Copy link
Member

@MAJKFL I've merged, but please look at Alex's comments above to address in a follow-up PR.

@MAJKFL
Copy link
Contributor Author
MAJKFL commented Jul 23, 2024

Thanks for merging! Once I create a new PR with the suggested changes, I'll respond to the comments and link the fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0