10000 WIP: Adding typo correction. by Keboo · Pull Request #377 · dotnet/command-line-api · GitHub
[go: up one dir, main page]

Skip to content

WIP: Adding typo correction. #377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 23, 2019
Merged

Conversation

Keboo
Copy link
Member
@Keboo Keboo commented Jan 16, 2019

Seemed like a fun little thing to mock out.
Opening this PR just to get feedback on if people want this finished or not.

Fixes #358

@@ -184,6 +186,24 @@ public static CommandLineBuilder CancelOnProcessTermination(this CommandLineBuil
return builder;
}

public static CommandLineBuilder UseTypoCorrections(
this CommandLineBuilder builder, int maxLevenshteinDistance = 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Keboo Where did the default of 3 come from? I'm not saying it is a bad default. I am just curious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly from here and the simplest thing to get working. There is probably some better heuristics to use rather than just picking a number out of a hat.
I considered using something like:

string unmatchedToken = ...;
int maxLevenshteinDistance = (int)Math.Ceiling(unmatchedToken.Length / 2.0);

Open to other suggestions too.

The other part I am not completely happy with in single character aliases. You can imaging have a command with options -a, -b, and -c and then the user types -d. All of those would show as suggestions since they all have the same distance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't TypoSuggestion() just refuse to suggest alternatives if the length was under some value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly. I think there is an outstanding question of what is expected. I think for single character tokens it might be reasonable to either not provide any suggestions or do something different (perhaps only return suggestions for things that start with the same character as the token). I am not really sure what would be expected or desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there should be more consistency of behavior here with tab suggestions. There, we do substring matching, which is a different kind of fuzzy matching. If someone types a single letter and hits tab, they get back all of the symbols containing that letter. That could work for typo suggestions as well.

Similarly, if someone types myapp --nfo⇥, maybe suggestions should allow tabbing through the typo-corrected variants?

In other words, is typo correction just a different interaction model for the existing ISuggestionSource abstractions, which should be expanded to consider misspellings in addition to the existing substring matching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonsequitur yea I could see exposing the typo correction functionality as a suggestion source as well. However, I do think there is a difference between suggestions and typo correction. Similar to intellisense vs compiler errors, I think both can be valuable.

Copy link
Contributor
@jeredm jeredm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Nothing I noted was really critical to change. This should probably be a part of the defaults here:

public static CommandLineBuilder UseDefaults(this CommandLineBuilder builder)
{
return builder
.AddVersionOption()
.UseHelp()
.UseParseDirective()
.UseDebugDirective()
.UseSuggestDirective()
.RegisterWithDotnetSuggest()
.UseParseErrorReporting()
.UseExceptionHandler()
.CancelOnProcessTermination();
}

{
foreach (var token in result.UnmatchedTokens)
{
string suggestions = string.Join(", or ", GetPossibleTokens(result.CommandResult.Command, token).Select(x => $"'{x}'"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Join could be improved to provide better sentence structure. Here is what I expect the current solution will look like:
'car' was not matched, did you mean cart, or carb, or crab, or camp?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree. The message also needs to be moved out so it can be localized.

})
.Select(tuple => tuple.possibleMatch);

string GetSymbolMatchValue(ISymbol symbol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this into a local function doesn't seem to improve code readability. It is being put into a clearly named variable called possible matches, so I think it is helpful to see how that happens next to the variable it is assigned to.


public TypoCorrection(int maxLevenshteinDistance)
{
_maxLevenshteinDistance = maxLevenshteinDistance;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this check for >= 0 distance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, assuming we keep this as an incoming parameter, that should be checked. Ideally it would be nice to eliminate the fixed value and use some heuristics based on the unmatched token.

foreach (var token in result.UnmatchedTokens)
{
string suggestions = string.Join(", or ", GetPossibleTokens(result.CommandResult.Command, token).Select(x => $"'{x}'"));
console.Out.Write($"'{token}' was not matched, did you mean {suggestions}?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...matched. Did you mean...

Copy link
Contributor
@KathleenDollard KathleenDollard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool

@@ -184,6 +186,24 @@ public static CommandLineBuilder CancelOnProcessTermination(this CommandLineBuil
return builder;
}

public static CommandLineBuilder UseTypoCorrections(
this CommandLineBuilder builder, int maxLevenshteinDistance = 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't TypoSuggestion() just refuse to suggest alternatives if the length was under some value?

@jonsequitur jonsequitur merged commit a0e927e into dotnet:master Jan 23, 2019
@Keboo Keboo deleted the typoCorrections branch March 30, 2020 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0