-
Notifications
You must be signed in to change notification settings - Fork 400
WIP: Adding typo correction. #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -184,6 +186,24 @@ public static CommandLineBuilder CancelOnProcessTermination(this CommandLineBuil | |||
return builder; | |||
} | |||
|
|||
public static CommandLineBuilder UseTypoCorrections( | |||
this CommandLineBuilder builder, int maxLevenshteinDistance = 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Keboo Where did the default of 3 come from? I'm not saying it is a bad default. I am just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly from here and the simplest thing to get working. There is probably some better heuristics to use rather than just picking a number out of a hat.
I considered using something like:
string unmatchedToken = ...;
int maxLevenshteinDistance = (int)Math.Ceiling(unmatchedToken.Length / 2.0);
Open to other suggestions too.
The other part I am not completely happy with in single character aliases. You can imaging have a command with options -a
, -b
, and -c
and then the user types -d
. All of those would show as suggestions since they all have the same distance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't TypoSuggestion()
just refuse to suggest alternatives if the length was under some value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly. I think there is an outstanding question of what is expected. I think for single character tokens it might be reasonable to either not provide any suggestions or do something different (perhaps only return suggestions for things that start with the same character as the token). I am not really sure what would be expected or desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there should be more consistency of behavior here with tab suggestions. There, we do substring matching, which is a different kind of fuzzy matching. If someone types a single letter and hits tab, they get back all of the symbols containing that letter. That could work for typo suggestions as well.
Similarly, if someone types myapp --nfo⇥
, maybe suggestions should allow tabbing through the typo-corrected variants?
In other words, is typo correction just a different interaction model for the existing ISuggestionSource
abstractions, which should be expanded to consider misspellings in addition to the existing substring matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonsequitur yea I could see exposing the typo correction functionality as a suggestion source as well. However, I do think there is a difference between suggestions and typo correction. Similar to intellisense vs compiler errors, I think both can be valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far. Nothing I noted was really critical to change. This should probably be a part of the defaults here:
command-line-api/src/System.CommandLine/Builder/CommandLineBuilderExtensions.cs
Lines 137 to 149 in fd4a14d
public static CommandLineBuilder UseDefaults(this CommandLineBuilder builder) | |
{ | |
return builder | |
.AddVersionOption() | |
.UseHelp() | |
.UseParseDirective() | |
.UseDebugDirective() | |
.UseSuggestDirective() | |
.RegisterWithDotnetSuggest() | |
.UseParseErrorReporting() | |
.UseExceptionHandler() | |
.CancelOnProcessTermination(); | |
} |
{ | ||
foreach (var token in result.UnmatchedTokens) | ||
{ | ||
string suggestions = string.Join(", or ", GetPossibleTokens(result.CommandResult.Command, token).Select(x => $"'{x}'")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Join could be improved to provide better sentence structure. Here is what I expect the current solution will look like:
'car' was not matched, did you mean cart, or carb, or crab, or camp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree. The message also needs to be moved out so it can be localized.
}) | ||
.Select(tuple => tuple.possibleMatch); | ||
|
||
string GetSymbolMatchValue(ISymbol symbol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this into a local function doesn't seem to improve code readability. It is being put into a clearly named variable called possible matches, so I think it is helpful to see how that happens next to the variable it is assigned to.
|
||
public TypoCorrection(int maxLevenshteinDistance) | ||
{ | ||
_maxLevenshteinDistance = maxLevenshteinDistance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this check for >= 0 distance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, assuming we keep this as an incoming parameter, that should be checked. Ideally it would be nice to eliminate the fixed value and use some heuristics based on the unmatched token.
foreach (var token in result.UnmatchedTokens) | ||
{ | ||
string suggestions = string.Join(", or ", GetPossibleTokens(result.CommandResult.Command, token).Select(x => $"'{x}'")); | ||
console.Out.Write($"'{token}' was not matched, did you mean {suggestions}?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...matched. Did you mean...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool
@@ -184,6 +186,24 @@ public static CommandLineBuilder CancelOnProcessTermination(this CommandLineBuil | |||
return builder; | |||
} | |||
|
|||
public static CommandLineBuilder UseTypoCorrections( | |||
this CommandLineBuilder builder, int maxLevenshteinDistance = 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't TypoSuggestion()
just refuse to suggest alternatives if the length was under some value?
Seemed like a fun little thing to mock out.
Opening this PR just to get feedback on if people want this finished or not.
Fixes #358