-
Notifications
You must be signed in to change notification settings - Fork 457
Switch over to using lingo for language detection #230
Conversation
@@ -180,8 +180,22 @@ filePathReader = eitherReader parseFilePath | |||
parseFilePath arg = case splitWhen (== ':') arg of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love that we allow this on the cli, but didn't want to make breaking changes in this PR. I think in a future iteration we should remove this ability to specify a file language now that our file detection is a bit better.
"typescript" -> Just TypeScript | ||
"php" -> Just PHP | ||
_ -> Nothing | ||
pure $ textToLanguage l |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made our json parsing a little stricter here, languages are case sensitive and must be exactly how they are specified in linguist. Would like some feedback here on if that's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it’s a good thing.
"TSX" -> Data.TSX | ||
"PHP" -> Data.PHP | ||
_ -> Data.Unknown | ||
bridging = iso Data.textToLanguage Data.languageToText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed these helpers and couldn't figure out how to use the APIBridge
directly due to circular references with Data.Language
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to have changed
Right now semantic uses some very basic file extension matching for language detection. This moves us toward a slightly more sophisticated approach based on linguist's canonical list of languages.
I don't think we're ready to pull in a full dependency on linguist (it's a ruby gem), but this is a nice intermediate step. Lingo does some compile time map generation of extensions and common filenames to language, allowing fast lookups. We don't support anything where languages share a file extension (first one wins), but that's not an issue for the languages we currently target.