8000 Switch over to using lingo for language detection by tclem · Pull Request #230 · github/semantic · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.

Switch over to using lingo for language detection #230

Merged
merged 6 commits into from
Aug 26, 2019
Merged

Switch over to using lingo for language detection #230

merged 6 commits into from
Aug 26, 2019

Conversation

tclem
Copy link
Member
@tclem tclem commented Aug 14, 2019

Right now semantic uses some very basic file extension matching for language detection. This moves us toward a slightly more sophisticated approach based on linguist's canonical list of languages.

I don't think we're ready to pull in a full dependency on linguist (it's a ruby gem), but this is a nice intermediate step. Lingo does some compile time map generation of extensions and common filenames to language, allowing fast lookups. We don't support anything where languages share a file extension (first one wins), but that's not an issue for the languages we currently target.

@tclem tclem requested a review from a team August 16, 2019 21:42
@@ -180,8 +180,22 @@ filePathReader = eitherReader parseFilePath
parseFilePath arg = case splitWhen (== ':') arg of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that we allow this on the cli, but didn't want to make breaking changes in this PR. I think in a future iteration we should remove this ability to specify a file language now that our file detection is a bit better.

"typescript" -> Just TypeScript
"php" -> Just PHP
_ -> Nothing
pure $ textToLanguage l
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made our json parsing a little stricter here, languages are case sensitive and must be exactly how they are specified in linguist. Would like some feedback here on if that's OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it’s a good thing.

"TSX" -> Data.TSX
"PHP" -> Data.PHP
_ -> Data.Unknown
bridging = iso Data.textToLanguage Data.languageToText
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed these helpers and couldn't figure out how to use the APIBridge directly due to circular references with Data.Language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine by me.

@patrickt
Copy link
Contributor
patrickt commented Aug 19, 2019

Can we publish lingo to Hackage? Adding more pinned dependencies makes it harder for us to upload semantic to Hackage someday (cf #16). (Happy to 🍐 on this if you want a hand, @tclem).

@tclem
Copy link
Member Author
tclem commented Aug 19, 2019

Can we publish lingo to Hackage?

Yup, Done! Mind taking another look at this @patrickt?

Copy link
Contributor
@patrickt patrickt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

@tclem tclem merged commit c1486db into master Aug 26, 2019
@tclem tclem deleted the lingo branch August 26, 2019 18:05
Copy link
@kristinoliver37 kristinoliver37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to have changed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0