[WIP: Apr2025] TorchText #590
GeorgeS2019
started this conversation in
Ideas
Replies: 2 comments
-
Now that we have Microsoft.ML.Tokenizers It is now possible to start thinking TorchText for torchsharp??
### [torchtext.transforms](https://pytorch.org/text/stable/transforms.html)
|
Beta Was this translation helpful? Give feedback.
0 replies
-
[TorchText.Transforms]GPT2BPETokenizer
TorchText.Transforms; BPEncoderhttps://github.com/openai/gpt-2/blob/master/src/encoder.py Parameters
[Microsoft.ML.Tokenizers]
Microsoft.ML.Tokenizers; BPEncoder=> Microsoft.ML.Tokenizers/Model/BPE.cs Parameters /// <param name="vocabFile">The JSON file path containing the dictionary of string keys and their ids.</param>
/// <param name="mergesFile">The file path containing the tokens's pairs list.</param>
/// <param name="unknownToken"> The unknown token to be used by the model.</param>
/// <param name="continuingSubwordPrefix">The prefix to attach to sub-word units that don’t represent a beginning of word.</param>
/// <param name="endOfWordSuffix">The suffix to attach to sub-word units that represent an end of word.</param>
Bpe bpe = new Bpe(vocabFile, mergesFile, UnknownToken); private static readonly string _vocabUrl = "https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json";
private static readonly string _mergeUrl = "https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe";
private static readonly string _dictUrl = "https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt"; |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
[WIP: Apr2025] TorchText
This list needs to be updated
torchtext.transforms
WIP: Mar2023] TorchSharp TorchText
PyTorch TorchText
LANGUAGE MODELING WITH NN.TRANSFORMER AND TORCHTEXT
Now we have a .NET version of this TorchText tutorial
We could NOW start a discussion on how to support TorchSharp TorchText.
Language translation with transformer using TorchData and TorchText
[Original]Language translation with transformer using TorchData and TorchText
This is a WIP written text
🔵 Objectives of this discussion
Objectives of this discussion
📗 Components needed:
Components needed:
TorchData
What is TorchData?
TorchText
📗 How to leverage the existing .NET frameworks?
How to leverage the existing .NET frameworks?
What are the .NET NLP frameworks that could be leveraged to deliver the missing components?
Machine
Using .NET catalyst to work with Spacy
🔵 Objectives of the tutorial: language translation with transformer
Objectives of the tutorial
Data Sourcing and Processing
🔵 Relevant TorchSharp open issues
📗 Missing TorchText.NN feature
📗 Proposal for common .NET tokenization library
- [Proposal for common .NET tokenization library](https://github.com//issues/248)Beta Was this translation helpful? Give feedback.
All reactions