Benchmarking Lexer Against Hugging Face Transformer #12

p3nGu1nZz · 2024-03-30T05:55:54Z

Benchmarking Lexer Against Hugging Face Transformer

Objective:

To evaluate the performance and effectiveness of our custom Lexer in comparison to the Hugging Face transformer, we will create a benchmark that measures speed, memory usage, and the quality of context representation.

Tasks:

Set up a Python environment with the necessary Hugging Face transformers library and dependencies.
Develop a Python script to tokenize and vectorize text using the Hugging Face transformer.
Include additional context calculations in the Python script, such as entropy, whitespace, variance, etc.
Create a mechanism within Unity to call the Python script and capture its output.
Design the benchmark to measure the processing time, output size, and context quality for both systems.
Ensure the benchmark tests are repeatable and consistent across multiple runs.
Document the benchmark process, including setup, execution, and result interpretation.

Acceptance Criteria:

The benchmark should accurately measure and compare the performance of our Lexer and the Hugging Face transformer.
Results should highlight the strengths and weaknesses of each approach in terms of speed, efficiency, and context representation.
The benchmarking process should be well-documented and easily reproducible for future testing and development.

This ticket will guide the development of a comprehensive benchmarking suite that will inform our decision-making process regarding text processing tools within our project.

The text was updated successfully, but these errors were encountered:

Josephrp · 2024-03-30T19:56:16Z

i want to follow along with this but dont know how much i can help ^^

p3nGu1nZz · 2024-04-08T05:00:56Z

i want to follow along with this but dont know how much i can help ^^

you could make a simple python script to tokenize a string of words (using huggingface transformers) no more than 1000 characters. And track how long it takes to tokenize that string as accurately as possible.

p3nGu1nZz added the help wanted Extra attention is needed label Mar 30, 2024

p3nGu1nZz self-assigned this Mar 30, 2024

p3nGu1nZz added documentation Improvements or additions to documentation good first issue Good for newcomers labels Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking Lexer Against Hugging Face Transformer #12

Benchmarking Lexer Against Hugging Face Transformer #12

Benchmarking Lexer Against Hugging Face Transformer #12

Benchmarking Lexer Against Hugging Face Transformer #12

Comments