-
Notifications
You must be signed in to change notification settings - Fork 12.5k
We could use std::unordered_map over std::map #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
25ef27c
78b964e
40ab486
ef792ae
3459653
a19aa63
cfdf363
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | D 67E6 iff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
#pragma once | ||
|
||
#include <string> | ||
#include <map> | ||
#include <unordered_map> | ||
#include <vector> | ||
#include <random> | ||
#include <thread> | ||
|
@@ -52,19 +52,24 @@ std::string gpt_random_prompt(std::mt19937 & rng); | |
// Vocab utils | ||
// | ||
|
||
struct token_score { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is confusingly named, same with token_t. the type is only used inside gpt_vocab, so why not nest it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also gpt_vocab is token_t already in this case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hm, first thought was token_t, but that is too close to token, so, just leave it as token_score. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This compiler version seems to not accept token token;
Should I rename the using token = std::string; to token_t? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The quickest and simplest fix to that would be to just rename the data member to |
||
using token_t = std::string; | ||
token_t token; | ||
float score; | ||
}; | ||
|
||
struct gpt_vocab { | ||
using id = int32_t; | ||
using token = std::string; | ||
|
||
std::map<token, id> token_to_id; | ||
std::map<id, token> id_to_token; | ||
std::map<id, float> score; | ||
std::unordered_map<token, id> token_to_id; | ||
std::vector<token_score> id_to_token; | ||
}; | ||
|
||
void replace(std::string & str, const std::string & needle, const std::string & replacement); | ||
|
||
// poor-man's JSON parsing | ||
std::map<std::string, int32_t> json_parse(const std::string & fname); | ||
std::unordered_map<std::string, int32_t> json_parse(const std::string & fname); | ||
|
||
// split text into tokens | ||
// | ||
|
Uh oh!
There was an error while loading. Please reload this page.