Incorporate embedding pooling layer fixes #1194

iamlemec · 2024-02-15T17:56:49Z

Made some fixes to the pooling layer in llama.cpp that are reflected here. Previously we had to divide by the number of tokens in the sequence. Now we can just take them as-is and optionally normalize. Also changed the truncation to n_batch rather than n_ctx since that's what we're writing to.

Embedding numbers now match up very closely with SentenceTransformers. Usually around 1-(1e-7) cosine similarity, though there are some remaining issues with tokenizing text with accents.

abetlen · 2024-02-15T20:23:21Z

Thanks @iamlemec

iamlemec added 2 commits February 15, 2024 01:31

remove division by token count

ee84ca1

truncate to n_batch, not n_ctx

fa7f1cd

iamlemec mentioned this pull request Feb 15, 2024

Batch embed strings to fit model context #1186

Merged

abetlen merged commit 7bb91f0 into abetlen:main Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorporate embedding pooling layer fixes #1194

Incorporate embedding pooling layer fixes #1194

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Incorporate embedding pooling layer fixes #1194

Incorporate embedding pooling layer fixes #1194

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!