0% found this document useful (0 votes)

45 views7 pages

Vector Embeddings - OpenAI API

OpenAI has released new embedding models, text-embedding-3-small and text-embedding-3-large, which offer improved performance and lower costs for various applications such as search, clustering, and recommendations. These embeddings convert text into numerical vectors, allowing for the measurement of relatedness between text strings. Users can access these embeddings through the OpenAI API, with pricing based on input tokens, and the document provides examples of how to implement and utilize these embeddings effectively.

Uploaded by

deadlifter.lasc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views7 pages

Vector Embeddings - OpenAI API

Uploaded by

deadlifter.lasc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

19/02/25, 23:01 Vector embeddings - OpenAI API

Vector embeddings Copy page

Learn how to turn text into numbers, unlocking use cases like search.

New embedding models

text-embedding-3-small and text-embedding-3-large, our newest and most performant
embedding models, are now available. They feature lower costs, higher multilingual performance,
and new parameters to control the overall size.

What are embeddings?

OpenAI’s text embeddings measure the relatedness of text strings. Embeddings are
commonly used for:

Search (where results are ranked by relevance to a query string)

Clustering (where text strings are grouped by similarity)

Recommendations (where items with related text strings are recommended)
Anomaly detection (where outliers with little relatedness are identified)
Diversity measurement (where similarity distributions are analyzed)
Classification (where text strings are classified by their most similar label)

An embedding is a vector (list) of floating point numbers. The distance between two vectors
measures their relatedness. Small distances suggest high relatedness and large distances
suggest low relatedness.

Visit our pricing page to learn about embeddings pricing. Requests are billed based on the
number of tokens in the input.

How to get embeddings

To get an embedding, send your text string to the embeddings API endpoint along with the
embedding model name (e.g., text-embedding-3-small ):

Example: Getting embeddings javascript

1 import OpenAI from "openai";

2 const openai = new OpenAI();
3

https://platform.openai.com/docs/guides/embeddings 1/7
19/02/25, 23:01 Vector embeddings - OpenAI API

4 const embedding = await openai.embeddings.create({

5 model: "text-embedding-3-small",
6 input: "Your text string goes here",
7 encoding_format: "float",
8 });
9
10 console.log(embedding);

The response contains the embedding vector (list of floating point numbers) along with some
additional metadata. You can extract the embedding vector, save it in a vector database, and
use for many different use cases.

1 {
2 "object": "list",
3 "data": [
4 {
5 "object": "embedding",
6 "index": 0,
7 "embedding": [
8 -0.006929283495992422,
9 -0.005336422007530928,
10 -4.547132266452536e-05,
11 -0.024047505110502243
12 ],
13 }
14 ],
15 "model": "text-embedding-3-small",
16 "usage": {
17 "prompt_tokens": 5,
18 "total_tokens": 5
19 }
20 }

By default, the length of the embedding vector is 1536 for text-embedding-3-small or

3072 for text-embedding-3-large . To reduce the embedding's dimensions without losing its
concept-representing properties, pass in the dimensions parameter. Find more detail on
embedding dimensions in the embedding use case section.

Embedding models
OpenAI offers two powerful third-generation embedding model (denoted by -3 in the model
ID). Read the embedding v3 announcement blog post for more details.

Usage is priced per input token. Below is an example of pricing pages of text per US dollar
(assuming ~800 tokens per page):

https://platform.openai.com/docs/guides/embeddings 2/7
19/02/25, 23:01 Vector embeddings - OpenAI API
MODEL ~ PAGES PER DOLLAR PERFORMANCE ON MTEB EVAL MAX INPUT

text-embedding-3-small 62,500 62.3% 8191

text-embedding-3-large 9,615 64.6% 8191

text-embedding-ada-002 12,500 61.0% 8191

Use cases
Here we show some representative use cases, using the Amazon fine-food reviews dataset.

Obtaining the embeddings

The dataset contains a total of 568,454 food reviews left by Amazon users up to October 2012.
We use a subset of the 1000 most recent reviews for illustration purposes. The reviews are in
English and tend to be positive or negative. Each review has a ProductId , UserId , Score ,
review title ( Summary ) and review body ( Text ). For example:

PRODUCT ID USER ID SCORE SUMMARY TEXT

B001E4KFG0 A3SGXH7AUHU8GW 5 Good Quality Dog I have bought several of the

Food Vitality canned...

B00813GRG4 A1D87F6ZCVE5NK 1 Not as Advertised Product arrived labeled as Jumbo

Salted Peanut...

Below, we combine the review summary and review text into a single combined text. The
model encodes this combined text and output a single vector embedding.

Get_embeddings_from_dataset.ipynb

1 from openai import OpenAI

2 client = OpenAI()
3
4 def get_embedding(text, model="text-embedding-3-small"):
5 text = text.replace("\n", " ")
6 return client.embeddings.create(input = [text], model=model).data[0].embedding
7
8 df['ada_embedding'] = df.combined.apply(lambda x: get_embedding(x, model='text-embedding-3-
9 df.to_csv('output/embedded_1k_reviews.csv', index=False)

To load the data from a saved file, you can run the following:

https://platform.openai.com/docs/guides/embeddings 3/7
19/02/25, 23:01 Vector embeddings - OpenAI API

1 import pandas as pd
2
3 df = pd.read_csv('output/embedded_1k_reviews.csv')
4 df['ada_embedding'] = df.ada_embedding.apply(eval).apply(np.array)

Reducing embedding dimensions

Question answering using embeddings-based search

Text search using embeddings

Code search using embeddings

Recommendations using embeddings

Data visualization in 2D

Embedding as a text feature encoder for ML algorithms

Classification using the embedding features

Zero-shot classification

Obtaining user and product embeddings for cold-start recommendation

Clustering

FAQ

How can I tell how many tokens a string has before I embed it?

In Python, you can split a string into tokens with OpenAI's tokenizer tiktoken .

Example code:

1 import tiktoken
2

https://platform.openai.com/docs/guides/embeddings 4/7
19/02/25, 23:01 Vector embeddings - OpenAI API

3 def num_tokens_from_string(string: str, encoding_name: str) -> int:

4 """Returns the number of tokens in a text string."""
5 encoding = tiktoken.get_encoding(encoding_name)
6 num_tokens = len(encoding.encode(string))
7 return num_tokens
8
9 num_tokens_from_string("tiktoken is great!", "cl100k_base")

For third-generation embedding models like text-embedding-3-small , use the cl100k_base

encoding.

More details and example code are in the OpenAI Cookbook guide how to count tokens with
tiktoken.

How can I retrieve K nearest embedding vectors quickly?

For searching over many vectors quickly, we recommend using a vector database. You can find
examples of working with vector databases and the OpenAI API in our Cookbook on GitHub.

Which distance function should I use?

We recommend cosine similarity. The choice of distance function typically doesn't matter
much.

OpenAI embeddings are normalized to length 1, which means that:

Cosine similarity can be computed slightly faster using just a dot product
Cosine similarity and Euclidean distance will result in the identical rankings

Can I share my embeddings online?

Yes, customers own their input and output from our models, including in the case of
embeddings. You are responsible for ensuring that the content you input to our API does not
violate any applicable law or our Terms of Use.

Do V3 embedding models know about recent events?

No, the text-embedding-3-large and text-embedding-3-small models lack knowledge of

events that occurred after September 2021. This is generally not as much of a limitation as it
would be for text generation models but in certain edge cases it can reduce performance.

https://platform.openai.com/docs/guides/embeddings 5/7
19/02/25, 23:01 Vector embeddings - OpenAI API

https://platform.openai.com/docs/guides/embeddings 6/7
19/02/25, 23:01 Vector embeddings - OpenAI API

https://platform.openai.com/docs/guides/embeddings 7/7

Chapter1 AI Embeddings
No ratings yet
Chapter1 AI Embeddings
32 pages
OpenAI Embeddings Guide
No ratings yet
OpenAI Embeddings Guide
13 pages
Embeddings - A Simple Guide To Rag
No ratings yet
Embeddings - A Simple Guide To Rag
10 pages
Vector Embeddings Guide
No ratings yet
Vector Embeddings Guide
28 pages
Ultimate Guide To Embedding Models
100% (1)
Ultimate Guide To Embedding Models
50 pages
PythonAI VectorEmbeddingsForSharing
No ratings yet
PythonAI VectorEmbeddingsForSharing
46 pages
Newwhitepaper - Embeddings & Vector Stores
No ratings yet
Newwhitepaper - Embeddings & Vector Stores
51 pages
Whitepaper - Embeddings & Vector Stores
No ratings yet
Whitepaper - Embeddings & Vector Stores
52 pages
Word Embeddings
No ratings yet
Word Embeddings
12 pages
Week 5 Large Language Models
No ratings yet
Week 5 Large Language Models
5 pages
Text-Image Embeddings With OpenAIs CLIP
No ratings yet
Text-Image Embeddings With OpenAIs CLIP
5 pages
DL - 20-WordEmbeddings - Ipynb - Colab
No ratings yet
DL - 20-WordEmbeddings - Ipynb - Colab
6 pages
Word Embeddings in NLP and IR
No ratings yet
Word Embeddings in NLP and IR
31 pages
Understanding Vector Embeddings
No ratings yet
Understanding Vector Embeddings
14 pages
Report
No ratings yet
Report
9 pages
Understanding The Core Idea: Retrieval-Augmented Generation (RAG)
No ratings yet
Understanding The Core Idea: Retrieval-Augmented Generation (RAG)
6 pages
Embeddings
No ratings yet
Embeddings
83 pages
Embedding S
No ratings yet
Embedding S
83 pages
Anthropic-cookbook:Skills:Contextual-embeddings:Guide - Ipynb at Main Anthropics
No ratings yet
Anthropic-cookbook:Skills:Contextual-embeddings:Guide - Ipynb at Main Anthropics
21 pages
SemanticAI NEW
No ratings yet
SemanticAI NEW
4 pages
تمثيل النص كموترات - تدريب - مايكروسوفت ليرن
No ratings yet
تمثيل النص كموترات - تدريب - مايكروسوفت ليرن
14 pages
Langchain App Design
No ratings yet
Langchain App Design
7 pages
NLP - L9 Word Embedding
No ratings yet
NLP - L9 Word Embedding
5 pages
Open AI Python
No ratings yet
Open AI Python
1 page
Gen AI Lab
No ratings yet
Gen AI Lab
22 pages
Build An LLM From Scratch
No ratings yet
Build An LLM From Scratch
19 pages
Nlput-Unit2 Notes
No ratings yet
Nlput-Unit2 Notes
28 pages
Guide Ipynb
No ratings yet
Guide Ipynb
26 pages
Embeddings 1686516367
No ratings yet
Embeddings 1686516367
82 pages
Vector Databases & Top Solutions
No ratings yet
Vector Databases & Top Solutions
4 pages
Retrieval Augmented Language Model (Ralm) : Module #3 - Langchain
No ratings yet
Retrieval Augmented Language Model (Ralm) : Module #3 - Langchain
54 pages
Embeddings - Vector Databases
No ratings yet
Embeddings - Vector Databases
2 pages
RAG With Reinforcement Learning
No ratings yet
RAG With Reinforcement Learning
40 pages
Lab1 Installation
No ratings yet
Lab1 Installation
8 pages
Train 400x Faster Static Embedding Models With Sentence Transformers
No ratings yet
Train 400x Faster Static Embedding Models With Sentence Transformers
47 pages
Whitepaper Emebddings Vectorstores v2
100% (1)
Whitepaper Emebddings Vectorstores v2
64 pages
Contrastive Learning for Text & Code
No ratings yet
Contrastive Learning for Text & Code
13 pages
Word Embeddings
No ratings yet
Word Embeddings
163 pages
RAGHack AzureAISearch Spanish
No ratings yet
RAGHack AzureAISearch Spanish
85 pages
Natural Language Processing With Neural Network - Class3
No ratings yet
Natural Language Processing With Neural Network - Class3
25 pages
Vector Embedding
No ratings yet
Vector Embedding
8 pages
Embeddings, Vector Databases, and Search in LLM
No ratings yet
Embeddings, Vector Databases, and Search in LLM
38 pages
Sheet 3
No ratings yet
Sheet 3
5 pages
LLM Fine Tune
No ratings yet
LLM Fine Tune
11 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Vector Search Theoritical Notes With Keywords
No ratings yet
Vector Search Theoritical Notes With Keywords
36 pages
Transformer Neural Networks: RAHUL 121AD0036
No ratings yet
Transformer Neural Networks: RAHUL 121AD0036
43 pages
Unit IV
No ratings yet
Unit IV
57 pages
Unit IV
No ratings yet
Unit IV
58 pages
Illustrated Word2vec Guide
100% (1)
Illustrated Word2vec Guide
24 pages
Generative AI Lab Manual
No ratings yet
Generative AI Lab Manual
24 pages
PDF Version
No ratings yet
PDF Version
18 pages
Embeddings in Deep Learning An Introduction
No ratings yet
Embeddings in Deep Learning An Introduction
8 pages
02 Data Connections
No ratings yet
02 Data Connections
32 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
Pre-Trained Models: Objectives
No ratings yet
Pre-Trained Models: Objectives
12 pages
Coursera-Deep-Learning Sequence Models Emojify
No ratings yet
Coursera-Deep-Learning Sequence Models Emojify
35 pages
PostgreSQL As A Vector Database: Create, Store, and Query OpenAI Embeddings With Pgvector
No ratings yet
PostgreSQL As A Vector Database: Create, Store, and Query OpenAI Embeddings With Pgvector
2 pages
Orbit Shifters Compliance Guide
No ratings yet
Orbit Shifters Compliance Guide
5 pages
Readme Linux
No ratings yet
Readme Linux
2 pages
Chapter 10 Current Issues in Information Technology
No ratings yet
Chapter 10 Current Issues in Information Technology
37 pages
Alcatel Omnipcx Enterprise: Ringing
No ratings yet
Alcatel Omnipcx Enterprise: Ringing
10 pages
Connect Student Registration
No ratings yet
Connect Student Registration
10 pages
WEEK 5 Still Life Drawing
100% (1)
WEEK 5 Still Life Drawing
22 pages
IEEE 802.1ag Ethernet OAM
No ratings yet
IEEE 802.1ag Ethernet OAM
18 pages
RFP For Provision of Skilled IT HR - ICT Facilitation 2025
No ratings yet
RFP For Provision of Skilled IT HR - ICT Facilitation 2025
55 pages
Lab 1
No ratings yet
Lab 1
4 pages
ATS8600
No ratings yet
ATS8600
2 pages
Laporan Penjualan Bulan April 2019
No ratings yet
Laporan Penjualan Bulan April 2019
52 pages
Payroll Resume
No ratings yet
Payroll Resume
2 pages
Ontology Building
No ratings yet
Ontology Building
7 pages
Regarding Purchase of IT Product (03.06.2024)
No ratings yet
Regarding Purchase of IT Product (03.06.2024)
10 pages
Interactive Tech Presentation
No ratings yet
Interactive Tech Presentation
56 pages
Fardux Datacapture Rtu
0% (1)
Fardux Datacapture Rtu
2 pages
Chapter 4
No ratings yet
Chapter 4
47 pages
Shirisha Sirapurapu
No ratings yet
Shirisha Sirapurapu
2 pages
Uc2825 PDF
No ratings yet
Uc2825 PDF
20 pages
Mechanical Engineer & CEO Profile
No ratings yet
Mechanical Engineer & CEO Profile
3 pages
Trajectory Planning
No ratings yet
Trajectory Planning
43 pages
RC List
No ratings yet
RC List
3 pages
Wavedrum (WDX1) Service Manual
No ratings yet
Wavedrum (WDX1) Service Manual
26 pages
PDN Application of Ferrite Beads
No ratings yet
PDN Application of Ferrite Beads
30 pages
NordicTrack C100 PDF
No ratings yet
NordicTrack C100 PDF
32 pages
Ishani - Web Developer - Bangalore
No ratings yet
Ishani - Web Developer - Bangalore
1 page
CAS ETH in Applied IT for Managers
No ratings yet
CAS ETH in Applied IT for Managers
4 pages
Introduction to Statistics & Probability
No ratings yet
Introduction to Statistics & Probability
2 pages
Forex Pro Shadow Manual
No ratings yet
Forex Pro Shadow Manual
18 pages
Tally Practical Semester Ques Apr 2025
No ratings yet
Tally Practical Semester Ques Apr 2025
2 pages

Vector Embeddings - OpenAI API

Uploaded by

Vector Embeddings - OpenAI API

Uploaded by

19/02/25, 23:01 Vector embeddings - OpenAI API

Vector embeddings Copy page

New embedding models

What are embeddings?

Search (where results are ranked by relevance to a query string)

Clustering (where text strings are grouped by similarity)

How to get embeddings

Example: Getting embeddings javascript

1 import OpenAI from "openai";

4 const embedding = await openai.embeddings.create({

By default, the length of the embedding vector is 1536 for text-embedding-3-small or

text-embedding-3-small 62,500 62.3% 8191

text-embedding-3-large 9,615 64.6% 8191

text-embedding-ada-002 12,500 61.0% 8191

Obtaining the embeddings

PRODUCT ID USER ID SCORE SUMMARY TEXT

B001E4KFG0 A3SGXH7AUHU8GW 5 Good Quality Dog I have bought several of the

B00813GRG4 A1D87F6ZCVE5NK 1 Not as Advertised Product arrived labeled as Jumbo

1 from openai import OpenAI

Reducing embedding dimensions

Question answering using embeddings-based search

Text search using embeddings

Code search using embeddings

Recommendations using embeddings

Embedding as a text feature encoder for ML algorithms

Classification using the embedding features

Obtaining user and product embeddings for cold-start recommendation

3 def num_tokens_from_string(string: str, encoding_name: str) -> int:

For third-generation embedding models like text-embedding-3-small , use the cl100k_base

How can I retrieve K nearest embedding vectors quickly?

Which distance function should I use?

OpenAI embeddings are normalized to length 1, which means that:

Can I share my embeddings online?

Do V3 embedding models know about recent events?

No, the text-embedding-3-large and text-embedding-3-small models lack knowledge of

You might also like