You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-11Lines changed: 7 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,18 @@
1
-
# Vector Similarity Search with Azure SQL & Azure OpenAI
1
+
# Vector similarity search with Azure SQL & Azure OpenAI
2
2
3
-
This example shows how to use Azure OpenAI from Azure SQL database to get the vector embeddings of any choose text, and then calculate the cosine distance against the Wikipedia articles (for which vector embeddings have been already calculated,) to find the articles that covers topics that are close - or similar - to the searched text.
3
+
This example shows how to use Azure OpenAI from Azure SQL database to get the vector embeddings of any choosen text, and then calculate the [cosine similarity](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) against the Wikipedia articles (for which vector embeddings have been already calculated,) to find the articles that covers topics that are close - or similar - to the provided text.
4
4
5
5
Azure SQL database can be used to significatly speed up vectors operations using column store indexes, so that search can have sub-seconds performances even on large datasets.
6
6
7
7
## Download and import the Wikipedia Article with Vector Embeddings
8
8
9
9
Download the [wikipedia embeedings from here](https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip), unzip it and upload it (using [Azure Storage Explorer](https://learn.microsoft.com/en-us/azure/vs-azure-tools-storage-manage-with-storage-explorer?tabs=windows) for example) to an Azure Blob Storage container.
10
10
11
-
In the example the unzipped csv file to `vector_database_wikipedia_articles_embedded.csv` is assumed to be upload to a blob container name `playground` and in a folder named `wikipedia`:
11
+
In the example the unzipped csv file to `vector_database_wikipedia_articles_embedded.csv` is assumed to be uploaded to a blob container name `playground` and in a folder named `wikipedia`.
Once the file is uploaded, get the SAS token to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and than select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into the Notepad, as it will be needed later)
13
+
Once the file is uploaded, get the [SAS token](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and than select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into the Notepad, as it will be needed later)
18
14
19
-
Use a client tool like Azure Data Studio to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings` where the uploaded CSV file will be imported.
15
+
Use a client tool like [Azure Data Studio](https://azure.microsoft.com/en-us/products/data-studio/) to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings` where the uploaded CSV file will be imported.
20
16
21
17
Make sure to replace the `<account>` and `<sas-token>` placeholders with the value correct for your environment:
22
18
@@ -27,7 +23,7 @@ Run each section (each section starts with a comment) separately. At the end of
27
23
28
24
## Create Vectors Table
29
25
30
-
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a columnstore index. Thanks to `OPENJSON` turning a vector into a set of values that can be saved into a column is very easy:
26
+
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a columnstore index. Thanks to `OPENJSON`, turning a vector into a set of values that can be saved into a column is very easy:
31
27
32
28
```sql
33
29
select
@@ -75,7 +71,7 @@ from
75
71
openjson(@response, '$.result.data[0].embedding')
76
72
```
77
73
78
-
Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine distance. The math can be easily expressed in T-SQL:
74
+
Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine similarity. The math can be easily expressed in T-SQL:
0 commit comments