8000 updated content · sirmnemonic/azure-sql-db-openai@720d05e · GitHub
[go: up one dir, main page]

Skip to content

Commit 720d05e

Browse files
authored
updated content
1 parent f5b0c21 commit 720d05e

File tree

2 files changed

+10
-12
lines changed

2 files changed

+10
-12
lines changed

README.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,18 @@
1-
# Vector Similarity Search with Azure SQL & Azure OpenAI
1+
# Vector similarity search with Azure SQL & Azure OpenAI
22

3-
This example shows how to use Azure OpenAI from Azure SQL database to get the vector embeddings of any choose text, and then calculate the cosine distance against the Wikipedia articles (for which vector embeddings have been already calculated,) to find the articles that covers topics that are close - or similar - to the searched text.
3+
This example shows how to use Azure OpenAI from Azure SQL database to get the vector embeddings of any choosen text, and then calculate the [cosine similarity](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) against the Wikipedia articles (for which vector embeddings have been already calculated,) to find the articles that covers topics that are close - or similar - to the provided text.
44

55
Azure SQL database can be used to significatly speed up vectors operations using column store indexes, so that search can have sub-seconds performances even on large datasets.
66

77
## Download and import the Wikipedia Article with Vector Embeddings
88

99
Download the [wikipedia embeedings from here](https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip), unzip it and upload it (using [Azure Storage Explorer](https://learn.microsoft.com/en-us/azure/vs-azure-tools-storage-manage-with-storage-explorer?tabs=windows) for example) to an Azure Blob Storage container.
1010

11-
In the example the unzipped csv file to `vector_database_wikipedia_articles_embedded.csv` is assumed to be upload to a blob container name `playground` and in a folder named `wikipedia`:
11+
In the example the unzipped csv file to `vector_database_wikipedia_articles_embedded.csv` is assumed to be uploaded to a blob container name `playground` and in a folder named `wikipedia`.
1212

13-
```
14-
https://<myaccount>.blob.core.windows.net/playground/wikipedia/vector_database_wikipedia_articles_embedded.csv
15-
```
16-
17-
Once the file is uploaded, get the SAS token to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and than select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into the Notepad, as it will be needed later)
13+
Once the file is uploaded, get the [SAS token](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and than select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into the Notepad, as it will be needed later)
1814

19-
Use a client tool like Azure Data Studio to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings` where the uploaded CSV file will be imported.
15+
Use a client tool like [Azure Data Studio](https://azure.microsoft.com/en-us/products/data-studio/) to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings` where the uploaded CSV file will be imported.
2016

2117
Make sure to replace the `<account>` and `<sas-token>` placeholders with the value correct for your environment:
2218

@@ -27,7 +23,7 @@ Run each section (each section starts with a comment) separately. At the end of
2723

2824
## Create Vectors Table
2925

30-
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a columnstore index. Thanks to `OPENJSON` turning a vector into a set of values that can be saved into a column is very easy:
26+
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a columnstore index. Thanks to `OPENJSON`, turning a vector into a set of values that can be saved into a column is very easy:
3127

3228
```sql
3329
select
@@ -75,7 +71,7 @@ from
7571
openjson(@response, '$.result.data[0].embedding')
7672
```
7773

78-
Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine distance. The math can be easily expressed in T-SQL:
74+
Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine similarity. The math can be easily expressed in T-SQL:
7975

8076
```sql
8177
SUM(v1.[vector_value] * v2.[vector_value]) /

vector-embeddings/01-import-wikipedia.sql

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ end
1818
go
1919

2020
/*
21-
Create database scoped credential and external data source
21+
Create database scoped credential and external data source.
22+
File is assumed to be in a path like:
23+
https://<myaccount>.blob.core.windows.net/playground/wikipedia/vector_database_wikipedia_articles_embedded.csv
2224
*/
2325
create database scoped credential [openai_playground]
2426
with identity = 'SHARED ACCESS SIGNATURE',

0 commit comments

Comments
 (0)
0