You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some more language clean up. I apologize for the series of requests; I was submitting as I noticed things while reading through before.
In this request, I've looked through the remainder of the README for spelling and grammar issues.
I hope you find these helpful -- I know how difficult it is to write well in markdown format -- and my GitHub repo often has similar things amok with it.
Copy file name to clipboardExpand all lines: README.md
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -39,9 +39,9 @@ Download the [wikipedia embeddings from here](https://cdn.openai.com/API/example
39
39
40
40
In the example the unzipped csv file `vector_database_wikipedia_articles_embedded.csv` is assumed to be uploaded to a blob container name `playground` and in a folder named `wikipedia`.
41
41
42
-
Once the file is uploaded, get the [SAS token](https://learn.microsoft.com/azure/storage/common/storage-sas-overview) to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and than select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into the Notepad, as it will be needed later)
42
+
Once the file is uploaded, get the [SAS token](https://learn.microsoft.com/azure/storage/common/storage-sas-overview) to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and then select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into Notepad, as it will be needed later)
43
43
44
-
Use a client tool like [Azure Data Studio](https://azure.microsoft.com/products/data-studio/) to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings` where the uploaded CSV file will be imported.
44
+
Use a client tool like [Azure Data Studio](https://azure.microsoft.com/products/data-studio/) to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings`table where the uploaded CSV file will be imported.
45
45
46
46
Make sure to replace the `<account>` and `<sas-token>` placeholders with the value correct for your environment:
47
47
@@ -52,7 +52,7 @@ Run each section (each section starts with a comment) separately. At the end of
52
52
53
53
## Add embeddings columns to table
54
54
55
-
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into more compact and optimzed binary format index. Thanks to the new `VECTOR` type, turning a vector into a set of values that can be saved into a column is very easy:
55
+
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a more compact and optimized binary format index. Thanks to the new `VECTOR` type, turning a vector into a set of values that can be saved into a column is very easy:
The vector returned in the response can extrated using `json_query`:
88
+
The vector returned in the response can extracted using `json_query`:
89
89
90
90
```sql
91
91
set @re = json_query(@response, '$.result.data[0].embedding')
92
92
```
93
93
94
-
Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine similarity. The math can be easily expressed in T-SQL:
94
+
Now it is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculating the cosine similarity. The math can be easily expressed in T-SQL:
The described process can be wrapped into stored procedures to make it easy to re-use it. The scripts in the `./vector-embeddings/` show how to create a stored procedure to retrieve the embeddings from OpenAI:
102
+
The described process can be wrapped into stored procedures to make it easy to re-use. The scripts in the `./vector-embeddings/` directory show how to create a stored procedure to retrieve the embeddings from OpenAI:
103
103
104
104
-`03-store-openai-credentials.sql`: stores the Azure OpenAI credentials in the Azure SQL database
105
-
-`04-create-get-embeddings-procedure.sql`: create a stored procedure to encapsulate the call to OpenAI using the script.
105
+
-`04-create-get-embeddings-procedure.sql`: creates a stored procedure to encapsulate the call to OpenAI using the script.
106
106
107
107
## Finding similar articles
108
108
109
109
The script `05-find-similar-articles.sql` uses the created stored procedure and the process explained above to find similar articles to the provided text.
110
110
111
111
## Alternative sample with Python and a local embedding model
112
112
113
-
If you don't want or can't use OpenAI to generate embeddings, you can use a local model like `https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1` to generate embeddings. The Python script `./python/hybrid_search.py` shows how to
113
+
If you don't want to, or can't use OpenAI to generate embeddings, you can use a local model like `https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1` to generate embeddings. The Python script `./python/hybrid_search.py` shows how to
114
114
115
115
- use Python to generate the embeddings
116
116
- do similarity search in Azure SQL database
117
117
- use [Fulltext search in Azure SQL database with BM25 ranking](https://learn.microsoft.com/en-us/sql/relational-databases/search/limit-search-results-with-rank?view=sql-server-ver16#ranking-of-freetexttable)
118
118
- do re-ranking applying Reciprocal Rank Fusion (RRF) to combine the BM25 ranking with the cosine similarity ranking
119
119
120
-
Make sure to setup the database for this sample using the `./python/00-setup-database.sql` script. Database can be either an Azure SQL DB or a SQL Server database.
120
+
Make sure to setup the database for this sample using the `./python/00-setup-database.sql` script. The database can be either an Azure SQL DB or a SQL Server database.
121
121
122
122
## Conclusions
123
123
124
-
Azure SQL database, has now support to perform vector operations directly in the database, making it easy to perform vector similarity search. Using vector search along with fulltext search and BM25 ranking, it is possible to build powerful search engines that can be used in a variety of scenarios.
124
+
Azure SQL database, now has support to perform vector operations directly in the database, making it easy to perform vector similarity search. Using vector search along with fulltext search and BM25 ranking, it is possible to build powerful search engines that can be used in a variety of scenarios.
125
125
126
126
> [!NOTE]
127
127
> Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement
0 commit comments