8000 Update README.md · doraig/azure-sql-db-openai@29741ae · GitHub
[go: up one dir, main page]

Skip to content

Commit 29741ae

Browse files
Update README.md
Some more language clean up. I apologize for the series of requests; I was submitting as I noticed things while reading through before. In this request, I've looked through the remainder of the README for spelling and grammar issues. I hope you find these helpful -- I know how difficult it is to write well in markdown format -- and my GitHub repo often has similar things amok with it.
1 parent 1873d18 commit 29741ae

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ Download the [wikipedia embeddings from here](https://cdn.openai.com/API/example
3939

4040
In the example the unzipped csv file `vector_database_wikipedia_articles_embedded.csv` is assumed to be uploaded to a blob container name `playground` and in a folder named `wikipedia`.
4141

42-
Once the file is uploaded, get the [SAS token](https://learn.microsoft.com/azure/storage/common/storage-sas-overview) to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and than select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into the Notepad, as it will be needed later)
42+
Once the file is uploaded, get the [SAS token](https://learn.microsoft.com/azure/storage/common/storage-sas-overview) to allow Azure SQL database to access it. (From Azure storage Explorer, right click on the `playground` container and then select `Get Shared Access Signature`. Set the expiration date to some time in future and then click on "Create". Copy the generated query string somewhere, for example into Notepad, as it will be needed later)
4343

44-
Use a client tool like [Azure Data Studio](https://azure.microsoft.com/products/data-studio/) to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings` where the uploaded CSV file will be imported.
44+
Use a client tool like [Azure Data Studio](https://azure.microsoft.com/products/data-studio/) to connect to an Azure SQL database and then use the `./vector-embeddings/01-import-wikipedia.sql` to create the `wikipedia_articles_embeddings` table where the uploaded CSV file will be imported.
4545

4646
Make sure to replace the `<account>` and `<sas-token>` placeholders with the value correct for your environment:
4747

@@ -52,7 +52,7 @@ Run each section (each section starts with a comment) separately. At the end of
5252

5353
## Add embeddings columns to table
5454

55-
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into more compact and optimzed binary format index. Thanks to the new `VECTOR` type, turning a vector into a set of values that can be saved into a column is very easy:
55+
In the imported data, vectors are stored as JSON arrays. To take advtange of vector processing, the arrays must be saved into a more compact and optimized binary format index. Thanks to the new `VECTOR` type, turning a vector into a set of values that can be saved into a column is very easy:
5656

5757
```sql
5858
alter table wikipedia_articles_embeddings
@@ -85,43 +85,43 @@ exec @retval = sp_invoke_external_rest_endpoint
8585
select @response;
8686
```
8787

88-
The vector returned in the response can extrated using `json_query`:
88+
The vector returned in the response can extracted using `json_query`:
8989

9090
```sql
9191
set @re = json_query(@response, '$.result.data[0].embedding')
9292
```
9393

94-
Now is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculate the cosine similarity. The math can be easily expressed in T-SQL:
94+
Now it is just a matter of taking the vector of the sample text and the vectors of all wikipedia articles and calculating the cosine similarity. The math can be easily expressed in T-SQL:
9595

9696
```sql
9797
vector_distance('cosine', @embedding, title_vector)
9898
```
9999

100100
## Encapsulating logic to retrieve embeddings
101101

102-
The described process can be wrapped into stored procedures to make it easy to re-use it. The scripts in the `./vector-embeddings/` show how to create a stored procedure to retrieve the embeddings from OpenAI:
102+
The described process can be wrapped into stored procedures to make it easy to re-use. The scripts in the `./vector-embeddings/` directory show how to create a stored procedure to retrieve the embeddings from OpenAI:
103103

104104
- `03-store-openai-credentials.sql`: stores the Azure OpenAI credentials in the Azure SQL database
105-
- `04-create-get-embeddings-procedure.sql`: create a stored procedure to encapsulate the call to OpenAI using the script.
105+
- `04-create-get-embeddings-procedure.sql`: creates a stored procedure to encapsulate the call to OpenAI using the script.
106106

107107
## Finding similar articles
108108

109109
The script `05-find-similar-articles.sql` uses the created stored procedure and the process explained above to find similar articles to the provided text.
110110

111111
## Alternative sample with Python and a local embedding model
112112

113-
If you don't want or can't use OpenAI to generate embeddings, you can use a local model like `https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1` to generate embeddings. The Python script `./python/hybrid_search.py` shows how to
113+
If you don't want to, or can't use OpenAI to generate embeddings, you can use a local model like `https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1` to generate embeddings. The Python script `./python/hybrid_search.py` shows how to
114114

115115
- use Python to generate the embeddings
116116
- do similarity search in Azure SQL database
117117
- use [Fulltext search in Azure SQL database with BM25 ranking](https://learn.microsoft.com/en-us/sql/relational-databases/search/limit-search-results-with-rank?view=sql-server-ver16#ranking-of-freetexttable)
118118
- do re-ranking applying Reciprocal Rank Fusion (RRF) to combine the BM25 ranking with the cosine similarity ranking
119119

120-
Make sure to setup the database for this sample using the `./python/00-setup-database.sql` script. Database can be either an Azure SQL DB or a SQL Server database.
120+
Make sure to setup the database for this sample using the `./python/00-setup-database.sql` script. The database can be either an Azure SQL DB or a SQL Server database.
121121

122122
## Conclusions
123123

124-
Azure SQL database, has now support to perform vector operations directly in the database, making it easy to perform vector similarity search. Using vector search along with fulltext search and BM25 ranking, it is possible to build powerful search engines that can be used in a variety of scenarios.
124+
Azure SQL database, now has support to perform vector operations directly in the database, making it easy to perform vector similarity search. Using vector search along with fulltext search and BM25 ranking, it is possible to build powerful search engines that can be used in a variety of scenarios.
125125

126126
> [!NOTE]
127127
> Vector Functions are in Early Adopter Preview. Get access to the preview via https://aka.ms/azuresql-vector-eap-announcement

0 commit comments

Comments
 (0)
0